volume 28 number 7 july 2010
e d i tor i a l 629
Consortia and commodities
© 2010 Nature America, Inc. All rights reserved.
news Crystal structure of the WD40 domain of the yeast SCFCdc4 E3 ubiquitin ligase showing the binding of an allosteric inhibitor and the subsequent displacement of the substrate. Orlicky et al. and Aghajan et al. present the first inhibitors of cullin-RING ubiquitin ligases, the largest class of enzyme that confers specificity to the ubiquitinproteasome system. (pp 733 and 738)
631 Pharma embraces open source models 633 New eyes on old drugs 633 Genetic testing clamp down 634 New tech transfer models gain traction with deal flow 635 Sequencing firms vie for diagnostics market, tiptoe round patents 635 French IPO spate 635 Industrial biotech to boom? 636 Merck ditches biogeneric 636 Investors fight Charles River/WuXi merger 637 Microcap public biotechs access new pool of VC funding 637 Genzyme partners TJAB 637 China’s heparin billionaires 638 Italian GM rebels 639 newsmaker: Agios Pharmaceuticals 640 data page: Drug pipeline: Q210 641 News feature: Sunshine on conflicts
B i oe n trepre n e u r B u i l d i n g a b u s i n ess 644
Ask your doctor Jeffrey J Stewart, Jeron Eaves & Ben Bonifant
VCs looking for microcap pearls, p 637
op i n i o n a n d comme n t 647 650 654 654 655 656
C O R R E S P O ND E N C E PeptideClassifier for protein inference and targeted quantitative proteomics Minimum information about a protein affinity reagent (MIAPAR) Guidelines for reporting the use of column chromatography in proteomics Guidelines for reporting the use of capillary electrophoresis in proteomics Guidelines for reporting the use of gel image informatics in proteomics The 20-year environmental safety record of GM trees
Nature Biotechnology (ISSN 1087-0156) is published monthly by Nature Publishing Group, a trading name of Nature America Inc. located at 75 Varick Street, Fl 9, New York, NY 10013-1917. Periodicals postage paid at New York, NY and additional mailing post offices. Editorial Office: 75 Varick Street, Fl 9, New York, NY 10013-1917. Tel: (212) 726 9335, Fax: (212) 696 9753. Annual subscription rates: USA/Canada: US$250 (personal), US$3,520 (institution), US$4,050 (corporate institution). Canada add 5% GST #104911595RT001; Euro-zone: €202 (personal), €2,795 (institution), €3,488 (corporate institution); Rest of world (excluding China, Japan, Korea): £130 (personal), £1,806 (institution), £2,250 (corporate institution); Japan: Contact NPG Nature Asia-Pacific, Chiyoda Building, 2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843. Tel: 81 (03) 3267 8751, Fax: 81 (03) 3267 8746. POSTMASTER: Send address changes to Nature Biotechnology, Subscriptions Department, 342 Broadway, PMB 301, New York, NY 10013-3910. Authorization to photocopy material for internal or personal use, or internal or personal use of specific clients, is granted by Nature Publishing Group to libraries and others registered with the Copyright Clearance Center (CCC) Transactional Reporting Service, provided the relevant copyright fee is paid direct to CCC, 222 Rosewood Drive, Danvers, MA 01923, USA. Identification code for Nature Biotechnology: 1087-0156/04. Back issues: US$45, Canada add 7% for GST. CPC PUB AGREEMENT #40032744. Printed by Publishers Press, Inc., Lebanon Junction, KY, USA. Copyright © 2010 Nature America, Inc. All rights reserved. Printed in USA.
i
volume 28 number 7 july 2010 C O M M E N TA R Y 659
The pros and cons of peptide-centric proteomics Mark W Duncan, Ruedi Aebersold & Richard M Caprioli
feat u re 665
Proteomics retrenches Peter Mitchell pate n ts
© 2010 Nature America, Inc. All rights reserved.
Peptide-based proteomics, p 659
671 Intellectual property, technology transfer and manufacture of low-cost HPV vaccines in India Swathi Padmanabhan, Tahir Amin, Bhaven Sampat, Robert Cook-Deegan & Subhashini Chandrasekharan 679 Recent patent applications in stem cells
N E W S A ND V I E W S 681
A B C D E F
C+D C+D C+D
[Ca2+] Time
A B C D E F
Multiple-signal integration, p 681
Paring down signaling complexity see also p 727 Kevin A Janes
682 Inhibitors for E3 ubiquitin ligases John R Lydeard & J Wade Harper
see also pp 733 & 738
684
Systematic phenotyping of mouse mutants Wolfgang Wurst & Martin Hrabe de Angelis
686
Splicing by cell type Mauricio A Arias, Shengdong Ke & Lawrence A Chasin
687
A synthetic DNA transplant Mitsuhiro Itaya
689
Antibiotic leads challenge conventional wisdom Markus Elsner
690
Research highlights
see also p 749
comp u tat i o n a l b i o l ogy comme n tary 691
research
Neurology Lung
Cloud computing and the DNA data race Michael C Schatz, Ben Langmead & Steven L Salzberg
Oncology
Behavior
perspect i ve
Pathology
695
Proteomics: a pragmatic perspective Parag Mallick & Bernhard Kuster
710
Options and considerations when selecting a quantitative proteomics strategy Bruno Domon & Ruedi Aebersold
Metabolism
Embryology
Cardiovascular
Reproduction Immunology
Mouse phenotypic screening, p 684
nature biotechnology
iii
volume 28 number 7 july 2010 l etters 723 Live attenuated influenza virus vaccines by computer-aided rational design S Mueller, J R Coleman, D Papamichail, C B Ward, A Nimnual, B Futcher, S Skiena & E Wimmer
© 2010 Nature America, Inc. All rights reserved.
Controlling kinase activation, p 743
727
Pairwise agonist scanning predicts cellular signaling responses to combinatorial stimuli see also p 681 M S Chatterjee, J E Purvis, L F Brass & S L Diamond
733
An allosteric inhibitor of substrate recognition by the SCFCdc4 ubiquitin ligase S Orlicky, X Tang, V Neduva, N Elowe, E D Brown, F Sicheri & M Tyers see also p 682
738
Chemical genetics screen for enhancers of rapamycin identifies a specific inhibitor of an SCF family E3 ubiquitin ligase M Aghajan, N Jonai, K Flick, F Fu, M Luo, X Cai, I Ouni, N Pierce, X Tang, B Lomenick, R Damoiseaux, R Hao, P M del Moral, R Verma, Y Li, C Li, K N Houk, see also p 682 M E Jung, N Zheng, L Huang, R J Deshaies, P Kaiser & J Huang
743
Engineered allosteric activation of kinases in living cells A V Karginov, F Ding, P Kota, N V Dokholyan & K M Hahn R eso u rce
Mouse knockout library, p 749
nature biotechnology
749
A mouse knockout library for secreted and transmembrane proteins T Tang, L Li, J Tang, Y Li, W Yu Lin, F Martin, D Grant, M Solloway, L Parker, W Ye, W Forrest, N Ghilardi, T Oravecz, K A Platt, D S Rice, G M Hansen, A Abuin, D E Eberhart, P Godowski, K H Holt, A Peterson, B P Zambrowicz & F J de Sauvage see also p 684
756
corrigenda and errata
careers a n d recr u i tme n t 757
Advancing the careers of life science professionals of Indian origin Jagath R Junutula, Praveena Raman, Darshana Patel, Holly Butler & Anula Jayasuriya
760
people
v
in this issue
Mass spectrometry–based proteomics has come to play an integral role in both basic biological research and when addressing more applied questions, such as how best to develop better drugs and diagnostics. Notwithstanding the field’s many accomplishments over the past decade, the phenomenal potential of proteomics still seems far from being fully realized. Kuster and Mallick provide a comprehensive overview of how far the field has progressed over the past decade, distinguishing between what can now be accomplished routinely and what types of experiments remain challenging even for more specialized laboratories. To help biologists without extensive technical expertise to better calibrate their expectations from collaborations with specialists, the authors systematically discuss a range of commonly encountered research issues in the context of the capacities of current technologies. Expanding on the theme that indiscriminate use of otherwise powerful technologies can do more to harm the reputation of a field than promote it, Caprioli and colleagues consider some underappreciated caveats associated with so-called ‘bottom-up’ or peptide-centric approaches. They conclude their account of these assumptions and their potential implications with some recommendations about how the issues can be addressed. Recent years have witnessed a shift from primarily exploratory proteomics methods to more targeted strategies that allow researchers to better focus on specific proteins of interest. This is moving protein mass spectrometry towards becoming a routine assay system, as well as serving as a tool for discovery. Domon and Aebersold discuss some of the trade-offs that need to be considered when selecting between the so-called shotgun, directed and targeted quantitative proteomics strategies. Although targeted proteomic strategies are sometimes perceived as competing and alternative options to discovery-based strategies, the authors highlight the benefits of using the approaches in a complementary manner. [Perspective, p. 695, 710; Commentary, p. 659; Feature, p. 665] PH & AM
Codon pair–deoptimized influenza vaccine
affecting amino acid sequences of the viral proteins. The modified viral mRNAs, which carry hundreds of nucleotide changes, must use codon pairs that are thought to be translated poorly by the host organisms. As a result, the ‘deoptimized’ virus is weakened but still presents the host immune system with wild-type viral proteins that stimulate a beneficial immune response. In mice infected with codon pair–deoptimized influenza, viral load in the lung is reduced over 1,000-fold compared with wild-type influenza, resulting in a controlled infection, no overt disease symptoms and effective protection with a wide safety margin against subsequent exposure to wild-type virus. As polio and influenza virus have very different genomic characteristics, this study demonstrates the potential for the codon pair–deoptimization strategy to be effective across a broad range of viruses. [Letters, p. 723] CM
Exploring kinase activity Kinases are important components of intracellular signaling cascades, and the ability to experimentally control their activation with a high temporal resolution would facilitate elucidation of their physiological functions. Hahn and colleagues now describe a small protein insert that can activate a protein kinase upon addition of the small-molecule rapamycin. In principle, the 88-amino-acid fragment (iFKBP) of the rapamycin binding protein FKBP12 can be inserted in a conserved loop of the catalytic domain of any protein kinase. In the absence of the ligand, this fragment is highly flexible and distorts the catalytic site of the kinase. Upon binding of rapamycin, the flexibility of iFKBP is reduced, the kinase domain resumes its natural confirmation and activity is restored. Co-expression of the FKBP-binding protein FRB enhances the sensitivity to rapamycin. The authors successfully test their approach on two tyrosine kinases—focal adhesion kinase (FAK) and Src—as well as in a serine/threonine kinase (p38). The activation is rapid; FAK can be activated in living cells within minutes. The authors demonstrate the utility of their approach by showing that induction of membrane ruffling by FAK requires the kinase’s catalytic activity. Nonimmunosuppressive analogs of rapamycin may enable in vivo application of the approach. [Letters, p. 743] ME
Unclotting complex signaling
Written by Markus Elsner, Michael Francisco, Peter Hare, Craig Mak, Andrew Marshall & Lisa Melton
Cells are exposed to complex combinations of stimuli that control physiological processes and influence responses to drugs. But because Dose × EC there are too many combinations to exhaus10 × 1 × tively assay, Diamond and colleagues devise a 0.1 × method for assaying cellular responses to pairs of stimuli and then use these data to predict responses to complex stimulus cocktails. They apply the method to study intracellular signaling in platelets treated with agonists that modulate blood clotting. The researchers load 384-well microtiter plates with all possible pairwise combinations chosen from six agonists at varying concentrations, add the agonist pairs to platelets and then
nature biotechnology volume 28 number 7 JULY 2010
vii
AstraZeneca’s (London) Flumist, currently the only live, attenuated influenza viral vaccine marketed in the United States, was developed by serial passage through pathogen-free primary chick kidney cells and subsequent culture in eggs. Wimmer and colleagues demonstrate an alternative strategy for producing a live, attenuated influenza vaccine. Their strategy, which had previously only been applied to poliovirus, involves changing the nucleotide sequence of viral proteins to introduce rarely used pairs of codons without altering the overall codon bias or
50
0.1 × 1× 10 ×
© 2010 Nature America, Inc. All rights reserved.
Whither proteomics?
i n this iss u e track intracellular calcium mobilization over a 4-minute time course. A neural network trained using the pairwise data is able to successfully predict responses to complex cocktails of three to six stimuli. Diamond and colleagues also harvest platelets from ten human donors, profile the cells’ responses to pairs of stimuli and observe donor-specific phenotypes, which may be useful for stratifying patients according to their predicted platelet responses to blood clotting dugs. More broadly, this study suggests that, in some cases at least, higher-order signaling phenotypes may be predictable from responses to pairs of stimuli. [Letters, p. 727; News and Views, p. 681] CM
© 2010 Nature America, Inc. All rights reserved.
E3 ubiquitin ligase inhibitors E3 ubiquitin ligases confer specificity to the ubiquitin-proteasome system for regulated protein degradation. Ubiquitin ligases of the cullin-RING type represent the largest family of E3 enzymes and are involved in the regulation of a wide range of cellular processes. Several have been implicated as oncogenes. Most commonly, they consist of a core complex that is recruited to specific substrates by different F-box adaptor subunits. Despite the potential medical importance of individual cullin-RING ligases, only general inhibitors of the whole family have been discovered previously. In this issue, papers by Tyers and colleagues and Huang and colleagues present the first specific inhibitors of cullinRING ligases. Using a fluorescence polarization assay, Tyers and colleagues screen a chemical library for compounds that displace a peptide corresponding to the targeting sequence of the yeast SCFCdc4 ligase from the Cdc4 F-box subunit. A crystal structure of Cdc4 bound to the lead compound, SCF-I2, shows that the inhibitor causes an allosteric modulation of the structure of the WD40-repeat domain that recognizes the
Patent roundup Next-generation genome sequencing firms are moving into diagnostics, but as they race to apply novel platforms to investigate disease, issues of patent ownership loom in the background. [News, p. 635] New funds—closely linked to tech transfer offices—are springing up and acting as brokers for multiple laboratories with overlapping intellectual property, easing the path to commercialization. [News, p. 634] Padmanabhan et al. study the patent, licensing and manufacturing landscape of human papillomavirus vaccines in India and go on to suggest strategies for helping improve vaccine affordability and access in low- and middle-income countries. [Patent Article, p. 671] Recent patent applications in stem cells. [New Patents, p. 679]
viii
targeting sequence. Beyond ubiquitin ligases, WD40 repeats are common protein interaction domains, and the paper suggests that they might be promising drug targets. Also working in yeast, Huang and colleagues screen for enhancers of the cytotoxic effect of rapamycin. The most potent compound, SMER3, inhibits the ubiquitin ligase SCFMet30. In contrast to the inhibitor of SCFCdc4, SMER3 disrupts the binding of the F-box subunit to the core complex of the E3 ligase. The papers suggest that both E3 ubiquitin ligases and WD40 domains might be promising drug targets. [Letters, p. 733, 738; News and Views, p. 682] ME
Phenotypic screen of a knockout mouse library Although the generation of knockout mice is an important tool for studying gene function, comprehensive phenotypic screens for the involvement in multiple physiological processes are seldom performed. Tang and colleagues present a systematic and comprehensive phenotypic analysis of a collection of 472 mouse strains with disrupted genes of secreted and membrane proteins that were chosen based on their membership in prominent protein families, their homology to known human disease–associated proteins and their tissue-specific expression. The phenotypic screen comprises 85 different assays designed to uncover the involvement of each gene in diverse processes, including embryonic development, metabolism and functioning of the immune, nervous and cardiovascular systems. Eighty-nine percent of the genes had discernable effects on at least one organ system. Moreover, for a substantial number of genes, the phenotypes did not correspond directly to the tissues where the gene is most prominently expressed. Both the phenotyping data and the mouse strains are publicly available and will provide valuable leads for more detailed secondary phenotyping and mechanistic studies. [Resource, p. 749; News and Views, p. 684] ME
LM
Next month in LM
MF MF
• Controlling HIV-1 with zinc-finger nucleases • Epigenetic memory in iPS cells • Trackable multiplex recombineering • Annotating the human genome using chromatin states • Genome of a model mushroom • A nonhuman sugar in therapeutic antibodies • Characterizing ubiquitinylation
volume 28 number 7 JULY 2010 nature biotechnology
www.nature.com/naturebiotechnology
EDITORIAL OFFICE
[email protected] 75 Varick Street, Fl 9, New York, NY 10013-1917 Tel: (212) 726 9200, Fax: (212) 696 9635 Chief Editor: Andrew Marshall Senior Editors: Laura DeFrancesco (News & Features), Kathy Aschheim (Research), Peter Hare (Research), Michael Francisco (Resources and Special Projects) Business Editor: Brady Huggett Associate Business Editor: Victor Bethencourt News Editor: Lisa Melton Associate Editors: Markus Elsner (Research), Craig Mak (Research) Editor-at-Large: John Hodgson Contributing Editors: Mark Ratner, Chris Scott Contributing Writer: Jeffrey L. Fox Senior Copy Editor: Teresa Moogan Managing Production Editor: Ingrid McNamara Senior Production Editor: Brandy Cafarella Production Editor: Amanda Crawford Senior Illustrator: Katie Vicari Illustrator/Cover Design: Kimberly Caesar Senior Editorial Assistant: Ania Levinson
© 2010 Nature America, Inc. All rights reserved.
MANAGEMENT OFFICES NPG New York 75 Varick Street, Fl 9, New York, NY 10013-1917 Tel: (212) 726 9200, Fax: (212) 696 9006 Publisher: Melanie Brazil Executive Editor: Linda Miller Chief Technology Officer: Howard Ratner Head of Nature Research & Reviews Marketing: Sara Girard Circulation Manager: Stacey Nelson Production Coordinator: Diane Temprano Head of Web Services: Anthony Barrera Senior Web Production Editor: Laura Goggin NPG London The Macmillan Building, 4 Crinan Street, London N1 9XW Tel: 44 207 833 4000, Fax: 44 207 843 4996 Managing Director: Steven Inchcoombe Publishing Director: Peter Collins Editor-in-Chief, Nature Publications: Philip Campbell Marketing Director: Della Sar Director of Web Publishing: Timo Hannay NPG Nature Asia-Pacific Chiyoda Building, 2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843 Tel: 81 3 3267 8751, Fax: 81 3 3267 8746 Publishing Director — Asia-Pacific: David Swinbanks Associate Director: Antoine E. Bocquet Manager: Koichi Nakamura Operations Director: Hiroshi Minemura Marketing Manager: Masahiro Yamashita Asia-Pacific Sales Director: Kate Yoneyama Asia-Pacific Sales Manager: Ken Mikami DISPLAY ADVERTISING
[email protected] (US/Canada)
[email protected] (Europe)
[email protected] (Asia) Global Head of Advertising and Sponsorship: Dean Sanderson, Tel: (212) 726 9350, Fax: (212) 696 9482 Global Head of Display Advertising and Sponsorship: Andrew Douglas, Tel: 44 207 843 4975, Fax: 44 207 843 4996 Asia-Pacific Sales Director: Kate Yoneyama, Tel: 81 3 3267 8765, Fax: 81 3 3267 8746 Display Account Managers: New England: Sheila Reardon, Tel: (617) 399 4098, Fax: (617) 426 3717 New York/Mid-Atlantic/Southeast: Jim Breault, Tel: (212) 726 9334, Fax: (212) 696 9481 Midwest: Mike Rossi, Tel: (212) 726 9255, Fax: (212) 696 9481 West Coast: George Lui, Tel: (415) 781 3804, Fax: (415) 781 3805 Germany/Switzerland/Austria: Sabine Hugi-Fürst, Tel: 41 52761 3386, Fax: 41 52761 3419 UK/Ireland/Scandinavia/Spain/Portugal: Evelina Rubio-Hakansson, Tel: 44 207 014 4079, Fax: 44 207 843 4749 UK/Germany/Switzerland/Austria: Nancy Luksch, Tel: 44 207 843 4968, Fax: 44 207 843 4749 France/Belgium/The Netherlands/Luxembourg/Italy/Israel/Other Europe: Nicola Wright, Tel: 44 207 843 4959, Fax: 44 207 843 4749 Asia-Pacific Sales Manager: Ken Mikami, Tel: 81 3 3267 8765, Fax: 81 3 3267 8746 Greater China/Singapore: Gloria To, Tel: 852 2811 7191, Fax: 852 2811 0743 NATUREJOBS
[email protected] (US/Canada)
[email protected] (Europe)
[email protected] (Asia) US Sales Manager: Ken Finnegan, Tel: (212) 726 9248, Fax: (212) 696 9482 European Sales Manager: Dan Churchward, Tel: 44 207 843 4966, Fax: 44 207 843 4596 Asia-Pacific Sales & Business Development Manager: Yuki Fujiwara, Tel: 81 3 3267 8765, Fax: 81 3 3267 8752 SPONSORSHIP
[email protected] Global Head of Sponsorship: Gerard Preston, Tel: 44 207 843 4965, Fax: 44 207 843 4749 Business Development Executive: David Bagshaw, Tel: (212) 726 9215, Fax: (212) 696 9591 Business Development Executive: Graham Combe, Tel: 44 207 843 4914, Fax: 44 207 843 4749 Business Development Executive: Reya Silao, Tel: 44 207 843 4977, Fax: 44 207 843 4996 SITE LICENSE BUSINESS UNIT Americas: Tel: (888) 331 6288 Asia/Pacific: Tel: 81 3 3267 8751 Australia/New Zealand: Tel: 61 3 9825 1160 India: Tel: 91 124 2881054/55 ROW: Tel: 44 207 843 4759
[email protected] [email protected] [email protected] [email protected] [email protected]
CUSTOMER SERVICE www.nature.com/help Senior Global Customer Service Manager: Gerald Coppin For all print and online assistance, please visit www.nature.com/help Purchase subscriptions: Americas: Nature Biotechnology, Subscription Dept., 342 Broadway, PMB 301, New York, NY 100133910, USA. Tel: (866) 363 7860, Fax: (212) 334 0879 Europe/ROW: Nature Biotechnology, Subscription Dept., Macmillan Magazines Ltd., Brunel Road, Houndmills, Basingstoke RG21 6XS, United Kingdom. Tel: 44 1256 329 242, Fax: 44 1256 812 358 Asia-Pacific: Nature Biotechnology, NPG Nature Asia-Pacific, Chiyoda Building, 2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843. Tel: 81 3 3267 8751, Fax: 81 3 3267 8746 India: Nature Biotechnology, NPG India, 3A, 4th Floor, DLF Corporate Park, Gurgaon 122002, India. Tel: 91 124 2881054/55, Tel/Fax: 91 124 2881052 REPRINTS
[email protected] Nature Biotechnology, Reprint Department, Nature Publishing Group, 75 Varick Street, Fl 9, New York, NY 10013-1917, USA. For commercial reprint orders of 600 or more, please contact: UK Reprints: Tel: 44 1256 302 923, Fax: 44 1256 321 531 US Reprints: Tel: (617) 494 4900, Fax: (617) 494 4960
Editorial
Consortia and commodities The rise of open source drug R&D in consortia involving big pharma should prompt some biotech companies to re-examine their businesses.
© 2010 Nature America, Inc. All rights reserved.
P
recompetitive collaborations among pharmaceutical companies are increasingly in vogue. They take the form of public–private partnerships or consortia, in which drug makers swap knowledge, data and resources with one another, as well as government agencies, non-profits and academic institutions, for the benefit of all. Their aim is to tackle collectively shared bottlenecks in early-stage biomedical research to both spur innovation and increase the productivity of drug research. As a byproduct, these consortia disrupt the business space for biotech companies, radically transforming the intellectual property (IP) landscape for biomedical technologies and platforms, and severely eroding the market. Indeed, businesses that depend on big pharma paying premium prices for access to proprietary technologies should probably rethink their strategy if consortia become active in their field. Tackling the problems that hamper pharmaceutical R&D and the high attrition of new drug compounds is a big ask—often too big for companies to tackle alone. All of which is leading big pharma to increasingly embrace precompetitive collaborations. Around 50 or so of these public-private partnerships exist today, the biggest of which is the European Union’s Innovative Medicines Initiative (IMI; http://imi.europa.eu/index_en.html). IMI is attempting to address challenges designed by drug companies in areas such as predictive pharmacology and toxicology, patient recruitment and the validation of biomarkers. Biomarkers are also the focus for the Biomarkers Consortium, a 2006 public-private initiative of the Foundation for the National Institutes of Health, and for the Predictive Safety Testing Consortium. Other collegial approaches involve (some) open access to what companies once regarded as their precious pharmaceutical resources. Thus, in May GlaxoSmithKline and Novartis of Basel between them deposited over 18,000 chemical structures active against the malaria parasite into the European Bioinformatics Institute’s open source database ChEMBL Neglected Tropical Disease archive (p. 675). The same month, Pfizer invited outside collaborators to screen a structure-blinded subset of its own compounds in return for limited co-development rights on any optimized leads. And last year, Eli Lilly launched its Phenotypic Drug Discovery initiative, sourcing compounds from outside organizations and offering to screen them against its own set of biochemical, cell-based and secondary assays. Most of these consortium arrangements involve a mix of other large drug makers, government agencies, not-for-profit organizations and academic institutions. Pharma sees the benefit from avoiding duplication of research and breaking down preexisting ‘silos’ of expertise in early stage research. But there remains a question: what, if anything, is in them for innovative biotech companies? The truth is that many biotechs stand to lose more than they will gain. Open consortia can severely undermine their businesses, particularly those based on providing platform technologies and techniques. When a pharma company puts its resources into a collaboration, it is contributing nature biotechnology volume 28 number 7 july 2010
not only a minuscule proportion of its total assets, but also assets from which it is currently deriving little value. A biotech’s contribution may be much smaller in absolute terms, but it still is likely to be a larger slice of that company’s IP. Thus, if the consortium achieves its goal, pharma R&D is facilitated, but commercial opportunities for biotech firms previously operating in the area are likely to be reduced. For example, imagine that a consortium finds a way around the predictive toxicity challenge. Both pharma and biotech get better toxicity studies—only the small companies don’t have compounds to do toxicology. Similar arguments could be made for biomarkers of disease progression or treatment outcomes. And co-development rights are of little value to a small company without the resources for co-development. Similarly, the Pistoia Alliance (http://pistoiaalliance.org/) consortium, which is attempting to streamline noncompetitive elements of drug discovery workflow by developing open standards for common scientific and business terms, relationships and processes, offers little incentive for biotech companies currently offering systems modeling packages to participate. Why would they when Pistoia’s goal essentially undermines their IP and expertise in controlled vocabularies, data structures and modeling tools—tools that have taken huge amounts of investment to create. Of course, there are upsides if biotech companies are fleet and agile enough to recognize them. First, although consortia rarely pay premium rates for access to technology, there may nevertheless be fee-for-service elements and, ultimately, some form of technology licensing that, at the least, helps a biotech with nondilutive cash flow in constrained times. Second, biotechs participating in consortia should be able to better benchmark the value of their own contribution both because of access to innovative research coming from academic partners (the future technology threat) and because of access to pharma partners in the consortium (potential customers). Third, IP agreements that fence in many of the consortia will help clarify freedom-to-operate challenges, like those that currently beset such areas as stem cells or gene patents for diagnostics. And finally, the biggest advantage of all to a biotech company may be that the consortium’s very existence sends a message about a forthcoming change in the market. That message is very loud and very clear: we, the pharma industry, have identified a tractable problem and we are going to solve it with whatever help we need. When we do, the value of biotech’s parallel solutions will plummet and the market will become commoditized. The formation of R&D consortia does not necessarily signal impending doom for biotechs. Instead, it should be a sign that they need to participate, take what they can get and reorient their business in a direction that does not compete with the consortium outputs. In essence, the formation of a consortium should act as a Damascene conversion for biotech management to look for new ways of developing their business. Crass as it sounds, as far as consortia are concerned, biotechs should take the money and then run with it—preferably in a new direction. 629
news in this section FDA clamps down on genetic testing
Third-generation sequencing firms move into diagnostics p635
p633
Investors find pearls in microcap public biotechs p637
© 2010 Nature America, Inc. All rights reserved.
Pharma embraces open source models On May 19, two large pharmaceutical companies participated in the unprecedented deposition of hundreds of thousands of potential leads for new malaria drugs into an open source database. The two companies, London-based GlaxoSmithKline (GSK) and Novartis of Basel, together with the St. Jude Children’s Research Hospital in Memphis, Tennessee, submitted the chemical structures of 328,100 compounds active against the malaria parasite Plasmodium falciparum to a European Bioinformatics Institute ChEMBL Neglected Tropical Disease archive (http://www.ebi.ac.uk/chemblntd). This willingness to cooperate in nonproprietary collaborations goes beyond diseases neglected by commercial developers to other aspects of drug discovery research. A recent flurry of open source collaborations have sprung up recently aimed at extracting value out of precompetitive information. Merck, of Whitehouse Station, New Jersey, signed up with Sage Bionetworks, a Seattle-based nonprofit collaborative information platform run by former Merck scientists and executives, and New York-based Pfizer has entered into a similar arrangement. These and other deals mark the beginning of a radical reconfiguration of the initial stages of the drug discovery process that were traditionally carried out within companies. For the pharma industry, the impetus to adopt an open source or open innovation strategy is driven by a need to refocus resources on driving the best compounds through the pipeline and collaborating on those early parts of the drug discovery R&D where problems are shared by other companies and often, indeed, across the industry. Bringing a drug to market, when all failures are added to the ledger, currently costs an astronomical $1.8 billion, according to evidence produced by Bernard Munos and researchers at Eli Lilly Corporate Center in Indianapolis (Nat. Rev. Drug. Discov. 9, 203–214, 2010). This is prompting some—in industry, universities and government—to propose new types of collaboration that attempt to pool resources, with a particular focus on new target biology and biomarker development. As the University of Toronto’s Aled Edwards puts it a root cause of our failure to find new drugs is
our ignorance of basic biology. “We are trying to discover drugs while we have a really bleak understanding of how basic physiology works and no one organization has the resources to understand it all,” he says. One way to cut the rate and cost of failures is to imitate the information openness and collective data sharing that gave rise to the Internet’s Wikipedia and YouTube and Linux’s open source software. The proposition is for companies to deemphasize intellectual property rights at least on early biology and be more open about sharing negative results so that knowledge advances faster in drug discovery research. An instantiation of this approach is Innocentive (http://www. nature.com/openinnovation/index.html), an initiative spun out of Eli Lilly in 2001, in which the wisdom of the crowds is used to solve problems offered with cash awards. Edwards knows firsthand the challenges of coordinating private-public collaborations from his work in the Structural Genomics Consortium (SGC). In 2004, he and his colleagues at the University of Toronto pooled resources with the University of Oxford, and later Sweden’s Karolinska Institute in Stockholm, to determine the three-dimensional structures of medically relevant proteins from humans and parasites. The effort is funded in part by the Canadian and Swedish governments and the UK’s Wellcome Trust but also by companies such as GSK, Merck and Novartis. An important aspect of the project is that discoveries are made publicly available without any restrictions on their use. Initial results have been impressive. Since 2005, the 200 researchers at SGC have contributed >20% of the novel human protein structures lodged in the Protein Data Bank each year. Drug developers are keen: 20% of the requests for SGC structures have originated from industrial, rather than university, researchers. SGC is just one of several efforts that have sprung up in recent years focusing on precompetitive research and data sharing (Table 1). These include the European-based Innovative Medicines Initiative Joint Undertaking (Nat. Biotechnol. 26, 717–718, 2008), the Alzheimer’s Disease Neuroimaging Initiative (http://www. adni-info.org/), the Biomarkers Consortium
nature biotechnology volume 28 number 7 JULY 2010
(Clin. Pharmacol. Therapeutics 87, 539–542, 2010), the Predictive Safety Testing Consortium (Nat. Biotechnol. 28, 432–433, 2010) and Lilly’s Phenotypic Drug Discovery Initiative (Nat. Rev. Drug Discov. 9, 87–88, 2010) . There are also individual efforts, such as the one spearheaded by GSK in January to freely provide 13,500 malarial compounds from its own library for others to test and develop. The GSK data set was loaded into ChEMBL’s free medicinal chemistry and drug discovery database, acquired from the Mechelen, Belgium–based biotech company Galapagos in 2008 and currently hosted on servers of the European Bioinformatics Institute Outstation of the European Molecular Biology Laboratory at Hinxton. Novartis deposited its Malaria Box
The Fred Hutchinson Cancer Research Centre in Seattle hosts the nonprofit Sage Bionetworks, a pioneer in open source data sharing.
631
NEWS
© 2010 Nature America, Inc. All rights reserved.
Table 1 Selected open source collaborations involving pharma Open source initiatives
Partners
Purpose
Terms
Merck Oncology Collaborative Trials Network
June 2010 Merck, National Cancer Institute of Brazil; Princess Margaret Hospital and Ontario Cancer Institute; Institut Gustave Roussy; Chaim Sheba Medical Center; Seoul National University Hospital; Netherlands Cancer Institute; Oslo University Hospital; National Taiwan University Hospital; Mayo Clinic Cancer Center; The University of Texas MD Anderson Cancer Center; Memorial Sloan Kettering Cancer Center; and others.
Set up
The research sites will lead the design and conduct of phase 0 to 2a clinical studies of Merck’s investigational oncology candidates. Every year, the network will enroll ~1,200 patients in 30–40 clinical trials.
These studies will include investigator- and company-sponsored trials. Infrastructure to consolidate data, specimen-testing results, imagingtesting results and patient outcomes are being developed.
European Bioinformatics Institute ChEMBL Neglected Tropical Diseases (ChEMBLNTD)
GlaxoSmithKline, Novartis Genomics Institute, St. Jude Hospital
May 2010
A repository for open access primary screening and medicinal chemistry data directed at neglected diseases—endemic tropical diseases of the developing regions of Africa, Asia, and the Americas.
The primary purpose of ChEMBL-NTD is to provide a freely accessible and permanent archive and distribution center for deposited data.
June 2010 National Institute of Neurological The Coalition Against Major Diseases (CAMD) – through the Disorders and Stroke (NINDS), the National Institute on Aging, and the Critical Path Institute Engelberg Center for Health Care Reform at the Brookings Institution; FDA; the European Medicines Agency (EMA); Abbott Laboratories; AstraZeneca; Bristol-Myers Squibb; Eli Lilly and Co.; Roche’s Genentech; Forest Laboratories; GlaxoSmithKline; Johnson & Johnson; Novartis; Pfizer; Sanofi-Aventis Group
A new shared and standardized database that currently contains information from ~4,000 Alzheimer’s subjects from eleven industry-sponsored clinical trials. They ultimately define clinical data standards and establish a pooled database of the control groups of pharmaceutical clinical trials in order to develop quantitative disease progression models for both Alzheimer’s and Parkinson’s diseases.
CAMD members will collaborate to gather and submit the evidence necessary for the FDA and EMA to officially designate such tools as “qualified for use” in drug development. These newly qualified tools will be made publicly available for use by scientists and commercial developers.
Lilly Phenotypic Drug Discovery Eli Lilly, University of Cincinnati Initiative (PD2)
data set of over 5,600 compounds tested against the malaria parasite in May, and researchers at St. Jude Children’s Research Hospital released data on 310,000 chemicals of which 1,100 compounds have confirmed activity against malaria (Nature 465, 305–310, 311–315, 2010). But perhaps the greatest media buzz surrounding open collaboration was generated in March, when Stephen Friend, a former senior vice president of cancer research at Merck & Co., and others announced the creation of the Seattle-based Sage Bionetworks. Sage is a nonprofit, open source research company, which has as one of its goals the development and sharing of large-scale predictive network models of disease. Sage grew directly out of Friend’s frustrations with the systemic failure of old-style drug discovery. It arose out of his sudden realization that Merck’s seemingly successful partnership with the H. Lee Moffitt Cancer Center in Tampa, Florida, was statistically underpowered. Moffitt was providing Merck with expression-profiled and imaged tissue samples from cancer patients. The hope was that uncovering genetic differences in the patients’ tumors would allow Merck to decide 632
August 2009 Phenotypic drug discovery directly interrogates complex biological systems composed of multiple or unknown biochemical components and/or pathways. Phenotypic drug discovery or chemical genomics enables the discovery of compounds that modulate biology in a target and mechanism-agnostic manner.
ahead of time which people were best suited to receive which experimental drugs. After 5,000 tumor samples had been obtained “I had this ‘aha’ moment,” says Friend. Although the sample numbers seemed huge, his analysis showed they actually needed to be 100 times larger. When you calculated the cost and complexity of getting the larger numbers, “you said to yourself, ‘the first thing we need is open access to other data sets’, ” says Friend. The attractiveness of open access as a means of creating larger, openly analyzed data sets also made sense to Merck. In 2009, it gave Sage an estimated $150 million worth of its global genomic data sets along with their clinical outcomes. Included in the donation came the analytic software and the in-house expertise and know-how used to create it. The availability of new data sets has already led to Sage partnerships with nonprofits such as the Canary Foundation in Palo Alto, California, which seeks to identify cancer at very early stages, and the CHDI Foundation, which funds research into Huntington’s disease. In January of this year, Sage announced an agreement with Pfizer. Both Pfizer and Sage will explore the company’s data sets and publicly
Lilly will provide no-cost access to phenotypic assay panel for external investigators. PD2 panel includes disease-relevant assays. Confidential compound submission via web-based interface. Full data report provided to investigator. Promising findings can serve as basis for a collaboration agreement.
available data sets in an effort to create the holy grail of computer-generated, predictive disease models. The hope is to both find new targets for drugs, particularly as they relate to cancer, and be able to zero in on the personalized clinical trial information the Moffitt data set was not big enough to provide. Pfizer is also inviting smaller companies and institutions to screen against its internal compound library. However, the Pfizer collaboration with Sage is what might be termed semi-open access as the agreement states that Pfizer does not have to make the data it and Sage generate publicly available until a year after a given project concludes. This conflict between Sage’s theoretical openness and its actual semi-privacy hints at the difficulties which precompetitive, open access, public-private collaborations face. In a recent article (Clin. Pharmacol. Therapeut. 87, 527– 529, 2010) four F. Hoffmann-La Roche authors outline a series of challenges to the open access, precompetitive approach. One issue was a standardization of methods. “Different partners may be using different metrics to measure the clinical outcomes of interest,” they write. Creating a standardized database
volume 28 number 7 JULY 2010 nature biotechnology
news
in brief
format is another problem. Managing the complexity of collaborations that occur when large numbers of institutions and individuals collaborate creates another twist in the road toward open participation. This is particularly worrisome in the light of a recent counterintuitive finding by Jonathon Cummings of Duke University, who studied 491 National Science Foundation–funded research collaborations. He found more can turn into less if the project is not carefully managed. “Our study found that projects with more collaborating institutions, on average, were less likely to have published papers and patent applications compared with projects with fewer collaborating institutions,” says Cummings. Another issue is that although companies theoretically give all their data to open collaboration, “often only select data are permitted to be shared,” says Bruno Boutouyrie, who heads up the F. Hoffmann-La Roche clinical pharmacology central nervous system division, and who
was one of the paper’s co-authors. This leads to the general argument that open source collaboration is probably best arrived at when all parties agree ahead of time what will and won’t be publicly available. But, says Boutouyrie, “controlling the direction of research in a network may be the most important problem.” That is to say: who decides a given line of research is exhausted and one must just move on. Such issues may challenge open collaborations between academia and industry. Thus, whereas industry researchers are used to axing projects if a target turns out to be undruggable or a lead series has unacceptable toxicities or equivocal efficacy, this kind of abrupt stop to a research program creates tensions with university investigators whose graduate students’ funding may be cut in the midst of a PhD. To deal with this “you will have to get some transitional funding arrangement so an institution is not disadvantaged internally,” says Colin Dollery,
in their words
© AAAS
© 2010 Nature America, Inc. All rights reserved.
A unique example of open source collaboration has been struck between Pfizer and Washington University in St. Louis. New York-based Pfizer agreed, in May, to provide university researchers with information on more than 500 drug candidates to give them the opportunity to identify new uses for these compounds. The agreement entitles Washington University to $22.5 million over five years and access to proprietary data, which are not normally released to university groups. “By allowing others to consider the additional use of our compounds, we hope to identify new opportunities for truly unmet medical needs,” says Don Frail, chief scientific officer of Pfizer’s indications discovery unit. The advantage for the academic researchers is that Pfizer’s compounds have been extensively studied and their mechanisms are well understood, shaving off time needed for evaluation. In the new collaboration, when the researchers find a promising new application for a compound, they can propose a research project to Pfizer. The university will have the opportunity to negotiate the commercialization terms for its discoveries. Stephen Strauss Toronto
“There’s no doubt in my mind that this is a major achievement. But is it artificial life? Of course not.” Steen Rasmussen, a professor of physics at the University of Southern Denmark (New York Times, 31 May 2010).
“Synthesizing and cloning a genome with 1.08 million base pairs might seem to be a trivial extension of the 1984 synthesis of a gene containing about 300 base pairs…This paper shows that it was not.” Steven Benner, Foundation for Applied Molecular Evolution, Gainesville, Florida, (Nature, 27 May 2010)
“An interesting result.” The Vatican (CNN, May 22 2010) “I hope very much these patents won’t be accepted because they would bring genetic engineering under the control of the J Craig Venter Institute. They would have a monopoly on a whole range of techniques.” John Sulston, University of Manchester (BBC News, 24 May, 2010) “This milestone and many like it should be celebrated. But has the JCVI created ‘new life’ and tested vitalism? Not really…Printing out a copy of an ancient text isn’t the same as understanding the language.” George Church, Harvard (Nature 465, 422–424, 2010) “A marvelous advance, but it doesn’t immediately open up or enable new studies for the broad community.” James Collins, Boston University (New Scientist, 26 May 2010)
nature biotechnology volume 28 number 7 JULY 2010
Genetic testing clamp down The US Food and Drug Administration (FDA) has told five genetic test manufacturers that their products need the agency’s blessing before they can be sold to consumers. On June 10th, the Consumer takes a agency sent letters 23andMe genetic test. to Illumina, of San Diego, Pathway Genomics also of San Diego, NaviGenics, 23andMe and deCODE Genetics, of Reykjavik, Iceland, explaining that their genetic tests are considered medical devices and must be approved. The FDA had no specific plans to regulate these direct-to-consumer tests until recently when Pathway Genomics announced its intention to market a kit at pharmacy chain Walgreens. Customers would buy the Pathway Genomics’ Insight Saliva Collection Kit at most of Walgreen’s 7,500 stores for $20 to $30 and send their saliva sample to Pathway to undergo what the company terms “comprehensive genotyping.” They could then order individualized Genetic Insight Reports for Drug Response ($79), Pre-Pregnancy Planning ($179), Health Conditions ($179) or a combination of all three ($249). The FDA quickly sent a letter to Pathway stating that agency staffers were “unable to identify any Food and Drug Administration clearance or approval number,” for the kits, a clear indication that they expected to find that. Pathway responded that their laboratory is Clinical Laboratory Improvement Amendments (CLIA)approved, which they believed sufficient. That little scuffle prompted Walgreens to announce that it would postpone offering the kits “until we have further clarity on this matter.” The furor even caught the interest of Congress. The House Energy and Commerce Committee requested information about their tests from Pathway, 23 and Me, of Mountain View, and Navigenics of Foster City, both in California. After many months of regulatory uncertainty, the FDA’s stance is welcome (Nat. Biotech, 27, 875, 2009). All of these companies have been selling such services from their websites for more than a year and will be allowed to continue. But it appears that the agency will no longer be satisfied with just CLIA certification for genotyping facilities, which is how most of these firms operate. According to an e-mail from Dick Thompson of the FDA Office of Public Affairs, “The agency has been meeting with several companies to understand their claims and business models.” The FDA will hold a public meeting on July 19 and 20 to discuss how the agency will oversee laboratory-developed tests. Malorye Allison News.com
New eyes on old drugs
633
NEWS who works for GSK and also argues that closer pharmaceutical industry–academic cooperation is the future of drug development. The over-riding question, however, may well be how to judge the success of precompetitive, open innovation research. It is not easy because “open innovation spans a great variety of models and has become something of a catch-all term,” remarks Lilly’s Munos.
On the one hand, Friend says Sage assumes the present patent structure is staying in place for compounds and biologics, a situation that “would allow companies to have an ability to develop something and have a return on it others couldn’t copy.” Edwards, on the other hand, argues for collaborative precompetitive research going right up to clinical trials. But with the attrition rate so high in drug
discovery and open collaborations still relatively young in terms of drug development timelines, it is hard to track which successes might have resulted from precompetitive research. “We have to develop very structured metrics as opposed to feel good, ‘oh, look we are getting together and working together’ arguments,” remarks Edwards. Stephen Strauss Toronto
© 2010 Nature America, Inc. All rights reserved.
New tech transfer models gain traction with deal flow One view of the acquisition in June of respiratory drug discovery company Respivert by Centocor Ortho Biotech of Horsham, Pennsylvania, is that it is just another commonplace example of an established public biotech company swallowing a minnow. Another perspective is that the deal represents a whole new take on tech transfer, providing seed investors with proof of concept that earlystage life sciences technology not only has value, but also can return value tangibly and quickly. Imperial Innovations, the tech transfer group for Imperial College London, invested a total of £2 ($2.8) million in Londonbased Respivert in 2007 and 2008. The sale of its 13.4% stake in the company yielded £9.5 million in cash, a 4.7-fold return on its three-year investment. It also yielded profits for co-investors, the global firm SV Life Sciences, London-based Advent Venture Partners and Fidelity Biosciences of Cambridge, Massachusetts. Although this is not the first time that Imperial Innovations has profited from the disposal of a biotech asset, it is much more financially significant than the December 2008 sale of its peptide obesity drug firm, Thiakis, to Wyeth Pharmaceuticals (now Pfizer, New York), which generated £2.9 million in cash upfront. “We are probably now the most active early-stage investor in the UK,” says Susan Searle, CEO of Imperial Innovations. “This may be because the venture capital investors have largely moved upstream, leaving this investment gap that you need to cross—which is where we specialize.” Imperial Innovations is not a typical tech transfer organization. It is a public limited company that raised £26 million in July 2006 when the company listed on London’s Alternative Investment Market and another £30 million in October 2007. It has invested significantly in its portfolio companies, with over £16 million invested in 2009 and nearly £6.0 million so far in 2010. This has meant it can attract co-investors to its portfolio companies. Even then, the current economic climate has made it “more challenging to find investors in this early-stage space,” says Searle. But she is hopeful that more firms will co-invest as more successful exits are made. The technology transfer picture is changing elsewhere, too, in different ways. For instance, more groups of universities are channelling their commercialization efforts through inter-institutional technology management groups. One of the earliest models was the Flanders Institute for Biotechnology (VIB) in Ghent, Belgium, established way back in 1995. Backed with regional government funds, VIB acts both as a funder of research and a commercialization arm for biotech projects from four Flemish universities. Some 15 years later, Wallonia, the French-speaking region of Belgium, is adopting a similar model. WelBio (Walloon Excellence in Life Science and Biotechnology) has received a €15 million ($18.5 million) commitment from the Wallonia government to fund basic
634
research projects at the Catholic University of Louvain, the University of Liege and the French-speaking Université Libre de Bruxelles, and is gearing up to launch soon. Jean Stéphenne, the president and chairman of GlaxoSmithKline (GSK) Biologicals in Rixensart, Wallonia’s largest life science company, says the idea is to create dynamic groups of research that will provide added value in future. “If we generate IP [intellectual property], it will lead to spin-offs and, in the long run, WelBio will become self-financing.” At least initially, WelBio will commercialize only technology arising directly from the €15 million worth of research projects it has funded rather than the broader universities’ research activities. Stéphenne’s colleague at London-based GSK, Pierre Hauser, says that it is still a “relatively touchy” subject for the universities. Facilitating tech transfer through the provision of research funding is undoubtedly a way of winning research cooperation. However, it doesn’t really address the absence of significant earlystage investment. To fill this gap, tech transfer offices are turning to ‘soft’ money. In the UK, for instance, there is some support for translational research from the Wellcome Trust, the UK Strategy Board, Medical Research Council or seed investment funds associated with universities. However, Sam Ogunsalu, principal executive, commercial development at Queen Mary College, University of London, points out that accessing that money means dealing with granting agencies that are inundated with applications. Another evolving tech transfer model is that from PBL Technology, a group established in Norwich, UK, to commercialize the research outputs of some of the UK’s Biological and Biotechnological Sciences Research Council (BBSRC) institutes. PBL has an established reputation in agricultural biotech. As well as commercializing work from BBSRC institutes, PBL’s deal flow emanates from European universities in Belgium, Denmark, Finland, France and Spain as well as further afield in Argentina and the US. PBL’s managing director, Jan Chojecki, points to the fact that PBL can be a broker for single or multiple bits of IP. “If there are two bits of overlapping IP from two different laboratories—not only co-inventions, but also completely synergistic bits of IP—to have someone independent handle things may make it easier to commercialize,” Chojecki argues. “Companies like that,” he says, “because we come with at least a worthwhile package if not the full freedom to operate.” One example is a package of plant gene silencing patents that PBL has pooled from both Yale University and the Sainsbury Laboratory in Norwich. PBL has noticed a much greater interest in its services from university departments. “Perhaps now they are seeing the advantage of having a specialist [in agbio] deal with selected IP,” he adds. John Hodgson Cambridge, UK
volume 28 number 7 JULY 2010 nature biotechnology
news
in brief
format is another problem. Managing the complexity of collaborations that occur when large numbers of institutions and individuals collaborate creates another twist in the road toward open participation. This is particularly worrisome in the light of a recent counterintuitive finding by Jonathon Cummings of Duke University, who studied 491 National Science Foundation–funded research collaborations. He found more can turn into less if the project is not carefully managed. “Our study found that projects with more collaborating institutions, on average, were less likely to have published papers and patent applications compared with projects with fewer collaborating institutions,” says Cummings. Another issue is that although companies theoretically give all their data to open collaboration, “often only select data are permitted to be shared,” says Bruno Boutouyrie, who heads up the F. Hoffmann-La Roche clinical pharmacology central nervous system division, and who
was one of the paper’s co-authors. This leads to the general argument that open source collaboration is probably best arrived at when all parties agree ahead of time what will and won’t be publicly available. But, says Boutouyrie, “controlling the direction of research in a network may be the most important problem.” That is to say: who decides a given line of research is exhausted and one must just move on. Such issues may challenge open collaborations between academia and industry. Thus, whereas industry researchers are used to axing projects if a target turns out to be undruggable or a lead series has unacceptable toxicities or equivocal efficacy, this kind of abrupt stop to a research program creates tensions with university investigators whose graduate students’ funding may be cut in the midst of a PhD. To deal with this “you will have to get some transitional funding arrangement so an institution is not disadvantaged internally,” says Colin Dollery,
in their words
© AAAS
© 2010 Nature America, Inc. All rights reserved.
A unique example of open source collaboration has been struck between Pfizer and Washington University in St. Louis. New York-based Pfizer agreed, in May, to provide university researchers with information on more than 500 drug candidates to give them the opportunity to identify new uses for these compounds. The agreement entitles Washington University to $22.5 million over five years and access to proprietary data, which are not normally released to university groups. “By allowing others to consider the additional use of our compounds, we hope to identify new opportunities for truly unmet medical needs,” says Don Frail, chief scientific officer of Pfizer’s indications discovery unit. The advantage for the academic researchers is that Pfizer’s compounds have been extensively studied and their mechanisms are well understood, shaving off time needed for evaluation. In the new collaboration, when the researchers find a promising new application for a compound, they can propose a research project to Pfizer. The university will have the opportunity to negotiate the commercialization terms for its discoveries. Stephen Strauss Toronto
“There’s no doubt in my mind that this is a major achievement. But is it artificial life? Of course not.” Steen Rasmussen, a professor of physics at the University of Southern Denmark (New York Times, 31 May 2010).
“Synthesizing and cloning a genome with 1.08 million base pairs might seem to be a trivial extension of the 1984 synthesis of a gene containing about 300 base pairs…This paper shows that it was not.” Steven Benner, Foundation for Applied Molecular Evolution, Gainesville, Florida, (Nature, 27 May 2010)
“An interesting result.” The Vatican (CNN, May 22 2010) “I hope very much these patents won’t be accepted because they would bring genetic engineering under the control of the J Craig Venter Institute. They would have a monopoly on a whole range of techniques.” John Sulston, University of Manchester (BBC News, 24 May, 2010) “This milestone and many like it should be celebrated. But has the JCVI created ‘new life’ and tested vitalism? Not really…Printing out a copy of an ancient text isn’t the same as understanding the language.” George Church, Harvard (Nature 465, 422–424, 2010) “A marvelous advance, but it doesn’t immediately open up or enable new studies for the broad community.” James Collins, Boston University (New Scientist, 26 May 2010)
nature biotechnology volume 28 number 7 JULY 2010
Genetic testing clamp down The US Food and Drug Administration (FDA) has told five genetic test manufacturers that their products need the agency’s blessing before they can be sold to consumers. On June 10th, the Consumer takes a agency sent letters 23andMe genetic test. to Illumina, of San Diego, Pathway Genomics also of San Diego, NaviGenics, 23andMe and deCODE Genetics, of Reykjavik, Iceland, explaining that their genetic tests are considered medical devices and must be approved. The FDA had no specific plans to regulate these direct-to-consumer tests until recently when Pathway Genomics announced its intention to market a kit at pharmacy chain Walgreens. Customers would buy the Pathway Genomics’ Insight Saliva Collection Kit at most of Walgreen’s 7,500 stores for $20 to $30 and send their saliva sample to Pathway to undergo what the company terms “comprehensive genotyping.” They could then order individualized Genetic Insight Reports for Drug Response ($79), Pre-Pregnancy Planning ($179), Health Conditions ($179) or a combination of all three ($249). The FDA quickly sent a letter to Pathway stating that agency staffers were “unable to identify any Food and Drug Administration clearance or approval number,” for the kits, a clear indication that they expected to find that. Pathway responded that their laboratory is Clinical Laboratory Improvement Amendments (CLIA)approved, which they believed sufficient. That little scuffle prompted Walgreens to announce that it would postpone offering the kits “until we have further clarity on this matter.” The furor even caught the interest of Congress. The House Energy and Commerce Committee requested information about their tests from Pathway, 23 and Me, of Mountain View, and Navigenics of Foster City, both in California. After many months of regulatory uncertainty, the FDA’s stance is welcome (Nat. Biotech, 27, 875, 2009). All of these companies have been selling such services from their websites for more than a year and will be allowed to continue. But it appears that the agency will no longer be satisfied with just CLIA certification for genotyping facilities, which is how most of these firms operate. According to an e-mail from Dick Thompson of the FDA Office of Public Affairs, “The agency has been meeting with several companies to understand their claims and business models.” The FDA will hold a public meeting on July 19 and 20 to discuss how the agency will oversee laboratory-developed tests. Malorye Allison News.com
New eyes on old drugs
633
NEWS who works for GSK and also argues that closer pharmaceutical industry–academic cooperation is the future of drug development. The over-riding question, however, may well be how to judge the success of precompetitive, open innovation research. It is not easy because “open innovation spans a great variety of models and has become something of a catch-all term,” remarks Lilly’s Munos.
On the one hand, Friend says Sage assumes the present patent structure is staying in place for compounds and biologics, a situation that “would allow companies to have an ability to develop something and have a return on it others couldn’t copy.” Edwards, on the other hand, argues for collaborative precompetitive research going right up to clinical trials. But with the attrition rate so high in drug
discovery and open collaborations still relatively young in terms of drug development timelines, it is hard to track which successes might have resulted from precompetitive research. “We have to develop very structured metrics as opposed to feel good, ‘oh, look we are getting together and working together’ arguments,” remarks Edwards. Stephen Strauss Toronto
© 2010 Nature America, Inc. All rights reserved.
New tech transfer models gain traction with deal flow One view of the acquisition in June of respiratory drug discovery company Respivert by Centocor Ortho Biotech of Horsham, Pennsylvania, is that it is just another commonplace example of an established public biotech company swallowing a minnow. Another perspective is that the deal represents a whole new take on tech transfer, providing seed investors with proof of concept that earlystage life sciences technology not only has value, but also can return value tangibly and quickly. Imperial Innovations, the tech transfer group for Imperial College London, invested a total of £2 ($2.8) million in Londonbased Respivert in 2007 and 2008. The sale of its 13.4% stake in the company yielded £9.5 million in cash, a 4.7-fold return on its three-year investment. It also yielded profits for co-investors, the global firm SV Life Sciences, London-based Advent Venture Partners and Fidelity Biosciences of Cambridge, Massachusetts. Although this is not the first time that Imperial Innovations has profited from the disposal of a biotech asset, it is much more financially significant than the December 2008 sale of its peptide obesity drug firm, Thiakis, to Wyeth Pharmaceuticals (now Pfizer, New York), which generated £2.9 million in cash upfront. “We are probably now the most active early-stage investor in the UK,” says Susan Searle, CEO of Imperial Innovations. “This may be because the venture capital investors have largely moved upstream, leaving this investment gap that you need to cross—which is where we specialize.” Imperial Innovations is not a typical tech transfer organization. It is a public limited company that raised £26 million in July 2006 when the company listed on London’s Alternative Investment Market and another £30 million in October 2007. It has invested significantly in its portfolio companies, with over £16 million invested in 2009 and nearly £6.0 million so far in 2010. This has meant it can attract co-investors to its portfolio companies. Even then, the current economic climate has made it “more challenging to find investors in this early-stage space,” says Searle. But she is hopeful that more firms will co-invest as more successful exits are made. The technology transfer picture is changing elsewhere, too, in different ways. For instance, more groups of universities are channelling their commercialization efforts through inter-institutional technology management groups. One of the earliest models was the Flanders Institute for Biotechnology (VIB) in Ghent, Belgium, established way back in 1995. Backed with regional government funds, VIB acts both as a funder of research and a commercialization arm for biotech projects from four Flemish universities. Some 15 years later, Wallonia, the French-speaking region of Belgium, is adopting a similar model. WelBio (Walloon Excellence in Life Science and Biotechnology) has received a €15 million ($18.5 million) commitment from the Wallonia government to fund basic
634
research projects at the Catholic University of Louvain, the University of Liege and the French-speaking Université Libre de Bruxelles, and is gearing up to launch soon. Jean Stéphenne, the president and chairman of GlaxoSmithKline (GSK) Biologicals in Rixensart, Wallonia’s largest life science company, says the idea is to create dynamic groups of research that will provide added value in future. “If we generate IP [intellectual property], it will lead to spin-offs and, in the long run, WelBio will become self-financing.” At least initially, WelBio will commercialize only technology arising directly from the €15 million worth of research projects it has funded rather than the broader universities’ research activities. Stéphenne’s colleague at London-based GSK, Pierre Hauser, says that it is still a “relatively touchy” subject for the universities. Facilitating tech transfer through the provision of research funding is undoubtedly a way of winning research cooperation. However, it doesn’t really address the absence of significant earlystage investment. To fill this gap, tech transfer offices are turning to ‘soft’ money. In the UK, for instance, there is some support for translational research from the Wellcome Trust, the UK Strategy Board, Medical Research Council or seed investment funds associated with universities. However, Sam Ogunsalu, principal executive, commercial development at Queen Mary College, University of London, points out that accessing that money means dealing with granting agencies that are inundated with applications. Another evolving tech transfer model is that from PBL Technology, a group established in Norwich, UK, to commercialize the research outputs of some of the UK’s Biological and Biotechnological Sciences Research Council (BBSRC) institutes. PBL has an established reputation in agricultural biotech. As well as commercializing work from BBSRC institutes, PBL’s deal flow emanates from European universities in Belgium, Denmark, Finland, France and Spain as well as further afield in Argentina and the US. PBL’s managing director, Jan Chojecki, points to the fact that PBL can be a broker for single or multiple bits of IP. “If there are two bits of overlapping IP from two different laboratories—not only co-inventions, but also completely synergistic bits of IP—to have someone independent handle things may make it easier to commercialize,” Chojecki argues. “Companies like that,” he says, “because we come with at least a worthwhile package if not the full freedom to operate.” One example is a package of plant gene silencing patents that PBL has pooled from both Yale University and the Sainsbury Laboratory in Norwich. PBL has noticed a much greater interest in its services from university departments. “Perhaps now they are seeing the advantage of having a specialist [in agbio] deal with selected IP,” he adds. John Hodgson Cambridge, UK
volume 28 number 7 JULY 2010 nature biotechnology
news
© 2010 Nature America, Inc. All rights reserved.
Sequencing firms vie for diagnostics market, tiptoe round patents
French IPO spate STEVE GSCHMEISSNER / SCIENCE PHOTO LIBRARY
Genome sequencing companies are moving into clinical diagnostics, with the number of deals soaring, despite an uncertain patent landscape. This past April, Cambridge, Massachusetts– based personal genomics company Knome announced a strategic partnership with French company bioMérieux to develop sequencingbased in vitro diagnostics. A few weeks later, Helicos Biosciences, also based in Cambridge, restructured its financially struggling business to focus on diagnostic applications for its sequencing platform. Industry leaders Illumina and Life Technologies are also racing to apply their ‘next-generation sequencing’ platforms to the investigation of cancer and other diseases. At the same time, issues around patent ownership are being put aside, at least for the moment, in the deal-making flurry. The idea of using genome sequencing as a diagnostic tool is catching on fast. In May, a collaboration between S. San Francisco– based Genentech and Complete Genomics of Mountain View, California, revealed a staggering 50,000 single-nucleotide genomic mutations in a tumor from the lung of a heavy smoker that were absent in unaffected lung tissue (Nature, 465, 473, 2010). In another recent study, Victor Velculescu’s team at Johns Hopkins Medical Institute in Baltimore partnered with Carlsbad, California–based Life Technologies to identify genomic translocations in colorectal and breast tumors that proved suitable as patientspecific biomarkers (Sci. Transl. Med., 2, 20ra14, 2010). In June, Life Technologies spearheaded The Genomic Cancer Care Alliance—a collaboration between the company and the Fox Chase Cancer Care Center, in Philadelphia, the Scripps Genomic Medicine in San Diego, and the Translational Genomics Research Institute in Phoenix, Arizona, to study whether wholegenome sequencing can help guide treatment decisions in oncology. “In some ways, I think this has probably surprised all of us in the industry, and certainly me,” says Shaf Yousaf, division president of molecular and cell biology at Life Technologies. He and others credit changes in price and throughput as the primary drivers. The price point of sequencing of individual genomes has fallen below $10,000 across many platforms, as manufacturers and service providers slash prices with the fervor of salesmen on a car lot. In parallel, these systems now deliver complete sequences in under a week. “We’re getting to the point where a genome can be extracted in a single experiment in a short time at an affordable cost and at increasingly high quality and repeatability,”
in brief
Smoker’s lung tumors contain up to 50,000 single-nucleotide mutations. Sequencing offers an entirely new approach to cancer diagnosis, and manufacturers are jumping into the space.
says David Bentley, chief scientist and vice president at San Diego–based Illumina. In June, the company announced the launch of its individual genome sequencing service, which costs $19,500 but drops to $14,500 if a physician orders five or more at a time, and to $9,500 if an individual has a serious medical condition. Meanwhile, newcomers like Pacific Biosciences are promoting ‘single-molecule’ sequencing systems that offer longer readlengths and faster turnaround times, although many of these instruments are still awaiting formal release. In June, Harvard University spinout GnuBio shook up this year’s Consumer Genetics Conference by announcing plans for a microfluidics-based system capable of turning out a full human genome for around $30. This fast and furious price-slashing suggests the ingredients may soon be in place for an entirely new approach to diagnostics. “At Massachusetts General Hospital, they’re already doing genotyping for every tumor,” says Ari Kiirikki, vice president of sales and business development at Knome. “There’s no doubt that when the cost becomes a little bit more reasonable, they’ll sequence every single tumor and sequence it multiple times throughout the course of treatment.” This enthusiasm, however, is increasingly tempered by awareness of a potential intellectual property (IP) minefield. Nearly 30 years of gene patenting have enabled individuals and institutions to lay claim to an estimated 20% of known human genes—and at least one study suggests this is an underestimate (Science 322, 198, 2008). More importantly, these patents diverge wildly in terms of specified claims, ranging from isolated cDNA or genomic sequences to diagnostic platforms. The restrictions enacted by these patents also
nature biotechnology volume 28 number 7 JULY 2010
Three French companies have floated on the stock market in rapid succession, in what appears to be a sign of financial maturity and investor interest in the local biotech sector. On April 21, Paris-based Neovacs listed on the New York Stock Exchange Alternext (part of NYSE Euronext for small and mid-sized companies). Industrial biotech Deinove, of Paris, floated next, on April 27, and within a month, medtech concern Carmat of Vélizy Villacoublay began the initial public offering (IPO) process, expected for July. The listings are surprising, given investors’ current reluctance to bankroll small and medium-sized firms. “Selected top-notch companies can IPO even in shaky markets,” says Philippe Pouletty, who sits on the board of the three companies and is managing partner for private equity firm Truffle Capital, Paris. What they have in common, he says, is “Strong proprietary technology, major product candidates for large markets, experienced management teams and committed historical investors wanting to reinvest upon IPO.” Neovacs is developing vaccine-induced polyclonal antibody therapies. Deinove is exploiting Deinococci bacteria to develop biofuels and Carmat is developing an implantable artificial heart for heart failure. “In France, the past crunch has not significantly affected the ability to raise capital for mature biotechs,” says France Biotech director, Andre Choulika. “The downturn in private rounds is more worrying.” Emma Dorey
Industrial biotech to boom? In the next 20 years industrial biotech will surge, according to a new analysis of The Organization for Economic Cooperation and Development (OECD). The report, entitled The Bioeconomy to 2030, forecasts that biotech will grow from the current 0.5–1% to 2.7% of gross domestic product, driven mostly by industrial biotech. “We should really be concentrating on industrial and agricultural biotech because these are areas that are going to be extremely important in the future,” says report co-author David Sawaya, of the Paris-based OECD. Industrial biotech will contribute 39% to the sector agriculture, 36% and health, 25%. The numbers, however, are at odds with current R&D investment where 87% is focused on health and 2% on industrial applications. The report’s potential weakness is that the data predate the economic crisis. The statistics were sourced from a 2008 US Department of Agriculture report, and these were, in turn, based in part on a 2005 presentation by Rolf Bachmann, then an analyst at global management consulting firm McKinsey & Co. To meet the report’s predictions, the current 2% contributed by bio-based materials to the industrial chemical economy must rise tenfold. Growth will depend on rapid developments in fermentation techniques, favorable environmental legislation and high oil prices pushing for cheaper alternatives. “There might have been a bit of over-enthusiasm initially,” says Jens Riese a partner at McKinsey, Bachmann’s collaborator at the time, “but the overall trend is heading there.” Daniel Grushkin
635
NEWS
in brief
© 2010 Nature America, Inc. All rights reserved.
Merck ditches biogeneric Merck of Whitehouse Station, New Jersey, has halted development of its lead biogeneric product, MK-2578, a PEGylated erythropoietin-stimulating agent for treating anemia. The decision, announced on May 11, followed a request from regulatory authorities for a cardiovascular outcomes assessment, an expensive and timeconsuming process, says Peter Kim, president of Merck Research Laboratories. MK-2578, in phase 2 trials, was Merck’s most advanced biosimilar—similar to Amgen’s blockbuster Aranesp (darbepoetin alfa). “Other biosimilars counterparts will have to face [similarly] strict regulatory hurdles,” says Swetha Shantikumar, research associate at Frost & Sullivan, Chennai, India. The difficulties may dissuade small and medium-sized companies from developing biosimilars, but large companies remain undeterred. Merck itself has two other biogeneric candidates, MK4214 a G-CSF (granulocyte colony stimulating factor) and MK6302 (a recombinant pegylated G-CSF), in development. Moreover, the news boosted share values for Affymax in Palo Alto, California, which is developing a competitor product to treat anemia. And Samsung, of Seoul, South Korea subsequently announced plans to invest about $1.72 billion in biosimilars, hoping to take advantage of biologics patent expiries expected by 2016. Merck’s decision does not change the dynamics of the biosimilars market, says Shantikumar. “It is a definite reminder that it is strikingly different from the traditional generics market.” Emma Dorey
Investors fight Charles River/ WuXi merger In a vote of confidence for China, leading outsourcing company, Charles River Laboratories (CRL) of Wilmington, Massachusetts, plans to spend $1.6 billion to buy Chinese contract research organization WuXi PharmTech of Shanghai. The transaction will create the first global contract research company to offer a fully-integrated drug development service, from molecule creation to early clinical studies. But activist hedge fund Jana Partners, Charles River’s largest shareholder, is arguing that the price paid for WuXi is unjustified and intends to stop the merger. Should the deal go ahead, “The new company will be able to provide lowercost services, though price is probably the least important metric—more significant are quality, know-how and full-service capabilities,” says Ross Muken of Deutsche Bank Securities in New York. “There have been quality issues in China in the past, but with support of the Chinese government these have improved.” Companies engaging these integrated services will also gain better access to the booming Chinese market. “Carrying out R&D in China will speed up Chinese drug launches and allow companies to optimize therapeutics for Asian people,” says Johnny Huang of Frost & Sullivan. Some analysts have suggested that WuXi’s animal testing facility will attract companies that no longer want to face Western animal rights campaigners, but Muken does not believe this to be a deciding factor. Suzanne Elvidge
636
vary widely. A recent study from the Catholic University of Leuven in Belgium analyzed European and American patent families pertaining to the diagnosis of 22 different genetic disorders. Their findings revealed that of the 145 gene patents examined, 35 contained a ‘blocking claim’ that is impossible to circumvent with an alternative diagnostic strategy (Nat. Biotechnol. 27, 903, 2009). “If you read somebody’s DNA sequence and gave them information about their sequence related to a disease—that is, if you did whole-genome sequencing—you would be infringing at least one patent in each case for those 15 [medical] conditions,” says Robert Cook-Deegan, director of the Duke Institute for Genome Sciences and Policy in Durham, North Carolina. The recent ACLU v. Myriad decision, which rejected Myriad Genetics’ claims on isolated sequences for breast cancer risk factors BRCA1 and BRCA2 as well as methods for identifying mutations in those genes, has garnered much press in this regard. “It challenges one of the fundamental premises of biotechnology patents, which is that you can just go and patent genes,” says Daniel Vorhaus, an attorney at Robinson, Bradshaw & Hinson and editor of the Genomics Law Report website. Although the decision stunned many in the patent law world, its impact remains limited to Myriad’s patents, and it will almost certainly be appealed, and possibly overturned. The true ‘main event’ in diagnostic IP law, some observers believe, is ‘association patents’. “Some of the disease-association patents are much more broadly written and problematic for some of these next-generation [sequencing] applications,” says Vorhaus. The Supreme Court has yet to rule on so-called association patents, which link a biological state with a medical condition. The only exception is a nonbinding dissent filed in 2006 by Justice Stephen Breyer in LabCorp v. Metabolite, where he argued against the validity of a claim for an assay of homocysteine levels as a means for gauging vitamin B deficiency on the grounds that this association was an unpatentable natural phenomenon. The Supreme Court refused to hear that case, but will soon issue a highly anticipated decision on an equally relevant case, In re Bilski. Although this case relates to patentability of business methods, it has clear relevance for clinical diagnostics; the Federal Circuit decision established a test for such patents requiring that any patentable method must employ a “machine or transformation,” and although the meaning of this phrase remains ambiguous, it could theoretically prohibit patents based on mere identification or comparison of naturally occurring entities, such as DNA sequences (Nat. Biotechnol. 27, 586–587, 2009).
The Bilski decision could also constrain the controversial 5,612,179 patent held by Genetic Technologies in Fitzroy, Australia. This patent, recently upheld by the US Patent and Trademark Office, covers any amplification-based sequencing of intronic DNA sequences, and cases of perceived infringement have been vigorously litigated by the company—most recently against Beckman-Coulter and eight other defendants this past January. “These are method claims and they are quite broad,” says Cook-Deegan. “But they would not necessarily be infringed by all forms of full-genome sequencing; single-molecule sequencing almost certainly would not infringe because it entails no amplification step.” The current system is not popular with the Secretary’s Advisory Committee on Genetics, Health, and Society (SACGHS) for the US Department of Health and Human Services. Thee SACGHS has issued a draft report in February (Nat. Biotechnol. 28, 381, 2010) that explicitly defends gene patents, but calls for exemptions against infringement liability for patient care purposes or for research. These recommendations, which have been condemned by the Biotechnology Industry Organization (Washington, DC) as having the potential to “do more harm than good,” are unlikely to change patent policy. But they may stir the industry to take the initiative for reform. Given that most grievances surrounding gene patents are actually condemnations of business practices related to licensing and litigation, reforms may arise from companies hoping to avoid messy, unpopular lawsuits. “I don’t think that any company wants to be in the position of losing the PR battle the way Myriad has been for years,” says Cook-Deegan. Patent pools or clearinghouses represent one opportunity for compromise, as in a plan recently put forward by Larry Horne, CEO of MPEG-LA, for a ‘supermarket’ for the simple, nonexclusive licensing of patents related to specific disorders. This could ensure a modicum of profit for patent-holders while expanding IP access, but constructing such a system will not be easy. An important consideration, however, is that much of the unique power of whole-genome sequencing lies in sophisticated data analysis, and that this is likely to spur previously unforeseen business models and categories of IP in the diagnostic sector. “In the future, when you can do a whole genome within hours in a doctor’s office, our service of shipping things around the world won’t make sense—we’ll have to become a software company,” says Knome’s Kiirikki. “And because it’s digital it’s going to grow exponentially and be exciting and it will have speed bumps, but there will be all kinds of things we can’t imagine now.” Michael Eisenstein Philadelphia
volume 28 number 7 JULY 2010 nature biotechnology
news
in brief
© 2010 Nature America, Inc. All rights reserved.
Microcap public biotechs access new pool of VC funding
Laughing Stock/Corbis
Venture capital (VC) firm Abingworth Management has invested €33.1 ($40.6) million in a public German diagnostics firm Epigenomics in a deal known as a VIPE—a venture investment in public equity. The late March offering aimed to help the company build its commercial infrastructure to launch a novel blood-based colon cancer detection kit (Nat. Biotechnol. 27, 1066, 2009). This sizeable investment in a company that has already ‘gone public’ is unusual, because VC firms have traditionally focused earlier in the company creation process, funding a portfolio of startups. But with poor historical returns and a lack of current exits—either through a sale to another firm or an initial public offering (IPO)—VC firms are now preferring to invest in more mature, publicly traded companies, the share prices of which have slumped since the Lehman Brothers crash. “There are opportunities in the public markets where biotech as a sector has been beaten up badly on share valuations,” says Jamie Topper, general partner at VC Frazier Healthcare Ventures in Menlo Park, California. “The quality players have been hit along with the dross.” Some leading VC firms—such as Abingworth, located in London, Venrock of Palo Alto, California, ProQuest, of Princeton, New Jersey as well as Frazier, with offices in Seattle and Menlo Park, California—are now switching their interest away from privately held startups toward these later-stage public firms. It is not entirely a new strategy: private investments in public equity (PIPEs) have been around for years. In a typical PIPE, the private equity firm identifies an undervalued company, invests a small amount for the short term and waits for the share price to recover before exiting at a profit. The drawback of the PIPE strategy for venture capitalists has been the difficulty of identifying prime candidates. Biotech shares were not always so wildly underpriced as they are now, and most such companies typically need several more financings before their share prices show a worthwhile gain. Under these circumstances, the risk of failure at some intermediate point, such as a disappointment in the clinic, is high. “So the shares of these companies typically traded sideways, or more often down, as events played out,” says David Pinniger of SV Life Sciences Advisers, London, a leading British VC company. This slump was exacerbated as hedge funds preyed on the company stocks, finding them relatively easy targets for short-selling (that is, betting that the price will drop). For these reasons, Pinniger
Genzyme partners TJAB
VC firms are searching for biotech pearls in an undervalued public market.
reckons returns to VC life-science specialists from PIPEs have been very poor so far. “Most are likely to have lost a significant amount of capital over the past five years,” he says The attractiveness of such investments in public companies is now increasing, though, because the valuations of many early-stage, publicly listed biotech companies are so low and many companies are in dire need of cash. To take advantage of this, several VC groups are reinventing the PIPE as the VIPE (Table 1). Under a typical VIPE arrangement, a VC syndicate does a very large fundraising— enough to see the company through the critical development phase to proof of concept, or until it reaches a major milestone where there is a significant uptick in valuation, such as partnering or product approval. This could be several years down the line, says Pinniger. At that point, the idea is that venture capitalists will be able to cash in their holdings at perhaps 2.5 to 3 times the amount they originally paid. The profit is amplified because, when they first invested, company stock would have been bought at a discount to the already heavily undervalued market price. “This can provide a lifeline for earlier-stage public biotech companies with high-quality assets,” says Pinniger. But the capital often comes at a price. “Venture capitalists are able to extract very aggressive terms for these financings, often more or less wiping out the value of investments held by company’s existing investors and perhaps also the company founders,” warns Pinniger. One VC firm that has aggressively pursued VIPE financings in biotech is Abingworth. “A lot of [investor] money has gone out of the stock market sector consisting of small, risky companies, leaving a lot of them in a very sticky position with inadequate cash reserves,” says Abingworth’s Joe Anderson. “But there are some very compelling development programs in that sector.” Abingworth began its VIPE foray in
nature biotechnology volume 28 number 7 JULY 2010
Genzyme of Cambridge, Massachusetts and the Tianjin International Joint Academy of Biotechnology and Medicine (TJAB) in China agreed last month to form a partnership that will bring Genzyme’s products to China. TJAB, co-founded by a public consortium of federal and municipal governments, opened officially in 2009. Its brand-new public biotech platform was built to accelerate the process of biological discovery through to clinical trials. Richard Gregory, Genzyme’s head of research, cites TJAB’s creative thinking and systematic approach as incentives for partnership. From the collaboration, Genzyme hopes to capture innovation from the ground up, while offering TJAB the industrial experience they currently lack. The partnership may also generate future employees for Genzyme and help consolidate the company’s presence in China, says Gregory. Genzyme has invested $70 million in a major R&D facility in Beijing, the Zhongguancun Life Science Park, and sponsors academic groups across the country. Roger Xie, head of TJAB’s US Operation, said that working with Genzyme “will be a giant step moving forward.” Genzyme may kick-start TJAB’s pipeline by offering several products already in preclinical and clinical development, and Xie expects that many jointly developed therapies will be relevant for patients worldwide. Details of the partnership, including financial incentives, are still under discussion. Jennifer Rohn
China’s heparin billionaires On May 6, Li Li and his wife Li Tan became China’s richest couple when their company, Hepalink Pharmaceutical, in Shenzhen, floated on the Shenzhen Stock Exchange. Although they lost the crown soon after, when stock prices slumped in mid-May, this is the first time the biopharma sector has produced China’s top billionaires. When stocks surged to 148 yuan ($21.80) per share—the highest on record for a Chinese stock—the Li couple’s 70% ownership was valued at 46.5 billion yuan ($6.8 billion). Hepalink is China’s largest producer of the bloodthinning heparin, a drug sourced and purified from pig intestines. Analysts commented that the hike in Hepalink’s share price shows that investors are still optimistic about the sector despite the contamination debacle in 2008 that linked over 80 deaths to heparin sourced in China and manufactured by Baxter of Deerfield, Illinois (Nat. Biotechnol. 26, 477–478, 2008). Although most Chinese heparin producers have been beset by trouble since then, investors’ enthusiasm for Hepalink possibly reflects the fact that it is currently the only Chinese company approved by both the US Food and Drug Administration and the European Directorate for the Quality of Medicines and HealthCare to export heparin. But Zhaohui Peng, former president of Shenzhen-based SiBiono GeneTech, notes that to maintain their fortune, the Li couple must invest in developing new drugs, because the technological threshold to produce heparin is too low to fight off competitors. Hepeng Jia
637
NEWS Table 1 How PIPEs differ from VIPEs Investment aspects
VIPE
Typical new-investor profile
Specialized institutional; private equity
Venture capital syndicate
Size of financing
Intermediate, with further financings expected later
Large-scale, taking company through development stages
Share of company equity taken 5–10%
20–50% (including warrants)
Time to exita
1–2 years
5–7 years
Exit strategy
Unload shares on open market after lock-up Exit at key inflection point, e.g., period trade sale or licensing agreement
Target exit multiple
50–100%
2.5–3×
Role of investors in management Passive
Active
Impact on existing investors
Highly diluting
Moderately diluting
on investment.
February 2009 when it joined a $35 million fundraising by Algeta, a Norwegian biotech company based in Oslo. Last October, it participated in a similar fundraising by Amarin of Dublin worth $70 million. Then, in April this year, it sealed its third VIPE deal with Epigenomics. For a microcap public biotech, one of the advantages of receiving a VC investment, says Anderson, is that it raises the company’s profile and improves its negotiating position. Take Algeta: it had good phase 2 data on its radiopharmaceutical therapeutic, yet at that stage no pharma company was prepared to pick up the assets, because of Algeta’s weak cash position. But once cash had been inserted in the financing round of March 2009, to which Abingworth contributed its VIPE funding, Algeta could progress to phase 3. This enabled it to attract a pharma partner, Bayer, of Leverkusen, Germany. Algeta signed a big licensing deal on favorable terms, says Anderson. “If they’d been in a weaker position, they might have been driven by expediency [to make a less favorable deal],” he notes. Another biotech company that took $35 million of private equity cash last October in a financing round that included VC firm Frazier is Threshold Pharmaceuticals, located in Redwood City, California. The biotech’s CEO Barry Selick is upbeat about the new VC interest in post-IPO companies. “It has increased the pool of potential funding for our companies and driven competition for deals, which I believe has led to better financing terms,” he says. Previously, he says, only a very few institutional investors were willing to invest in microcap biotechs, and they were very choosy. Moreover, he says, an investment from a high-quality VC firm is important validation of a company’s prospects to the financial markets in general. Company management, however, must also be prepared for the additional strings that come attached to VIPE funding and the complexities of close ties with VC firms. Thus, according to Selick, “VC investors also tend to want to play an active role in advising and helping to build the company, and they are generally quite good at it,” but they also want to preserve their ability to 638
trade stock, which raises insider trading issues. “In some cases, we will bring an investor ‘over the wall’, with a confidentiality agreement that prevents them from trading our shares [while sensitive issues are being resolved],” says Selick. In other cases, the VC investor is such a valuable asset that he joins the biotech firm’s board and is thus automatically bound by rules governing commercial confidentiality and share trading restrictions. Abingworth concedes, however, that fundraising via the VIPE route may not be to every biotech’s taste. “We want our company to raise a substantial sum to get to the endgame, and not just a sum sufficient to get them through to the next stepping stone,” Anderson says. “Not all companies want to do that when their share price is still at a very low level, because of the [severe] dilution for existing shareholders.” VC firms’ demands for extra ‘warrants’ exacerbate the dilution effect. (Warrants are options for the firm to take up yet more shares in future, at a favorable price.) “The warrant
coverage is sometimes as high as 100% in these deals,” says Pinniger. “That can often put off new investors, as the true current value (or cost for a prospective new buyer) is increased significantly.” On the other hand, Frazier’s Topper points out that a VC concern would only convert the warrants if the company is succeeding and the stock has gone up, limiting the damage to other investors. Another risk emerges if a VC investor distributes the shares directly to its limited partners (LPs), rather than husbanding them and distributing the proceeds in a thoughtful and controlled fashion. “When this happens, there is always a risk that the LPs will sell the stock in a less organized fashion and cause pressure on the stock price,” says Selick, noting that some institutional investors may be sensitive to this. Another downside of a VIPE is that a sudden sale of a large chunk of company stock could affect the liquidity of the remaining shareholders. Frazier’s Topper concedes that this is a possibility but again stresses that VC exits typically occur only when the company’s stock is riding high. Moreover, exit instability is limited by the fact that venture capitalists prefer not to take too large a holding in public equity; for example, Abingworth has acquired ~20% in each of its three VIPEs so far, whereas Frazier has taken only 5–10%. “Overall, I think, the benefits provided by VC investors far outweigh the perceived risk,” says Selick. “And suddenly the range of biotech companies that can secure financing has broadened considerably.” Peter Mitchell London
Italian GM rebels Libertarian farmer Giorgio Fidenato and former journalist, Leonardo Facco, have sown six genetically modified seeds in an act of civil disobedience. Fidenato, who grows conventional corn, is one of a few hundred farmers wanting to plant genetically modified crops in Italy. The MON810 variety seedlings are growing in an undisclosed site near Vivaro, in the north of Italy, and their progress is being posted on YouTube. Although MON810 is approved for planting in the EU, it is still unclear whether the six GM plants are legal, since the Italian Ministry of Agriculture never authorized the sowing but neither did it invoke a safeguard clause in directive 2001/18 to enforce a ban. The symbolic harvest is expected for mid-September and will be displayed on YouTube (http://www.youtube.com/watch?v=JS7nEDL3CzE). Anna Meldolesi
G. Fidenato
aReturn
© 2010 Nature America, Inc. All rights reserved.
PIPE
volume 28 number 7 JULY 2010 nature biotechnology
news
PAT GREENHOUSE/Boston Globe/Landov
NEWS maker Agios Pharmaceuticals
© 2010 Nature America, Inc. All rights reserved.
Agios has brought cancer metabolism into vogue and is making hay from tumor cells’ well-known hunger for glucose. For a three-year-old company to land a licensing deal worth $130 million upfront is surprising, even more so when the assets are preclinical. But in April, the fledgling Agios netted the head-turning deal with Celgene of Summit, New Jersey, based on a biochemical observation about cancer cell metabolism that dates back 85 years. Founders Craig Thompson at the University of Pennsylvania, Tak Mak of the University of Toronto and Lewis Cantley, transduction expert from the Harvard Medical School and Beth Israel Deaconess Medical Center in Boston, set up Agios in 2007 with headquarters in Cambridge, Massachusetts. They rekindled a discovery made by Otto Warburg in 1924 that virtually all malignant cells choose aerobic glycolysis, an inefficient way to burn glucose that yields two net ATP molecules, rather than the usual oxidative phosphorylation that yields 30. Because tumor cells need to consume more glucose to maintain ATP levels, glucose withdrawal seemed a promising therapeutic route. But despite manifold efforts, blocking glycolysis failed to yield any anticancer agents. So it is perhaps surprising that, in 2004, Thompson applied for a National Cancer Institute grant to study how rapid uptake of glucose triggers a fundamental reprogramming of metabolism in cancer cells. Mak, a molecular biologist and immunologist, was asked to conduct a site visit to assess the application. Thompson recalls how Mak hated the idea, and returned to his Toronto lab determined to quash the notion. Two years later, however, Mak had become a convert to the cause. What is more, he had generated a significant body of evidence to show that tumor cell metabolism would be an ideal target for a new generation of anticancer agents that would have little impact on healthy cells. In 2007, Mak and Thompson organized a symposium on the subject at the American Association for Cancer Research National Meeting in Los Angeles. One of the speakers was Lewis Cantley, from Harvard Medical School, who had discovered the phosphoinositide-3 kinase pathway that Thompson believed was an important link between metabolism and malignant transformation. Relaxing at a table after the session, the three scientists began throwing ideas around and before the night was over, had the basics of a company sketched out on a napkin. Taking advantage of a connection that
Mak had to angel investors, the three procured “a couple million dollars” and hired Shin-San Su from the Biomedical Engineering Research Laboratory in Taiwan to run Agios’ scientific efforts and help the three scientific founders to develop a business plan. In 2008, the Agios team used a proteomic screen for phosphotyrosine binding proteins using stable isotope labeling of amino acids (SILAC) of HeLa cell culture lysates followed by flow over phosphotyrosine/unphosphorylated peptide library affinity matrices and analysis by liquid chromatography tandem mass spectrometry (Nature 452, 181–186, 2008). The screen showed, for the first time, a mechanistic link between an enzyme involved in glucose metabolism—the phosphotyrosine-binding pyruvate kinase M2 isozyme (PKM2)—and tumor cell growth. With these early data in hand, Agios headed to the venture well and in July 2008 closed $33 million in Series A funding. The cash provided by Boston-based Third Rock Ventures, Flagship Ventures in Cambridge, Massachusetts and ARCH Venture Partners, from Chicago, allowed Agios to move forward quickly. Last November, Agios researchers also reported an association between a single amino acid substitution in the isocitrate dehydrogenase 1 (IDH1) enzyme and the development of brain cancer (Nature 426, 739–742, 2009). And just a few months later, the company clinched the lucrative deal with Celgene, the latter taking an exclusive option to license any clinical candidates from their discovery and early development work at the end of phase 1 trials. Agios could receive a further $120 million in milestones plus royalties on each licensed program, and the option to codevelop and co-market the products. Agios’ platform combines large-scale metabolomic profiling to initially identify enzymes, followed by a genetic approach to search for mutations in the pathway and X-ray crystallography to identify a specific site on the enzyme. In the 2009 Nature paper, they show that three different mutations in arginine 132 of the IDH1 enzyme result in an entirely new function capable of reducing α-ketoglutarate to R(–)-2hydroxyglutarate. Concentrations of the latter molecule increase by a 100-fold in human brain tumors that contain mutant IDH1 (~70 % of gliomas and glioblastomas), pointing to a common pathogenic mechanism and the potential
nature biotechnology volume 28 number 7 JULY 2010
Agios founders (left to right): Lewis Cantley, Craig Thompson and Tak Mak.
of IDH1 as a therapeutic intervention point. Rather than sequence the IDH1 gene to identify mutations, however, Agios plans to screen for metabolites, such as R(–)-2-hydroxyglutarate. David Schenkein, who was senior vice president for clinical hematology/oncology at Genentech before joining Agios, says that the company intends to base go/no-go decisions on the availability of biomarker-therapeutic pairs, in a similar manner to the S. San Francisco big biotech, now part of Roche. Agios is also developing imaging agents to visualize tumors by identifying hotspots of metabolic activity. For instance, the glucose analog 2-[18F]-fluoro-2-deoxy-d-glucose as a radioactive tracer is taken up rapidly but metabolized slowly by cancer cells—the very effect Warburg described. This can be easily quantified using positron emission tomography (PET) to provide a real-time assessment of tumor metabolism. Targets disclosed thus far by Agios and Celgene are IDH1 and PKM2, but others will be added, says Schenkein. According to Matthew Vander Heiden, who is at the Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, in Boston, the firm’s biggest challenge will be to stay focused on their strength and resist sliding back into a traditional gene-biotech company. Vander Heiden notes that many researchers are now looking back and realizing that maybe they didn’t learn all the biochemistry in the 1940s, and that there is still some hay to be made. Matej Orešič, systems biology and bioinformatics professor at the VTT Technical Research Centre of Finland, believes that despite past failures, with the emergence and maturing of metabolomics, there is a strong case for looking into cancer metabolism. He points out, however, that although IDH1 is a promising case study, it is still very far from the clinic. Given the systemic complexity of metabolism, says Orešič, the case for the Agios therapeutic strategy will be much stronger once demonstrated in a physiological setting. Joe Alper with additional reporting by Lisa Melton
639
data page
Drug pipeline: Q210 Wayne Peng New drug approvals were off to a slow start in 2010 but addressed indications outside the usual areas. In April, the first autologous cell therapy, Provenge (sipuleucel-T), was approved, and last month, Amgen’s RANK ligand antagonist (denosumab) was also registered for marketing.
Fingolimod, the first synthetic sphingosine-1-phosphate agonist in multiple sclerosis, was given a favorable recommendation, and positive trial data came in for the antisense drug, mipomersen, as well as ipilimumab, epratuzumab and pertuzumab, which addresses a new epitope on HER2.
FDA approvals by therapeutic indication
Notable trial results (Mar–Jun 2010)
1 1 2 3 3 1
35 30 25 20 15 10 5
08 20 15 09 ,2 01 0
07
Oncology Infectious disease Neurology Cardiovascular Immunology Endocrine Metabolic Opthamology Gastroenterology Psychiatry Respiratory Rheumatology Other
1– Ju n
20
06
20
05
20
04
20
03
20
02
20
01
20
00
20
20
19
99
0
98
Number of FDA approvals
40
19
Year
Ja n
© 2010 Nature America, Inc. All rights reserved.
Oncology, infectious, neurological and cardiovascular diseases have been absent from drug approvals this year
Source: U.S. Food and Drug Administration Center for Drug Evaluation and Research (FDA CDER); BioMedTracker, a service of Sagient Research (http://www.biomedtracker.com/).
Notable regulatory approvals (March–June 2010) Drug name Prolia (denosumab)
Indication Post-menopausal osteoporosis
Provenge (sipuleucel-T)
Prostate cancer, castrationresistant Menveo (MenACWY-CRM Meningococcal disease prevenvaccine) tion for adults age 11–55 Lumizyme (alglucosidase Pompe disease alfa)
Company Amgen
Approval FDA, 6/1/10; EMA, 5/28/10 Dendreon FDA, 4/29/10 Novartis Genzyme
FDA, 2/22/10; EMA, 3/18/10 FDA, 5/25/10 (sBLA)
Source: BioMedTracker, a service of Sagient Research (http://www.biomedtracker.com/). sBLA, supplemental Biologic License Application; FDA, US Food and Drug Administration; EMA, European Medicines Agency.
Notable regulatory setbacks (Mar–Jun 2010) Drug name Naproxcinod (nitronaproxen)
Indication Pain, arthritis
Belatacept (LEA29Y) Kidney transplantation rejection
Albinterferon alfa-2b Hepatitis C (Zalbin, a.k.a. Albuferon or Joulferon)
Cerepro (sitimagen ceradenovec)
Malignant glioma
Company NicOx
Setback summary 5/12/10 FDA advisory panel meeting voted 16 to 1 against approval. In phase 3 trial, naproxcinod treatment was superior to placebo (primary endpoint) but failed to achieve statistical noninferiority compared with secondary endpoint naproxen (Aleve) (Osteoarthritis and Cartilage 18, 629–639, 2010). Bristol-Myers 5/1/10 FDA complete response Squibb letter requested 36-month data from the ongoing phase 3 study. The initial BLA filing included only 24-month data. Human 4/19/10 marketing authorization Genome application (MAA) withdrawal Sciences/ due to unfavorable EMA opinion. Novartis FDA issued unfavorable discipline review letter on 6/14/10. Ark 3/9/10 MAA withdrawal due to Therapeutics unfavorable recommendation from EMA advisory panel, following MAA resubmission in 02/10. FDA response to BLA expected in 06/10.
Source: BioMedTracker, a service of Sagient Research (http://www.biomedtracker.com/). BLA, biologic license application.
640
Company/ drug name Bristol-Myers Squibb/ Ipilimumab
Genzyme–Isis Pharmaceuticals/ Mipomersen, s.c. (ISIS-301012) Tolerx– GlaxoSmithKline/ Otelixizumab (ChAglyCD3) UCB– Immunomedics/ Epratuzumab
Vical/ Velimogene aliplasmid (Allovectin-7) MolMed NGR-hTNF (Arenegyr)
Indication Metastatic melanoma
Result summary Phase 3 study showed monotherapy or combination with gp100 peptide vaccine significantly prolonged overall survival (primary endpoint) from 6.4 months to 10 months (New Engl. J. Med., published online, doi:10.1056/NEJMoa1003466, 5 June 2010). Homozygous Phase 3 study met primary endpoint (low-density familial hyper- lipoprotein (LDL) cholesterol concentration decrease cholesterolin treatment versus placebo; P < 0.003) as well emia as secondary and tertiary endpoints (Lancet 375, 998–1006, 2010). Diabetes Although primary endpoint (suppression of rise mellitus, in daily insulin requirement) was not met in all type I subgroups, phase 3 study showed efficacy over 48 months, depending on patient’s age and initial beta cell function (Diabetologia 53, 614–623, 2010). Systemic lupus Phase 2b study showed clinically meaningful erythematosus improvements in patients with moderate to severe (SLE) SLE (Abstract for 2010 Annual Congress of the European League Against Rheumatism, 16 June 2010). Metastatic High-dose therapy well tolerated in single-arm, openmelanoma label phase 2 study, with 11.8% response rate among 127 patients. Melanoma Res. 20, 218–226, 2010).
Mesothelioma Phase 2 study met primary endpoint and showed overall 46% patients achieved disease control with median progression-free survival increased from 2.8 month to 4.7 months (J. Clin. Oncol., published online, doi:10.1200/JCO.2009.27.3649, 20 April 2010). Roche–Genentech/ Breast cancer, Single-arm phase 2b study in conjunction with Pertuzumab (2C4) HER2 positive trastuzumab showed combination is active and well tolerated in patients with metastatic HER2+ breast cancer and responsive to previous Herceptin treatment (J. Clin. Oncol. 28, 1138–1144, 2010). Source: BioMedTracker, a service of Sagient Research (http://www.biomedtracker.com/).
Notable upcoming approvals Q310 Company/ drug name Indication Theratechnologies/ HIV-associated Tesamorelin (Egrifta/ lipodystrophy ThGRF/somatorelin) Savient Gout Pharmaceuticals/ Krystexxa (pegloticase) Novartis/ Multiple sclerosis Gilenia (fingolimod)
Roche–Genentech/ Diabetic macular Lucentis (ranibiedema; retinal vein zumab) occlusion
Expected approval 7/27/10 PDUFA date. FDA panel voted 16 to 0 in favor of approval on 5/27/10. Phase 3 study showed treatment met primary endpoint (J. AIDS 53, 311–322, 2010). 9/14/10 PDUFA date. Biologic license application resubmitted in 03/10 to correct deficiencies cited by FDA in 08/09 following favorable panel vote (14 to 1) on 6/16/09. 9/21/10 PDUFA date. FDA advisory panel voted in favor of approval on 6/10/10. Phase 3 study met primary endpoint. (Abstract in Amer. Acad. Neurol., 15 April 2010). H2 2010 supplemental MAA approval.
Source: BioMedTracker, a service of Sagient Research (http://www.biomedtracker.com/). PDUFA, Prescription Drug User Fee Act. MAA, marketing authorization application.
Wayne Peng, Emerging Technology Analyst, Nature Publishing Group
volume 28 number 7 JULY 2010 nature biotechnology
n e ws f e at u r e
Sunshine on conflicts
© 2010 Nature America, Inc. All rights reserved.
US drug companies are preparing for new draconian provisions for reporting on financial relationships with academia. Will efforts to increase transparency prove burdensome to researchers and the industry? Virginia Hughes investigates. In late May, the US National Institutes of Health announced a draft set of rules for managing conflicts of interest among its grantees1. In late April, Senator Charles Grassley (R-IA) sent a stern inquiry to the Centers for Disease Control and Prevention in Atlanta following a government report claiming that the agency was lax in policing financial conflicts of interest between experts serving on advisory committees and the pharmaceutical and biotech industries2. This is only the latest fallout from Grassley’s long campaign for increased transparency between physician researchers and industry. Thanks to his efforts, the massive healthcare reform legislation passed in March includes provisions mandating that every pharmaceutical, biotech or medical device company disclose, on a publicly searchable website, all payments of $10 or more made to physicians or teaching hospitals. Some stakeholders contend that strict disclosure rules add unnecessary and unjust burdens to an already struggling biotech industry—particularly to fledgling companies with few sales. “One thing we can be absolutely, guaranteed sure about is that the industry’s going to have to spend money to do it,” says Tom Stossel, director of translational medicine at Brigham and Women’s Hospital, in Boston, and founder of a small biotech called Critical Biologics. “At a time when the biotechnology industry profitability is low and the investment ecosystem is completely seized up, is this what we want to throw money at?” he asks. Push and pull Industry and academia have a symbiotic relationship. The type of blue sky research that is undertaken in academia is typically too risky to be carried out in industry. But when researchers hit upon something clinically useful, they need companies to scale up their work, develop products and guide them through the long and expensive regulatory road. In return, industry gains access to innovative therapies, as well as to patients, a rigorous clinical trial infrastructure and the public relations bonus of being affiliated with distinguished universities and hospitals. Eighty percent of clinical departments at US medical schools receive industry funding of some kind—from research support to
generic is available5. A survey of authors of clinical practice guidelines—which outline standard treatments and influence the decisions of many physicians—found that 38% had been pharmaceutical company consultants and 58% received research support from industry6. In addition to prescribing patterns, clinical research may be biased as well. For example, a report evaluating 1,140 papers published between 1980 and 2002 found that industrysponsored studies are more likely to yield results that benefit the company7. Another review analyzed data submitted to the US Food and Drug Administration (Silver Spring, Maryland) for 74 clinical trials of 12 common antidepressants. In roughly half of the trials, the FDA deemed the drug effective. Only 51 studies were published ever, and 48 of them (94%) were reported to have positive results8.
faculty lunches—according to a 2007 survey3. And the money seems to move things along: clinical trials are published eight to ten times faster when one of the investigators is affiliated with industry4. Although customary now, financial entanglements between the two realms were practically nonexistent before 1980. That’s when the BayhDole Act deemed that companies, universities or nonprofits could own the intellectual property Culture shift resulting from federally funded research, taking The issue hit the headlines in June 2008, when it out of the public domain. Suddenly, academic the New York Times reported that Senator researchers and their institutions could apply for Grassley’s investigators had found large discreppatents and license discoveries to companies. ancies between what pharmaceutical companies Many credit the act for the rapid rise of the US said they paid three Harvard psychiatrists and biotech industry. what those researchers actually disclosed to the Those relationships are growing more numer- university. Each reported only a fraction of the ous, thanks to shrinking federal research bud- more than $1 million received through various gets and industry’s stagnant product pipeline. company relationships. Grassley’s team has since In the past two years, big pharma has forged a found similar inconsistencies in disclosure statedozen multi-million dollar research collabora- ments from dozens of other academic researchtions with prominent medical centers. ers. Subsequent surveys of physicians suggest Industry is parthat these aren’t isoticularly valuable in lated cases. nascent fields that For example, a require expertise in study published in several areas of biolthe New England ogy, such as stem cells, Journal of Medicine notes Brock Reeve, analyzed disclosure executive director of statements submitthe Harvard Stem Cell ted by physicians who Institute in Cambridge. presented work at the Massachusettes. He annual meeting of the estimates that 40–50% It’s official. With the passage of the health American Academy of of the institute’s bud- care reform act came new mandates for Orthopedic Surgeons, get comes from indus- reporting industry/academia partnerships. and compared the pay(Source: The White House) try. “To do research ment figures to those in a multidisciplinary published on device area like stem cells, no one company, and manufacturers’ websites (Box 1). Only 71% of no one lab, is going to have all the necessary payments were disclosed9. resources,” Reeve says. “These relationships “There probably is a minority of surgeons are critical to innovation.” who intentionally did not disclose, but I think The hitch is that what’s best for a company a large part of it is that the disclosure requireisn’t always what’s best for patients. And doctors ments are so confusing,” says lead investigator who are paid consultants can sometimes have a Mininder Kocher, associate professor of orthohard time managing these competing interests. pedic surgery at Harvard Medical School in Some studies suggest that physicians who receive Boston. Kocher, who serves as a surgical conresearch funding, honoraria, gifts or meals from sultant to the orthopedics industry, says that industry are more likely to prescribe newer, these relationships are common and necessary. more expensive drugs, even when an effective “But they clearly also have the potential to be
nature biotechnology volume 28 number 7 july 2010
641
© 2010 Nature America, Inc. All rights reserved.
NE W S f e at u r e negative. So the solution that’s most commonly advocated is disclosure.” There is some debate, however, as to whether disclosure rules should be mandated by the federal government. ‘You can’t legislate morality,” says Peter Corr, co-founder of Celtic Therapeutics, a biotech investment company located in the US Virgin Islands, and former head of worldwide R&D at Pfizer. Last year, Corr sat on an Institute of Medicine committee on conflict of interests in medicine, which published a report urging the professional community to create a “culture of accountability.” Corr says that any legislation that impinges on the relationship between industry and academia is problematic. “I would hope that the profession can police itself,” he says. “Otherwise, I think the government will end up doing things with unintended consequences that would be sad for society as a whole.” Others counter that because it helps to protect consumers, physician disclosure is a government matter. “Transparency is necessary for patients and the public to be able to assess the relationships and be fully informed,” says Allan Coukell, director of the nonprofit Pew Prescription Project in Washington, DC. “While this is a pure transparency bill, we hope that it will continue to help the process of culture change that’s already underway,” he adds. Federal moves The stories of highly paid physician consultants certainly got the public’s attention, and paved the way for Grassley and fellow Senator Herb Kohl (D-WI) to introduce the Physician Payment Sunshine Act in January 2009. The act was later folded into the healthcare reform legislation passed earlier this year. Companies must start recording payments on January 1,
2012, and submit their first annual report to the department of Health and Human Services by March 31, 2013. The information will appear on a public website—searchable by physician name—by September 30, 2013. For each payment of $10 or more, companies must record the form of the payment (cash or stock), the nature of the payment (gift, royalty, consulting fee) and, if applicable, the drug or device that’s related to the payment. Providing this detailed accounting was of foremost concern to doctors who consulted with Grassley’s team during the development of the legislation. “If the reporting lumps all of the payments into one, and lacks context, it can create a false impression. A lunch is different from a royalty is different from a research project,” notes Christopher Armstrong, investigative counsel to the Senate Committee on Finance, who wrote most of the bill’s language. Armstrong talked to hundreds of physicians and industry representatives when putting the bill together. The provisions also mandate that companies report contributions to research. But to ensure intellectual property, research support does not have to be disclosed for four years, or until the product is approved, whichever comes first. Jumping on the disclosure bandwagon, several large companies have already set up their own websites listing physician payments. Eli Lilly of Indianapolis, and Pfizer were required to, as part of the terms of legal settlements with the federal government over illegal marketing of drugs. Their websites, however, are often difficult to navigate and don’t specify what the payments are for (Table 1). So far, six state legislatures—in the District of Columbia, Maine, Massachusetts Minnesota, Vermont and West Virginia—have passed disclosure rules. Four apply only to drug compa-
Box 1 Orthopedics’ disclosure drama The medical device industry has found itself at the center of conflict of interest storms because unlike drugs, orthopedic devices are often invented or modified by surgeons. “A lot of advances we’ve had in orthopedics came from relationships between physician innovators and industry,” says Mininder Kocher of Harvard Medical School, who consults with device companies. But sometimes there’s a downside to those interactions. The most well-known example occurred in March 2005, when US federal prosecutors began investigating five manufacturers of artificial joints for bribing doctors to exclusively use their products. The companies—Biomet, Smith & Nephew, Stryker Orthopedics, Zimmer Holdings and the DePuy Orthopedics unit of Johnson & Johnson—represent roughly 95% of the market for hip and knee implants. The government investigated physician relationships forged as early as the late 1990s and brought formal charges against the companies in 2007. The companies settled for a combined $310 million in penalties, although none admitted any wrongdoing. Some say the settlement spurred interest in the new federal Sunshine provisions. “I’d say it was a pretty significant part of the motivation behind the [new] legislation,” says Bill Kolter, of Biomet. VH
642
nies, and three require that the information be made public. Publicly available databases have been set up by the Attorney General of Vermont (http://www.atg.state.vt.us/issues/pharmaceutical-manufacturer-payment-disclosure.php) and the Minnesota Board of Pharmacy (http:// extra.twincities.com/CAR/doctors/). According to Michael Gonzalez-Campoy, CEO of the Minnesota Center for Obesity, Metabolism and Endocrinology, a private institution outside of St. Paul that conducts industry-sponsored research, many medical institutions in the state have banned interactions between their physicians and industry. This has made recruiting top talent difficult owing to the hostile environment created by Minnesota’s law, he says. One justification for creating a federal database is to standardize the reporting from all of these preexisting sites, according to Armstrong. The federal legislation preempts all state disclosure laws, unless the state requires information that is not covered in the federal laws. “If the information is all in one place, companies have one rule to follow, not 80 rules, and the public only has one website to consult,” Armstrong says. Industry reactions In a 2008 statement, industry group Pharmaceutical Research and Manufacturers of America, located in Washington, DC, came out in favor of disclosure and praised the Sunshine Act for superseding local legislation. The “confusing myriad” of state rules, it said, are “overly burdensome and costly for those required to report.” But although the additional costs of conforming to the legislation might not be burdensome to big pharma, they certainly will represent a drain in time and money for smaller companies. For now, the biotech industry’s lobbying group, the Biotechnology Industry Organization, also based in Washington, DC, is sitting back. “We’ll monitor their implementation and weigh in with the designated agency as appropriate,” says general counsel Thomas DiLenge. Leaders of Adolor, a biopharmaceutical company in Exton, Pennsylvania, say they welcome increased transparency, but also point out that complying with the new legislation will affect their business operations. “It will be necessary to allocate resources to purchase the systems to track these criteria and dedicate personnel to manage the process,” notes Eliseo Salinas, senior vice president of R&D at Adolor. “This expense will, unfortunately, shift dollars away from our ongoing drug development programs.” But Kay Dickersin, director of the Center for Clinical Trials at Johns Hopkins University in Baltimore, says that’s just part of the cost of doing business. “It’s like saying you have to have an office, or a lawyer,” she says.
volume 28 number 7 july 2010 nature biotechnology
n e ws f e at u r e Table 1 Companies’ disclosure websites Earliest period reported
Data reported
Format/search ability
Eli Lilly http://www.lillyfacultyregistry.com/Pages/index.aspx
Q1–Q4 2009
Doctor payments for consulting and speaking only
Flash website, not downloadable or searchable
GlaxoSmithKline http://gsk-us.com/docs-pdf/responsibility/hcp-feedisclosure-2q-4q2009.pdf
Q2–Q4 2009
Doctor payments for consulting and speaking only
PDF, extremely small font
Doctor payments for speaking only
PDF, extremely small font
Lists doctors making a total of at least $500. Lists individual payments of $25 or more. Includes money for research collaborations
HTML, searchable by doctor name
Lists total amount paid to individual doctors for speaking and consulting. Does not list individual payments
HTML, searchable by doctor name
Company/website
Q3–Q4 2009 Merck (Whitehouse Station, New Jersey) http://www.merck.com/corporate-responsibility/docs/ business-ethics-transparency/APA_4Q09_Grant_ Trans_Data_v15_051010.pdf Pfizer http://www.pfizer.com/responsibility/working_with_ hcp/payments_report.jsp
Q3–Q4 2009
© 2010 Nature America, Inc. All rights reserved.
2009 calendar year Cephalon (Frazer, Pennsylvania) http://www.cephalon.com/our-responsibility/fees-forservices-2009/fees-for-services-2009.shtml
The penalty for companies that unknowingly fail to disclose is up to $10,000 per payment, not to exceed $100,000 per year. For intentionally not reporting, the fines go up to $100,000 per payment, with a $1 million annual cap. Although big pharma might not have trouble complying with the new laws, some experts say that small companies and startups will take a hit. “Where it is a casualty is where a company has to have a full-time person who decides whether it’s OK to buy their collaborator lunch,” says Stossel. Last year, Stossel founded the Association of Clinical Researchers and Educators to advocate on behalf of physician-industry partnerships. “Companies that have sales can do it. But the companies that have few sales are going to have a terrible time with it,” he says. Joel Martin, president and CEO of Altair Therapeutics, an eight-person company in San Diego, says the regulations are “incredibly stringent,” particularly for companies like his that are still in early development phases of their products. Altair is collaborating with several Canadian academic medical centers to carry out a phase 2 clinical trial. It’s entirely possible, he says, that a strong backlash against pharma will cause more academics to bow out of industry relationships. “And if that happened, I would be tremendously disappointed. You don’t want drugs developed in a vacuum.” This is exactly what happened to Velico Medical, a ten-person company in Beverly, Massachusetts. Velico CEO Doug Clibourn says that the company tried—and failed—to retain a renowned expert from an elite institution as a device consultant. “He would have had to go through enormous hoops,” to comply with his institution’s rules, Clibourn says. “We don’t even have a product, we’re just trying to figure out a product. But still he’s not able to talk with us.” For similar reasons, Velico no longer has a scientific advisory board, Clibourn says.
“We’ve operated in this universe for decades with an amazing synergy between the clinical community and companies that are developing new products,” he adds. “From our perspective, the ethical questions are sort of absurd.” Public perceptions For his part, Grassley does not deny that these relationships are essential for developing new medical treatments. But, he says, patients deserve to know about their doctors’ conflicts of interest, which the new laws will lay bare. “My work on the disclosure issue, since 2007, has focused on oversight of what is happening in the real world,” Grassley says. A few doctors have already publicly stepped down from academic positions over conflict of interest rules. This January, for example, allergy specialist Lawrence DuBuske resigned from his clinical position at Brigham and Women’s Hospital—and lost his academic position at Harvard Medical School—after the partner institutions announced that they would no longer allow its doctors to be paid speakers for the pharmaceutical industry. DuBuske reportedly made $99,375 last year from GlaxoSmithKline of London for giving 40 talks in three months, and has similar agreements with six other companies. “Academia will be losing more and more smart people,” because of its growing antiindustry sentiment, says Antonio Hardan, associate professor of psychiatry at Stanford in Palo Alto, California, who has consulting and research relationships with several pharmaceutical companies. “You’re going to see more people deciding either to go straight into industry or to not do research at all.” Thomas Sullivan, president of Rockpointe Corporation, a medical education company in Columbia, Maryland, takes it a step further. “This kind of anti-industry culture that’s being permeated is pretty rapidly moving research and
nature biotechnology volume 28 number 7 july 2010
development, and even commercialization, into other regions of the world,” he says. The laws perpetuate the myth among the general public that all doctor-industry relations are bad, says Gonzalez-Campoy. “A lot of harm comes from the implication that doctors are corruptible, that they don’t do what they think or know is best for their patients,” he says. He believes that there will be a “significant delay” in the implementation of new treatments in the US, and a growing number of drug developers going abroad. But Tom Insel, head of the National Institute of Mental Health in Bethesda, Maryland, says the public’s disapproval of physician-industry relationships is precisely why disclosure laws are so important. “In terms of the ability at least to put everything in the public domain, and to try to restore public trust, this is a step in the right direction,” he says. In the end, perhaps the best way to gain the public’s trust is to develop useful treatments— and not be shy about it, says Derek Lowe, a blogger and chemist in the pharmaceutical industry. “Overall, the laws are probably a good thing, because the less we look like we have something to hide, the better off we are.” Virginia Hughes, Brooklyn, New York 1. Rockey, S.J. & Collins, F.S. J. Am. Med. Assoc. published online, doi:10.1001/jama.2010.774 (24 May 2010). 2. Levinson, D. CDC’s ethics program for special government employees on federal advisory committees (CDC, Atlanta, Georgia, December, 2009). http://oig.hhs.gov/ oei/reports/oei-04-07-00260.pdf 3. Campbell, E.G. et al. J. Am. Med. Assoc. 298, 1779– 1786 (2007). 4. Contopoulos-Ioannidis, D.G. et al. Am. J. Med. 114, 477–484 (2003). 5. Wazana, A. J. Am. Med. Assoc. 283, 373–380 (2000). 6. Choudhry, N.K. et al. J. Am. Med. Assoc. 287, 612–617 (2002). 7. Bekelman, J.E. et al. J. Am. Med. Assoc. 289, 454–465 (2003). 8. Turner, E.H. et al. N. Engl. J. Med. 358, 252–260 (2008). 9. Okike, K. et al. N. Engl. J. Med. 361, 1466–1474 (2009).
643
building a business
Ask your doctor Jeffrey J Stewart, Jeron Eaves & Ben Bonifant
© 2010 Nature America, Inc. All rights reserved.
When seeking a licensing partner for a product in development, market projections and strategies require substantiation. This can be provided through physician interviews.
D
o you know the market for your products? You may think you do, but a surprisingly large number of companies end up having erroneous ideas about their customers’ needs. Flawed market projections can jeopardize your ability to find a licensing partner or, worse still, lead to failure of a product launch. Although you may understand what experts in the field believe, prescribing physicians often look for very different things in a new product than do so-called key opinion leaders. Thus, on the basis of expert advice, you may believe you understand your market when, in fact, you understand only a small segment. This is why physician interviews can prove extremely useful in building your business. We have personally conducted more than 700 60-minute interviews with physicians about products in development. We use these interviews to assist in valuations and preparing partnership discussion materials. For drug developers, there are two general reasons to conduct physician interviews: to find ways to improve your product and to convince a partner to work with you on terms favorable to your company. In the remainder of this article, we discuss how to manage physician interviews to help achieve these goals.
The target product profile Before talking to physicians, you need something to discuss. Physicians must have enough information to evaluate your product and provide meaningful feedback (Box 1). A oneto five-page document, called a target product profile (TPP), may be used to describe the product to physicians. This is where a Jeffrey J. Stewart is a senior consultant, Jeron Eaves is an associate practice executive and Ben Bonifant is vice president at Campbell Alliance, Raleigh, North Carolina, USA. e-mail:
[email protected]
644
Box 1 Creating a meaningful target product profile Creating a target product profile is a pivotal part of the process of obtaining feedback from physicians. Here are five rules you should follow when preparing such a document: • Describe the finished product in US Food and Drug Administration label language • Avoid advocacy—don’t go beyond the label claims in the target product profile • Focus on (projected) clinical results • Assume nothing about dosing and administration • Discuss the concerns you have about the product
science-driven company may err, as doctors are more familiar with evaluating products based on US Food and Drug Administration (FDA) labels instead of on scientific papers. Rather than writing a technical paper, you should present the highlights from what you believe your FDA label will eventually look like. You should familiarize yourself with FDA label language from products similar to yours, which can be found on the FDA’s website. Make sure you do not go beyond the anticipated FDA label in the TPP. Sales representatives will be able to address these ‘label claims’ but will be restrained from discussing most data outside the label. What does this mean for you? If your regulatory expert believes the indication will be mild-to-moderate asthma, don’t say “and may be appropriate for other respiratory conditions, including severe cases” in the TPP. Advocacy is important, but a physician interview is a place for analysis alone. It is good practice to project the number of patients that will be tested in phase 3 even if you are in an earlier stage of development. If you present the patient numbers you have from earlier stages, or if you have no clinical data at all, then physicians will be biased against the data you do present. Because you are attempting to gauge physician adoption of your product once all clinical trials are
complete, present projected phase 3 trial patient numbers in the TPP. If otherwise unknown, phase 3 trial sizes may be estimated roughly from other FDA labels for products approved for the same indication. Presenting preclinical data (even if otherwise compelling) may simply convince physicians that the product is in early stages and cannot yet be evaluated. “I’ve never once treated a rat,” one interviewed physician said to us when presented with otherwise strong preclinical data for a melanoma product. One area in which companies often trip up is in providing a minimal description of dosing and administration. These things matter, and they especially matter to physicians. What’s more, these areas are the ones in which most scientifically minded companies (and key opinion leaders) have the least common ground with their customers. In our experience, some of the most consequential but solvable problems are in dosing and administration (Box 2). These problems may make or break a product launch if not discovered and resolved. Finally, if you have a concern that could damage your product, discuss it with physicians. Include a projected adverse event table along with projected warnings and contraindications. You may be surprised to find that what you think matters actually does not bother physicians. If the factor is a concern to
volume 28 number 7 JULY 2010 nature biotechnology
b u i l d i n g a b us i n e ss
© 2010 Nature America, Inc. All rights reserved.
physicians, you may ask about ways to mitigate the downside. This will allow you, when you do approach partners with your primary interview results, to say confidently that you have spoken with physicians and they do not believe issue X will be a problem, or that they suggested solutions Y and Z. Conducting the interview A friendly physician interview is unlikely to be useful. Instead, you are better off interviewing physicians at arm’s length—ideally through a skilled third party and certainly with physicians who do not have an existing relationship with you or your company. Once you have lined up your physicians (you will typically pay an honorarium), there are two general steps to take to conduct a productive interview: make sure you understand both the physician’s current practice and his or her response to the TPP. About half of a 60-minute interview is typically spent understanding the usage patterns and patient population surrounding different pharmaceuticals. Patients may exist in practical ‘buckets’— meaning different subsets of patients receive different types of treatment. Examples of bucketing include differences based on age, disease severity or co-morbidities. Before a physician interview, you might have an idea of how patients are segmented by practicing physicians, and scientific papers and treatment guidelines are sometimes helpful. Still, the only way to get real-world information is to ask. Once you understand the patient buckets and how each is treated differently, you are in a position to discuss how your product fits in. Present the TPP and then ask about advantages, disadvantages and anticipated use in each patient bucket until you thoroughly understand how often, for whom and why this physician might use your product. If there are additional pieces of information (especially anything that would be in a publication but not on the FDA label), you may present that to test if the physician is ‘promotionally sensitive’ to particular messages (a company’s medical science liaison is able to discuss scientific literature outside of the FDA label, so it’s meaningful to explore what scientific information would be compelling to physicians). Finally, we have not found it useful to discuss price with physicians in most cases. Physicians are quick to say that price is a major factor for every product. However, these same physicians are often unable to say what similar, existing products cost. Instead of asking about price, ask physicians how they will react to secondary effects of price.
Box 2 Just what the doctor ordered Physicians can provide feedback that is useful for many aspects of product development. Here is a real-world example: an intravenous drug has to be infused in the office and an alternative dosage form would be oral and taken at home. Which would physicians prefer? We have worked on projects in which physicians declared each dosage form the clear winner. For a chronic fatigue syndrome treatment, physicians believed getting the patient to come into the office often would be difficult and would monopolize office space, so the oral drug was much preferred. For an oncology product we reviewed, however, physicians were reimbursed more favorably for infused products than for oral products and also could ensure compliance by using the infused product, so that form was strongly preferred. We have observed enough counterintuitive reactions to product strength, dosing regimen, packaging and methods of administration to recommend that companies pay close attention to dosing and administration in the target product profile.
That is, physicians work in a world in which third parties seek to restrict what doctors prescribe. These restrictions may include formulary placement, step edits (required use of drug A before drug B may be prescribed), prior-authorization requirements, unfavorable product reimbursement or medical exemption requirements. Ask physicians what restrictions (from such payers as the government, insurance companies or hospital administration) they believe will be placed on your product and how their use of it might change if additional restrictions are added. Payer interviews will then inform you of what restrictions may be in place at different price points, so you may estimate changes in use from changes in price. Improving the product The results of primary physician research can often provide crucial insights into actions you may take to improve perceptions of, and ultimately use of, your product. In short, you can learn the product features that matter most to physicians and then figure out how to craft convincing marketing messages. Clinical results are often the primary drivers for market uptake of a product, so it makes sense to listen to what physicians have to say about your planned clinical trials. Are the proposed endpoints the right ones? Is the comparator arm relevant to the physicians? Will patient numbers be large enough to convince a community physician? For a diagnostic, do physicians care most about sensitivity, specificity, positive predictive value or negative predictive value? (Different trial designs will tend to maximize different values.) One example we found was a walking test. For neurologists treating patients who had multiple sclerosis, walking was an ideal endpoint that meaningfully described a patient’s ability to function. In another setting (chronic fatigue), physicians believed the amount of time walking on a treadmill had little clinical
nature biotechnology volume 28 number 7 JULY 2010
relevance. Physicians viewed what appears to be the same endpoint (walking) very differently in different contexts. In another example, the time the physician had to spend monitoring the patient for adverse events after administering a cardiac diagnostic was viewed as a ‘straightto-the-pocketbook’ endpoint (lost physician time), and we were able to advise our client to include patient monitoring time in the planned phase 3 trial. In our experience, understanding what clinical data would support effective marketing messages is well worth the effort before phase 3 begins. Another area that is often overlooked—and can leave substantial gaps in product valuation efforts if ignored—is a thorough understanding of health economics (how the use of a treatment ultimately may save money for the payer) for a new drug. Studies that provide supportive rationale for reimbursement decisions are not always at the front of physicians’ minds. However, knowing how doctors are incentivized to use your product should be an area of focus during primary research. Our example of oral versus infusion dosing (Box 2) highlights how a lack of objective, realistic discussions with treating physicians can prevent a product from being developed in a formulation doctors will be inclined to use. In many therapeutic areas, especially those affecting elderly populations, physicians carefully weigh the impact of adding another drug to a patient’s regimen versus the impact on quality of life. Too often, those details are not even considered in clinical studies or are uncovered only during partnering due diligence (or even worse, in a post-launch analysis of a product’s poor performance). Gaining a thorough understanding of the logistics of how healthcare providers actually use your product can also help maximize value and minimize barriers to adoption. Often, physicians and nurses have preferred 645
b u i l d i n g a b us i n e ss
© 2010 Nature America, Inc. All rights reserved.
packaging and dosage forms, and preference may well trump efficacy and safety for many products. In one instance, physician interviews showed us that the time necessary to warm our client’s product to room temperature was a concern given the circumstances in which it would most likely be used. In another, our client proposed supplying its product in vials that did not contain enough active pharmaceutical ingredient to dose the typical range of patients, which meant multiple vials, wasted product and wasted time. Dosage and packaging have dramatic impacts on market share because these things matter to physicians. Use in partnering In our experience in partnering discussions, we’ve noticed that whichever party has talked to the larger number of physicians has more credibility. Quotes and other qualitative observations trump unsubstantiated beliefs held by companies and their potential partners (as they should). To get a large physician sampling, use quantitative Internet surveys following qualitative interviews. Because credibility is the keystone of partnering discussions, partners must believe that you understand the market in detail and that any pitfalls have been uncovered. Your partner
646
must see that the TPP used in your physician interviews was not an advocacy piece but an unbiased analysis piece. Partners will be generating and comparing revenue estimates based on their own physician interviews. If you have presented your product in detail to physicians, your market share estimates will increase in credibility. There is an important translation between what physicians say they will do in terms of market share and what they actually do. Physicians (and consumers more generally) are thought to overestimate when asked to predict future use of a new product. Some companies automatically apply a 33%–67% reduction on market share results or adjust usage share by zeroing out responses that came from physicians who did not make the top one or two boxes on an intent-toprescribe scale (answered 6 or 7 on a 1–7 scale). There is no universally adopted rule for translating physician intent into physician action, but credibility in the process helps in defending a high projected market share during negotiation.
There are three reasons why partners discount market share projections: the TPP was an advocacy piece, potential pitfalls were not discussed with physicians or physicians were ‘detailed’ (sold) on the product. You need to come down on the right side of all three aspects. It has been our experience that if you are rigorous about bucketing patients, discussing pitfalls in detail and presenting your product in an unbiased manner, partners are much more willing to accept your market projections. Conclusions Talking to your customers via physician interviews is the strait and narrow gate to product improvement and successful partnering. A meaningful interview process may lead to course corrections before it’s too late and can help position a company for success in the market. A credible physician interview process will allow your partners to accept the market values you project and will help you reach a partnership based on mutual trust.
To discuss the contents of this article, join the Bioentrepreneur forum on Nature Network:
http://network.nature.com/groups/bioentrepreneur/forum/topics
volume 28 number 7 JULY 2010 nature biotechnology
correspondence
© 2010 Nature America, Inc. All rights reserved.
PeptideClassifier for protein inference and targeted quantitative proteomics To the Editor: Direct protein profiling offers unique insights beyond those afforded by transcriptomics or genomics technologies. These include information about the abundance level, posttranslational modifications and interaction partners of proteins. Shotgun proteomics1 is
Experimental workflow
the method of choice for the comprehensive analysis of complex protein mixtures, and extensive proteome coverage has recently been reported even for eukaryotic model organisms2,3. However, shotgun proteomics also faces significant challenges, such as the protein inference problem4. Protein inference
Computational workflow >Genome ATGGTGGGTGGCAAGAAGAA AACCAAGATA...
refers to the process of deducing the proteins that were originally present in a sample on the basis of the experimentally identified peptides. Because significant amounts of shared peptides—peptides that could be derived from several proteins—introduce ambiguity, protein inference can be tricky
Minimal protein list
Protein inference Genome Classified peptide list
>Gene_2 CTACTCGCATGTAGA >Gene_1 CTACTCGCATGTAGA ...
Protein mixture
Genes
Peptide MVQYNFK ITVVPNGK MVVRPYNDELR
Score 0.99 0.25 0.95
Class 1a 2 3b
Proteins/genes Protein_1.1 Gene_1 Protein_1.1,2.1
>Protein_2.1 >Protein_1.2 MVQYNFKRIT... MVQYNFKRIT... >Protein_1.1 MVQYNFKRITVVPNGK...
Proteolytic digestion
Proteins
Peptide classification
Peptide selection for quantitative proteomics
In silico digestion MVQYNFK R ITVVPNGK MVVRPYNDELR
MS spectrum
Predicted spectrum
MS/MS spectrum
Peptides
Measurement & prediction
Scored peptide list Peptide Score MVQYNFK 0.99 ITVVPNGK 0.25 MVVRPYNDELR 0.95
Spectra
Peptide spectrum match
Figure 1 Schematic overview of where PeptideClassifier maps onto a shotgun proteomics workflow and selected applications. Proteins are extracted and digested with a protease (typically trypsin) before further separation of the peptide mixture, ionization, selection of precursor ions, fragmentation and recording of fragment ion spectra in a mass spectrometer. PeptideClassifier takes peptides assigned by the PSM process as input, ideally further processed with a probabilistic or other scoring scheme. Because gene-model information is included in the classification process (red arrow), several different peptide classes with distinct information content can be reported (see main text). For deterministic protein inference, a minimal list of protein identifications can be generated for peptides above a user-defined threshold (for more details, see Supplementary Fig. 1). Other major applications include the information content–based selection of peptides for targeted quantitative proteomics workflows (based on experimental data or an in silico pre computed index for all peptides; see dashed arrow) and integration with transcriptomics data (not shown).
nature biotechnology volume 28 number 7 july 2010
647
© 2010 Nature America, Inc. All rights reserved.
corr e spo n d e n c e and error prone. Furthermore, the error rate at the protein level is typically substantially higher than that at the peptide level4,5. Informatics solutions that provide accurate and reproducible results are thus needed to minimize the propagation of errors in the literature and in data repositories, and to allow readers to critically evaluate the conclusions of papers6,7. To address this issue, we have recently devised a novel, deterministic peptide classification and protein inference scheme8. This approach is the first to take into account the gene model–protein sequence–protein identifier relationships. Each peptide sequence is classified according to its information content with respect to protein sequences and gene models (Fig. 1). This allows shared peptides to be further distinguished depending on whether the implied proteins could be encoded either by the same or by distinct gene models. Here we announce the release of the modular software tool PeptideClassifier (folders containing the code for PeptideClassifier as well as some AuxiliaryScripts can be downloaded on
the Nature Biotechnology website, or from http://www.mop.uzh.ch/software.html) and illustrate its general applicability for both eukaryotes and prokaryotes, and its value for applications beyond protein inference (Supplementary Table 1). These include integration with transcriptomics data and information content–based selection of peptides for targeted quantitative proteomics studies (Fig. 1). PeptideClassifier can classify shotgun proteomics data from any organism, provided that a clear relationship exists between the gene model, its encoded protein sequences and their identifiers. Several reference databases (such as FlyBase, Wormbase, TAIR, ENSEMBL and RefSeq) fulfill this requirement. PeptideClassifier carries out several steps (Supplementary Fig. 1): first, it analyzes protein sequence redundancies and generates an identifiable proteome index; second, it parses the database search result files; third, it classifies the experimentally identified peptides into six evidence classes with different information contents (see below); fourth, it infers a minimal list of protein identifications
Gene 1
per evidence class; and finally, it can report a minimal set of protein identifications that would explain the remaining ambiguous peptides, following the Occam’s Razor approach5. In its current implementation, PeptideClassifier can work with the output of common database search engines or, alternatively, with a list of confident peptide identifications provided by a user. The classification and protein inference approach is generic: on the basis of their different gene structures, for prokaryotes we report three peptide evidence classes (classes 1a, 3a and 3b), and for eukaryotes, to capture potential alternative splice isoforms, we consider three additional evidence classes (classes 1b, 2a and 2b) (Fig. 2). Class 1a peptides unambiguously identify a single unique protein sequence. Class 1b peptides also unambiguously identify one unique protein sequence, but this sequence could be derived from distinct splice isoform transcripts of a gene model that, although identical in the coding sequence, differ in the 5′ or 3′ untranslated region, or in both regions. We extend the original classification8 to further distinguish class 2 peptides into
Gene 2
Gene 3
5’ 3’
Gene 4 3’ 5’
UTR
Protein 1.1
Protein 2.1
Protein 1.2
Protein 2.2
UTR
Protein 3.1 Protein 4.1
UTR
Protein 2.3
Protein 2.4 Class
Eukaryotes
Prokaryotes
Gene 1
Gene 2
5’ 3’
Protein sequence(s)
Protein isoform(s)
Gene(s)
Unambiguous
Unambiguous
1a
Unambiguous
1b
Unambiguous
Ambiguous
Unambiguous
2a
Ambiguous
Ambiguous
Unambiguous
2b
Ambiguous
Ambiguous
Unambiguous
3a
Unambiguous
Ambiguous
Ambiguous
3b
Ambiguous
Ambiguous
Ambiguous
Gene 3 3’ 5’
Protein 1
Protein 2 Protein 3
Figure 2 Overview of the distinct peptide evidence classes of our classification scheme for eukaryotes and prokaryotes. We distinguish six peptide evidence classes for eukaryotes (upper panel). Introns are shown as gray boxes, exons as green boxes and splicing events as dashed green lines. The 3′ untranslated region (UTR) is indicated for one gene model, to highlight the relevance of 5′ or 3′ UTRs for assigning class-1b identifications. Also shown are sets of two gene models that encode an identical protein sequence (dashed gray lines). Owing to the lack of splice variants, class 1b, 2a and 2b do not apply to prokaryotes (lower panel). A table summarizing the ability of the respective peptide evidence classes to distinguish protein sequences, annotated protein isoforms and genes is shown in the figure; the implications for major applications are indicated in Supplementary Table 1.
648
volume 28 number 7 july 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
corr e spo n d e n c e those peptides that identify a proper subset (class 2a) versus those that imply all protein sequences encoded by a gene model (class 2b). Finally, class 3a peptides unambiguously identify a protein sequence that can be encoded by several gene models from distinct loci. Such cases, which include histones or the products of duplicated genes in prokaryotes, are typically very rare. In contrast, class 3b peptides are derived from different protein sequences encoded by gene models from distinct loci. They have the least information content but can account for a large percentage of the experimental data8. The conceptually simple extension of integrating the gene model distinguishes our solution from other common protein inference tools, such as ProteinProphet5, IsoformResolver9, Scaffold10 or IDPicker11. Similar to these tools, PeptideClassifier addresses protein inference using the assigned peptides, but it does not try to improve the peptide-spectrum matching (PSM) process (Fig. 1). Notably, for protein inference, our deterministic method considers only peptides above a user-defined threshold, and not lower-scoring peptides that could provide additional evidence for certain protein identifications, the default approach adopted by ProteinProphet5 and Scaffold10. The deterministic approach is therefore very stringent. Similar to IDPicker11, Scaffold10 and other solutions, the output of two different database search algorithms could in principle be classified and integrated to achieve additional stringency. One example of the use of PeptideClassifier concerns the reporting of reference data sets in proteomics, where the error rate should be as minimal as possible. For peptides of class 1a, 1b and 3a, a minimal list of nonoverlapping, unambiguous protein sequence identifications can be generated (Fig. 2). For cases in which the peptide evidence cannot distinguish between several possible protein sequences that are encoded either by the same gene model (class 2a and 2b) or by different gene models (class 3b) (Fig. 2), a minimal list of ambiguous protein identifications can be generated that reports the inherent ambiguities (for more details, see Supplementary Fig. 1). Using a targetdecoy database search strategy12, one can estimate the spectrum-level false discovery rate (FDR) for a selected peptide confidence threshold. Because the FDR is much higher for proteins identified by a single hit, one option would be to exclude them. Existing guidelines for protein identification requiring two distinct peptides6 have limited the number of false-positive protein
identifications reported in the literature. Alternatively, by opting to manually validate all single-hit identifications with an information-rich peptide, a user could reduce the overall protein FDR while keeping valid single hits. We have shown that single hits passing manual evaluation (only 35% of all single hits) are enriched in short and low-abundance proteins8, which, by definition, will contribute fewer observable peptides. The rejected single hits accounted for around 90% of the incorrect PSM matches estimated to be present in the data set on the basis of target-decoy database search results. Removing them greatly reduces the actual FDR in the data set at the peptide level and even more so at the protein level. Ideally, one would be able to rely on a scoring scheme; the solution by Gupta and Pevzner13 may represent one valuable resource. We suggest that the guidelines for protein identification6 be extended to consider the peptide information content. The classification facilitates seamless integration with transcriptomics data. We have demonstrated this for current transcriptomics platforms, which predominantly report results at the genemodel level. However, to allow more finegrained integration with data from exonbased array platforms or RNA-Seq and take advantage of their potential to distinguish splice variants, we have further subdivided class 2 peptides: class 2a peptides imply a proper subset of distinct protein sequences encoded by one gene model, whereas class 2b peptides imply all encoded protein sequences. In combination, class 1a, 1b and 2a peptides can thus be informative in identifying and distinguishing different splice isoforms. Because a substantial part of the continuous updates to eukaryotic reference protein databases represent splice variants (Supplementary Table 2), we expect that such a classification will become increasingly valuable. In addition, shared peptides may lead to inaccurate protein quantification results in semiquantitative spectral counting applications: the distinction of several peptide evidence classes provided by PeptideClassifier can help prevent this. Finally, PeptideClassifier can assist in selecting the most relevant peptides for targeted quantitative proteomics approaches using multiple reaction monitoring. Applications can range from selecting proteotypic peptides from existing experimental proteome catalogs14 to supporting larger projects like the Human Protein Detection and Quantitation project15, which aims to identify expression evidence
nature biotechnology volume 28 number 7 july 2010
for all human gene models. In the first phase, peptides specific for a gene model but not a specific splice isoform or modified protein are required. Thus, a classification that displays the in silico pre computed information content of each peptide could help researchers select the best candidates for a specific use case both at the splice-variant level and at the gene-model level. We detail the steps for generating a proteome-wide pre computed peptide information content index and its advantages for this use case in Supplementary Table 3. Accurate protein identification and quantification are of key interest for the proteomics field. Our classification scheme, which is, to our knowledge, the first to consider gene model–protein sequence– protein identifier relationships, can help to minimize potential protein inference errors. PeptideClassifier displays all ambiguities, enabling a researcher to further examine candidates of specific interest and to distinguish or even remove protein-level ambiguities by integrating transcriptomics or other data sets. Its applications for data integration and information content– based selection of peptides for targeted quantitative proteomics are expected to find widespread use. Note: Supplementary information is available on the Nature Biotechnology website. ACKNOWLEDGMENTS We thank C. Panse for contributing a first XML parser to extract information from database search engine output files, E. Brunner and G. Hausmann for feedback on the manuscript and K. Basler, U. Grossniklaus, R. Aebersold, M. Hengartner and J. Jiricny for continued support of the Quantitative Model Organism Proteomics bioinformatics core group. E.Q. and C.H.A. are members of the Quantitative Model Organism Proteomics Initiative, which is supported by the University Research Priority Program Systems Biology/Functional Genomics of the University of Zurich. AUTHOR CONTRIBUTIONS E.Q. wrote the software code and documentation and generated the figures; C.H.A. originally devised the peptide classification scheme and wrote the manuscript. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests.
Ermir Qeli & Christian H Ahrens Quantitative Model Organism Proteomics, Institute of Molecular Life Sciences, University of Zürich, Winterthurerstrasse 190, 8057 Zürich, Switzerland. Correspondence should be addressed to C.H.A. (
[email protected]). 1. Washburn, M.P., Wolters, D. & Yates, J.R. III. Nat. Biotechnol. 19, 242–247 (2001). 2. Brunner, E. et al. Nat. Biotechnol. 25, 576–583
649
corr e spo n d e n c e (2007). 3. de Godoy, L.M. et al. Nature 455, 1251–1254 (2008). 4. Nesvizhskii, A.I. & Aebersold, R. Mol. Cell. Proteomics 4, 1419–1440 (2005). 5. Nesvizhskii, A.I., Keller, A., Kolker, E. & Aebersold, R. Anal. Chem. 75, 4646–4658 (2003). 6. Carr, S. et al. Mol. Cell. Proteomics 3, 531–533 (2004). 7. Nesvizhskii, A.I., Vitek, O. & Aebersold, R. Nat. Methods 4, 787–797 (2007). 8. Grobei, M.A. et al. Genome Res. 19, 1786–1800 (2009). 9. Resing, K.A. et al. Anal. Chem. 76, 3556–3568
(2004). 10. Searle, B.C., Turner, M. & Nesvizhskii, A.I. J. Proteome Res. 7, 245–253 (2008). 11. Ma, Z.Q. et al. J. Proteome Res. 8, 3872–3881 (2009). 12. Elias, J.E. & Gygi, S.P. Nat. Methods 4, 207–214 (2007). 13. Gupta, N. & Pevzner, P.A. J. Proteome Res. 8, 4173– 4181 (2009). 14. Ahrens, C.H., Brunner, E., Hafen, E., Aebersold, R. & Basler, K. Fly 1, 182–186 (2007). 15. Anderson, N.L. et al. Mol. Cell. Proteomics 8, 883– 886 (2009).
© 2010 Nature America, Inc. All rights reserved.
Minimum information about a protein affinity reagent (MIAPAR) To the Editor: We wish to alert your readers to MIAPAR, the minimum information about a protein affinity reagent. This is a proposal developed within the community as an important first step in formalizing standards in reporting the production and properties of protein binding reagents, such as antibodies, developed and sold for the identification and detection of specific proteins present in biological samples. It defines a checklist of required information, intended for use by producers of affinity reagents, qualitycontrol laboratories, users and databases (Supplementary Table 1). We envision that both commercial and freely available affinity reagents, as well as published studies using these reagents, could include a MIAPAR-compliant document describing the product’s properties with every available binding partner. This would enable the user or reader to make a fully informed evaluation of the validity of conclusions drawn using this reagent (Fig. 1). Supplementary Table 2 shows an example of a MIAPAR-compliant document, which could be derived from the information supplied in a single publication using the workflow summarized in Supplementary Figure 1. Affinity reagents serve various roles in experimental studies. These include protein sample identification and detection; protein capture for isolation, purification and quantification; and functional studies. The choice of an applicable molecular tool is conditioned by the experimental objectives and the chosen approaches and methods. This has led to a widening of the range of molecules being used as affinity reagents (Table 1 and ref. 1). The best established are ‘natural’ polyclonal and monoclonal 650
antibodies; however, an expanding range of recombinant constructs are now available, including single-chain variable fragments (scFvs), single-domain antibody fragments and diabodies. More recently, alternative affinity reagents have been developed, the biophysical properties of which present advantages in specific applications. They include protein scaffolds, such as fibronectin, lipocalins and ankyrin and armadillo repeat domains, and nucleic acid aptamers. These reagents are used in a growing range of experimental methods, including enzymelinked immunosorbent assay (ELISA), western blotting, immunohistochemistry, affinity chromatography and immunoprecipitation (Table 2). At the same time, the systematic characterization of complete proteomes has led to an increase in the scale on which affinity reagents are produced. Several ambitious projects aim to develop systematic affinity-reagent collections. In Europe, they include the EU ProteomeBinders consortium1, the Human Protein Resource and Human Protein Atlas2 and the Antibody Factory3. In the United States, the National Cancer Institute (Bethesda, MD) has initiated the Clinical Proteomic Reagents Resource within the Clinical Proteomic Technologies Initiative for Cancer4. Globally, the Human Proteome Organization (HUPO) Human Antibody Initiative aims to promote and facilitate the use of antibodies for proteomics research, which embraces many of these activities (http://www.hupo.org/ research/hai), and the HUPO Proteomics Standards Initiative has developed PSIPAR, a global community standard format for the representation and exchange of protein affinity-reagent data5.
With the broadening availability of tools and methods, researchers have to define the most efficient binder applicable to the method and approach they have selected. These applications are carried out under different experimental conditions, which affect the choice of affinity reagent used. For example, binders can be either in solution or immobilized to a solid phase, and target proteins may be present either in a native, conformationally folded form or in a denatured state. To compare affinity reagents and decide upon the most appropriate one, users need comprehensive information regarding each reagent. Currently, multiple sources of information exist, including commercial catalogs of antibodies, portals centralizing affinityreagent properties from various sources and experimental results published in the literature describing the successful use of a binder in a specific application. Largescale production initiatives also add other sources such as validation and qualitycontrol results from production centers and independent quality assessment laboratories (such as the Antibodypedia portal; http://www.antibodypedia.org/). Even so, the available information may be incomplete; for example, the identification of a protein belonging to a particular family using a given antibody may be reported with no information concerning the assessment of possible cross-reactivity of the antibody with other family members. Existing information may also be biased by unsubstantiated reports from a commercial producer. Furthermore, data may appear contradictory at first glance, owing to a lack of precision in target or sample descriptions. The purpose of MIAPAR is to permit the reliable identification of affinity reagent– target–application triples. A binder is designed and produced for the detection of a particular target protein or peptide, often within a complex mixture. For maximum benefit of potential users, reporting of data about such a protein binder must describe (or reference) both its intended target and its qualities as a molecular tool. Ideally, such a description should include: (i) affinity reagent (and target) production processes, which may influence the characteristics of the binder and permit the unambiguous identification of the molecules; (ii) properties of the reagent as a binding tool, including its specificity, affinity, binding kinetics and cross-reactivity; (iii) the use of the reagent in applications (that is, compatibility with experimental
volume 28 number 7 july 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
corr e spo n d e n c e techniques and methods); and (iv) links to standardized protocols or experimental records that support the production process, the qualities of the binder as a tool and the claimed applications. MIAPAR-compliant descriptions need to be kept up to date and relevant to the batch of material being made available. This may require a new document with every batch in the case of potentially variable reagents, such as polyclonal antibodies. The underlying principle in MIAPAR is similar to that of other reporting guidelines developed as part of the HUPO Proteomics Standards Initiative (HUPOPSI)6. Required information is structured so as to allow entry into databases and enable useful querying and automated data analysis. This structure is designed to achieve comprehensive coverage and clarity. To provide unambiguous reports, MIAPAR recommends the use of standard naming conventions, such as database accession numbers, controlled vocabularies and the like, to describe entities and processes. Other important criteria in MIAPAR are sufficiency, meaning that a reader should be able to understand and evaluate the conclusions and their experimental corroboration, interpret the validity of the project and its outcome, and perform comparisons with similar projects; and practicality, meaning that the guidelines should not be so burdensome as to prohibit their widespread use. The objective is not to describe in detail experimental results that will typically be recorded in databases or laboratory information management systems; nor is MIAPAR intended as a substitute for production protocols and procedures that are documented elsewhere, and its minimal information will not be sufficient to reproduce binder and target production or synthesis. Finally, the guidelines are not expected to be static. They have been assembled through consultations with a large number of experts and will evolve according to community requirements in the context of a rapidly developing technological framework. The MIAPAR document displayed on the HUPO-PSI website describes the most up-to-date version of the standard (http://www.psidev. info/index.php?q=node/281); the content at the time of this publication can be found in the Supplementary Note. MIAPAR is designed to be used for the reporting of several processes. The first is the production of new affinity reagents. This can be part of a large-scale activity
Affinity reagent–related issues ~24,000 proteins
Coverage issue
Other molecules
Antibodies
Binders pool
Redundancy issue
Quality issue
MIAPAR
Binder:
Binding:
Identification Production Characteristics
Target:
Binding properties Experimental evidence
Identification Production Characteristics
For binder users • Assistance in binder selection: evidence-based choice
For binder producers • In production: documentation in catalog or database
Database ? • In production: complement research publication +
• Allowing information sharing: feedback
• Allowing positioning: binders comparison Database
=?
Figure 1 The scope of MIAPAR. MIAPAR-compliant reports will enable users to make informed choices when selecting from catalogs, databases or publications the binder best suited to a particular application.
performed by academic or commercial producers or by systematic initiatives. In this case, a MIAPAR-compliant document could be used in the producer’s catalog or in public databases and repositories to describe accurately and unambiguously
the qualities of such reagents as molecular tools. Alternatively, a laboratory may produce one specific affinity reagent, either to develop a new production process or to meet research goals when there is no suitable commercial binder. In such a case,
Table 1 Affinity-reagent types Affinity reagent category
Example
Immunoglobulin
Full-length antibody (monoclonal or polyclonal) Antibody fragment (e.g., Fab, scFv and related constructs including minibodies, diabodies, single VH or VL domains or nanobodies)
Protein scaffold
Fibronectin Ankyrin repeat Armadillo repeat Lipocalin (anticalin) Affibody
Peptide ligand
Natural peptide Synthetic peptide Peptidomimetic
Nucleic acid aptamer
DNA aptamer RNA aptamer
Small chemical entities
nature biotechnology volume 28 number 7 july 2010
Natural product (secondary metabolite) Synthetic product
651
corr e spo n d e n c e
Table 2 Assay types and associated reagent states Assay class
Assay type
Affinity reagent state
Target state
Gels and blots
Immunoblot (western blot)
In solution
Denatured
Purification
Affinity chromatography
Bound to solid phase
In solution, native folding
Immunoprecipitation
In solution
In solution, native folding
Immunohistochemistry
In solution
Fixed (cross-linked)
Live cell imaging
In solution
Native folding
Fluorescence activated cell sorting
In solution
Membrane bound, native folding
Magnetic cell sorting
In solution
Membrane bound, native folding
Radioimmunoassay
Capture binder: in solution
Native folding (sometimes denatured)
Staining Sorting and counting Assays
Detection: in solution Sandwich ELISA-type
Capture binder: solid phase
Native folding (sometimes denatured)
Detection: in solution
Arrays
Competitive ELISA-type
Various configurations
In solution, native folding
Affinity determination (SPR, QCM, etc.)
In solution or bound to surface
Bound to surface or in solution
Protein arrays
No binder
Bait: bound to surface
Antibody arrays
Capture: solid phase
In solution, native folding
Antibody arrays with sandwich
Capture: solid phase
In solution, native folding
© 2010 Nature America, Inc. All rights reserved.
Prey: in solution
Detection: in solution with other binders Bead assays
Reverse phase arrays
In solution
Surface immobilized
Single bead assays
Capture: solid phase, bound to bead
In solution, native folding
Detection: in solution Multiplex bead assays
Capture: solid phase, bound to bead
In solution, native folding
Detection: in solution with other binders Therapeutics
Tumor therapy: tumor targeting
Administered to mammalia
Cell surface receptor, native folding
Tumor therapy: toxin neutralization
Administered to mammalia
Native folding
the MIAPAR document can complement the scientific publication describing the binder and provide a checklist for the author to work with during manuscript preparation. As reagents pass through quality-control procedures, an initial MIAPAR document could be updated with the corresponding reagent quality reports produced by laboratories charged with independent characterization and evaluation of available affinity reagents. Finally, when the binder is used in a specific experiment, such as protein identification in tissue samples, a reference to the corresponding MIAPAR document in the paper reporting the experiment would allow unique identification of the binder used and a clear understanding of both the strengths and the limitations of that protein identification. This process could also lead to an update of the MIAPAR document with the report of a successful experimental use of the binder in a particular application. Whereas MIAPAR provides a list of descriptive items to document a binder uniquely and unambiguously, it does not define terms to be used to fill in the descriptions. Use of database accession numbers, controlled vocabularies and 652
ontologies for describing entities, processes and conditions is strongly recommended for MIAPAR documents. Regarding molecules, they may be identified by a database accession number from a public database, such as UniProtKB (http://www. uniprot.org) for proteins and Ensembl (http://www.ensembl.org) or Entrez Gene (http://www.ncbi.nlm.nih.gov/gene/) for genes. The PSI-PAR controlled vocabulary under development (see below) provides a list of recommended databases and unified names for these resources. A number of controlled vocabularies are available in the Open Biomedical Ontologies Foundry (http://www.obofoundry.org/)7 and may be used to describe proteins, tissues, diseases and molecular interactions, including protein affinity interactions. A controlled vocabulary is currently being developed (PAR) to cover specifically protein affinity reagents, including terms not described in existing controlled vocabularies5. This is based on the molecular interactions vocabulary (MI) maintained as part of the HUPO-PSI. A draft version is available online through the European Bionformatics Institute ontology lookup service (http://www.ebi.ac.uk/ontologylookup/browse.do?ontName=PAR)8. The
ontology may also be downloaded from the HUPO-PSI website (www.psidev.info/ index.php?q=node/281#cv). The use of a structured format and ontology to describe experiments and reagents has already aided the development of tools for selecting epitopes to raise affinity reagents9. The MIAPAR guidelines have been developed within the affinity-reagent community in close collaboration with the HUPO-PSI work group on molecular interactions. As a standard for representation of affinity reagent–target interactions, MIAPAR extends the MIMIx guidelines for molecular interactions10 with specific principles and practices appropriate for affinity reagents and their target molecules. As a standard to describe molecular tools, MIAPAR complements MIMIx with further characterization of the molecules involved, their method of production and their binding properties, and it further documents the use of the binders in experimental applications. Within MIAPAR, information regarding experiments is limited to that which is essential for documenting the properties of the binder as a molecular tool. When required, more complete descriptions should be provided using
volume 28 number 7 july 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
corr e spo n d e n c e other relevant guidelines; for instance, the immunohistochemical application in our example MIAPAR document (Supplementary Table 2) could be described more fully using the ‘minimum information specification for in situ hybridization and immunohistochemistry experiments’ (MISFISHIE) guidelines11. The Minimum Information for Biological and Biomedical Investigations project12 is working to manage all such guidelines through a central repository of standards, providing a single entry point for users of guidelines and ensuring that these standards are complementary and nonoverlapping. MIAPAR has been developed to facilitate the sharing of data about affinity reagents within the scientific community. It does not dictate a specific format for reporting information but rather provides a checklist of the information which should be included somewhere within such a report. It is also a first stage toward the design of a data model and information infrastructure associated with the affinity-reagents field. In particular, an XML exchange format based on PSI-MI XML2.5 (refs. 6,13) and associated controlled vocabulary are now available5, and MIAPAR-compliant data maps to the PSI-PAR XML schema. Plans have also been made to adapt the IntAct14 database to support the management of affinity-reagent data. The current MIAPAR guidelines serve as a basis for the design of a more complete knowledge model to be used for information exploitation and inference. We recognize that these reporting guidelines are addressed to a somewhat different audience than most, in that the majority of available agents, particularly antibodies, are produced and sold by commercial companies. It is hoped that researchers will use these guidelines as leverage to request that companies supply MIAPAR-compliant data with each purchase, thus providing clear and consistent information about the quality of binding agents. Although it is difficult to see how this could be anything other than a voluntary agreement, we hope that once this commitment is made by a critical mass of manufacturers, both commercial and nonprofit, it will become standard practice. We anticipate that MIAPAR will be updated as other binder types, production methods
and experimental applications of affinity reagents emerge. There is still considerable scope for discussion of which characteristics of binders should be documented to support their efficient use in a wide range of experimental settings. Suggestions from the community are encouraged and will be collected and published on the PSI-PAR HUPO-PSI website (http://www.psidev. info/index.php?q=node/281). We encourage binder producers and users to promote compliance with MIAPAR in the interests of the entire community. Note: Supplementary information is available on the Nature Biotechnology website. ACKNOWLEDGMENTS Work on MIAPAR was supported in part by the EU FP6 ProteomeBinders Infrastructure Coordination Action (contract 026008) and the EU FP7 Biobanking and Biomolecular Resources Infrastructure BBMRI (grant agreement 212111). COMPETING FINANCIAL INTERESTS The authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/ naturebiotechnology/.
Julie Bourbeillon1,26, Sandra Orchard2,26, Itai Benhar3, Carl Borrebaeck4, Antoine de Daruvar1,5, Stefan Dübel6, Ronald Frank7, Frank Gibson8, David Gloriam2,9, Niall Haslam10, Tara Hiltker11, Ian Humphrey-Smith12, Michael Hust6, David Juncker13, Manfred Koegl14, Zoltàn Konthur15, Bernhard Korn14, Sylvia Krobitsch15, Serge Muyldermans16, Per-Åke Nygren17, Sandrine Palcy1,5, Bojan Polic18, Henry Rodriguez11, Alan Sawyer19, Martin Schlapshy20, Michael Snyder21, Oda Stoevesandt22, Michael J Taussig22, Markus Templin23, Matthias Uhlen24, Silvere van der Maarel25, Christer Wingren4, Henning Hermjakob2 & David Sherman1 1INRIA
Bordeaux–Sud-Ouest, MAGNOME project team, Talence, France. 2European Molecular Biology Laboratory–European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK. 3Department of Molecular Microbiology and Biotechnology, Tel-Aviv University, Ramat Aviv, Israel. 4Department of Immunotechnology, Lund University, Lund, Sweden. 5Université de Bordeaux, Centre de Bioinformatique de Bordeaux, Bordeaux, France. 6Technische Universität Braunschweig, Institute of Biochemistry and Biotechnology,
nature biotechnology volume 28 number 7 july 2010
D-38106 Braunschweig, Germany. 7Helmholtz Center for Infection Research, Braunschweig, Germany. 8AbCam, Cambridge, UK. 9Medicinal Chemistry, Pharmaceutical Faculty, Copenhagen University, Copenhagen, Denmark. 10Complex and Adaptive Systems Laboratory, University College, Dublin, Ireland. 11Clinical Proteomic Technologies for Cancer, National Cancer Institute, Bethesda, Maryland, USA. 12Deomed Limited, Newcastle-upon-Tyne, UK. 13Biomedical Engineering Department, McGill University and Genome Quebec Innovation Centre, McGill University, Montreal, Canada. 14German Cancer Research Center, Heidelberg, Germany. 15Max Planck Institute for Molecular Genetics, Berlin, Germany. 16Department of Molecular and Cellular Interactions, Vrije Univeristeit Brussel, Brussels, Belgium. 17Royal Institute of Technology, AlbaNova University Center, Stockholm, Sweden. 18Medical Faculty University of Rijeka, Rijeka, Croatia. 19European Molecular Biology Laboratory Monoclonal Core Facility, MonterotondoScalo, Italy. 20Technische Universität München, Munich, Germany. 21Stanford University School of Medicine, Department of Genetics, Stanford, California, USA. 22Babraham Bioscience Technologies, Babraham, Cambridge, UK. 23Natural and Medical Science Institute, University of Tübingen, Tübingen, Germany. 24Royal Institute of Technology, AlbaNova University Center, Stockholm, Sweden. 265Universiteit Leiden, Leiden, The Netherlands. 26These authors contributed equally to this work. Correspondence should be addressed to S.O. (
[email protected]). 1. Taussig, M.J. et al. Nat. Methods 4, 13–17 (2007). 2. Ponten, F., Jirström, K. & Uhlen, M. J. Pathol. 216, 387–393 (2008). 3. Mersmann, M. et al. New Biotechnol. 27, 118–128 (2010). 4. Tao, F. Expert Rev. Proteomics 5, 17–20 (2008). 5. Gloriam, D. et al. Mol. Cell. Proteomics 9, 1–10 (2010). 6. Taylor, C. et al. OMICS 10, 145–151 (2006). 7. Smith, B. et al. Nat. Biotechnol. 25, 1251–1255 (2007). 8. Cote, R.G., Jones, P., Martens, L., Apweiler, R. & Hermjakob, H. Nucleic Acids Res. 36, 372–376 (2008). 9. Haslam, N. & Gibson, T. EpiC: a resource for integrating information and analyses to enable selection of epitopes for antibody based experiments. in Data Integration in the Life Sciences, Paton, N.W., Missier, P. & Hedeler, C. (eds.) 173–181 (Springer, Berlin and Heidelberg, Germany, 2009) 10. Orchard, S. et al. Nat. Biotechnol. 25, 894–898 (2007). 11. Deutsch, E.W. et al. Nat. Biotechnol. 26, 305–312 (2008). 12. Taylor, C.F. et al. Nat. Biotechnol. 26, 889–896 (2008). 13. Kerrien, S. et al. BMC Biol. 5, 44–54 (2007). 14. Aranda, B. et al. Nucleic Acids Res. 38, 525–531 (2010).
653
corr e spo n d e n c e
© 2010 Nature America, Inc. All rights reserved.
Guidelines for reporting the use of column chromatography in proteomics To the Editor: We wish to announce the column chromatography module (MIAPE-CC) of the minimum information about a proteomics experiment (MIAPE) guidelines1, specifying the minimum information that should be provided when reporting the use of column chromatography in a proteomics experiment (Box 1). MIAPE-CC constitutes a further component of the MIAPE documentation system, developed by proteomics researchers working under the aegis of the Human Proteome Organisation’s Proteomics Standards Initiative (HUPO-PSI; http:// www.psidev.info/). Prior modules for mass spectrometry and gel electrophoresis have already been described in Nature Biotechnology2–4. MIAPE-CC covers the use of columns for protein or peptide separation, with a view to supporting the sharing of best practices, validation of results, discovery of results and sharing of experimental data sets. For a full discussion of the principles underpinning this specification, please refer to the MIAPE ‘Principles’ document1. Specifically, the CC module covers the configuration of a column, the selection of a suitable mobile phase, the gradients employed during the column run, the collection of fractions and the associated detector readings. The guidelines request a brief description of the sample, sample processing before chromatography and the injection procedures. They do not address subsequent protein identification, chromatographic performance assessment procedures or the mechanisms by which data are captured, transported and stored. Note that where multidimensional chromatography is used, the module should be adhered to for each dimension, with specific fractions from one column being used as the input sample for another. The full specification of the MIAPE-CC module is provided as Supplementary Table 1 and the most recent version can be obtained through the HUPO-PSI website. Note that subsequent versions of this document may have altered scope, as will almost certainly be the case for all the MIAPE modules. To contribute or to track progress to remain ‘MIAPE compliant’, browse the HUPO-PSI website (http://www.psidev.info/miape/). Note: Supplementary information is available on the Nature Biotechnology website.
654
Box 1 Contents snapshot for MIAPE-CC The full MIAPE-CC document is divided into two parts: an introduction providing background for the module and an overview of its content, then a full list of items to be reported. The MIAPE-CC guidelines themselves are subdivided as follows: • General features, such as analyst details, description of the sample, sample preparation and the injection procedure. • Description of the column(s) used: product details and physical characteristics including the stationary phase, and the chromatography system used for the separation. • Mobile phase: the concentrations of each of the mobile phase constituents. • Properties of the column run (time, gradient (with reference to the mobile phases described in section 3), flow rate and temperature). • Pre- and post-run processes, such as equilibration, calibration or washing. • Column outputs: chromatogram; details of fractions collected.
COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests.
Andrew R Jones1, Kathleen Carroll2, David Knight3, Kirsty MacLellan4, Paula J Domann5, Cristina Legido-Quigley6, Lihua Huang7, Lance Smallshaw8, Hamid Mirzaei9 , James Shofstahl10 & Norman W Paton11 1Department of Comparative Molecular Medicine, School of Veterinary Science, The University of Liverpool, Liverpool, UK. 2Manchester Centre for Integrative Systems Biology, Manchester Interdisciplinary Biocentre, University of Manchester, Manchester, UK. 3Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester, UK. 4National Institute for Biological Standards and
Control, Blanche Lane, South Mimms, UK. 5LGC Ltd., Teddington, Middlesex, UK. 6PSD, School of Biomedical and Health Sciences, King’s College London, London, UK. 7Bioproduct Research and Development, Lilly Research Laboratories, Lilly Technology Centre, Indianapolis, Indiana, USA. 8Lilly UK, Speke, Liverpool, UK. 9Institute for Systems Biology, Seattle, Washington, USA. 10Thermo Fisher Scientific, Inc., San Jose, California, USA. 11School of Computer Science, University of Manchester, Oxford Road, Manchester, UK. (
[email protected]). 1. Taylor, C.F. et al. Nat. Biotechnol. 25, 887–893 (2007). 2. Taylor, C.F. et al. Nat. Biotechnol. 26, 860–861 (2008). 3. Binz, P.-A. et al. Nat. Biotechnol. 26, 862 (2008). 4. Gibson, F. et al. Nat. Biotechnol. 26, 863–864 (2008).
Guidelines for reporting the use of capillary electrophoresis in proteomics To the Editor: We wish to announce the capillary electrophoresis module (MIAPE-CE) of the minimum information about a proteomics experiment (MIAPE) guidelines1, specifying the minimum information that should be provided when reporting the use of capillary electrophoresis in a proteomics experiment (Box 1). The MIAPE-CE module is the result of a coordinated effort by a consortium
of capillary electrophoresis researchers working in the proteomics field and constitutes an additional part of the MIAPE documentation system established by the Human Proteome Organisation’s Proteomics Standards Initiative (HUPO-PSI; http:// www.psidev.info/). MIAPE modules for mass spectrometry and gel electrophoresis have already been described in previous issues of Nature Biotechnology2–4.
volume 28 number 7 july 2010 nature biotechnology
corr e spo n d e n c e Box 1 Contents snapshot for MIAPE-CE The full MIAPE-CE document is divided into two parts: an introduction, providing background and an overview of the content, and the full list of items to be reported. The MIAPE-CE guidelines themselves are subdivided as follows: 1. General features: the overall type and aim of the experiment. 2. Sample details and method-specific sample preparation. 3. Equipment used, in terms of the instrumentation, software and capillary; with a description of type and manufacturer along with any subsequent modifications. 4. Run process: the steps followed in each experiment and all the parameters that are associated with this. For example, capillary and sample temperatures, auxiliary data channels, time of data collection, step name/purpose, step length/order, pressures, voltages, geometries, flush solution and electrolyte compositions. 5. Detection: type, wavelengths/mass range, data collection rate, whether direct or indirect and detector calibration requirements.
© 2010 Nature America, Inc. All rights reserved.
6. Electropherogram data processing.
Capillary electrophoresis comprises a broad family of techniques, for all of which the subtleties of operation are the key to obtaining robust and reliable results. Therefore, it is necessary to specify that a significant degree of descriptive detail be captured, for the equipment deployed, its manner of use, the sample analyzed and the data processing performed. The MIAPE-CE guidelines provide a checklist of the information that should be provided when describing a capillary electrophoresis experiment (Supplementary Table 1). Providing the information requested by MIAPE-CE enables improved corroboration of results by enhancing the comparability of data, whether they are to be submitted to a public repository or reported in a scientific publication (e.g., in a ‘materials and methods’ section). MIAPE-CE does not specify the format in which to transfer data, or the structure of any repository or document. Nor does it require a description of the preparation of the sample (excepting directl assay-related preparation) or the ‘fate’ of the analyzed sample beyond the process of detection. Items falling outside the scope of this module may be captured in complementary modules. These guidelines will evolve as circumstances dictate. The most recent version of MIAPE-CE is now available (http://www.psidev.info/miape/ce/) and the content is replicated here as supplementary information (Supplementary Table 1). To contribute or to track progress to remain ‘MIAPE compliant’, browse the HUPO-PSI website (http://www.psidev.info/miape/). Note: Supplementary information is available on the Nature Biotechnology website.
COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests.
Paula J Domann1, Satoko Akashi2, Coral Barbas3, Lihua Huang4, Wendy Lau5, Cristina Legido-Quigley6, Stephen McClean7, Christian Neusüβ8, David Perrett9, Milena Quaglia1, Erdmann Rapp10, Lance Smallshaw11, Norman W Smith6, W Franklin Smyth7 & Chris F Taylor12
1LGC, Teddington, Middlesex, UK.
2International Graduate School of Arts and
Sciences, Yokohama City University, Tsurumi-ku, Yokohama, Kanagawa, Japan. 3Facultad de Farmacia, Universidad San Pablo-CEU, Campus Montepríncipe, Boadilla del Monte, Madrid, Spain. 4Bioproduct Research and Development, Lilly Research Laboratories, Lilly Technology Centre, Indianapolis, Indiana, USA. 5Department of Protein Analytical Chemistry, Genentech Inc., South San Francisco, California, USA. 6Pharmaceutical Sciences Research Division, King’s College London, London, UK. 7School of Biomedical Sciences, University of Ulster, Coleraine, Co. Londonderry, UK. 8Aalen University, Aalen, Germany. 9William Harvey Research Institute, Barts & The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK. 10Max-Planck-Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany. 11Lilly UK, Speke, Liverpool, UK. 12European Bioinformatics Institute, Hinxton, UK (
[email protected]). 1. Taylor, C.F. et al. Nat. Biotechnol. 25, 887–893 (2007). 2. Taylor, C.F. et al. Nat. Biotechnol. 26, 860–861 (2008). 3. Binz, P.-A. et al. Nat. Biotechnol. 26, 862 (2008). 4. Gibson, F. et al. Nat. Biotechnol. 26, 863–864 (2008).
Guidelines for reporting the use of gel image informatics in proteomics To the Editor: We present the gel informatics module (MIAPE-GI) of the minimum information about a proteomics experiment (MIAPE) guidelines1. MIAPE-GI—a component of the MIAPE documentation system developed by the Human Proteome Organisation’s Proteomics Standards Initiative (HUPO-PSI; http://www.psidev. info/)—results from a coordinated effort by practitioners of gel informatics and representatives of appropriate software vendors, in consultation with the wider proteomics community. Previous MIAPE modules for mass spectrometry and gel electrophoresis have already been described in Nature Biotechnology1–3. The MIAPE-GI guidelines cover the processing of images derived from twodimensional gel electrophoresis to detect and quantify features, for example, relating to distinct proteins. The guidelines
nature biotechnology volume 28 number 7 july 2010
describe the relationships between (sets of) features on different images established through analyses or known to exist prior to the experiment (such as standards), and the stable location at which data have been deposited (Box 1). These guidelines were developed with a view to supporting the sharing of best practice, validation of results, discovery of results and sharing of experimental data sets. For a full discussion of the principles underlying this specification, please refer to the MIAPE ‘Principles’ document1. For MIAPE modules to work well together, their scope must be tightly constrained. Therefore, the MIAPE-GI guidelines do not cover the preparation and running of a gel, nor do they cover image capture; those areas are the province of the MIAPE gel electrophoresis document (MIAPE-GE4). Items outside the scope of this module may be addressed in later 655
corr e spo n d e n c e
Box 1 Contents snapshot for MIAPE-GI The full MIAPE-GI document is divided into two parts: an introduction providing background and overview of the content and a full list of the items to be reported. The guidelines have been designed to cope with different types of workflows, as performed by particular software packages. As such, a number of items are optional if they refer to a specific procedure not employed by the software used. The MIAPE-GI guidelines themselves are subdivided as follows: •G eneral features describing the type of electrophoresis performed, the source images for analysis and the analysis software used. • The gel analysis design with respect to replicates, groupings and standards used. • Image preparation steps before bioinformatics analysis, such as scaling, resizing or crops. • Image processing, such as image alignment, performed by bioinformatics software.
© 2010 Nature America, Inc. All rights reserved.
•D ata extraction, including feature detection, feature matching and feature quantification (if performed). • Data analyses performed, for example, extracting features with significant differential expression. • Results of data analysis, including feature locations, matches and relative quantities where appropriate.
versions or by complementary modules, such as MIAPE-GE, which can be obtained from the MIAPE web page (http://www. psidev.info/miape/). As is the case for all MIAPE modules, this specification does not recommend a particular format in which to transfer data nor the structure of any related repository or document. These guidelines will evolve as circumstance dictates. The most recent version of MIAPE-GI is available from the HUPO-PSI website and the content is replicated here in Supplementary Table 1. To contribute or to track progress to remain ‘MIAPE compliant’, browse the HUPO-PSI website (http://www.psidev. info/miape/). Note: Supplementary information is available on the Nature Biotechnology website. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests.
Christine Hoogland1, Martin O’Gorman2, Philippe Bogard2, Frank Gibson3, Matthias Berth4, Simon J Cockell5, Andreas Ekefjärd6, Ola Forsstrom-Olsson6, Anna Kapferer6, Mattias Nilsson6, Salvador Martínez-Bartolomé7, Juan Pablo Albar7, Sira Echevarría-Zomeño8, Montserrat Martínez-Gomariz9, Johann Joets10, Pierre-Alain Binz11, Chris F Taylor12, Andrew Dowsey13 & Andrew R Jones14 1Swiss
Institute of Bioinformatics, Proteome Informatics Group, Geneva, Switzerland. 2Nonlinear Dynamics, Cuthbert House, All Saints, Newcastle upon Tyne, UK. 3School
656
of Computing Science, Newcastle University, Newcastle upon Tyne, UK. 4Decodon, GmbH W, Greifswald, Germany. 5Bioinformatics
Support Unit, Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, UK. 6Ludesi AB, Malmö, Sweden. 7ProteoRed, National Center for Biotechnology-CSIC, Cantoblanco, Madrid, Spain. 8Agricultural and Plant Biochemistry and Proteomics Research Group, Department of Biochemistry and Molecular Biology, University of Córdoba, Córdoba, Spain. 9ProteoRed, Proteomic Facility, Universidad Complutense de Madrid-Parque Científico de Madrid, Madrid, Spain. 10Institut National de la Reserche Agronomique, Gif-sur-Yvette, France. 11Swiss Institute of Bioinformatics and GeneBio SA, Geneva, Switzerland. 12EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. 13Institute of Biomedical Engineering, Imperial College London, London, UK. 14Department of Comparative Molecular Medicine, School of Veterinary Science, University of Liverpool, Liverpool, UK (
[email protected]). 1. Taylor, C.F. et al. Nat. Biotechnol. 25, 887–893 (2007). 2. Taylor, C.F. et al. Nat. Biotechnol. 26, 860–861 (2008). 3. Binz, P.-A. et al. Nat. Biotechnol. 26, 862 (2008). 4. Gibson, F. et al. Nat. Biotechnol. 26, 863–864 (2008).
The 20-year environmental safety record of GM trees To the Editor: In a commentary last May, Strauss et al.1 pointed out that opposition to genetically modified (GM) organisms has recently intensified on GM trees and that recommendations of the Conference of the Parties (COP) to the Convention on Biological Diversity (CBD) have encouraged regulatory impediments to undertaking field research. We concur with Strauss et al. that the CBD appears to be increasingly targeted by activist groups whose opinions are in stark contrast to the scientific consensus and indeed the opinions of most respected scientific and environmental organizations worldwide. Strauss et al. call for more science-based (case-by-case) evaluation of the value and environmental safety of GM trees, which
requires field trials. However, the regulatory impediments being erected by governments around the world, with full corroboration of the COP, are making such testing so costly and Byzantine, it is now almost impossible to undertake field trials on GM trees in most countries. Here we summarize the key published evidence relating to the main environmental concerns surrounding the release of GM trees (Box 1). On the basis of our findings, we urge the COP to consider the opportunity costs for environmental and social benefits, and not just risks, in its deliberations of field trials and releases. A very large amount of performance and safety data related to GM crops and trees has now been gathered since field trials were first initiated in 1988 (ref. 2). Our search in publicly accessible databases worldwide
volume 28 number 7 july 2010 nature biotechnology
corr e spo n d e n c e
© 2010 Nature America, Inc. All rights reserved.
Box 1 Commercially successful GM trees Few GM tree species have as yet been deployed commercially. Two notable exceptions are the following: Bacillus thuringiensis toxin (Bt)-expressing poplar trees in China; and papaya trees expressing the viral coat protein gene of papaya ringspot virus (PRSV) in Hawaii. Approximately 1.4 million Bt poplars have been planted in China on an area of ~300– 500 hectares along with conventionally bred varieties to provide refugia to avoid the development of Bt resistance in insects. The trees are grown in an area where economic deployment of poplar was previously impossible due to high insect pressure. GM trees have been successfully established and have successfully resisted insect attack. The oldest trees in the field are now 15 years old (Minsheng Wang, personal communication). No harm to the environment has been reported. Experiences with GM papaya trees also illustrate multiple benefits15. The Hawaiian papaya industry faced serious threats in 1992 when PRSV was detected in plantations, and production dropped from 55 million pounds to 26 million pounds in 1998. In 2001, 3 years after the release of PRSV-resistant GM papaya plants, production was up to 40 million pounds. As an additional benefit, the GM papaya actually enabled the economic production of non-GM papaya in the same area because the GM trees kept infestation rates in the area well below economically problematic levels.
reveals >700 field trials with GM trees (including forest trees, fruit trees and woody perennials). None of them has reported any substantive harm to biodiversity, human health or the environment. In the following paragraphs, we summarize our main findings as they relate to ecological impact, the stability of transgene expression over time, the effectiveness of transgene containment and the status of nontarget organisms on leaves, stem and in soil. Field trials with GM poplars (Populus sp.) with modified lignin composition were among the first to include potential ecological impacts on the environment as goals. In this case, the poplars were engineered to express antisense transgenes that reduced the expression of lignin biosynthesis genes cinnamyl alcohol dehydrogenase or caffeic acid/5hydroxyferulic acid O-methyltransferase. Field trials of these trees, conducted in the UK3,4, were regularly inspected for alterations in growth and development, as well as for damage caused by insects, including ladybirds, ants, aphids, copper beetles, earwigs, shield bugs, froghoppers, caterpillars, spiders and fungi. No differences were observed comparing the wild-type and GM trees3,4. In addition, after termination of two trials in the UK and France5, analysis of the levels of carbon, nitrogen and microbial biomass as well as of the soil microbial population revealed no consistent differences between plots with wild-type trees and plots with GM trees. In fact, the only significant differences in these parameters were observed between the soil of the field trial and the soil taken under the grass just <4 meters away
from the field trial, indicating that the influence of the different vegetation types is considerably larger than the variation induced by the genetic modification5. Although decomposition assays revealed that the roots of lignin-modified trees do compost slightly faster than those of wildtype trees, this is in agreement with the role of lignin in resistance to biodegradation and the expectations of the researchers. Thus, no unexpected ecological impacts could be attributed to the GM trees; instead, differences in soil characteristics and microbial biomass were caused by environmental variation4,5. With relation to the ability of transgenes to be stably expressed over many years, studies of GM poplars over 3 to 8 years have found no evidence for loss of transgene expression in the field6–8. Nearly all of the instability observed is during in vitro production and propagation. This suggests that once GM trees pass in vitro and early field screens for stability, their traits remain highly stable6–8. The same also applies to RNA interference (RNAi)-based gene suppression traits, which have been shown to be stable and reproducible over multiple years and independent of outdoor temperature. This suggests that gene suppression continues to occur, even during early bud development and leaf senescence9. In two other field trials poplar trees resistant to glufosinate by virtue of expression of the phosphinothricin acetyltransferase10 or poplars carrying the Agrobacterium rhizogenes rolC gene have also been confirmed to stably express the transgenes after selection of GM lines10,11. Unpublished data from trials in New Zealand of transgenic pine, which express
nature biotechnology volume 28 number 7 july 2010
657
© 2010 Nature America, Inc. All rights reserved.
corr e spo n d e n c e the antibiotic resistance gene encoding neomycin phosphotransferase (nptII), also provide further corroboration of transgene stability in the field (C. Walter, unpublished data). Transgenic containment traits for mitigating gene flow have also been effective when tested in the field. Studies with male transgenic poplar trees containing a male-sterility gene showed that several transgenic events had very low or undetectable levels of pollen production, which persisted over several years12. This suggests that containment genes can be highly efficient and stable. The Strauss laboratory has produced ~1,000 transgenic events in poplar with advanced forms of sterility genes (via RNAi or dominantnegative mutations) with the intent of similar field studies. Finally, data on the status of nontarget organisms on leaves, stem and in the soil surrounding GM trees also indicate that the traits tested thus far are comparable to wild type. A comparison of wild-type and rolC transgenic poplars in German field trials revealed no differences in the status of nontarget phytopathogenic fungi on leaves and stems, or evidence of differences in carbohydrate and hormonal metabolism in the transgenic trees (M. Fladung, unpublished data). These studies have also systematically investigated the possibility of horizontal gene transfer to mycorrhizal fungi. Transgenic poplars carrying a fungal-specific promoter controlling the Streptococcus hygroscopicus bar gene were planted in the field to assess if horizontal gene transfer to the mycorrhizal fungi living in association with the transgenic trees occurred. Subsequently, large screening programs were initiated to identify putative phosphinothricin herbicide (Basta)-resistant mycorrhizal fungi. Although the results remain unpublished, the investigators running the trials have communicated that even though >100,000 mycorrhizal fungi were isolated from roots of the transgenic trees, there was no indication of a horizontal gene transfer event10 (M. Fladung and U. Nehls, unpublished data). Nontarget effects have also been studied in transgenic pines. In experiments conducted in New Zealand using radiata
658
pine (Pinus radiata) genetically modified with nptII and genes related to reproductive development, the impacts on invertebrates and soil microbial populations were assessed over a period of 2 years (on trees that had been grown in the field for up to 9 years; personal communication). When the composition and abundance of invertebrate populations usually present on non-GM radiata pine were compared with those on GM pines, no differences were found other than seasonal differences, and invertebrate species and numbers were unchanged13. Feeding studies with GM needles revealed no impact of transgenic material on fertility or fecundity of the invertebrates. Microbial populations living in association with, or close to, the roots of trees were characterized using an approach capturing the culturable and nonculturable fractions of microbes. Although seasonal differences were observed in population structures, no significant differences between GM and unmodified trees were found (C. Walter, unpublished data). These experiments again show that variation caused by environmental factors is much more pronounced than variation induced by the genetic modifications studied. Decisions on whether or not to use GM (or conventionally bred) organisms should be based on a scientific evaluation of possible risks associated with a particular new trait and the degree of novelty of the genes encoding it. However, it is also important to keep in mind the significant environmental benefits that such organisms could provide. The negative effects of the creeping regulatory burdens are becoming progressively more obvious as GM methods cannot be effectively employed despite the growing anthropogenic threats to native forests, the urgent needs for new biofuels and biomaterials, the already substantial impacts of climate change on forest health and the growing demand for forest products14,15. And all of this in the face of pressing demands for increased forest conservation. Given these grave challenges, among which are serious threats to the very survival and basic productivity of native and planted forests, we need to put hypothetical residual risks of GM in context. In our view, they appear very modest indeed.
Sooner or later, the COP should recognize the huge opportunity costs its current recommendations impose for GM technology. When it meets in Nagoya, Japan, in October, COP should urgently take note of the scientific evidence on the biosafety of GM traits that have been tested in the field so far and reconsider the regulatory and political hurdles that currently make meaningful field tests of GM trees almost impossible. The strong concerns against all GM plants and trees, initially expressed more than 20 years ago, are no longer justified. They are obviated by the long record of safety obtained from hundreds of field trials with several transgenic traits and the urgent societal and environmental problems for which the technology could be one additional, valuable tool. Therefore, we recommend the COP seriously consider the endorsement of policies that actively promote, rather than retard, further field testing of GM trees. Competing interests statement The authors declare that they have no financial competing interests.
Christian Walter1, Matthias Fladung2 & Wout Boerjan3 1Scion Biomaterials, Rotorua, New Zealand. 2vTI, Institute for Forest Genetics, D-22927
Grosshansdorf, Germany. 3Department of Plant Systems, VIB and the Department of Plant Biotechnology and Genetics, Technologiepark 927, 9052 Ghent University, Gent. Belgium. C.W. (
[email protected]), M.F. (
[email protected]), W.B. (
[email protected]). 1. Strauss,S. et al. Nat. Biotechnol. 27, 519–527 (2009) 2. Sweet, J. Environ. Biosafety Res. 8, 161–181 (2009). 3. Pilate, G. et al. Nat. Biotechnol. 20, 607–612 (2002). 4. Halpin, C. et al. Tree Genet. Genomes 3, 101–110 (2007). 5. Hopkins D.W. et al. Nat. Biotechnol. 27, 168–169 (2007). 6. Li, J. et al. Plant Biotechnol. J. 6, 887–896 (2008). 7. Li, J. et al. Transgenic Res. 17, 676–694 (2008). 8. Li, J. et al. Tree Physiol. 29, 299–312 (2009). 9. Li, J. et al. West. J. Appl. For. 23, 89–93 (2008). 10. Hoenicka, H. & Fladung, M. Trees 20, 131–144 (2006). 11. Kumar, S. & Fladung, M. Planta 213, 731–740 (2001). 12. Brunner, A. et al. Tree Genet. Genomes 3, 75–100 (2007). 13. Schnitzler, F.R. et al. Environ. Entomol. (in the press). 14. Fenning, T. et al. Nat. Biotechnol. 26, 615–617 (2008). 15. Fenning, T. & Gershenson, J. Trends Biotechnol. 20, 291–295 (2002). 16. Ferreira, S.A. et al. Plant Dis. 86, 101–105 (2002).
volume 28 number 7 july 2010 nature biotechnology
c o m m e n ta r y
The pros and cons of peptide-centric proteomics Mark W Duncan, Ruedi Aebersold & Richard M Caprioli
© 2010 Nature America, Inc. All rights reserved.
Recommendations on how best to exploit the strengths of peptide-centric proteomics and avoid its pitfalls.
P
also raise some cautionary notes and make some recommendations specifically for end users.
Mark W. Duncan is in the Division of Endocrinology, Metabolism and Diabetes School of Medicine, University of Colorado Denver, Aurora, Colorado, USA and the Obesity Research Center, College of Medicine, King Saud University, Riyadh, Saudi Arabia. Ruedi Aebersold is at ETH Zurich, Institute of Molecular Systems Biology, Zurich, Switzerland. Richard M. Caprioli is in the Department of Biochemistry and at the Mass Spectrometry Research Center, Vanderbilt University, Nashville, Tennessee, USA. email:
[email protected]
Rationale for peptide-centric approaches Although the exact number of human protein products remains unknown, it extends far beyond the estimated 20,000 or so protein-coding genes in the genome. Consideration of the likely number of transcriptional variants predicts >100,000 coded proteins1. However, the main source of protein complexity is the ubiquitous incorporation of >200 post-translational modifications, including phosphorylation and glycosylation2. Consequently, genes frequently serve as the predecessors of multiple structurally distinct products and even minor structural changes can alter protein function. All variants of bottom-up proteomics begin with site-specific cleavage of a protein mixture to generate an even more complex mixture of peptides. Although in some respects this is apparently counterintuitive, the cleavage step generates products that are easier to identify by MS. Specifically, LC separates peptides better than proteins; most proteins generate some soluble peptides under conditions compatible with ionization (even if the parent protein itself is poorly soluble), peptides fragment more effectively in a tandem mass spectrometer, yielding spectra that can be sequenced, and peptides are detected by a mass spectrometer at substantially lower levels than the parent proteins from which they were derived. The peptides are then fractionated by LC and analyzed by MS/MS. Each experimentally determined MS/MS spectrum is ‘matched’ with a database of simulated MS/MS spectra generated by in silico digestion of protein sequences either entered directly or extrapolated from DNA sequences. The degree of matching between each experimental and theoretical mass spectrum
eptide-centric approaches—sometimes referred to as shotgun strategies or bottom-up proteomics—are now widely adopted as a means of identifying proteins present in biological mixtures. Such approaches involve the sequence-specific cleavage of a complex, protein-containing sample to create a mixture of peptides of much greater complexity. The underlying assumption of this strategy is that proteins in the original sample can be identified by means of mass spectrometry (MS)-mediated identification of their constituent peptides. The type of instrumentation commonly used in this analysis is liquid chromatography (LC) and electrospray ionization tandem mass spectrometry (MS/MS), but another approach gaining popularity is matrix-assisted laser desorption ionization (MALDI) MS. Although peptide-centric strategies are capable of generating impressive amounts of information, the assumptions and limitations inherent in their use are sometimes underappreciated and frequently unstated. This can lead to overinterpretation of the resulting data and even misleading or false conclusions. In this article, we consider both the positive attributes and limitations of peptide-centric strategies. We
nature biotechnology volume 28 number 7 JULY 2010
is assigned a score, and the peptide sequence in the database with the best score, above some predetermined threshold, is generally assumed to be correct. Typically, if the threshold is not met, no assignment is made. Once a tandem mass spectrum is assigned a peptide sequence, the database(s) of known proteins is searched to define the antecedent protein(s) incorporating it. The overall process is represented in Figure 1. This is a powerful strategy, and there are few practical alternatives. For example, Edman analysis would require isolation and purification of each individual protein followed by exhaustive residue-by-residue sequencing. The process would be time consuming, costly and complex to the extent that it is impractical—if not impossible—at the proteome-wide level. Similarly, sequence inference by de novo interpretation of fragment ion spectra or by means of sequence tags would not be compatible with the tens of thousands of fragment ion spectra generated per hour with modern mass spectrometers. Peptide-centric proteomics combined with automated sequence database searching therefore offers a practical alternative that typically identifies 1,000–2,000 proteins in a biological sample, or perhaps up to 4,000–8,000 proteins in cases in which complex proteomes are extensively fractionated and the peptides in each are exhaustively sequenced (e.g., ref. 3). The approach does, however, have intrinsic limitations relating to the loss of intact protein information and an inability to decipher the combinatorial aspects of protein modifications. Other limitations arise because of the expeditious and sometimes inappropriate use of the tool, but these are neither insurmountable nor fundamental. A peptide-centric strategy can also be leveraged to quantify individual proteins within a mixture. Stable isotope-labeled peptides 659
C O M M E N TA R Y
Protein
Peptides (protease fragments)
MS/MS spectra of peptides
m/z Matching Protein database
Peptides predicted from proteolysis
Identified peptides/ proteins
in silico MS/MS pattern from theoretical peptides
© 2010 Nature America, Inc. All rights reserved.
m/z Figure 1 General approach used by peptide-centric MS technologies for the identification of proteins in complex mixtures. After proteolysis of a protein or complex mixture of proteins, the spectra associated with protease fragments are matched with spectra generated in silico using information obtained from protein databases.
can be added for targeted quantification of a specific protein; this facilitates relative and absolute quantification with high precision. Alternatively, label-free methods (e.g., spectral counting and ion current measurement) can conveniently provide differential (or comparative) estimates of peptide levels with intermediate precision. Although comparisons are sometimes made between ‘shotgun DNA’ sequencing and ‘shotgun proteomics’, the approaches share few similarities. In shotgun sequencing, DNA is randomly shredded into multiple small segments, which are then sequenced by the chain termination method. The process is repeated several times over and reassembly is based on finding similarities between overlapping reads (that is, ragged ends) from the same segments of the original DNA molecule. These overlapping fragments (contigs) are progressively merged together to give longer continuous sequences. By contrast, in shotgun proteomics, fragments are typically generated by sitespecific proteolysis: there are no ragged ends and no overlapping fragments. Underlying assumptions and caveats Peptide-centric proteomics is predicated on several underlying assumptions: first, when a protein is cleaved by a specific protease (or other reagent), it will reproducibly and predictably generate a relatively small number of peptides; second, determining the sequences of a small subset of these peptides is sufficient to define the antecedent protein; third, the association between a small subset of peptides and their predecessors holds for either a purified single protein or complex protein mixture (that is, several peptides from each protein are 660
sufficient to identify multiple antecedents); and finally, the protein databases are populated with all proteins and their variants. These assumptions do not always hold true. End users of proteomics data should be aware of at least confounding issues if they are to use this powerful methodology appropriately and properly interpret the data it generates. Unanticipated cleavages, chemical by-products and the nature and/or number of peptides generated. A critical but contentious assumption is that cleavage by a specific protease (or other reagent) is reproducible and generates a manageable and anticipated set of peptides. As an example, tryptic digestion of a typical protein of molecular mass 50,000 is expected to yield ~50 tryptic peptides. Therefore, one might expect a proteome comprising 5,000 proteins to yield a conservative estimate of >250,000 peptides. Although some argue that trypsin makes few or no mistakes and that the products of digestion are exclusively those predicted by applying the tryptic rules, the number of proteolytic products is considerably higher than expected. This results largely from unanticipated cleavage products, side-products of the reduction and/or alkylation steps, deamidation and oxidation of methionines4. Because these artifacts occur at low levels relative to the major products, they do not present a problem when the sample comprises a few proteins at near equimolar concentrations. Nonetheless, they become major confounders in complex samples where the protein abundances span multiple orders of magnitude. Here, the minor by-products of major components generate more intense signals than the major products of minor proteins.
In addition, the mass spectrometer itself can introduce additional peptide forms through gas phase chemistry or in-source decay. This increased sample complexity complicates the analytical exercise, explains the high proportion of unmatched spectra and accounts for some of the difficulties identifying a larger fraction of the proteins expected in complex samples such as blood plasma. The vast majority of peptide-centric applications incorporate trypsin and search for tryptic peptides, but other enzymes or cleavage agents can be employed. However, searching for nontryptic peptides introduces other complications, some similar to those mentioned above. Limitations of peptide-matching algorithms. Although enormous effort has been devoted to developing algorithms to match ‘real’ MS/MS spectra to those generated in silico from database entries, the strategy has limitations. It is important to underscore that spectra are not interpreted, but are simply matched; the approach therefore fails if it is not used in conjunction with an extensive, error-free database. Hundreds of thousands of MS/MS spectra are typically generated during the analysis of a single sample by peptide-centric proteomics, and in an automated manner these are matched against tryptic peptides generated in silico from entries in the relevant database. High-quality spectra derived from unmodified peptides that are selected without interference from other precursor ions are frequently matched, but many more spectra remain unmatched and unassigned. Instrument enhancements, notably high-accuracy precursor ion measurement, improve the fraction of assigned spectra, but an MS/MS spectrum may remain unidentified for several reasons: first, the spectrum is of poor quality and/or the fragment ions are uninformative; second, the fragmented precursor is not a peptide; third, the peptides are modified in a way that is unaccounted for by the search algorithm; fourth, the peptide is not present in the database searched; or, finally, multiple precursor ions are selected in a particular precursor ion window and concurrently fragmented, leading to complex composite spectra. Importantly, spectra derived from novel peptides or those incorporating residues modified by processes such as oxidation, reduction, nitration and phosphorylation frequently go unmatched. Failure to identify modifications is of special significance and occurs for several reasons. Although investigators can opt to include modifications in their search strategy, most resist the temptation to turn on all
volume 28 number 7 JULY 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
C O M M E N TA R Y or a large number of post-translational modification options because the extraordinarily large search space markedly increases both search time and the number of false-positive identifications. In addition, the physical characteristics of the modified residues (their mass and/or ionization efficiency) can work against their detection. For example, although ~60% of human proteins are reportedly glycosylated, glycosylated tryptic fragments are often ‘invisible’ on mass analysis because they have low ionization efficiencies, because heterogeneous glycosylation distributes the total ion current over numerous molecular entities and/or because the ions appear beyond the m/z (mass/charge) range typically scanned. Stoichiometry can also confound attempts to identify modifications. If only a fraction of the total population of a specific polypeptide is modified (e.g., by phosphorylation) then on proteolysis and analysis, the modified form is missed or obscured by the more abundant (unmodified) peptides. In short, modified peptides are typically underrepresented in the data set and therefore, so too are their modified antecedent proteins5–8. If, however, the focus is on identifying specific modifications in pure proteins or a subset of the proteome, the peptide-centric approach offers advantages because the modification’s influence on mass is more evident at the peptide level than at the protein level. Said another way, small changes in mass at multiple sites are difficult to detect and define at the protein level, but defining their nature and number is easier and more accurate when performed on peptides. Incomplete databases. Another underlying assumption is that proteomics databases are complete and contain all protein structures and their variants found in the sample of interest. This is rarely, if ever, the case. Many variants have not yet been characterized and documented. Furthermore, there are many sequence databases available, each with deficiencies and errors that influence the outcome of a search, with no consensus regarding which database should be used, or the minimum requirements for definitive identification of the antecedent protein and its modified forms. Clearly, matching strategies can only be as good as the database(s) they search. For example, if an organism’s genome and proteome are poorly defined, even high-quality spectra derived from it go unmatched, or worse, mismatched. Search tools always provide a ‘best match’ between the experimental and hypothetical data. Even so, the challenge is to objectively assess the quality of the match
and decide whether it is real. Tools are available to do this, and additional approaches are being developed. Specifically, the discrimination between true- and false-positive peptideto-spectrum matches is usually attempted using statistical mixture models that combine an array of factors into a single discriminate score9 or by decoy strategies10. Problems arise even with an extensively populated database. If only one peptide entry fits the experimental data, this is no guarantee of correct assignment, and when multiple database entries fit the experimental data equally or nearly equally well, selecting one over another is subjective. The protein inference problem. A set of peptides may be degenerate and shared by multiple proteins. Consequently, determining a unique protein precursor is often impossible, regardless of the quality of the analytical work. This fundamental limitation has been discussed in detail11. Because proteins are cleaved to peptides in the first steps of peptide-centric analysis, there is no straightforward way to restore the link between proteolytic products and their unique antecedents. Consequently, erroneous assignments and misleading conclusions can follow. There are, however, approaches around this problem. For example, a nearest-neighbor analysis could be performed on a time-course study of fragments released in the protease digestion, but this markedly increases the workload and reduces throughput12. Extrapolation in the absence of any data. It is important to acknowledge that often only a fraction of the peptides making up the full amino acid backbone of a protein are recovered and identified, and that these alone are used to define the protein. Where there are gaps—and gaps can constitute most of the sequence—the missing amino acids are ‘filled in’ by assuming them to be exactly as prescribed in the database entry. For low-abundance proteins, fewer peptides are recovered and assigned. Consequently, more extrapolation is required. But to assume the absence of modifications and/or mutations in the unmatched regions is dangerous, given that the modifications themselves adversely influence the chance of detecting them. For example, a single point amino acid mutation within a tryptic peptide will stymie a match, as will the presence of most post-translational modifications. Throughput and pooling of samples. Peptidecentric analysis of a single sample can take many hours, sometimes much longer, and therefore practical sample throughput is limited. Consequently, studies aimed at defining
nature biotechnology volume 28 number 7 JULY 2010
heterogeneous populations are frequently underpowered. Pools or sub-pools of multiple samples aid in part, but this masks individual variations and averages out proteomic data. Qualitative applications Peptide-centric methods are frequently used to ‘define’ the components of a biological sample. The value of this exercise is questionable, given that (i) exactly what is present is rarely established without ambiguity, (ii) comprehensive coverage requires repeat analysis and (iii) what is not found is determined more by analytical performance than the characteristics of the sample itself. In many instances where a protein is reportedly identified, the data are open to alternative interpretations. As a consequence, erroneous assignments increasingly populate the biological and clinical literature5–8. Parsimony and simplicity should dictate interpretation of the findings and all alternative structures should be considered plausible. Conclusions regarding the nature, number and/or relative amounts of the in vivo antecedent are to some extent always speculative; prudent, hypothesis-driven verification of each candidate is recommended if it is to form the basis of an important claim. Compounding the problem of identifying the constituents of a complex sample is the fact that the tryptic digest contains many thousands of peptides. Even an extended LC run is inadequate to resolve all of these. Consequently, a deluge of products continuously elute into the mass spectrometer. Under these conditions, even the most advanced data-dependent scanning algorithms cannot decipher all the components in a single run. Therefore, on reanalysis, novel findings are returned, along with redundant identifications. The instrument’s attributes, sample complexity and user-defined operating parameters are key to determining the degree of redundancy. Nonetheless, significant variations in the data are usually evident, even for the same sample run on the same instrument on the same day. Correspondingly, vast differences are evident when different instruments and operating conditions are used for the same sample. Repeated analysis is therefore essential to gain a comprehensive depiction of the components of most samples. At least two publications demonstrate the extent of this problem. When six research groups used multi-dimensional protein identification technology (MuDPiT) to analyze a protein extract of 10,000 human cells, of the 1,757 nonhomologous proteins found, only 52 (3%) were found by all groups; 1,109 (63%) were found once only13. In another study, at 661
© 2010 Nature America, Inc. All rights reserved.
C O M M E N TA R Y least three replicate analyses of gel-separated proteins were required to obtain a stable set of peptides and proteins14. Ongoing instrument and instrument control advances ameliorate, but do not eliminate, this problem15. Using peptide-centric data to draw conclusions about what is absent from a sample is imprudent. Specifically, end users should not place undue emphasis on the apparent absence of a protein (or protein isoform) from the list of identified candidates. The absence of evidence for the presence of a peptide should not be construed as evidence for the absence of the peptide. A specific target might not be identified because it was (i) present at or below the detection limit, (ii) poorly recovered or unstable under the workup conditions, (iii) not matched by the algorithm or (iv) missing from the database. Proteins defined by interrupted start and stop codons and those with modified residues are also typically underrepresented. For all the reasons cited above, making comparisons between different samples or across several studies is similarly ill advised. Quantitative applications Determining differences in protein levels between two or more sample populations is among the most important of all tasks in proteomics, especially for applications related to biomarker discovery and use. Despite the potential of peptide-centric approaches for these applications, this task is often poorly executed16. Any quantitative analysis using proteincentric approaches should be thoroughly validated before its routine application. Depending on the intended application, a single-analyte assay is typically validated by determining most if not all of the following performance parameters before use: accuracy, precision (often considered at three levels: repeatability, intermediate precision and reproducibility), specificity, limit of detection, limit of quantification, linearity and range, ruggedness and robustness. Although defining all of these parameters is an enormous, if not impractical task when attempting to quantify thousands of components, it is misguided to lack rigor in defining performance parameters and use a method that is either not validated or poorly validated. At least two of these performance parameters—repeatability and specificity—deserve careful consideration; without these, the method should not be considered quantitative. Precision is the extent to which repeated measures of a series of samples agree. Many factors influence precision. These include, but are not limited to, the instrument, the environment (e.g., temperature and humidity) 662
the source of reagents, the operator, the matrix, inconsistencies in working practices, irreproducibility of sample handling steps, analyte concentration, instrument parameters and performance parameters (e.g., LC column life). With peptide-centric approaches, sample manipulation is commonly practiced to cut deeper into the proteome. However, this can introduce variability and compromise precision. In addition, variations in the sample matrix or minor perturbations within the mass spectrometer (e.g., pressure and temperature fluctuations, or the presence or levels of co-eluting species) can alter the ionization process and in turn affect the signal intensity. Intensity comparisons are therefore compromised at a fundamental level. Methods that rely on comparing results between different analytical runs are clearly the most susceptible to these factors. A measure of precision is therefore essential if experimental findings are to be put into context; that is, to assess whether a measured difference is real or simply relates to imprecision in the method itself. Assessment of precision requires multiple determinations, ideally of several different samples containing a range of concentrations of the target analyte(s). Unfortunately, these tests are rarely performed. At a minimum, precision should be determined at various concentrations for a subset of analytes measured in one or more test samples and data from these studies aid in assessing the method and the validity of subsequent findings. Technical replicates (that is, repeat analyses of the same sample) are therefore essential to assess whether a change is real or an artifact of the analytical method itself. The second key performance parameter is specificity. Although they differ in particular details, all peptide-centric quantitative approaches are based on analysis of peptides derived from parent proteins. Proteotypic peptides—the (small) subset of predicted peptides that are repeatedly and consistently identified from a protein in a mixture—are typically used in this setting. However, for the same reasons that protein identification is problematic when based on one or a few peptides, accurate quantification based on one or a few peptides also has inherent risks. A single peptide only defines a segment of a protein and modifications elsewhere in the molecule are not telegraphed to this entity. Quantification based on a peptide that is common to multiple related forms leads to an overestimate of the amount of any single variant. Similarly, quantification based on a unique peptide fails to ‘recognize’ and quantify closely related forms, even if they are significantly more abundant. Precise and accurate quantification of a specific protein
variant is therefore achievable only when the targeted peptide is derived from a single precursor protein17. Because the aforementioned parameters are rarely determined, uncertainties are associated with most quantitative proteomic data. This is especially so at low protein levels where the probability of an erroneous protein assignment increases, and precision and accuracy decline. It is therefore important to stress that if the quantitative method itself is not rigorously validated, measured differences are of questionable significance and they should be independently verified. Available ‘quantitative’ approaches have recently been reviewed18 and they fall into two main categories: label-free approaches and those involving the use of stable isotope labels. Label-free approaches. Label-free quantification is increasingly popular because it is fast, cost-effective and relatively uncomplicated. There are two main label-free strategies, both involving digestion with a protease to give a peptide mixture that is subsequently analyzed by LC-MS or LC-MS/MS. The first strategy, spectral counting, compares the number of identified MS/MS spectra from the same protein across multiple LC (or LC/LC)-MS/MS runs. The assumption is that increasing protein abundance increases protein-sequence coverage, the number of unique peptides identified and the number of identified total MS/MS spectra (spectral count). Although relative protein abundance is correlated with sequence coverage, peptide number and spectral count, the correlation is only strong (r2 = 0.9997) with the last of these (spectral count) and extends over a dynamic range of approximately two orders of magnitude19. On this basis, Liu et al.19 have concluded that spectral counting is a simple and reliable approach to relative protein quantification. We note, however, that their data were obtained from an idealized sample set comprising standard proteins spiked into a fixed matrix consisting of proteins that were resolubilized after precipitation of a total yeast cell extract. The behavior of real-world samples is likely to be far from ideal. Even so, a modification of the spectral counting strategy—absolute protein expression (APEX) profiling—has also been reported recently20. In this approach, the measurement of absolute (rather than relative) protein concentration per cell is made possible by the application of several correction factors. Ion-current (chromatographic peak intensity) measurements is an alternative strategy to spectral counting based on the observation that the measured ion current increases
volume 28 number 7 JULY 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
C O M M E N TA R Y with increasing concentrations of an injected peptide. In practice, LC-MS analysis of a mixture of peptides is performed and the ion current (either peak height or area) is recorded over the appropriate retention intervals21,22. Although the relationship between amount and ion current holds for standard samples of limited complexity, in practice, measuring differences in protein abundances in complex biological samples is problematic. Multiple factors influence the measured ion current from run to run for the same sample, and additional factors come into play when comparisons are made between samples. For example, precision can be compromised by subtle variations in sample preparation, injection volume, retention time and co-eluting species, as well as temperature and pressure fluctuations within the mass spectrometer. The simplicity of implementing labelfree approaches makes them attractive. However, precision is suboptimal, complications are common and findings are uncertain. For example, in a 2009 Association of Biomolecular Resource Facilities (ABRF) study23, data generated from digests of parallel lanes of gel-separated proteins were supplied to several groups. The task was to ‘identify’ the proteins in the sample and determine which were elevated or reduced in intensity relative to the adjacent lane. Notably, there was no agreement among participants in the study, and no evidence that either approach—whether based on spectral counting or intensity—could reliably address the quantitative question at hand23. Labeled approaches. Labeled strategies offer the significant advantage that samples are combined after labeling and analyzed in a single run. Consequently, precision is markedly improved, albeit at the expense of the time, cost and complexity of the analysis. Although labeled approaches are routinely adopted, rarely are their performance characteristics evaluated and the data generated are thus of questionable validity. There is an abundance of approaches available for ‘discovery’ applications, the most important of which use isobaric or differentially isotopically labeled reagents18. Of special interest, however, is isotope dilution and absolute quantification of specific proteins as a precise and accurate quantitative strategy for multiple proteins in complex biological samples. Cost-effective, precise and accurate analysis is possible provided that the investigator is mindful of the caveats mentioned earlier, and careful attention is paid to the selection of the peptides monitored, the possibility of incomplete digestion and the influence of protein
Figure 2 Representation of sequence coverage for a protein identified using a peptide-centric approach. The full sequence of the protein is represented by a large rectangle of fixed dimensions, with the N terminus on the left and the C terminus on the right. The filled (purple) sections show the relative portion of the entire sequence that was measured and used to identify the protein.
modifications. A major advantage of this approach when compared with enzyme-linked immunosorbent assay (ELISA) is that there is no requirement for immunological reagents. Consequently, assays offering excellent performance characteristics can be developed quickly and cost-effectively. In fact, significant progress has been achieved recently in the rapid generation of selected reaction monitoring assays24 and in making collections of these assays publicly accessible25. General recommendations After 15 years of intensive effort and substantial financial investment, some profess that proteomics has not made the progress anticipated or promised. We suggest part of the problem is the unrealistic expectations of some and indiscriminate application of the tools by others. Proteomics is a complex endeavor, and the available tools are not yet sufficiently refined. Therefore, with the strengths and weaknesses of peptide-centric approaches in mind, we offer the following recommendations for consideration. Investigators should detail the data in support of any protein assignment. Researchers should routinely show the sequence coverage for each identified protein. For example, in Figure 2 the full sequence of the protein is represented by a rectangle of fixed dimensions, with the N terminus on the left and the C terminus on the right. The filled sections show the portion of the entire sequence used to make the assignment. End users of these findings can then fully appreciate the portion of the experimental data used in support of the assigned structure and the portions of the sequence extrapolated without any supporting data. Peptide-centric technologies reduce protein characterization to the peptide level and in some settings this extrapolation is reasonable; in others, it is misleading. Our concern is that without explicit statements of exactly what was found, what assumptions
nature biotechnology volume 28 number 7 JULY 2010
were made and alternative explanations for the data, the literature will be rife with errors. We also favor the use of transparent, open source tools. If the tools used are both fully described and generally accessible, the operating parameters are sufficiently detailed and the raw data are available, others can independently perform the analysis and confirm or extend the conclusions. Investigators should offer all alternative explanations that fit their experimental data. Proteins, especially those from large families with extensive sequence homology, produce many identical peptides after proteolytic digestion. Such degenerate peptides cannot be unambiguously linked to a single protein sequence unless there is additional, conclusive evidence to permit an informed selection at the protein level. In a peptide-centric study, investigators should acknowledge all possible protein families in the nonredundant database because none of these can be favored or disregarded over another11. Knowing which protein form is present is important because different isoforms are sometimes organ- or disease-specific and may have different biochemical characteristics. Investigators should describe what principles have been used to infer the identity of the proteins from the identified peptides (e.g., parsimony, expansive set or no tools used) and should consider highlighting the proteotypic peptides. Whether or not a peptide is proteotypic can be extracted from such databases as PeptideAtlas (http://www.peptideatlas.org/) and is computed on the fly. Investigators should specify the number of unassigned, high-quality spectra associated with each study. The high percentage of unmatched spectra has received inadequate attention. Increased emphasis should be placed on (i) generating high-quality, accurate mass MS/MS data, (ii) developing de novo sequencing tools, (iii) refining approaches for homology matching and (iv) assigning all data. Although the absolute number of matched peptides is often impressive, there is a tendency to focus on these findings while ignoring the remaining data. Investigators should commit to interpreting the majority, if not all, of the acquired data because end users are likely to be especially interested in cognate proteins containing amino acid polymorphisms, posttranslational modifications or spliced variants that were not anticipated. Unfortunately, however, these findings are typically underrepresented in the assigned data. Similarly, modifications with low stoichiometry 663
© 2010 Nature America, Inc. All rights reserved.
C O M M E N TA R Y (e.g., phosphorylation, oxidation and nitration) are routinely missed because the modified peptides are in low abundance, modified residues are not identified and/or the modification is labile. Peptide-centric approaches typically find unmodified, high-abundance proteins; lowabundance modified proteins are underrepresented. This bias influences results and inaccurate or false conclusions follow. For these reasons, we believe that all the raw data sets in their entirety should be made freely available at the time of publication along with a detailed description of the tools used to generate the conclusions. The frequency with which spectra are unassigned points to a serious limitation with our existing methods. Some powerful de novo sequencing tools have been developed, but they are not routinely employed. More attention should be directed toward these strategies. Peptide-centric methods should be applied as hypothesis-generating tools. Although peptide-centric data are rarely definitive, they can be garnered relatively quickly and cost-effectively. The strength of these methods is that they provide a wealth of information that can subsequently be addressed in targeted, hypothesis-driven studies. The data from these studies are of limited value in isolation and orthogonal verification of key findings is essential. Ongoing development of technology. Further development of alternative experimental strategies for the practical and comprehensive analysis of proteomes is critical. None of the existing approaches is optimal and additional high-throughput, cost-effective strategies that can better define and
664
precisely quantify intact proteins and their variants are of particular importance. Conclusions Current efforts to explore deeper into the proteome with much greater speed and specificity in a variety of biological samples are certainly to be applauded, especially given the formidable complexity of the proteome. Nonetheless, the solution to unraveling it lies in the application of advanced analytical technologies. However, advances in technologies, especially those with a focus on speed and high throughput, are always associated with sacrifices. For example, evolving proteomic methods may be fast, but the speed comes at the price of coverage and quantitative precision. Minimal, readily accessible data are used to identify proteins, and the parts of the sequence that are not determined are assumed; similarly, we draw quantitative conclusions without validation. These compromises allow rapid throughput and sometimes facilitate biological advances, but at times they can hamper our progress and confound our understanding. Legitimate exploratory applications of peptide-centric approaches acknowledge ambiguities in the interpretation of the data and aim to stimulate novel hypothesis generation. In addition, targeted peptide-centric approaches (e.g., protein quantification based on proteotypic peptides) provide investigators with powerful and practical tools for testing hypotheses already under consideration. The current state of proteomics is such that there are exciting opportunities for the development of new analytical methods. In the meantime, however, peptide-centric approaches are powerful contrivances provided we bear in mind the limitations of our data, the assumptions we have made, and the fitness of our findings for any intended purpose.
ACKNOWLEDGMENTS We thank R. Nelson, A. Yergey and I. Krull for their insightful and constructive comments on early drafts of this manuscript. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. 1. Nedelkov, D. et al. Proc. Natl. Acad. Sci. USA 102, 10852–10857 (2005). 2. Krishna, R.G. & Wold, F. Adv. Enzymol. 67, 265–298 (1993). 3. Brunner, E. et al. Nat. Biotechnol. 25, 576–583 (2007). 4. Picotti, P., Aebersold, R. & Domon, B. Mol. Cell. Proteomics 6, 1589–1598 (2007). 5. Carr, S. et al. Mol. Cell. Proteomics 3, 531–533 (2004). 6. Andersen, J.S. & Mann, M. EMBO Rep. 7, 874–879 (2006). 7. Wilkins, M.R. et al. Proteomics 6, 4–8 (2006). 8. Shen, Y. et al. Anal. Chem. 80, 1871–1882 (2008). 9. Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Anal. Chem. 74, 5383–5392 (2002). 10. Elias, J.E. & Gygi, S.P. Nat. Methods 4, 207–214 (2007). 11. Nesvizhskii, A.I. & Aebersold, R. Mol. Cell. Proteomics 4, 1419–1440 (2005). 12. Whaley, B. & Caprioli, R.M. Biol. Mass Spectrom. 20, 210–214 (1991). 13. Chamrad, D. & Meyer, H.E. Nat. Methods 2, 647–648 (2005). 14. Elias, J.E., Haas, W., Faherty, B.K. & Gygi, S.P. Nat. Methods 2, 667–675 (2005). 15. Schmidt, A., Claassen, M. & Aebersold, R. Curr. Opin. Chem. Biol. 13, 510–517 (2009). 16. Hackett, M. Proteomics 8, 4618–4623 (2008). 17. Duncan, M.W., Yergey, A.L. & Patterson, S.D. Proteomics 9, 1124–1127 (2009). 18. Schulze, W.X. & Usadel, B. Annu. Rev. Plant Biol. 61, 491–516 (2010). 19. Liu, H., Sadygov, R.G. & Yates, J.R., III. Anal. Chem. 76, 4193–4201 (2004). 20. Braisted, J.C. et al. BMC Bioinformatics 9, 529 (2008). 21. Chelius, D. & Bondarenko, P.V. J. Proteome Res. 1, 317–323 (2002). 22. Bondarenko, P.V., Chelius, D. & Shaler, T.A. Anal. Chem. 74, 4741–4749 (2002). 23. Settlage, R.E. et al. PRG-2009: Relative Protein Quantification in a Clinical Matrix (ABRF, Proteomics Research Group, 2009)
. 24. Picotti, P. et al. Nat. Methods 7, 43–46 (2010). 25. Picotti, P. et al. Nat. Methods 5, 913–914 (2008).
volume 28 number 7 JULY 2010 nature biotechnology
F E AT U R E
Proteomics retrenches Peter Mitchell
© 2010 Nature America, Inc. All rights reserved.
Improvements in technology are making proteomics research less descriptive and more analytic, but the field has yet to deliver on its aspirations. Ten years ago, proteomics research began moving from a purely qualitative mode— compiling long lists of proteins present in a biological sample—to quantitative methods. The predictions made then, that mass spectrometry (MS)-based ‘shotgun proteomics’ would become ever more sensitive, have been borne out, and quantitative studies have become almost routine. The debate now has moved on to whether this apparent progress has actually delivered anything; or whether MS-based proteomics needs to make a further step, or even a change of direction, before it can deliver clinically useful results. Biomarker malaise By and large, the search for protein biomarkers—proteins that can indicate the presence of disease or how an individual is responding to therapy—has failed. Some say that it should not even have been undertaken in the way it was. Countless millions of dollars have been thrown at the problem of looking for biomarkers; those discovered by proteomics researchers have turned out to be so nonspecific as to be next to useless, far from the ‘holy grail’ envisaged some ten to fifteen years ago. “Biomarkers have been the biggest disappointment of the decade, probably because proteomics’s role in their discovery was overhyped,” says John Yates, director of the Proteomic Mass Spectrometry Lab at the Scripps Institute (La Jolla, CA, USA). One difficulty has been the large dynamic range: the fact that protein abundance in biological fluids—particularly plasma, a favorite specimen for early biomarker discovery work— spans some ten orders of magnitude (Fig. 1). “The serum proteomics debacle led to the realization that you can’t discover markers that are Peter Mitchell is a freelance writer based in London.
low abundance by doing discovery in serum or plasma,” says Daniel Liebler of Vanderbilt University (Nashville, TN, USA). Another reason for the billion-dollar biomarker fiasco is the lack of validation, suggests Bernhard Kuster, chair of Proteomics and Bioanalytics at Technische Universitaet Muenchen, Freising, Germany. “I am sick of seeing papers proving that a known biomarker is a marker for yet another disease,” he says. “All it means is that the biomarkers discovered so far are mainly the same proteins that pop up in all kinds of diseases, indicating that the organism is under some kind of stress but not distinguishing between diseases. Various calgranulin proteins, for example, have been identified as serum biomarkers for everything from inflammatory arthritis to squamous cell carcinoma.” This is old news, says biomarker researchers, who claim the field is now advancing owing to a concerted effort to control sources of variability and define standard operating procedures for discovery and verification of biomarkers. Variability in sample processing, problems with the instrumentation (both separation technology and MS systems) and problems with data analysis all contributed to the difficulties, according to Steven Carr of the Broad Institute of MIT and Harvard (Cambridge, MA, USA). “Today, technical variability is greatly reduced owing to improvements in all of the above,” he says. “Early approaches to biomarker discovery, where the number of analytes was large and the number of samples analyzed small, was a recipe for a high false discovery rate,” he says. Another stopper has been the problem of diversity among study participants diagnosed with a given disease and the lack of clear methods for defining clinical phenotypes so that samples can be classified consistently, which is vital to correlating the expression level of a protein with the presence of disease. The lack
nature biotechnology volume 28 number 7 JULY 2010
of it confounds the statistical analysis. Higher sample throughput is needed, a problem that has yet to be solved with identity-based MS. “You can analyze larger numbers of samples, but only if you limit the amount of sample fractionation prior to MS analysis, which in turn limits the depth of coverage of the proteome,” says Carr. However, he points out that some emerging technologies, such as ion mobility, may enable higher throughput and greater specificity with equivalent or higher sensitivity to current methods. Still, some remain skeptical. “Saying this is going to alienate a lot of people,” says Kuster prophetically, “but these ten years of work and billions of dollars have been largely unsuccessful. We have to come back to charting protein–protein and protein–small molecule interactions and signaling pathways at the cellular and molecular level.” Quantitative methods have their limits, he warns. “They only measure what is present, albeit more accurately,” he says. “That still leaves the conceptual problem of linking cause and effect, and to solve that, we have to get away from examination of body fluids and design experiments that hypothesize a particular type of cell or tissue.” Unfortunately, that puts even more sensitivity demands on proteomics technology because the total amount of protein obtained from a localized sample will be much smaller. Multiplexing through microarrays Another seeming failure is the protein microarray chip. Like many proteomics approaches, the idea was borrowed from genomics as a method of performing thousands of experiments in parallel. “That hasn’t happened because there were very basic difficulties that could not be overcome,” says Matthias Mann of the Max Planck Institute for Biochemistry (Martinsried, Germany). Two kinds of protein arrays have been tried. In ‘capture arrays’, probes—in most cases, 665
f e atu r e 109 108
MRM alone in 10 nl plasma
Relative concentration
107 106 105
MRM-SISCAPA in 10–100 μl plasma 104 103 100 10
MRM-SISCAPA with larger samples
0.1 Serum albumin Apolipoprotein A-I Serotransferrin Alpha-1-acid glycoprotein 1 Alpha-1-antitrypsin Haptoglobin beta chain Transthyretin Fibrinogen gamma chain Fibrinogen alpha chain Fibrinogen beta chain Apolipoprotein C-III Alpha-2-macroglobulin Complement C3 Antithrombin-III Complement factor H Complement factor B Ceruloplasmin Complement component C9 Plasma protease C1 inhibitor Complement C1q subcomponent, C chain Complement C1q subcomponent, B chain Complement C1q subcomponent, A chain Prothrombin Plasma retinol-binding protein Complement C4 alpha chain Complement C4 gamma chain Complement C4 beta chain Apolipoprotein B-100 Alpha-2-antiplasmin Plasminogen Apolipoprotein E Complement component C8 gamma Complement component C8 alpha chain Complement component C8 beta chain Complement factor I Complement component C7 Complement component C6 Thyroxine-binding globulin Coagulation factor XIIa light chain Coagulation factor XIIa heavy chain Complement C5 alpha chain Complement C5 beta chain Vitamin K-dependent protein S Apolipoprotein(a) Coagulation factor X Adiponectin Coagulation factor XIa light chain Complement C2 Beta-2-glycoprotein I Coagulation factor IX C-reactive protein AL-11 Vitamin K-dependent protein C Coagulation factor XIII A chain Coagulation factor XIII B chain Complement factor B Bb fragment Coagulation factor V Insulin-like growth factor IA C3a anaphylatoxin Angiotensinogen L-selectin Vascular cell adhesion protein 1 Factor VII heavy chain Factor VII light chain Intercellular adhesion molecule-1 Fibronectin Transforming growth factor beta 1 Alpha-1-antichymotrypsin Thrombospondin 1 Fibrinopeptide A Platelet factor 4 Plasminogen activator inhibitor-1 Thrombomodulin Coagulation factor VIII C5a anaphylatoxin P-selectin E-selectin Heparin-binding growth factor 2 Insulin-like growth factor binding Tissue-type plasminogen activator chain Thyroglobulin Tumor necrosis factor ligand superfamily Leptin receptor Prostate-specific antigen 92 kDa type IV collagenase Interleukin-1 receptor antagonist Alpha-fetoprotein Prostatic acid phosphatase Somatotropin Inhibin beta A chain Vitamin K-dependent protein C heavy Vitamin K-dependent protein C light Carcinoembryonic antigen-related cell Atrial natriuretic factor Small inducible cytokine A2 Gamma-brain natriuretic peptide Macrophage colony-stimulating factor-1 Interleukin-8 Small inducible cytokine A3 Hepatocyte growth factor alpha chain Interleukin-18 Granulocyte colony-stimulating factor Tissue factor Interleukin-2 Interleukin-4 Vascular endothelial growth factor A Interferon gamma Renin Interleukin-1 beta Interleukin-5 Tumor necrosis factor, soluble form Interleukin-10 Interleukin-6 Interleukin-12 alpha chain Interleukin-12 beta chain
© 2010 Nature America, Inc. All rights reserved.
1
Protein
Figure 1 The dynamic range of plasma proteins. Using various MS approaches, the entire dynamic range of plasma proteins can be approached. Color bands indicate the abundance strata accessible with the different MS approaches. The colored symbols indicate which technology was employed in measuring each protein. Values are taken from the literature. SISCAPA: stable isotope standards and capture by anti-peptide antibodies. (Source: Leigh Anderson, The Plasma Proteome Institute, Washington, DC, USA, modified from ref Mol. Cell Proteomics 1, 845–847, 2002.)
antibodies, but they could also be aptamers or artificial scaffolds—are prebound to a chip and the sample is then applied to all of them at once to search for reactions. The difficulty here is in creating monospecific reagents to eliminate off-target interactions. Without that degree of specificity, the wide dynamic range of proteins in the sample solution triggers far too many side reactions. Only those classes of proteins for which banks of specific antibodies exist, such as cytokines, have enjoyed commercial success. A second model is the ‘reverse protein microarray’, where hundreds to thousands of sample proteins are expressed in active form and then bound to the chip for testing. This approach has proven to be problematic as well, because of the time and cost of purifying hundreds or thousands of proteins. Equally vexing is the problem of preserving protein activity through the manufacturing process. In addition, when arraying proteins, batch-to-batch variability is a problem. Invitrogen (Carlsbad, CA, USA), which offers human protein microarrays with more than 9,000 human proteins arrayed on ultrathin nitrocellulose, has so far gotten around the manufacturing problem 666
by scaling up their protein production capability such that a single lot can support thousands of samples, which exceed their customers’ needs, according to Niroshan Ramachandran, manager of R&D in the company’s Protein Technologies division. Joshua LaBaer, director of Personalized Diagnostics at Arizona State University’s BioDesign Institute (Tempe), has solved some of these problems by synthesizing proteins on a surface, on what he calls programmable self-assembling arrays. Complementary DNA clones for several hundred proteins are adhered to a surface by means of an epitope tag engineered onto the end of the proteins, over which an in vitro synthesizing system is laid. The advantage here is that the proteins are all “fresh,” says LaBaer, made within an hour of one another. In addition, no purification is needed, and the range of protein concentrations is much tighter (within an order of magnitude) than with protein spotting, which by and large yields arrays that reflect the concentrations of the proteins in the solution from which they are purified. LaBaer has used this technology to isolate autoantibodies from subjects with ankylos-
ing spondylitis1 and p53 autoantibodies from those with ovarian cancer2. The technology for manufacturing self-assembling protein arrays has been licensed, but the details have not been disclosed. Going quantitative So whereas genomics has become highly parallel, proteomics still works in a sequential fashion. It has relied on MS for progress, and it hasn’t been entirely disappointed. Deep sequencing by MS has taken over the field, says Mann. Quantification on a large scale has been a main theme of dynamical proteome studies. The original ‘shotgun’ technique involved digesting proteins, chemically or enzymatically labeling them with isotope tags (the two most common methods being isotopecoded affinity tags, or ICAT, and isotope tags for relative and absolute quantification, or iTraq; see Supplementary Techniques online of Mallick & Kuster pp. 695–709), injecting them into a mass analyzer, and identifying and quantifying them by matching the resulting fragmentation spectrum to known protein spectra held on public databases.
volume 28 number 7 JULY 2010 nature biotechnology
500 KHK ATPAF2 SETD3 SPRY2
250
GLB1L3 FYTTD1 IHPK1 IFRD1 GCNT3
0
EIF2S3 F2 FARP2 ENOX1 KLHL13 NIBP
250
MARS NUP210 THBS4 KIAA0746
Number of redundant 1,250 Da tryptic peptides
More recently, chemical labeling has been replaced by metabolic labeling—the so-called SILAC method (stable isotope labeling with amino acids in cell culture) popularized by Mann’s group. Most cell lines, including those derived from animals, can be labeled with a heavy stable isotope, allowing very good quantitative studies. These labeling methods require a sophisticated tandem MS instruments and are not trivial to use. When properly set up, however, they can quantify changes in the proteome, sometimes even in time-resolved fashion. They are also being developed for such specialized applications as protein imaging. The gold standard mass spectrometer is generally agreed to be the Orbitrap made by Thermo Scientific, headquartered (Waltham, MA, USA). It traps injected ions in an electric field, causing them to orbit a central electrode in rings determined by their mass and charge. The field also causes ions to oscillate along the central electrode’s axis, at a frequency that depends only on their mass/charge ratio and not on the ion velocity. This makes the instrument a very sensitive mass analyzer, with a mass accuracy of 1 to 5 p.p.m. and dynamic range of around 5,000. Ruedi Aebersold of the Institute of Molecular Systems Biology (Zurich), says this substantial progress made in instrumentation, coupled with improvements in separation schemes, database searching and data validation tools, has led to many new biological insights. “The number of proteins credibly identified in a shotgun study in the year 2000 was maybe 100 to 300,” he says. “Now the state of the art would be 4,000 to 5,000 proteins, or even more.” The technique has been applied successfully to, for example, studying biological processes in organelles and measuring cell responses to stimuli or viral infection, says Aebersold. Moreover, there is still a great deal of room for progress in MS technology, according to Kuster. Three properties of a mass spec determine its performance in proteomics applications: ion injection efficiency, cycling speed and detector sensitivity. “The detectors are exquisitely sensitive already: they can already detect a single ion, and you cannot improve on that,” says Kuster. But the process of getting the ions to the detector can be improved. Most mass specs use electrospray ionization to inject the ions. Although its efficiency has been improved tenfold in recent years, it still loses at least 99% of ions on their way to the detector. Moreover, mass specs cannot blank out irrelevant molecules; whatever is sprayed into the machine is what comes out at the detector, so most of the ion current is not peptides but
Number of redundant tryptic peptides excluding 1,250 Da peptides
© 2010 Nature America, Inc. All rights reserved.
f e atu r e
HIRA 0
100
200
300
400
500
600
700
800
900
1,000
500
Position of peptide in the protein sequence
Figure 2 Missing the mark. In the Bell et al. study4, twenty proteins were sent to 27 laboratories. Heatmaps for each of the 20 proteins are shown from the centralized analysis of the raw data from all 27 laboratories, revealing the frequency of observation of a given peptide and its position in the sequence. Heatmaps indicate the frequency of tandem mass spectra assigned to tryptic peptides (red), with the peptides of mass 1,250 ± 5 Da indicated in blue. (Reprinted from ref. 4 with permission.)
‘dirt’. This generates a noise level that is much higher than the sensitivity of the MS. “Solving these two problems and thus improving the signal/noise ratio will be the way forward for the next round of instruments,” says Kuster. “There is still a factor of 20 to 100 of improved sensitivity that could be harvested, so mass spec technology will be the driver for many years to come.” Another limiting parameter is the cycling rate. Current instruments run at 10 Hz (that is, ten spectra per second). Speeding this up will allow experimenters to improve the measurement depth—vital because of the wide dynamic range, when the aim is to quantify all the proteins present. Another trend in MS methods, according to Yates, is a shift from stable isotope labeling to label-free methods. Many labs now prefer label-free methods because they are much cheaper, and easier to perform, than SILAC. The two main methods produce proxy data that correlate well with protein abundances in complex samples. One measures the peak intensities of peptide ions, the limitation here being the purity of the peak. “Getting a clean peak and aligning the peaks can be difficult,” says Yates. The other method uses spectral counting, which counts the number of tandem MS spectra assigned to each protein, the number of spectra for each peptide or protein being proportional to the amount of
nature biotechnology volume 28 number 7 JULY 2010
protein in the sample—that is, the frequency with which the peptide of interest has been sequenced by the MS. The main drawback of this method is the difficulty of measuring small changes in the quantity of low-abundance proteins, which is often masked by sampling error. However, the method has an excellent linear dynamic range of about three orders of magnitude, which isotopic labeling such as SILAC doesn’t approach. “Label-free methods have proven to be very robust and reliable, in the hands of folks that have enough observations for the data to be meaningful,” says John Bergeron, a proteomics expert at McGill University (Montreal). The main source of error in all MS methods compared with the radioimmunoassay gold standard is the problem of irreproducibility when the protein is degraded into peptides; sometimes the so-called proteotypic peptide will not survive the process. It is clear that both labeled and unlabeled MS analyses will continue to have their uses. “Stable isotope labeling has a place; it does give you higher quality data at the analysis end,” says Yates. And with labeling methods such as iTraq, the labels are introduced so late in the process that the experiment can be performed much faster than in earlier labeling methods. Even so, it is a lot more challenging technically than label-free techniques, and also prone to systematic errors. 667
f e atu r e a
b z Intensity
Intensity
m/
m/z
Time
Time
© 2010 Nature America, Inc. All rights reserved.
Figure 3 The relationship between tandem MS (MS/MS) and multiple reaction monitoring (MRM). In both approaches, liquid chromatography delivers ionized peptides to a mass spectrometer in relation to the chemical properties of the peptide. (a) In MS/MS, the instrument scans the mass-to-charge ratio (m/z) of all peaks (black) and selects the most abundant (red) for fragmentation. It measures the m/z for resulting fragments (blue). (b) In MRM, only certain m/z fragments are chosen for fragmentation and only specific fragment ions are selected and reported. Blue bars represent the multiple fragmentation spectra; the green are the ones actually measured. (Reprinted from ref. 7 with permission.)
Making the irreproducible reproducible Uneasy rumblings about the validity of proteomics analyses have persisted for several years. Partly this is due to some poor experimental work in the pioneering years of large-scale protein identification, according to Mann. “Many of the early landmark papers in the last 5–10 years…were obtained on low-resolution instruments and without proper statistical analysis. We now know that a large proportion of the identifications obtained from such projects were in fact false positives… Much fuzzy thinking and bad data have unfortunately found their way into the literature.”3 As an example he cites the fact that peptide lists at the time contained a large proportion of nontryptic peptides; whereas trypsin is now known to be highly sequence-specific, at least in proteomics experiments. Yates agrees but ascribes many of the early problems to the use of SELDI (surface-enhanced laser desorption ionization, a technique for preparing protein mixtures for MS analysis). “It was a very poor analytical technique with significant reproducibility issues,” he says. “Some high-profile papers were later shown to be invalid, which tainted the whole field for a while.” But the reproducibility bombshell really exploded under proteomics in June 2009, when Bergeron and co-workers published a study4 suggesting that most proteomics labs had little idea what they were doing. The researchers sent standardized samples containing 20 known proteins to 27 labs for proteomics analysis. Each protein contained one or more unique tryptic peptides, which should have shown up in MS analysis. Disturbingly, only 7 of the 27 labs initially reported all 20 proteins correctly, and only one saw all the proteotypic peptides (Fig. 2). Yet 668
when the McGill group collected and analyzed the raw MS data from all the labs, they found that all the proteins and most of the peptides had indeed been detected in all 27 labs but had just not been interpreted correctly. So what went wrong at these labs? “The message of [this] study is that the technology delivers high quality MS data, irrespective of instrumental method,” says Bergeron’s coauthor Tommy Nilssen, also at McGill. “It was the human element that failed. From the smallest and most insignificant labs to the largest, they could not successfully report what they found.” Much of the reproducibility controversy is rooted in the fundamentally stochastic nature of MS-based shotgun proteomics. The technology still struggles with the task of dealing with a highly complex sample such as a whole cell or tissue lysate containing hundreds of thousands of peptides with a wide range of concentrations, some so rare that they do not occur in every spectrum obtained. The standard answer to this is to fractionate the sample so that only relatively few peptides are present and to look at the analytes contained in each fraction. Even so, not all ions detected in a precursor ion sweep are selected for fragmentation owing to the random nature of the sampling; this has contributed to proteomics’ poor reputation for reproducibility. “There is some randomness in how the instruments collect their data, so if you run a sample twice you see only about 70% overlap between the samples,” Yates explains. “But if you understand the technique, that’s an expected finding—the instrument is under-sampling because it can’t sample fast enough.” The issue is further complicated by the fact that MS instruments preferentially
sample some peptides, whereas they treat others totally randomly. As MS throughput increases, this is becoming less of a problem; the experimenter simply has to repeat the analysis, perhaps 7–10 times, until virtually every peptide present has been observed and results of all subsequent runs have a very high overlap with the data already obtained. “When people say proteomics is not reproducible they are just being dismissive because they don’t really understand the technology or the external design required to use the technology,” says Yates. More recent work done by CPTAC (Clinical Proteomics Technology Assessment for Cancer, a multidisciplinary network of proteomics researchers that is part of US National Cancer Institute’s Clinical Proteomics for Cancer program) has defined a set of performance standards for identifying the sources of variability, and has created a standard yeast proteome available to the community through the National Institute of Standards and Technology for investigators to benchmark their own performance5. Targeted proteomics The reproducibility problems, along with certain other limitations of shotgun MS proteomics, have led researchers to take an entirely different approach. “We and many others now believe the answer is to target particular molecules instead of doing random sampling of the whole proteome,” says Aebersold. In this ‘selective reaction monitoring’ (SRM) method (also referred to as ‘multiple reaction monitoring’, or MRM), the researchers first decide which proteins they want to observe—typically those involved in a certain interaction or signaling process—and then measure them accurately with very little experimental and computational
volume 28 number 7 JULY 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
f e atu r e overhead and oversampling (Fig. 3). This can be done automatically using a type of tandem mass spec known as a triple quadrupole MS (TQMS), which is able to filter for a target list of up to 500 peptides. Only peptides on this list get through the first stage of the mass analyzer. They then enter a collision cell, where they are fragmented; the fragments then enter another mass analyzer that monitors for one or more user-defined fragment ions. TQMS instruments are relatively slow and, until the advent of SRM, had not been so popular in discovery methods, says Aebersold. “But once it knows what to select, then it is very competitive because every selection is a hit,” he says. And like other types of MS, he says, TQMS are increasing in sophistication and throughput—though he adds that many who use SRM do not shares his preferences. The important point is that the principle, not the instrument used, closely reflects the way biologists really work. In drug discovery and the search for biomarkers in clinical samples, he says, SRM will often be more effective than the traditional discovery method: “It is a technique to discover how proteins interact, but it will not discover new proteins.” Even dedicated shotgun practitioners like Yates make occasional use of SRM in validation and more focused studies. Over the next five years, he says, SRM will replace western blots as the standard in validation studies: “They should be as sensitive as western blots, while having the advantage of being very specific as well as faster.” Developing and testing a western blot assay to quantify peptides can take three months, whereas with SRM, it can be done in a couple of days. SRM assays are also just as reproducible as western blots; drug companies, who need coefficients of variation (CV) less than 10%, use them all the time for metabolism studies, says Yates. The reproducibility of SRM was demonstrated in a recent study by a collaborating group of MS labs6. The methodology was similar to that of the earlier study; but instead of shotgun MS-MS, here the labs used MRM combined with stable-isotope dilution (SID) to continuously monitor selected ‘transitions’ (that is, peptide fragmentation events that produce ‘signature ions’ specific to the protein of interest). The study coordinators prepared plasma samples spiked with known concentrations of seven different proteins, and sent them to the eight participating labs for SID-MRM-MS analysis. Afterwards, the labs’ findings were compared and found to be reasonably consistent: interlaboratory variation of the quantitative measurements for nine of ten peptides ranged from 10% to 23%.
Although this is not as good as the coefficient of variation generally claimed for clinical assays (typically less than 10% to 15%), it is good enough to verify candidate biomarkers present at more than ~2–6 µg/ml in plasma, say the researchers. Ultimately, they say, these SID-MRM-MS assays may replace some clinical immunoassays, especially those that are not very specific. There is, however, a question over whether a 2 µg/ml detection limit is good enough for biomarker validation work. In a critique of this study, two molecular diagnostics experts note that only 10% of ovarian cancer plasma markers discovered so far are present in concentrations above this limit7. Moreover, marker abundance in presymptomatic ovarian cancer may be 200 times lower than that, so validating an early-detection marker using SID-MRM-MS would require it to have routine sensitivity at least at the low nanogram per milliliter level—while maintaining CVs of <10%. Clearly this is going to be a challenge. Database shortcomings Sequence databases—used to make decisions about whether a protein has been detected in a spectrum—are still a weak point of the whole proteomics enterprise, according to Aebersold. “There are still errors and inconsistencies in these databases,” he says. “There is confusion about annotation, with some databases having several names for the same sequence, or different names for the same protein depending on whether it does or doesn’t have the [N-terminal] methionine terminator, or for sequences that differ by only one amino acid.” Yates too is unhappy about duplication, redundancy and errors in the public protein sequencing databases but believes the issue is being addressed. “The European Bioinformatics Institute (EBI) (Cambridge, UK), which maintains some of the most-used databases, is not a small organization and it got a large chunk of the proteomics stimulus funding from the US, so I have to assume they are well enough funded to solve this,” he says. EBI’s Protein and Nucleotide Database Group under Rolf Apweiler is under constant pressure from researchers to improve its game, he says. But McGill’s Tommy Nilssen still has doubts. “They have had every opportunity to do this with the amount of funding they have received for a decade or more,” he says. “It is very surprising that they have not realized that their databases are of such low quality and in such a bad state that they cannot be used for biomarker discovery.” EBI’s problem is that it is struggling to cope with an explosive growth in the volume
nature biotechnology volume 28 number 7 JULY 2010
of proteomics data generated by research in recent years. “Next generation sequencing has dramatically increased the amount of nucleotide sequences submitted to the databases,” says Apweiler. “It will lead in the next few years to at least a hundred times as much nucleotide sequence–derived protein sequence as currently available.” He stresses that the principal sequencing centers and sequence databases will weather this data explosion if they collaborate much more closely than they do now. They are trying to improve the consistency and quality of the database annotations, ultimately aiming for a standard set of protein-coding gene annotations for all major sequence databases. This, says Apweiler, will “ease the pain of combining data from different databases” and allow bioinformatics data users to combine data from all of them. Researchers at the Institute for Systems Biology (Seattle) are applying SRM to create an atlas of the human proteome that will have representatives of all human proteins, according to Rob Moritz, principal investigator on a grant obtained with stimulus funds to develop this resource. So far, the atlas comprises ~13,000 out of 20,332 human proteins; the remaining 7,300 will be derived from prediction algorithms. According to Moritz, targeted approaches are more efficient in time, cost and computational needs compared to stochastic, nontargeted ones used to create proteome databases. Post-translational modifications Much of the protein interaction investigations have been directed at post-translational modifications, which cannot of course be predicted from genomic information alone. And with the advent of new techniques, such as electron transfer dissociation, which preserves modifications better than collision dissociation because of differences in the manner of fragmentaion, the analysis of ‘difficult’ posttranslational modifications has improved significantly, says Yates. Proteomics studies in phosphorylation-triggered signaling are where most of the effort is going. The discipline of ‘phosphoproteomics’ has turned out to be far wider in scope than ever imagined: the complete phosphoproteome may comprise more than 100,000 sites, says Mann. Acetylation, methylation and glycosylation are also becoming amenable to study, although monitoring the last of these modifications remains problematic. “Largescale proteomics-based signaling studies will fundamentally change our understanding of signaling networks,” he says. Mann’s group at the Max Planck Institute has used proteomics to investigate histone 669
© 2010 Nature America, Inc. All rights reserved.
f e atu r e acetylation, known to affect transcriptional activity and thus gene expression in living cells8. “Through MS we can see that thousands of proteins are modified through acetylation of binding sites.” This could be important in drug development, he says, as acetylation is also implicated in activating the tumor suppressor p53, and histone deacetylase (HDAC) inhibitors are now being investigated in clinical studies for cancer. Mann also notes that, although MS until very recently could only measure peptide ion mass, the very latest instruments can also measure fragments at very high accuracy. “That helps pinpoint the post-translation modification because we can sequence the whole peptide from its fragmentation spectrum,” he says. Even very unusual modifications can be identified, such as SUMOylation, which has not been studied much because it was difficult to determine by MS. Now, says Mann, “it is a breeze.” Proteomics meets biology Thus, the detection of protein-protein interactions and signaling pathways is being completely transformed by MS approaches. “Proteomics technology has continued to shine in terms of biological discovery,” says Yates. “We can dig deeper with profiling experiments, and some work that doesn’t get talked about much, like identifying modifications and protein complexes.” Yates’ group is using quantitative approaches to get at the mechanism of the pathology in cystic fibrosis and how the cystic fibrosis transmembrane conductance regulator (CFTR) protein gets targeted for destruction. Kuster believes that proteomics needs to become more focused and hypothesisdriven. “If you try to see everything, you may be seeing mainly biological noise, making it hard to find the needle in the haystack,” says Kuster. “Focusing on fewer things helps link cause and effect in biological processes, increasing the chances of finding something meaningful.” This approach might start, for example, with a subset of proteins from a particular organelle, or enzyme classes with a specific type of action (so-called activitybased proteomics). Chemical proteomics is moving forward. Some workers are using chemical probes to examine the role of proteases in cell homeostasis
670
and apoptosis. Others are concentrating on phosphoproteomics—looking selectively for phosphorylated proteins as evidence for upstream activated kinases and their substrates and cytokines. “This is especially important as many cancers and inflammatory diseases seem to be driven by deregulated [signaling] pathways, with cascades of kinases working on substrates and eventually changing DNA expression profiles,” says Kuster. And drug discovery Considering the hundreds of billions of dollars poured into proteomics research in the past decade, it is striking that not a single commercial molecule has emerged from it. “The output has been as close to zero as you can come,” says McGill’s Nilssen. “We have achieved nothing substantial, that’s the bottom line.” However, most experts consider proteomics not a tool of drug discovery per se; its value is in preparing the ground. “It’s true that proteomics can help identify targets, but the drug discovery people were already awash with targets; that wasn’t their problem,” says Yates. Where proteomics can help more than any other technology, he says, “is to validate targets and identify the function of these proteins, which has traditionally been a slow process. Hypothesis-driven proteomics studies can help investigate biological processes and cellular mechanisms, and that will eventually translate into better drug discovery.” “Proteomics gives the system-wide, cellular view, rather than probing for specific targets or substrates,” agrees Mann. “It could explain why a drug works in the first place, or why it has side effects.” He notes that companies, such as Cellzome (Heidelberg, Germany), have used proteomics to identify kinase inhibitor specificity. “With proteomics we can see which of the many proteins the small molecules are binding to.” Vassilios Papadopoulos, a researcher at McGill who investigates cholesterol pathways in mitochondria, backs the importance of phosphoproteomics. “Because of it we have begun to understand kinases and phosphorylation. Several kinase inhibitors are in trials and several companies are focusing on this area that did not exist five or six years ago.” And nonspecific HDAC inhibitors, which are already in the clinic for treating several cancers, are also being tackled. In a 2008 study,
researchers at Merck Research Laboratories in Rahway, New Jersey, used differential MS to define post-translational modifications of histones that are affected by particular HDAC inhibitors9. This approach can guide the development of more selective inhibitors. Achieving closure Another holy grail, of course, is to map the entire human proteome and offer it as a resource for the global community. John Bergeron thinks it is doable: “We can flesh out the human proteome as a coordinated effort by the global community with the technology we have now,” he says. He points to recent advances at the Swiss Institute of Bioinformatics, a confederation of research groups, which now has a highly accurate database of the 20,300-plus human protein-coding genes that can form the basis of the effort. Mann too believes success is getting very close. He points to the publication in 2008 of a paper describing the use of SILAC to do the first ever complete proteome characterization in budding yeast10. “That had been thought impossible in principle, but it is possible and now we are close to it for human cells.” Mammalian proteomes are much more complex than yeast, with proteins existing in different splice forms or moonlighting (assembling) with other proteins in myriad ways. Each mammalian cell line expresses proteins from more than 10,000 genes. The current depth of proteomics analysis is about 7,000 proteins. “We are coming into the endgame now,” he says. “Soon we will be able to quantify whole proteomes, and measure expression changes of every protein. If we can do that, it will have a huge impact.” 1. Wright, C. et al. Mol. Cell. Proteomics published online, doi:10.1074/mcp.M900384-MCP200 (1 February 2010). 2. Anderson, K.S. et al. Cancer Epidemiol. Biomarkers Prev. 19, 859–868 (2010). 3. Mann, M. Nat. Methods 6, 717–719 (2009). 4. Bell, A.W. et al. Nat. Methods 6, 423–430 (2009). 5. Paulovich, A.G. et al. Mol. Cell. Proteomics 9, 242– 254 (2010). 6. Adonna, T.A. et al. Nat. Biotechnol. 27, 633–641 (2009). 7. McIntosh, M. & Fitzgibbon, M. Nat. Biotechnol. 27, 622–623 (2009). 8. Zielinska, D.F. et al. Cell 141, 897–907 (2010). 9. Lee, A.Y.H. et al. J. Proteome Res. 7, 5177–5186 (2008). 10. de Godoy, L.M. et al. Nature 455, 1251–1254 (2008).
volume 28 number 7 JULY 2010 nature biotechnology
p at e n t s
Intellectual property, technology transfer and manufacture of low-cost HPV vaccines in India Swathi Padmanabhan, Tahir Amin, Bhaven Sampat, Robert Cook-Deegan & Subhashini Chandrasekharan
© 2010 Nature America, Inc. All rights reserved.
An empirical study of the impact of patenting and licensing on regional manufacturing of human papilloma virus vaccines to help improve vaccine affordability and access.
C
ervical cancer, the leading cause of female cancer mortality worldwide, disproportionately affects women in low- and middleincome countries (LMCs). Four-fifths of the nearly 275,000 annual cervical cancer–related deaths occur in LMCs where routine gynecological screening is minimal or absent1,2. Two new prophylactic vaccines, Gardasil from Merck (Whitehouse Station, New Jersey, USA) and Cervarix from GlaxoSmithKline (GSK; London) have proven effective in preventing human papilloma virus (HPV)induced cervical lesions and some sequelae. Both vaccines are composed of HPV-L1 major capsid antigen virus-like particles (VLPs), and both prevent persistent infection from HPV-16 and HPV-18, which cause nearly 70% of cervical cancers3. Gardasil also contains L1 antigens from HPV strains 6 and 11, which are associated with genital warts. Costing at least $300 for the three-dose regimen, Gardasil is one of the most expensive vaccines introduced so far4. Its private market price exceeds $500 in several developed and developing countries5, which few can afford in most LMCs1,6,7. Price discrimination by pharmaceutical companies could improve vaccine access in LMCs. Merck introduced Gardasil in India at about $171 for the three-dose regimen8. Swathi Padmanabhan, Robert Cook-Deegan and Subhashini Chandrasekharan are at the Center for Genome Ethics, Law & Policy, Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina, USA; Tahir Amin is at the Initiative for Medicines Access and Knowledge, New York, New York, USA; and Bhaven Sampat is at the Mailman School of Public Health, Columbia University, New York, New York, USA. e-mail: [email protected]
Although such pricing enables middle-class access in some emerging economies, the vaccine remains unaffordable for most low- and middle-income populations in LMCs. Prices must fall below $2 per dose to make broad access possible in low-income populations, especially in countries where gross domestic product per capita is below $1,000 (refs. 9,10). It is unlikely that Merck or GSK can reduce vaccine prices to match these affordability targets because of the high production costs associated with their vaccines11. Company donations can also improve access12,13. Merck has donated three million doses of Gardasil to the Program for Appropriate Technology in Health (PATH) for demonstration trials14. Its Gardasil Access Program aims to extend this support to eight LMCs15. However, reliance on pharmaceutical company donations alone is unsustainable. Alternatively, donor-aided vaccine purchase can significantly increase vaccine access by facilitating distribution of highly discounted vaccines in eligible LMCs16. The Global Alliance for Vaccines and Immunization (GAVI) recently prioritized HPV vaccines17. However, owing to a $4 billion deficit and existing financial commitments to other vaccines, it might be unable to finance HPV vaccines18. Donors often face tradeoffs, and high prices limit the quantities of vaccines or treatments they can subsidize19. Vaccine manufacturing in LMCs can also reduce prices. Over the past decade, manufacturers in India, Cuba, China and Brazil have demonstrated their capacity to produce lowcost vaccines that meet international quality standards. They primarily serve low-income markets, supplying 64% of childhood vaccines procured by UNICEF and 43% of vaccines procured by GAVI20,21. Recombinant hepatitis B (HBV) vaccines illustrate the potential impact of such manufacturers in
nature biotechnology volume 28 number 7 JULY 2010
improving vaccine access in LMCs. When introduced in the early 1980s, the HBV vaccine was priced at $50–80 per dose, one of the most expensive prophylactic vaccines at that time21. However, developing country vaccine manufacturers (DCVMs), using alternate expression platforms suitable for low-cost development, successfully brought inexpensive HBV vaccines to market in the 1990s. The ensuing competition reduced market prices to less than $0.30 per dose21,22. Procurement costs for large vaccination programs consequently decreased, allowing wider access to the vaccine in LMCs21. DCVMs are therefore potentially important suppliers of low-cost HPV vaccines. For successful production, DCVMs require access to relevant technology, which can be protected by intellectual property (IP) rights. Although DCVMs have faced few patent barriers so far, changes adopted by developing countries to comply with the World Trade Organization’s Agreement on Trade Related Aspects of Intellectual Property Rights might create new obstacles for vaccine development23. Before the agreement, many LMCs did not award product patents for biopharmaceuticals, including vaccines. Now, however, DCVMs must consider international pharmaceutical companies’ product patent rights on vaccines and related technologies. Patents granted in developing countries might constrain the ability of manufacturers to develop vaccines and/or to sell vaccines in local and international LMC markets (Fig. 1), thereby reducing their development incentives. Two recent reports have suggested that IP might be a barrier for DCVMs who are interested in developing HPV vaccines24,25. However, publicly available information about the patenting and licensing of HPV vaccine technologies in developing countries is minimal. 671
pat e n t s
No
No
Manufacturing country free to make and sell vaccines in local and international markets
Do these patents exist in the importing country?
Yes
Manufacturing country free to make vaccine and sell to countries in which granted patents are not infringed
Do patents exist in country of regional manufacturer? No
© 2010 Nature America, Inc. All rights reserved.
Yes
Manufacturing country has no freedom to make and sell vaccine without infringing patents
No
Can patents be worked around?
Yes
Manufacturing country free to make and sell a vaccine developed using alternate processes or with alternate compositions
Do these patents exist in the importing country?
Yes
Manufacturing country free to make and sell a vaccine developed using alternate processes or with alternate compositions as long as patents are not infringed in the importing country
Figure 1 Impact of patents on manufacturing, sale, and/or export of ‘bio-similar’ HPV vaccines by developing country vaccine manufacturers. Patents existing in the country of the manufacturer that claim the ‘composition of matter’ of necessary antigens (e.g., nucleic acid and/or amino acid sequences of the L1 proteins) or the vaccine itself (e.g., VLPs made of L1 antigens) would prevent DCVMs from developing and selling vaccines in their country without a license from the patent owner. If, however, patents only claim specific methods or processes for producing the vaccine, manufacturers may have freedom to operate if they use alternate processes to ‘work around’ those patents. Patents granted in jurisdictions outside the manufacturing country—especially in potential export markets—might also affect DCVM vaccine development plans. Manufacturers exporting vaccines to these countries would infringe patents if their vaccines embodied the compositions or processes protected by such patents. However, if vaccines had different formulations or were developed by using alternate processes, they could still be sold in the importing country.
We therefore systematically investigated the extent to which patents are a barrier to producing HPV vaccines in LMCs, focusing on India for several reasons. India bears nearly 25% of the global cervical cancer burden26,27 and its growing middle class is a potentially large private market for HPV vaccines. Including this vaccine in national immunization programs that target low-income populations will further expand the market, creating strong incentives for local manufacturing of inexpensive alternatives. In addition, several Indian manufacturers who are interested in developing HPV vaccines have concerns about potential patent impediments to such efforts. The HPV vaccine patent landscape First-generation prophylactic vaccines. Technologies that enabled the development of L1-VLP-based vaccines originated at the US National Institutes of Health’s (NIH) National Cancer Institute (NCI), University 672
of Rochester (Rochester, New York, USA), Georgetown University (Washington, DC) and the University of Queensland (Brisbane, Queensland, Australia) (Table 1). MedImmune (Gaithersburg, Maryland, USA), Merck and GSK developed these technologies further and performed safety and efficacy clinical testing to bring the vaccines to market28. The IP landscape for HPV vaccines is complex, with 81 US patents granted so far, linked to 86 specific Patent Cooperation Treaty (PCT) applications. Eighteen entities—ten of which are nonprofit—own these US patents. Nonprofit organizations own 20 of the 81 US patents, for-profits own 55, and for-profit and nonprofit entities jointly own 6 (Supplementary Table 1). Merck owns the most patents (24), followed by GSK and the US Government (arising from the NCI), who own 8 patents each. As of December 2008, 19 of the 86 international PCT applications were filed in India
(Table 2). The universities and NIH have not sought patent protection for technologies underlying L1-VLP vaccines in India. However, Merck and GSK have applied for patents on HPV vaccine compositions. GSK alone has filed 13 of these applications. The Indian Intellectual Property Office (IPO) has awarded six patents, four to GSK and one each to Wyeth Holdings (Wayne, New Jersey, USA) and the University of Cape Town (Cape Town, South Africa). Although the determination of patent scope is complicated and sometimes the subject of costly litigation, we offer our preliminary analysis of patent claims based on our understanding of these technologies and discussions with researchers who developed firstgeneration vaccines. Patent 203333, awarded to GSK, claims compositions of a prophylactic vaccine that contains VLPs composed of L1 antigens from HPV-16, 18, 31 and 45. This is detailed in Claim 1, the first independent claim that technically confers the broadest scope of protection and reads, “A vaccine composition comprising virus like particles containing L1 proteins or functional L1 protein derivatives from human papilloma virus 16, human papilloma virus 18, human papilloma virus 33 and human papilloma virus 45 genotypes wherein the antibody response generated by the vaccine is at a level similar to that for each human papilloma virus, virus like particle formulated alone.” Our analysis suggests that only a vaccine containing L1-VLPs from all four HPV strains mentioned in Claim 1 directly infringes the patent. Therefore, Indian manufacturers are probably free to develop a bivalent HPV vaccine containing L1-VLPs for HPV-16 and HPV-18 only or a quadrivalent vaccine containing any combination of three, two or one of these four strains in addition to other unclaimed oncogenic strains. Patent 209780, also awarded to GSK, claims a vaccine composition comprising L1-VLPs for HPV-16, HPV-18 and an adjuvant containing aluminum hydroxide and 3-O-desacyl4’-monophosphoryl lipid A (3dMPL). Claim 1 specifically reads, “A vaccine comprising a human papillomavirus 16 L1 virus like particles, human papillomavirus 18 L1 virus like particle, aluminum hydroxide, and 3dMPL.” Furthermore, Claim 4 reads, “The vaccine consisting of an HPV 16 L1 VLP, an HPV 18 L1 VLP, aluminum hydroxide, and 3dMPL.” However, our analysis suggests that a bivalent (HPV-16, -18 L1-VLP) prophylactic vaccine developed by an Indian manufacturer would not infringe this patent if formulated with a different adjuvant. Additional patents awarded to GSK (Table 2) claim nucleotide sequences of HPV early antigens (214047) and compositions of combination vaccines containing
volume 28 number 7 JULY 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
pat e n t s HPV L1 antigens (202425) and other antigens, respectively. These too are unlikely to constrain Indian vaccine manufacturers developing Gardasil or Cervarix ‘biosimilars’. The University of Cape Town patent claims methods to produce HPV-16 L1-VLPs in tobacco plants and their use in a vaccine composition. However, plant-based expression has so far been unsuccessful in yielding high amounts of purified HPV-16 VLPs11, thus limiting the commercial viability of this technology. Patent 220842, awarded to Wyeth, covers polypeptides of HPV early antigens E6 and E7, which are likely to be used in therapeutic cervical cancer vaccine compositions but are less relevant to L1-VLP–based prophylactic vaccines. We found no patents on HPV-16 and HPV-18 L1 nucleic acid sequences filed or awarded in India. However, Merck has four pending patent applications, claiming L1 nucleic acid sequences of HPV subtypes 31, 45, 52 and 58, optimized for expression in several yeast strains. Because the IPO can choose to significantly narrow the scope of or deny some claims during examination, it is difficult to assess whether Merck’s applications will affect vaccine development in India. Second-generation prophylactic vaccines. Currently, marketed first-generation vaccines are costly to produce as they use expensive expression systems to produce the L1 antigens. Moreover, both miss several oncogenic HPV strains that are present in India and other LMCs29, which is particularly problematic when countries lack the screening programs necessary to detect cancers not prevented by current vaccines. Researchers at the NCI and Johns Hopkins University (Baltimore, Maryland, USA) have developed an L2 (minor capsid antigen)-based vaccine. This approach would protect against infection by all oncogenic strains and would eliminate the costs of increasing the valency of L1-based vaccines11. NCI and Johns Hopkins have partnered with Shantha Biotechnics (Hyderabad, India) to commercialize this candidate. They jointly filed Indian patent application 6219/ DELNP/2007 (Table 2) with the explicit rationale of preserving freedom to operate and market exclusivity for Indian partners (J.T. Schiller & M. Schmilovich, NCI, NIH; personal communication). Shantha has signed a Cooperative Research and Development Agreement with the NIH, gaining access to biological materials such as codon-optimized plasmids as well as the expertise and training necessary to develop this vaccine. Shantha has nonexclusively licensed this technology
from Johns Hopkins30. Using an Escherichia coli expression system to purify L2 antigenic peptides, Shantha hopes to lower development costs, thereby making it possible to reduce the price of vaccines significantly30 (A. Khar & R. Chaganti, Shantha Biotechnics; personal communication). Indian patent application 131/CHENP/2007 also bears on second-generation vaccine development and is based on research performed at the University of Lausanne (Lausanne, Switzerland). Denise Nardelli-Haefliger and colleagues showed that recombinant clones of attenuated Salmonella enterica (serovar Typhi and Typhimurium) strains expressing
HPV-16 and HPV-18 L1 antigens can induce a strong immune response11,31. This technology would allow oral or mucosal immunization against HPV-16 and HPV-18 infection. Lower development and implementation costs associated with this oral vaccine make it highly suitable for use in LMCs. To maximize the potential benefits of this technology to LMCs, Nardelli-Haefliger and colleagues assigned ownership of enabling IP to Indian Immunologicals (Hyderabad) (D. NardelliHaefliger, personal communication). Indian Immunologicals has a memorandum of understanding with Lausanne and has received biological materials, know-how and training. It
Table 1 Timeline of patenting and licensing of HPV L1-VLP–based prophylactic vaccines Date
Event
July 19, 1991
Frazer et al. (Queensland) file international patent application in Australia
June 25, 1992
Schlegel et al. (Georgetown) file patent application in US
September 3, 1992
Schiller and Lowy et al. (NCI, NIH) file patent application in US
March 9, 1993
Rose et al. (Rochester) file patent application in US
February 1995
University of Queensland’s commercial arm UniQuest licenses HPV vaccine technology to CSL (Melbourne)
October 5, 1995
MedImmune acquires exclusive license to HPV vaccine technology from University of Rochester
1995
Merck licenses HPV vaccine technology from CSL
June 26, 1996
MedImmune in-licenses key HPV IP from German Cancer Research Center
January 7, 1997
NCI non-exclusively licenses HPV vaccine technology to MedImmune
June 24, 1997
USPTO declares initial interference
December 1997
NCI nonexclusively licenses HPV vaccine technology to Merck
December 11, 1997
MedImmune and SmithKline Beecham form worldwide HPV vaccine alliance
January 16, 1998
MedImmune finalizes vaccine agreement with SmithKline Beecham
October 24, 2001
USPTO declares patent interference 104,771 between Rose and Lowy USPTO declares patent interference 104,772 between Rose and Schlegel USPTO declares patent interference 104,773 between Rose and Frazer USPTO declares patent interference 104,774 between Lowy and Schlegel USPTO declares patent interference 104,775 between Lowy and Frazer USPTO declares patent interference 104,776 between Schlegel and Frazer
February 2005
Merck and GSK enter cross-license agreement for HPV patents
May 2005
NCI’s nonexclusive licenses convert to co-exclusive licenses
September 20, 2005
USPTO Board of Interference announces decision and awards priority to Schlegel et al.
December 29, 2005
Frazer et al. appeal USPTO decision, case docketed in CAFC
August 20, 2007
CAFC reverses USPTO decision and awards priority to Frazer et al.
Technologies underlying the L1-VLP based prophylactic vaccines emerged from research conducted at the University of Rochester, the NCI, Georgetown University and the University of Queensland. The NCI initially nonexclusively licensed the technology to MedImmune and Merck. MedImmune also acquired worldwide exclusive rights to IP from Georgetown University and the University of Rochester. The University of Queensland licensed its patents to CSL, which in turn licensed the technology exclusively to Merck. GSK eventually acquired exclusive rights to MedImmune’s entire IP portfolio for HPV vaccine development. Owing to a first-to-invent system in the United States, patent interference proceedings were triggered at the USPTO when claims overlapped from different patent applications filed by four different groups of inventors. The interference proceedings involved various L1-antigen HPV-related claims. Six two-way patent interferences between the four parties continued for nearly a decade, presumably at significant cost to the institutions or their primary licensees, and were partially resolved in 2005. Given the uncertainty surrounding the ownership of enabling vaccine technologies and the possibility of mutually blocking exclusive rights (that is, neither firm could be sure its products would not infringe on patent rights held by the other), Merck and GSK cross-licensed their respective IP holdings in 2005 to ensure unfettered access to these technologies. They consequently secured their market position in the United States and Europe and other OECD nations such as Canada and Japan. As part of the financial settlement of the patent interference, the nonexclusive licenses awarded by NCI, NIH to MedImmune and Merck were converted to coexclusive licenses, thus allowing both GSK and Merck access to this IP. Merck brought Gardasil to market in the United States in 2006 and Cervarix was introduced in the United Kingdom in June 2008. USPTO, US Patent and Trademark Office; CAFC, US Court of Appeals for the Federal Circuit.
nature biotechnology volume 28 number 7 JULY 2010
673
pat e n t s has also filed international patent applications (Table 2) but will not seek patent protection in Organisation for Economic Co-operation and Development (OECD) markets (R. Sriraman, D. Thiagarajan & K. Kumar, Indian Immunologicals; personal communication). With assured access to essential patents and know-how, Indian Immunologicals has strong incentives to invest in the development
of an oral HPV vaccine. Both the Shantha and Indian Immunologicals vaccines are currently in the preclinical phase. Shantha projects a 2015 market entry at an initial price of $15 per dose32. Both manufacturers believe, however, that prices will drop further as vaccine adoption increases, eventually reaching around $1–2 per dose, which would make broad access feasible.
Serum Institute of India (Pune) and Bharat Biotech (Hyderabad) are both also developing L1 VLP-based vaccines. Serum’s candidate will probably be a bivalent HPV-16 and HPV-18 vaccine. The company will seek a nonexclusive license from the NIH for cell lines optimized for high expression of L1 antigens and will nonexclusively license the Hansenula polymorpha expression platform
Table 2 Patent landscape for HPV vaccines in India Assignee/ applicant
PCT application no./ international publication no.
Indian application no. & application date in India
© 2010 Nature America, Inc. All rights reserved.
GlaxoSmithKline PCT/EP2005/006461 3436/KOLNP/2006 Biologicals (WO 05/123125)
GlaxoSmithKline PCT/EP2003/02826 Biologicals (WO 03/077942)
1351/KOLNP/2004
Granted Indian patent no. Publication date (publication date of grant) (application date) 6/15/2007 (11/20/2006)
Pending
12/30/2005 (9/13/2004)
203333 (4/13/2007)
Expiry date Summary of claims An immunogenic vaccine composition containing VLPs, and/or capsomeres of HPV-16, HPV-18 and at least one other HPV genotype 9/13/2024 An L1-VLP–based vaccine composition containing VLPs of HPV-16, 18, 31 and 45 Vaccine composition further comprising complete or immunologically active fragments of HPV early antigens E1–E8 Vaccine composition further comprising antigens of other STDs including HIV, HSV and Chlamydia
Glaxo Group
PCT/GB2001/03290 (WO 02/08435)
67/MUMNP/2003 1561/MUMNP/2007 (Divisional of 67/ MUMNP/2003)
2/4/2005 (1/16/2003) 11/9/2007 (9/27/2007)
214047 (1/24/2008) Pending
1/16/2023 A synthetic polynucleotide sequence, analog or fragment codon optimized for E. coli and encoding the mutated amino acid sequences of HPV early antigen E1, E2 for HPV types/ subtypes selected from HPV strains 1–4, 6, 7, 10, 11, 16, 18, 26–29, 31, 33, 35, 39, 49, 51, 52, 56, 59, 62 and 68 A p7313Plc backbone-based expression vector capable of driving expression of nucleotide sequences claimed in bacterial cells
GlaxoSmithKline PCT/EP2006/003918 3957/KOLNP/2007 Biologicals (WO 06/114312)
SmithKline Beecham Biologicals
PCT/EP/2000/08784 (WO 01/0117551)
1471/CHENP/2003 IN/PCT/2002/336/CHE
GlaxoSmithKline PCT/EP2003/014562 1108/KOLNP/2005 Biologicals (WO 04/056389)
Glaxo Group
PCT/EP2003/011158 506/KOLNP/2005 (WO 04/031222)
6/20/2008 (10/15/2007)
Pending
L1 proteins of HPV-31, 45 and 52
11/25/2005 (9/17/2003) N/A (3/5/2002)
209780 (9/6/2007) 202425 (4/13/2007)
7/21/2006 (6/9/2005)
Pending
Use of a vaccine composition comprising HPV-16 and HPV-18 VLPs to prevent infection by other oncogenic types of HPV, excluding HPV-16 and HPV-18
6/9/2003 (3/24/2005)
Pending
Nucleotide sequence of HPV polypeptides: E1 or E2 from oncogenic HPV subtypes
Method to boost immune response to HPV16, HPV-18 vaccine by using L1 proteins of other HPV subtypes in composition claimed 9/17/2023 A vaccine composition comprising HPV3/5/2022 16 L1 VLPs, HPV-18 L1 VLPs, aluminum hydroxide and 3dMPL A vaccine composition for treating or preventing HPV and HSV infections comprising the HSV gsD2 antigen and an HPV-6, 11, 16 or 18 L1 antigen and an adjuvant that stimulates the TH1 response
Expression vector with codon-optimized polynucleotide sequence Pharmaceutical composition comprising polynucleotides or vector-encoding nucleotide sequence GlaxoSmithKline PCT/EP2002/04966 Biologicals (WO 02/087614)
1336/KOLNP/2003
1/13/2006 (10/16/2003)
Pending
A vaccine composition comprising: (i) at least one HIV antigen; and either one or both of (ii) at least one HSV antigen and (iii) at least one (HPV) antigen selected from L1, L2, E6, E7 or a combination thereof (continued)
674
volume 28 number 7 JULY 2010 nature biotechnology
pat e n t s from Rhein Biotech (Düsseldof, Germany). Serum anticipates a market entry of three to four years after project initiation (S. Singh, Serum Institute; personal communication). In addition to the L1-VLP vaccine, Bharat scientists are exploring a chimeric L2-HPV vaccine. They plan to express an L2-HBV small surface antigen fusion protein in Picchea
pastoris to produce VLPs containing HPV-L2 antigens at high density. Because the HBV surface antigen spontaneously assembles into VLPs, Bharat hopes to reduce the price of the vaccine by circumventing the high costs of purifying and assembling VLPs. Bharat filed a provisional patent application for this vaccine in India last year. Despite developing this
technology in-house, Bharat might seek a nonexclusive license from the NIH for the cell lines used in neutralizing assays (S. Kandaswamy, Bharat Biotech; personal communication). Freedom to operate Despite considerable patenting activity, our analysis suggests that IP will not preclude
Table 2 Patent landscape for HPV vaccines in India (continued) PCT application no./ international publication no.
Indian application no. & application date in India
SmithKline Beecham Biologicals
PCT/EP1998/05285 (WO 99/10375)
1903/MAS/1998
SmithKline Beecham Biologicals
PCT/EP2000/08728 (WO 01/17550)
SmithKline Beecham Biologicals
PCT/EP1998/08563 (WO 99/33868)
Merck & Co.
PCT/EP2004/008677 4036/DELNP/2005 (WO 04/084831)
© 2010 Nature America, Inc. All rights reserved.
Assignee/ applicant
Granted Indian patent no. Publication date (publication date of grant) (application date)
Expiry date Summary of claims
3/4/2005 (8/24/1998)
Pending
HPV-16 or HPV-18 E6 or E7 HPV protein in fusion with Hib, lipoprotein D, or NS I or fragment thereof from influenza virus, and LYTA or fragment thereof from Streptococcus pneumoniae
IN/PCT2002/335/CHE
3/4/2005 (3/5/2002)
Pending
Multivalent combination vaccine including HPV (L1, L2, E6, E7) antigens, EBV (gp 350), HBV (Sag), hepatitis A (HM-175 strain), HSV-2 gD VZV antigen (gpl), HCMV antigen (gB685, pp65), Toxoplasma gondii antigen (SAG1 or TG34)
IN/PCT2000/116/CHE
3/4/2005 (6/13/2000)
Pending
A vaccine composition of HPV E6 or E7 proteins or fusion of above antigens with others including Hib, lipoprotein D, NS I, influenza virus and LYTA of S. pneumoniae
8/31/2007 (9/8/2005)
Pending
Codon-optimized nucleic acid sequence encoding HPV 31 L1 and codon-optimized for expression in yeast strains including Saccharomyces cerevisiae and Pichia pastoris Vector and host expressing nucleic acid claimed VLPs of recombinant HPV-31 L1 or L2 proteins or combinations produced in yeast Methods for producing VLPs in yeast using above nucleotide sequences
Merck & Co.
PCT/US2005/009199 5998/DELNP/2006 (WO 05/097821)
8/24/2007 (10/16/2006)
Pending
Codon-optimized nucleic acid of HPV-52 L1 for expression in yeast including S. cerevisiae and P. pastoris VLPs of HPV-52 L1, L2 or combination produced in yeast Method of producing HPV-52 L1 or L2 or combination VLPs in yeast
Merck & Co.
PCT/US2004/037372 2930/DELNP/2006 (WO 05/047315)
8/10/2007 (5/22/2006)
Pending
Codon-optimized nucleic acid of HPV-58 L1 for expression in yeast (S. cerevisiae, Hansenula polymorpha, P. pastoris, Kluyveromyces fragilis, Kluyveromyces lactis, Saccharomyces pombe) VLPs of HPV-58 L1, L2 or combination Method of producing HPV-58 L1 or L2 or combination VLPs in yeast
Indian Immunologicals
PCT/IB2005/001725 131/CHENP/2007 (WO 05/123762)
8/24/2007 (1/12/2007)
Pending
HPV-16 L1 nucleic acid sequence codonoptimized for expression in prokaryotic organisms E. coli, Shigella, Lactobacillus, Mycobacteria, Listeria, or Salmonellaattenuated strains S. enterica serovar, S. typhimurium, S. typhi Attenuated strain of microorganism expressing codon-optimized HPV capsid protein from HPV-16, 18, 31, 45 Method for improving immunogenicity of a prokaryotic microorganism, specifically Salmonella against HPV 16 (continued)
nature biotechnology volume 28 number 7 JULY 2010
675
pat e n t s
Table 2 Patent landscape for HPV vaccines in India (continued) Assignee/ applicant University of Cape Town, South Africa
PCT application no./ international publication no.
Indian application no. & application date in India
PCT/IB2002/03531 (WO 03/018623)
00831/DELNP/2004
Granted Indian patent no. Publication date (publication date of grant) (application date) 4/27/2007 (3/31/2004)
221817 (7/7/2008)
Expiry date Summary of claims 3/31/2024 Modified nucleotide sequences encoding HPV-16 and HPV-11 L1 proteins for producing VLPs in plant cells where the plant is Nicotiana benthamiana VLPs produced by this method for use in a vaccine to treat or prevent HPV infections in humans
Wyeth Holdings
PCT/US2003/031726 505/KOLNP/2005 (WO 04/030636)
2/24/2006 (3/24/2005)
220842 (6/6/2008)
3/24/2025 A fusion polypeptide comprising HPV E6 and E7 antigen polypeptides where the E6 antigen has mutations in amino acids 63 or 106 and the E7 antigen has mutations in amino acids 24, 26 or 91
© 2010 Nature America, Inc. All rights reserved.
A nucleotide sequence encoding the above polypeptide Active Biotech
PCT/SE2000/001808 IN/PCT/2002/00438/ (WO 01/023422) CHE
Government of the US (NIH); Johns Hopkins University
PCT/US2006/003601 6219/DELNP/2007 (WO 06/083984)
3/4/2005 (3/21/2002)
Pending
A carrier for introducing HPV major capsid Ll protein that has been intentionally modified to remove major type-specific epitope(s) that cause production of neutralizing antibodies and which raises a protective immune response cross-reactive toward two or more of the group of HPV-Ll proteins comprising Ll proteins of HPV-16, 18, 31 and 45
8/31/2007 (8/9/2007)
Pending
A method for inducing broadly crossneutralizing antibodies against cutaneous and mucosal HPV types in humans by administering immunogenic N-terminal peptides of L2 protein
The Delphion patent database was searched for HPV vaccine–related patents and patent applications published on or before December 31, 2008, using ‘inventor’ and ‘assignee’ names. These results were supplemented with searches of other databases (Derwent Patent Index and the World Intellectual Property Office patent database) to find corresponding international applications filed under the PCT and national phase information. Indian patent filings were identified using two freely available resources, the BigPatents 42 and IPO databases. As neither of the Indian electronic databases included patent claims, we obtained certified hard copies of all granted patents from the four Indian patent offices for claims analysis. For pending Indian applications, we analyzed claims published in corresponding PCT filings. However, there might be pending patent applications not yet published by the IPO that therefore could not be analyzed. We collected information about the licensing status of patents and applications from the US Securities and Exchange Commission filings, and where necessary, directly from patent owners. We identified other legal and/or technological barriers through interviews conducted with HPV vaccine researchers, who developed first- and second-generation vaccines, and technologies relevant to vaccine development in India. We also interviewed researchers and business leaders at four Indian companies— Shantha Biotechnics, Indian Immunologicals, Bharat Biotech and Serum Institute of India—that are developing HPV vaccines. STDs, sexually transmitted diseases; HSV, herpes simplex virus; Hib, Haemophilus influenzae type B; EBV, Epstein-Barr virus; HBV (sAg), hepatitis B surface antigen; HCMV, human cytomegalovirus.
manufacturing of first-generation L1-VLP– based vaccines unless they are identical in formulation or strain coverage to those compositions claimed in granted Indian patents. We cannot, however, make this claim definitively owing to uncertainties in interpreting claims and the fate of pending applications. The claims analysis we present is therefore not legal advice but rather a starting point for independent freedom to operate (FTO) analyses by interested parties. Although there are several patents and pending applications on promising second-generation technologies (e.g., L2-based vaccines or oral L1 vaccines), they so far appear to preserve freedom for DCVMs. IP transparency Recent studies suggest that the lack of IP transparency could be an important impediment for DCVMs exploring new vaccine candidates33. Lack of patent claim information in publicly available Indian patent databases made our own research slow and 676
expensive. Our experience mirrored those of Serum (Y. Dalvi, Serum Institute; personal communication) and Bharat. Indeed, Bharat’s R&D was delayed owing to uncertainty about the status of patent protection for HPV antigens in India (S. Kandaswamy, Bharat Biotech; personal communication). Moreover, many countries in Africa, Latin America and Southeast Asia—potential markets for HPV vaccines—lack online patent databases, making it very difficult to determine which LMCs have pending or granted patents. More importantly, LMC companies generally lack the substantial financial and human resources necessary to perform FTO analyses using proprietary databases available in developed countries. Shantha, Indian Immunologicals and Bharat often rely on researchers to conduct in-house patent searches (R. Chaganti, A. Khar, Shantha; R. Sriraman, D. Thiagarajan, K. Kumar, Indian Immunologicals; S. Kandaswamy, Bharat; personal communication). The WHO
Initiative for Vaccine Research or other agencies could help coordinate FTO and patent landscaping services to advise regional manufacturers on potential IP barriers for HPV vaccine development. The creation of resources to map and update the IP landscape for novel HPV vaccines could facilitate regional manufacturing efforts. This could be developed in partnership with the DCVM network. Potential roles for universities and funders Universities and nonprofit research institutions exploring new HPV vaccines can expedite access to technology in LMCs. The IP management practices of academic institutions, who are the primary generators and gatekeepers of IP for vaccine technologies, will greatly affect regional vaccine manufacturing. The Lausanne-Indian Immunologicals partnership, for example, harnesses the capacity of a DCVM to commercialize a vaccine candidate with potentially high public
volume 28 number 7 JULY 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
pat e n t s health impact in LMCs despite little commercial interest in OECD countries. The NCI-Johns Hopkins-Shantha partnership to commercialize L2-based vaccine technology further illustrates how IP management can create a pathway for product access in lowincome markets. University licensing terms are generally not publicly available, except when parties choose to disclose them voluntarily. This has precluded a definitive analysis of whether Rochester, Queensland and Georgetown preserved freedom to use these technologies or subsequent improvements in LMCs when negotiating exclusive licenses with Merck and MedImmune. However, the licensing of vaccine technologies underlying Gardasil and Cervarix does not conform to recent university technology transfer practice guidelines to maximize benefit for the global poor34,35. This is understandable because the licenses in question were crafted in the 1980s, before these guidelines were developed. In addition, limited recombinant vaccine production capacity in LMCs rendered humanitarian licensing largely unnecessary at the time. The inaccessibility of HPV vaccines, however, illustrates why recently recommended practice guidelines deserve attention, especially as new technologies for prophylactic or therapeutic vaccines emerge. Moving forward, universities and other nonprofit research institutions should adopt IP management strategies that preserve options for DCVMs. Preferred practices include default nonexclusive licensing, exclusive licenses with geographic fields of restriction (to ensure LMC companies have FTO), retaining rights to sublicense to regional manufacturers, nonprofit organizations and/or public-private partnerships, humanitarian-use clauses for patented technologies and products, and ‘White Knight’ clauses to ensure vaccine affordability36,37. Universities can also help to promote IP transparency. Secrecy surrounding licensing exacerbates uncertainties in FTO analyses. Publicly available licensing information can prevent regional manufacturers from wasting time and money on technologies that are blocked by patents and licenses. More importantly, illuminating unblocked pathways can create incentives to commercialize vaccines that are of little interest to OECD manufacturers. Universities owning upstream technologies can promote and compel disclosure of licensing terms as part of licensure to improve transparency. Alternatively, sponsors of university research can make transparency a condition of funding by stipulating (i) disclosure of what geographic regions and fields
of use exclusive licenses cover or (ii) publication of licensing contracts. Regional manufacturers can easily identify technologies that are available for licensing and potential partners for vaccine development if a central portal or electronic clearinghouse of all HPV vaccine technologies is created. Technology transfer: beyond patents Although this study focuses on patents, researchers and DCVMs affirm that additional know-how is also crucial for developing new vaccines23. Even when technologies are in the public domain or are available for licensing, vaccine development requires considerable expertise38. Universities and other non-profit entities can address this need by creating collaborative technology transfer partnerships modeled, for example, on the NIH Rotavirus Technology Transfer program39,40. Transfer of three second-generation HPV vaccine technologies to Indian companies potentially increases the likelihood of producing a vaccine better suited for LMCs than current vaccines. Oral, needlefree delivery of HPV vaccines, for example, might reduce the risk of infection of other sexually transmitted agents such as HIV and HSV, eliminate multiple healthcare visits, and increase patient compliance in resource-poor regions if doses can be administered at home as reconstituted oral drops. Collaborative partnerships with DCVMs might also produce vaccines designed from the onset to meet specific implementation characteristics of resource-poor regions, such as heat-stable formulations and single-dose or combination vaccines. Finally, market competition between one or more second-generation vaccines will probably reduce prices. Conclusions Experiences with introducing new vaccines suggest that 20 years could pass before women in LMCs gain access to HPV vaccines41. Meanwhile, every five-year delay in vaccine introduction could result in nearly 1.5 to 2 million more HPV-related deaths1. The prevention of these fatalities will require vaccines that entail fewer doses, minimize interactions with healthcare professionals, and are suitable for delivery in a resourcepoor setting. Although these are tremendous challenges independent of price, improved access to HPV vaccines will depend on a reduction in vaccine prices. Regional manufacturers can accomplish this by lowering production costs and by developing vaccines tailored for resource-poor settings. Furthermore, increased DCVM competition can lower prices. Our patent landscape
nature biotechnology volume 28 number 7 JULY 2010
suggests that patents on first-generation vaccines do not seriously inhibit the development efforts of DCVMs. Regional manufacturers, national governments and international agencies should consider this an opportunity and take the necessary steps to make low-cost vaccine production a possibility. Academic research institutions, from which most HPV vaccine technologies emerged, can play an important role in supporting regional manufacturing. Their technology transfer practices can promote new channels for regional manufacturing while ensuring that licensing does not block pathways to low-cost regional manufacturing of existing vaccines. Improving access to know-how and creating IP transparency can further facilitate regional manufacturing. By participating in technology transfer partnerships and adopting favorable IP management practices, universities can expedite access to new generations of life-saving HPV vaccines and increase the public health impact of these vaccines in LMCs. Note: Supplementary information is available on the Nature Biotechnology website. ACKNOWLEDGMENTS We thank all the interviewees for their voluntary participation in this study, and J.T. Schiller and M. Angrist for helpful discussion and comments. S.P. was supported by travel awards from the Alice M. Baldwin Scholars program at Duke University, the Janet B. Chiang grant from the Asian and Pacific Studies Institute at Duke University, the Dannenberg Awards, the Stay In Focus grant from the Focus Program and the Public Policy Studies Department at Duke University. S.C. and R.C.-D. gratefully acknowledge the support of the National Human Genome Research Institute and the Department of Energy (CEER grant P50 HG003391, Duke University, Center of Excellence for ELSI Research). S.C. and R.C.-D. also received a grant from the Charles M. Josiah Trent Foundation that supported travel for S.C. T.A. gratefully acknowledges the support of the Echoing Green Fellowship. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. 1. Agosti, J.M. et al. N. Engl. J. Med. 356, 1908–1910 (2007). 2. Gakidou, E. et al. PLoS Med. 5, e132 (2008). 3. Lowy, D.R. et al. J. Clin. Invest. 116, 1167–1173 (2006). 4. CDC Vaccine Price List. (accessed 20 January 2010). 5. Anonymous. Making cervical cancer vaccines widely available in developing countries: cost and financing issues. (International AIDS Vaccine Initiative, New York and PATH, Seattle, March 2008; accessed 20 January 2010). 6. Anonymous. HPV vaccine adoption in developing countries: cost and financing issues. (International AIDS Vaccine Initiative, New York and PATH, Seattle, March 2008; accessed 20 January 2010). 7. Kaiser, J. Science 320, 860 (2009). 8. Anonymous. India’s first vaccine for cervical cancer launched. Thaindian News 20 October 2008
677
© 2010 Nature America, Inc. All rights reserved.
pat e n t s www.thaindian.com/newsportal/health/indias-first-vaccine-for-cervical-cancer-launched_100109462.html> (accessed 20 January 2010). 9. Goldie, S.J. et al. Reprod. Health Matters 16, 86–96 (2008). 10. Diaz, M. et al. Br. J. Cancer 99, 230–238 (2008). 11. Schiller, J.T. et al. Vaccine 24, 147–153 (2006). 12. Anonymous. Donating more than one million vaccines to help Nicaraguan babies. (Merck, Whitehouse Station, New Jersey, 2009, 2010; accessed 20 January 2010) 13. Anonymous. GSK Rotavirus vaccine donation to Good Shepherd Social Welfare Services Babies Home (Good Shepherd Sisters Taiwan, 14 December 2006; accessed 20 January 2010) 14. Harner-Jay, C. et al. J. Pharm. Sci. X, 1–4 (2008). 15. Anonymous. Gardasil Access Program. (accessed 20 January 2010). 16. Batson, A. et al. Vaccine 24, 219–225 (2006). 17. Ghandi, G. Update on GAVI Alliance: renewed strategic vision and governance. (UNICEF, Copenhagen, 10–11 December 2008; accessed 20 January 2010) 18. Anonymous. All girls deserve protection from cer-
678
vical cancer. (GAVI Alliance, Geneva, 4 February 2009; accessed 20 January 2010) 19. Lydon, P. et al. Vaccine 26, 6706–6716 (2008). 20. Jodar, L. & Clemens, J.D. in The Grand Challenge for the Future Vaccines for Poverty-Related Diseases from Bench to Field (eds. Kaufmann, S.H.E. & Lambert, P.H.) 55–73 (Birkhäuser, Basel, 2005). 21. Anonymous. GAVI Alliance Progress Report (GAVI Alliance, Geneva, 2007; accessed 20 January 2010) 22. Anonymous. Global progress toward universal childhood hepatitis B vaccination. MMWR Weekly, 12 September 2003 (accessed 20 January 2010). 23. Milstien, J.B. et al. Health Aff. 25, 1061–1069 (2006). 24. Maybarduk, P. & Rimmington, S. Am J. Law Med. 35, 323–350 (2009). 25. Outterson, K. et al. Health Aff. 27, 130–139 (2008). 26. Sankaranarayanan, R. et al. Vaccine 26, M43–M52 (2008). 27. Ferlay, J. et al. GLOBOCAN2002: Cancer incidence, mortality and prevalence worldwide. IARC Cancer Base No. 5 Version 2.0. (IARC Press, Lyon, France, 2004). 28. Inglis, S. et al. Vaccine 24, 99–105 (2006). 29. Sankaranarayanan, R. et al. Vaccine 26S, M43–M52
(2008). 30. Jagu, S. et al. J. Natl. Cancer Inst. 101, 782–792 (2009). 31. Fraillery, D. et al. Clin. Vaccine Immunol. 14, 1285– 1295 (2007). 32. Sinha, G.P. Shantha developing $15 cervical cancer vaccine. The Economic Times 16 Oct 2007 (accessed 20 January 2010). 33. Milstien, J.B. et al. Vaccine 25, 7610–7619 (2007). 34. Chokshi, D.A. et al. J. Am. Med. Assoc. 298, 1934– 1936 (2007). 35. Chaifetz, S. et al. Global. Health 3, 1 (2007). 36. Mahoney, R.T. et al. Vaccine 22, 786–792 (2004). 37. Salicrup, L.A. et al. Biotechnol. Adv. 24, 69–79 (2006). 38. Batson, A. Health Aff. 27, 140–142 (2008). 39. MIHR/PIPRA. IP Handbook of Best Practices Rotavirus Vaccine: NIH Office of Technology Transfer case study. (accessed 20 January 2010). 40. Salicrup, L.A. Nat. Biotechnol. 25, 976–977 (2007). 41. Kane, M.A. et al. Vaccine 24, 132–139 (2006). 42. BigPatents India. (accessed 20 January 2010).
volume 28 number 7 JULY 2010 nature biotechnology
patents
© 2010 Nature America, Inc. All rights reserved.
Recent patent applications in stem cells Description
Assignee
WO 2010060266
A method of inducing myocardiogenesis or treating a cardiac muscle disorder involving contacting a mammalian cell with neuregulin. The method induces in vivo and in vitro differentiation of embryonic stem cells into cells of a myocardial lineage.
Zhou M Zensun Shanghai Science & Technology Ltd. (Shanghai)
11/28/2008
6/3/2010
WO 2010059738
A method of culturing stem cells to produce dopaminergic neurons for treating neurological disorders involving generating embryoid bodies from stem cells and culturing embryoid bodies in the presence of, e.g., stromal cell–derived factor.
Freed WJ, Vazin T US Department of Health & Human Services (Washington, DC)
11/18/2008
5/27/2010
WO 2010059965
A method for enhancing the regenerative potential of a cell, including a stem cell, involving recalibrating transforming growth factor (TGF)-beta/pSmad and Notch signaling intensities in the cell.
Carlson ME, Conboy IM, 11/24/2008 Regents of the University of California Conboy M (Oakland, CA, USA)
5/27/2010
WO 2010060031
Anthrogenesis (Warren, Faleck H, Hariri RJ, A method of treating an individual having a disease, Zeitlin A disorder or condition of the lung comprising adminis- NJ, USA) tering to the individual an amount of placental stem cells, where the amount causes a detectable improvement in one or more symptoms of the disease, disorder or condition.
US 20100129907 Cell culture comprising stem cells and human foreskin cells capable of maintaining stem cells in an undifferentiated state during co-culturing with stem cells; useful for proliferating stem cells.
Inventor
Priority application date Publication date
Patent number
11/21/2008
Amit M, Itskovitz-Eldor J 10/7/2002 Technion Research & Development Foundation (Technion City, Israel)
US 20100129910 A composition comprising a culture medium contain- Crooks G, Evseenko D Crooks G, Evseenko D ing laminin and nidogen, where the medium is capable of inducing formation of embryoid bodies; useful for promoting aggregation of an embryonic stem cell.
7/2/2008
WO 2010057965
A new, isolated, myometrial-derived mesenchymal stem cell population useful as a medicament for the treatment of a tissue degenerative condition, e.g., skeletal muscle degeneration and cardiac tissue degeneration.
Projech Science to Technology (Madrid)
CN 101709288
A method of separating mesenchymal stem/progenitor cells from marrow comprising (i) culturing the marrow cell by mesenchymal stem cell (MSC) culture system and collecting the adherent cell; and (ii) continuously performing adherent culture to the marrow nonadherent cell by the MSC culture system for the second time and collecting the adherent cell of the second time. The method is simple, convenient and has high separation and amplification efficiency, producing highly purified mesenchymal stem/progenitor cells.
Guo Z, Jin J, Ju X, 9/29/2009 The Institute of Radiology & Radiation Wang H, Wang J, Zhao Y Medicine of the Academy of Military Medical Science of China (Beijing)
CN 101709289
Zhejiang University A method of inducing a pluripotent stem cell to (Hangzhou, China) become a mesenchymal stem cell (MSC) comprising cultivating the pluripotent stem cell on cell culture medium I, subculturing on cell culture medium II and cultivating on cell culture medium III until the monoclinic cells grow to obtain the MSC.
KR 2010051195
A composition for preventing obesity and for suppressing differentiation to the adipocyte of the mesenchymal stem cell comprising inhibitor of Dickkopf 1 (Dkk 1) or secreted Frizzled-related protein 4 (sFRP 4).
KR 2010047742
Bio Spectrum A composition useful for ameliorating vitiligo, where the specific differentiation-inducing factor is handled (Gunpo-si, Korea) and the stem cell is differentiated to melanocyte. The composition can help through the melanocyte differentiation induction to the other skin diseases including vitiligo and darkening of skin.
Seoul National University Research & Development Business Foundation (Seoul)
Gonzalez Galvez B, 11/20/2008 Rodriguez Cimadevilla JC
5/27/2010
5/27/2010
5/27/2010
5/27/2010
5/19/2010
Chen X, Ouyang H, Song X, Yin Z
12/15/2009
5/19/2010
Kang KS, Park JR
11/7/2008
5/17/2010
Lee JH, Lee JS, Park D
10/29/2008
5/10/2010
Source: Thomson Scientific Search Service. The status of each application is slightly different from country to country. For further details, contact Thomson Scientific, 1800 Diagonal Road, Suite 250, Alexandria, Virginia 22314, USA. Tel: 1 (800) 337-9368 (http://www.thomson.com/scientific).
nature biotechnology volume 28 number 7 JULY 2010
679
news and views
Paring down signaling complexity Kevin A Janes
© 2010 Nature America, Inc. All rights reserved.
A new method for characterizing signaling responses to pairs of agonists predicts how cells react to higher-order combinations of external stimuli. Cells process a multitude of external stimuli through receptor and adaptor proteins that converge on a core set of signal-transduction pathways1. How are such complex inputs converted into signaling outputs? In this issue, Chatterjee et al.2 describe an experimentally tractable method for predicting signaling responses to the enormous number of stimulus ‘cocktails’ that a cell might encounter. Their test case is calcium signaling in platelets, but the approach holds promise for modeling signal integration in any cell. If every signaling output were unique to an associated receptor, one could simply characterize signaling responses to individual stimuli and then use linear superposition to infer the global response to any stimulus combination. Of course, cross-talk in signaling is more the rule than the exception, and so merely focusing on individual inputs would overlook key network properties3. Nevertheless, cross-talk does not require one to test all stimulus combinations to reveal important features of signal transduction. Previous studies have shown that simple pairwise stimulation of cells can provide unique information about the activation states of a signaling network3–5. But it has remained unclear whether pairwise stimulation captures the major features of signal integration or whether higher-order combinations of three or more stimuli are necessary. Cells almost certainly encounter complex stimulus combinations in their in vivo microenvironment. However, as the number of stimuli increases beyond two, the number of possible combinations quickly explodes, and the chance that a cell will see any particular higher-order combination becomes exceedingly small. Triple-stimulus combinations Kevin A. Janes is in the Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, USA. e-mail: [email protected]
(e.g., A + B + C) would likely be too rare to evolve their own specific signal-integration mechanisms beyond those provided by subsets of two stimuli (e.g., A + B, B + C or A + C). The work by Chatterjee et al.2 and two other recent reports6,7 provide experimental support for this theoretical argument. Using different approaches, the three groups found that higher-order stimulus combinations could be accurately predicted from pairwise information. As emphasized in one study6, the practical implications of this finding are substantial, because it suggests that it is sufficient to test fewer stimulus cocktails experimentally. With existing high-throughput approaches8 one should easily be able to quantify signaling responses to all pairwise combinations of 20–40 ligands that a cell type is most likely to encounter. Pairwise measurements could then enable data-driven predictions of any higherorder ligand combination as the need arises. These predictions could be incorporated into tissue-level models composed of multiple cell types and stimuli that vary in time and space. A comprehensive and predictive model of signal integration would also provide a means to search for cocktails that amplify desired signaling outputs or dampen undesirable ones. The method of Chatterjee et al.2, called pairwise agonist scanning (PAS), systematically exploits the power of pairwise measurements to make network-specific predictions and diagnoses. The authors focused on the calcium response of platelets to agonists of blood clotting, or thrombosis. Understanding the environmental factors and intrinsic network properties that control platelet activation is of considerable clinical interest because too much or too little clotting can be fatal. Furthermore, as a model for signal transduction, the platelet is ideal—no nucleus, a few thousand mRNA species and a collection of signaling proteins dedicated to one major function.
nature biotechnology volume 28 number 7 JULY 2010
Platelet agonists mobilize calcium from intracellular stores to drive thrombosis. Calcium second-messenger dynamics are readily monitored by sensitive fluorescent dyes whose spectral properties change with the concentration of intracellular calcium. Chatterjee et al.2 established a cell-based assay of intracellular calcium concentration with one such dye and used the assay to define the half-maximal effective concentration (EC50) for six orthogonal platelet agonists. They then devised a high-throughput method for measuring the intracellular calcium concentration profile of a platelet suspension for 4 minutes after stimulation with all possible combinations of six agonists at 0.1 ×, 1 × or 10 × their EC50 in a single 384-well plate (Fig. 1). Notably, the resulting 135 stimulus pairs fell right where the combinatorics start to become unwieldy—for instance, it would take four times as many measurements to assay all stimulus triples. In total, PAS experiments yielded tens of thousands of intracellular calcium concentration measurements for each platelet sample. The authors then extracted pairwise information from these data by defining a synergy score for the time-integrated intracellular calcium concentration response to agonist combinations. This score captures synergy, antagonism and additivity in a single metric and compresses each measurement into a 135-element synergy score vector that describes the synergy ‘signature’ for a platelet population. Remarkably, when the authors applied PAS to platelets from ten human donors, they found that synergy signatures could reproducibly organize donors into subpopulations with greater fidelity than previous studies had reported. Such patient separation based on platelet phenotype has long been desired for predicting adverse reactions to coagulant, anticoagulant and thrombolytic drugs. To translate the applications of this work further toward patient prognosis, it will be important 681
ne w s and v i e w s Pairwise agonist scanning
A B C D E F
Platelets
A B C D E F
C+D C+D C+D
[Ca2+] Time
Train neural-network model
C+D Predicted [Ca2+] response
C+D C
+D
A+B A B C D E F
etc.
Predict responses to complex agonist conditions Predicted [Ca2+]
Agonists
C+D+A D
A C
Time
© 2010 Nature America, Inc. All rights reserved.
Figure 1 Combining pairwise agonist scanning (PAS) with neural-network modeling to capture a complex stimulus landscape. Chatterjee et al.2 used PAS to measure the dynamic intracellular calcium response of platelets to all possible pairs of six agonists added at different dosings (agonists A–F, where smaller labels indicate lower doses). PAS data were then used to train a neural-network model of intracellular calcium responses. The model predicted sequential (blue) and higher-order agonist combinations (red), suggesting that most of the overall stimulus landscape had been captured with pairwise data from PAS.
to connect PAS-based discrimination with key clinical variables or platelet aggregopathies. PAS signatures may be reproducible and donor specific, but do they truly capture the higher-order signal-transduction properties of cells? To compile the PAS data in a format that enables predictions for new cocktails of agonists, Chatterjee et al.2 used a history-dependent form of neural-network modeling9 (Fig. 1). With reasonably few fitted parameters, they built a model to predict the platelet intracellular calcium concentration time course, given an arbitrary combination and timing of the six selected agonists. When challenged with new cocktails added together or sequentially, the model performed surprisingly well. Overall, the authors observed a strong correlation between predicted and measured synergy scores, although predicted intracellular calcium concentration responses occasionally did not match experimental measurements when several agonists were added together at 10 × EC50. Importantly, the predictions were achieved using <4% of the data that would be required to measure the six-factor agonist space exhaustively. PAS efficiency would become even more dramatic if more agonists were included in the stimulus panel. The utility of PAS should extend well beyond the particular network studied in this paper. For example, with proper signaling readouts, PAS could be used in pharmaceutical development to investigate the primary hits from a small-molecule screen. PAS in the presence and absence of a primary compound would quickly reveal contexts in which the drug has perturbed the normal signal-transduction machinery. Overall, PAS will be most successful in situations where the convergent intracellular pathways are established and where time-dependent signaling readouts are easily obtained for hundreds to thousands of samples. COMPETING FINANCIAL INTERESTS The author declares no competing financial interests. 1. Miller-Jensen, K., Janes, K.A., Brugge, J.S. & Lauffenburger, D.A. Nature 448, 604–608 (2007).
682
2. Chatterjee, M.S., Purvis, J.E., Brass, L.F. & Diamond, S.L. Nat. Biotechnol. 28, 727–732 (2010). 3. Gaudet, S. et al. Mol. Cell. Proteomics 4, 1569–1590 (2005). 4. Janes, K.A. et al. Science 310, 1646–1653 (2005).
Natarajan, M. et al. Nat. Cell Biol. 8, 571–580 (2006). Geva-Zatorsky, N. et al. Cell 140, 643–651 (2010). Hsueh, R.C. et al. Sci. Signal. 2, ra22 (2009). Albeck, J.G. et al. Nat. Rev. Mol. Cell Biol. 7, 803–812 (2006). 9. Krogh, A. Nat. Biotechnol. 26, 195–197 (2008).
5. 6. 7. 8.
Inhibitors for E3 ubiquitin ligases John R Lydeard & J Wade Harper Two studies show that specific cullin-RING E3 ubiquitin ligases can be targeted with small molecules. The ubiquitin-proteasome system for protein degradation regulates a wide array of cellular processes, and its misregulation has been linked to numerous human pathologies. General inhibitors of the proteasome and of the cullinRING sub-family of E3 ubiquitin ligases that confer specificity to the ubiquitin machinery have shown promise against cancer in clinical and preclinical studies1, triggering a search for more discriminating drugs that might have greater efficacy and safety for particular indications. As described in this issue, Orlicky et al.2 and Aghajan et al.3 have now identified the first specific inhibitors of individual cullin-RING E3 ubiquitin ligases. Although these studies were carried out in budding yeast, they raise the exciting possibility of developing drugs targeting components of the human cullin-RING ligase system that were previously considered ‘undruggable’. Moreover, Orlicky et al.2 reveal what may emerge as a general strategy for allosteric modulation of the WD40-repeat class of proteins. Cullin-RING ligases are one of the largest families of E3 ubiquitin ligases and provide specificity to E1-E2-E3 ubiquitin transfer cascades. The typical cullin-RING ligase—a John R. Lydeard and J. Wade Harper are in the Department of Pathology, Harvard Medical School, Boston, Massachusetts, USA. e-mail: [email protected]
S kp1-Cul1-F-box protein (SCF) ligase—is composed of a central cullin scaffold that binds the RING domain–containing protein Rbx1 through its C terminus and the Skp1-F-box protein substrate receptor module on its N terminus4 (Fig. 1a). Rbx1 recruits the ubiquitincharged E2-conjugating enzyme Cdc34, which ubiquitinates substrates bound to the F-box protein. SCFs are thought to target hundreds of different substrates for ubiquitination through variants of the F-box protein, ~70 of which are present in the human genome. F-box proteins typically bind substrates through leucine-rich or WD40 repeats located near their C terminus, often in a phosphorylation-dependent manner. For example, through the WD40 repeats of Cdc4, SCFCdc4 recognizes the Cdk1 inhibitor Sic1 once it is phosphorylated within multiple Cdc4 phosphodegron motifs5,6. A degron is a small motif that is sufficient for the recruitment of and ligation of ubiquitin by an E3 ligase. The human Cdc4 ortholog Fbw7 is a tumor suppressor that targets several key cell cycle and transcriptional control proteins containing conserved Cdc4 phosphodegron motifs7,8. SCF complexes hold significant potential for therapeutic intervention from a biological perspective as they are linked to many disease processes. Nevertheless, they have proven challenging to target, in large part because the
volume 28 number 7 JULY 2010 nature biotechnology
ne w s and v i e w s a
b P
UB F-box
Cdc34
Skp1
RbX1
Cul1
4
Y574
5
2
1
+ + P +
WD40
Cdc4
© 2010 Nature America, Inc. All rights reserved.
2 3
1
3 ++ +
8 L634 H631
8
4 7
6
7
5 6
SCF-I2
Figure 1 Allosteric inhibition of SCFCdc4 by SCF-I2. (a) Schematic of the SCFCdc4 complex, with the phosphodegron (red) bound to the surface of the Cdc4 WD40 β-propeller. UB, ubiquitin. (b) Schematic of the Cdc4 β-propeller (blue) and associated Cdc4 phosphodegron (red) showing the orientations of His631 (H631), Tyr 574 (Y574) and Leu634 (L634), whose positions are altered upon binding of SCF-I2 ‘+’ indicates the position of arginine residues in Cdc4 that bind the phosphodegron. Rotation of H631 provides a pocket for SCF-I2 binding at the interface between propeller blades 5 and 6. This leads to rotations of the side chains of Y574 and L634, disrupting interactions between these residues and hydrophobic residues in the phosphodegron. Small changes in the position of positively charged arginine residues in the phosphodegron binding site may also contribute to loss of interaction between Cdc4 and the phosphodegron. Disruption of the interactions between Leu-2 in the phosphodegron (red swiggle) and Y574 and L634 in Cdc4 constitute the primary allosteric structural changes induced by SCF-I2 binding to Cdc4.
catalytic machinery of the cullin system and, particularly, its E2-conjugating enzyme Cdc34, lacks the deep binding pockets characteristic of conventional drug targets. Moreover, inhibition of the core machinery, such as E2 itself, would be less specific than inhibition of a single SCF complex. An alternative possibility would be to specifically target substrate binding sites within particular F-box proteins. However, crystallographic studies of Cdc4 and Fbw7 bound to peptide degrons have revealed that much of the interaction occurs on an exposed surface of the WD40 propeller structure, again making the identification of direct competitive inhibitors difficult5–7. Using a fluorescence polarization assay that measures displacement of a Cdc4 phosphodegron–containing peptide from Cdc4, Orlicky et al.2 screened for small molecules that inhibit budding yeast SCFCdc4. Among the top hits was a biplanar dicarboxylic acid compound named ‘SCF-I2’. Importantly, SCF-I2 inhibited binding and ubiquitination of the SCFCdc4 substrates Sic1 and Far1 in vitro, as expected based on its ability to block phosphodegron binding, but had no effect on the activity of a related WD40-containing SCF complex, SCFMet30. Given the assay employed for inhibitor identification, it would be logical to assume that SCF-I2 simply binds to the phosphodegron binding site on Cdc4, thereby blocking substrate binding. However, analysis of the SCF-I2/Cdc4 co-crystal structure2 provided a major surprise, revealing that SCF-I2 acts as an allosteric inhibitor rather than a direct competitive inhibitor (Fig. 1b). Cdc4 contains eight WD40 repeat motifs, which form a canonical β-propeller structure bearing arginine and hydrophobic residues that interact with phosphothreonine and hydrophobic residues on the phosphodegron motif, such as Leu-Leu-phosphoThr-Pro.
In the co-crystal structure, SCF-I2 is inserted into a pocket formed by several otherwise buried hydrophobic residues between blades 5 and 6 on the lateral edge of the Cdc4 β-propeller. Interestingly, this binding site does not exist as such in the apo-Cdc4 structure but is formed by movement of multiple structural elements in the vicinity of the inhibitor binding site, particularly His631 (Fig. 1b). These conformational changes are propagated to the vicinity of the phosphodegron binding site itself, 25 Å from the SCF-I2 binding site. Structural rearrangement of key phosphodegron-binding residues—including Tyr574 and Leu634, which form the hydrophobic binding site for residues such as leucine at position –2 in the phosphodegron relative to the phosphothreonine—fill the void left by SCF-I2–induced rearrangements (Fig. 1b). Loss of this interaction with the phosphodegron sequence, coupled with a further disruption in the hydrogen-bonding architecture of arginine residues involved in phosphothreonine binding (including Arg572), appears to be sufficient to greatly reduce phosphodegron binding to Cdc4. Mutations in the SCF-I2 binding site as well as mutations that mimic SCF-I2–induced changes in the phosphodegron binding site are consistent with the structural studies and reveal that relatively small changes in the structure of the phosphodegron binding site can have a large impact on affinity. SCF-I2 is not an effective inhibitor of Fbw7 (ref. 2). This is not surprising, given that Cdc4 and Fbw7 are only 31% identical (51% similar) in the WD40 repeat domain, and key residues at the SCF-I2-Cdc4 interface are not conserved in Fbw7. Thus, even for evolutionarily and functionally related F-box proteins, significant specificity may be achievable.
nature biotechnology volume 28 number 7 JULY 2010
Chemical genetics screens in yeast provide another approach for discovering smallmolecule inhibitors of interest in clinical drug development. The fungal product rapamycin is an inhibitor of the target of rapamycin (TOR) protein kinase, which coordinates nutrient signaling and protein synthesis in eukaryotes and is being investigated for cancer therapy. The identification of small molecules that enhance the cell cycle–arrest activity of rapamycin might improve its therapeutic utility. Using a budding yeast platform and chemical libraries, Aghajan et al.3 identified a series of small-molecule enhancers of rapamycin (SMERs). One of these, SMER3, induced a transcriptional profile reminiscent of the methionine-biosynthesis response, a pathway that is controlled by the SCFMet30 complex. SCFMet30 keeps the Met4 transcription factor in an inactive ubiquitinated form in the presence of methionine. When methionine is absent, SCFMet30 activity is restrained and Met4 promotes the methionine biosynthesis transcriptional program. The methionine biosynthesis transcriptional signature induced by SMER3 led Aghajan et al.3 to examine whether SMER3 might directly inhibit SCFMet30. In vivo, SMER3 led to a loss of Met4 ubiquitination, and in vitro, SMER3 inhibited SCFMet30-dependent Met4 ubiquitination but not SCFCdc4-dependent Sic1 ubiquitination. Although the precise mechanism by with SMER3 inhibits SCFMet30 is unknown, biochemical studies indicate that SMER3 leads to disassembly of the Skp1-Met30 interaction in vivo without affecting the interaction of Skp1 with other F-box proteins. Precisely how disassembly occurs is unclear, although biophysical and partial proteolysis experiments suggest subtle alterations in the Skp1-Met30 complex in the presence of SMER3. Further 683
© 2010 Nature America, Inc. All rights reserved.
ne w s and v i e w s work is required to understand how inhibition of SCFMet30 and TOR conspire to arrest the cell cycle. Together, the studies of Orlicky et al.2 and Aghajan et al.3 make a case for more in-depth, systematic attempts to target SCFs specifically and cullin-RING ligases generally beyond the specialized example of MDM2, a non-cullin based RING E3 ligase of p53. Mdm2 has been successfully inhibited by the small-molecule nutilins but is not a member of a large family of E3 ligases8. Although there are 7 cullins and ~200 cullin-ring ligase adaptor proteins in humans, suggesting a seemingly endless number of possible targets, only a small number of cullin-RING ligases have thus far been implicated as oncogenes (e.g., SCFSkp2 (ref. 8)). The use of biochemical screens of the type performed by Orlicky et al.2 are potentially applicable to any cullin-RING ligase adaptor for which a peptide substrate (degron) can be identified, and this approach could be used for virtually any class of substrate-binding domain found in cullin-RING ligases. The ability to target single members of the cullin-RING ligase family would be expected to cause fewer sideeffects than general blockade of the entire class of E3 ligases or the proteasome itself. Perhaps more importantly, the work of Orlicky et al.2 strongly suggests that WD40 proteins themselves might be generally targetable. The structural transition seen upon binding of SCF-I2 to Cdc4 is remarkably similar to the transition observed upon binding of phosducin to the WD40 propeller of transducin-β, the β-subunit of the trimeric transducin-αβγ GTPase complex. In this case, phosducin-dependent opening between blades 5 and 6 of the propeller allows for insertion of the hydrophobic farnesyl moiety of the transducin-γ subunit to form the trimeric α, β, γ complex2. If blade interfaces in WD40 proteins can generally serve as sites for smallmolecule binding, possibly reflecting structural changes that normally occur during engagement of their in vivo targets, then a large segment of the WD40-containing proteome (with >250 members in humans and including several dozen cullin-RING ligase adaptors) could become candidate drug targets. WD40-containing proteins are among the largest class of protein interaction domains found in the human genome, and they play important roles in numerous signaling networks implicated in disease. Thus, a general strategy for targeting specific WD40 proteins would open up a large and important class of proteins to therapeutic intervention. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests.
684
1. Soucy, T.A. et al. Nature 458, 732–736 (2009). 2. Orlicky, S. et al. Nat. Biotechnol. 28, 733–737 (2010). 3. Aghajan, M. et al. Nat. Biotechnol. 28, 738–742 (2010). 4. Petroski, M.D. & Deshaies, R.J. Nat. Rev. Mol. Cell Biol. 6, 9–20 (2005).
5. Nash, P. et al. Nature 414, 514–521 (2001). 6. Hao, B., Oehlmann, S., Sowa, M.E., Harper, J.W. & Pavletich, N.P. Mol. Cell 26, 131–143 (2007). 7. Welcker, M. & Clurman, B.E. Nat. Rev. Cancer 8, 83–93 (2008). 8. Nalepa, G., Rolfe, M. & Harper, J.W. Nat. Rev. Drug Discov. 5, 596–613 (2006).
Systematic phenotyping of mouse mutants Wolfgang Wurst & Martin Hrabe de Angelis Comprehensive phenotypic screening of knockout mice highlights the pleiotropic functions of secreted and transmembrane proteins. Deciphering the functions of each mammalian gene is one of the most important challenges that the biomedical research community faces in the next few decades. Although advances in systematic mutagenesis technologies and phenotyping approaches now make it feasible to define physiological roles for all mammalian genes, determining all of the phenotypes associated with a single mutation will likely remain a major bottleneck for many years. In this issue, Tang et al.1 report the generation of 472 mouse knockout lines, each disrupted in the expression of a different secreted or transmembrane protein. They then use a broad phenotypic screen to establish the effects of each gene deletion on a wide range of physiological traits (Fig. 1). The findings support the value of comprehensive, systematic phenotyping to unravel new and unexpected functions for mammalian genes and provide leads for more detailed mechanistic studies and the identification of therapeutic targets. The mouse plays an essential role in genetic research; any mutation, ranging from large chromosomal deletions to point mutations, can be generated with single-nucleotide precision. Thus far, most efforts to phenotype mutant mice have been conducted on a small scale and have concentrated on the specific biological processes that the investigators who generate the mouse in question happen to be interested in. As a consequence, many interesting effects of the gene deletions are likely to be missed. Tang et al.1 contribute to the overall goal of a complete functional annotation of the Wolfgang Wurst and Martin Hrabe de Angelis are at the Helmholtz Zentrum München, Munich/Neuherberg, Germany. e-mail: [email protected] or [email protected]
mouse genome by phenotyping their library of knockout animals using 85 assays that span immunology, metabolism, bone metabolism, cardiology, oncology, growth, ophthalmology, neurobiology, reproduction, viability and embryonic lethality. The 472 secreted and transmembrane proteins they study were chosen based on their association to human diseases, their expression profile and their likely suitability as drug targets. Interestingly, 419 (89%) of the mutants tested in this pilot study exhibited a measurable phenotypic alterations in one (150 mutants (32%)) or two or more (269 mutants (57%)) organ systems. It should be noted, however, that this pilot study was only conducted on a mixed genetic background with a limited number of animals tested per line (eight mutants, eight wild type, both sexes). Considering that not all organ systems have been analyzed to the same extent and that no in-depth secondary phenotyping has been performed, this outcome clearly shows that mutations in a single gene usually affect several biological systems and functions, corroborating the often neglected concept of pleiotropy in gene function. Surprisingly, there is a trend, albeit not a significant correlation, that suggests an association between tissue-specific gene expression and organ-specific phenotypes, even when secreted proteins that are likely to have effects at distal sites were not included in the analysis. This may indicates systemic and tissue-specific, expression-independent effects of single-gene mutations. Nonetheless, it seems possible that the lack of significance observed might result either from inadequate statistical power in this study or because some phenotypes are not directly caused by changes in expression levels of mRNA but instead by cell-cell interactions and secondary effects in downstream pathways affected by the knockout mutation.
volume 28 number 7 JULY 2010 nature biotechnology
ne w s and v i e w s Selection of genes to knock out
Comprehensive phenotyping
Resource of phenotypes of individual genes
Neurology Lung
Oncology
Behavior
Pathology
Metabolism
Embryology
Cardiovascular
Reproduction
Gene A
Blood cholesterol level
Gene B
Blood triglyceride level
Gene C
Body weight
Gene D
Red blood cell count
Gene E
Femur bone thickness
Gene F
Heart rate
Gene G
Blood pressure
Immunology
Transmembrane and secreted proteins of potential relevance to human diseases
85 validated assays in a defined order
Encyclopedic database (all species)
© 2010 Nature America, Inc. All rights reserved.
Figure 1 The comprehensive screen performed by Tang et al.1 scored phenotypes of mice, each of which had one of 472 genes encoding membrane and secreted proteins knocked out, to reveal the involvement of the deleted gene in the function of different organ systems. The genes were selected based on their homology to human genes implicated in disease, their expression profiles and the likelihood that their products could be targeted by drugs. The gene-phenotype associations and the publically available mouse strains provide a valuable resource for more in-depth secondary phenotyping and mechanistic studies.
Alternatively, expression patterns might have been missed due to low abundance or transient expression at stages not analyzed. Although we do not dismiss the magnitude of the efforts and accomplishments of Tang et al.1, characterization of 472 mouse lines is but a first step to comprehensive analysis of the ~20,000 mammalian genes, each with one or more alleles, and the hundreds of phenotypic parameters that may be measured in cohorts of male and female mice in different inbred strains. Large-scale systematic mutagenesis efforts are underway to generate null and conditional mutations for every coding gene of the mouse genome in the framework of the International Mouse Knockout Consortium (http://www.knockoutmouse.org/)2, and will likely proceed for many years to come. The even larger challenge of linking phenotypes with individual mutations will be left to the International Mouse Phenotyping Consortium (IMPC)3. This group, which includes several genomics centers in Asia, Australia, Europe and North America, comprises several so-called ‘mouse clinics’ that have been established to perform broad systemic phenotyping of mouse mutants4. Despite the encouraging investment in these structures, their capacity is not sufficient to analyze all mutant mouse lines that will be produced in the next decade5. To ensure significant progress, their throughput will need to be increased considerably from analyzing hundreds of mutant lines to phenotyping thousands of mutant mouse lines per year. Although automation may contribute to increase capacities, additional and larger phenotyping centers with long-term funding are needed. Quality management and global consensus on common standards will be key for the success of large-scale phenotyping programs. All assays must be reproducible
even if the experiments are performed in different laboratories. The ‘order of test’ is important in this respect—e.g., results of behavioral tests will be much altered if mice have undergone stressful situations before the test. Phenotyping is far more complex and sensitive to experimental variations than most broadly comparable ventures, such as sequencing a genome. Deposition of the appropriate metadata (e.g., light/dark cycle, diet, water, temperature, humidity, time of test performance and even the name of the person who performed the test) can be critical to ensuring reproducibility. Factors such as differences between laboratories in the hygienic status of animals (infectious diseases or the status of the gut flora might influence the outcome of many assays) may need to be considered. As part of the IMPC efforts, the European Union Mouse Research for Public Health and Industrial Applications (EUMORPHIA) has developed the first standard set of phenotyping protocols that were validated across several laboratories engaged in such phenotyping programs. A limited selection of these phenotyping protocols form a coherent sequence of tests to fully characterize a mutant mouse line. This collection of assays is known as EMPReSS (European Mouse Phenotyping Resource for Standardized Screens). All protocols are publically available (http://www.eumorphia. org/) and the EMPReSS pipeline is currently used in the large-scale phenotyping program EUMODIC (The European Mouse Disease Clinic, http://www.eumodic.org/). The assays used by Tang et al.1 comprise only part of the phenotyping areas and the assays are not identical to those currently used by the mouse clinics involved in the IMPC. Nevertheless, as the authors report on many interesting phenotypic alterations in the knockouts they study, it would be very
nature biotechnology volume 28 number 7 JULY 2010
i nteresting to screen the mouse lines for the more comprehensive set of more than 500 parameters assayed in the mouse clinics to establish which phentoypes are missed in a screen like the one performed by Tang et al.1 and how much insight is added by additional, more labor-intensive assays. Such an experiment might have an impact on the future design of the IMPC screening pipeline. In translating the findings of large-scale phenotypic screens to clinical applications, it is important to bear in mind that many human diseases are multifactorial and thus difficult to model in laboratory mice. As they are often induced by combinations of genetic susceptibility, environmental factors and aging, extensive challenge tests would need to be included in the phenotyping screens to investigate the influence of, for example, nutrition, environmental factors, psychological stress, infections, exercise and age6. Given that even basic phenotyping poses an enormous challenge, sophisticated challenge tests can only be performed for preselected mutants. Compiling an encyclopedic account of the functions of all mouse genes will dramatically alter our understanding of the biology of mammalian life and undoubtedly accelerate progress in biomedical research. The focused study on secreted and transmembrane proteins by Tang et al.1 is an important milestone in this ongoing challenge. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. 1. Tang, T. et al. Nat. Biotechnol. 28, 749–755 (2010). 2. The International Mouse Knockout Consortium. Cell 128, 9–13 (2007). 3. Abbott, A. Nature 465, 410 (2010). 4. Gailus-Durner, V. et al. Nat. Methods 2, 403–404 (2005). 5. Brown, S.D., Hancock, J.M. & Gates, H. PLoS Genet. 2, e118 (2006). 6. Beckers, J., Wurst, W. & Hrabe de Angelis, M. Nat. Rev. Genet. 10, 371–380 (2009).
685
ne w s and v i e w s
Splicing by cell type Mauricio A Arias, Shengdong Ke & Lawrence A Chasin
© 2010 Nature America, Inc. All rights reserved.
A comprehensive study identifies sequence features that predict tissue-specific alternative splicing. The rules governing exon splicing in different cell types to generate protein diversity are complex and apparently manifold. In a recent paper in Nature, Barash et al.1 have applied machine learning to high-throughput splicing data to identify combinations of sequence features that can be analyzed to predict tissue-specific alternative splicing patterns. By using a multitude of features to describe an RNA molecule and focusing on cell-specific splicing decisions, the authors have provided a much richer picture of the code underlying alternative splicing than has been achieved previously. In contrast to transcription and translation, in which the flow of information from DNA to pre-mRNA and from mRNA to protein is governed by simple codes, the processing of pre-mRNA to mRNA is less straightforward. Extracting exons from pre-mRNA and splicing them together to create mRNA requires, first and foremost, a mechanism for distinguishing exons and introns. Intron recognition always takes place during the splicing event itself, which is catalyzed by the large spliceosomal machinery comprising five RNA molecules and >100 proteins. In contrast, exon recognition is thought to occur before the splicing reaction. The main evidence for this is that disruption of an individual splice site most often leads to the entire exon being skipped. How early exon recognition takes place is not well understood. The sequences immediately surrounding the splice sites themselves do not contain enough information to demarcate the borders of exons. Several lines of evidence have shown that additional information exists in short degenerate sequence motifs that lie both within and outside the exons. These genetic elements have been shown to interact with specific RNA-binding proteins to either enhance or silence splicing, but the underlying mechanisms have remained elusive. The composition, location and function of these sequence elements have been called the ‘splicing code’2–5. Deciphering the splicing code is more complicated than analyzing the linear arrangement of these sequence elements, for several Mauricio A. Arias, Shengdong Ke and Lawrence A. Chasin are in the Department of Biological Sciences, Columbia University, New York, New York, USA. e-mail: [email protected]
686
reasons. First, RNA can fold into intricate three-dimensional structures, driven mostly by base pairing between different regions of the molecule. The availability of a pre-mRNA sequence to bind an RNA-binding protein therefore depends on its structure. Pre-mRNA structure itself could also play a direct role in splicing. Second, as splicing can take place while RNA is being transcribed, it can be influenced by the transcription complex, which may act as a conduit for the delivery of gene-specific splicing factors and/or by pausing of transcription to allow a splice site to be recognized6. Third, chromatin structure is emerging as a possible modulating factor in splicing (e.g., refs. 7 and 8). Thus, the splicing code can involve DNA sequences as well as RNA. The situation is even more complicated because the splicing code can produce multiple outcomes in a given cell type and can be interpreted differently in different cellular environments. The result is alternative splicing, with the same gene giving rise to multiple mRNA isoforms and their corresponding protein isoforms. Although most exons are spliced constitutively—that is, included with near 100% efficiency in all mature mRNA molecules produced in all tissues examined—a large minority are alternatively spliced, such that almost all mammalian genes undergo some alternative splicing. Alternative splicing can generate a proteome that is much larger than the transcriptome, thereby explaining the relative complexity of higher organisms without much of a difference in genome size. Tissuespecific alternative splicing adds another layer to the splicing code, with differences between tissues presumably mediated by different repertoires or levels of splicing factors or chromatin structures. The code for tissue-specific alternative splicing may be part and parcel of the general code or distinct from it, or the two may overlap. The study of Barash et al.1 tackles the tissuespecific splicing code through a collaboration between computational and experimental researchers. The authors’ strategy was to reveal the elements of the code by associating the presence of sequence ‘features’ with splicing outcomes (Fig. 1). The latter, determined by high-throughput microarray measurements of mRNA levels, comprised 3,665 alternatively spliced exons in 27 mouse cells and
tissues. The complexity of the problem was then reduced in two ways. First, the 27 samples were grouped into four tissue categories (CNS, muscle, digestion and the embryo) for comparison. Second, relative percent inclusion levels were made discrete as three probabilities: increased, decreased or unchanged inclusion in a particular tissue compared to a baseline. A machine learning algorithm was developed to discover which features were associated with increased or decreased exon inclusion in each tissue category. The algorithm was tested against exons not used for training for its ability to predict increased or decreased relative inclusion levels in pairwise comparisons of different tissue categories. An accuracy of ~90% was achieved, attesting to the validity of the method. The collection of sequence features is perhaps the heart of this study. The authors compiled a list of 1,014 diverse features using data in the literature and their own intuition. Most of the features were based on oligomeric sequences discovered in various types of experiments—for example, sets of predicted and validated hexamer sequences from statistical analysis of the transcriptome, ligand sequences for splicing factors and positional weight matrices for sequences derived by functional selection. But the feature list also included the density of all possible base trimers, dimers and even single bases. RNA structure was taken into account as predicted single-strandedness around regions such as the splice sites. Splice site scores, the creation of premature stop codons, frame shifts, exon length and evolutionary conservation were also included. In addition, the features were considered separately for seven different regions: the alternatively spliced exon and 300 nt of its intronic flanks plus the upstream and downstream exons and their proximal intronic flanks. These last four regions can be located thousands of nucleotides away from the exon in question. The separate consideration of these seven regions multiplies the number of features tracked. Whereas tissue-specific splicing motifs have been discovered by genomic analysis in the past (e.g., ref. 9), this study stands out for its comprehensiveness and its inclusion of distant locations. About 200 of the original 1,014 features proved to be useful in predicting alternative
volume 28 number 7 JULY 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
ne w s and v i e w s
COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. 1. Barash, Y. et al. Nature 465, 53–59 (2010). 2. Wang, Z. & Burge, C.B. RNA 14, 802–813 (2008). 3. Chasin, L.A. Adv. Exp. Med. Biol. 623, 85–106 (2007).
Exon inclusion in 4 tissue types
Exon inclusion in 27 tissues
555 Density of A bases 31 Splice site single-strandedness 23 Downstream intron conservation
Learning algorithm
Exons
CMDE
Define ~1,000 features, such as: 190 Nova binding sites
Exons
splicing. This filtered list includes confirmatory assignments for binding sites of the polypyrimidine tract–binding protein and the Nova splicing factor, for example, but it also suggests unexpected roles for the density of many short sequences and, intriguingly, for sequences residing in the far-flung adjacent exon regions. Importantly, in a post-processing step, the authors could identify many pairs of features that significantly co-occurred, suggestive of specific molecular interactions. Overall, the results provide a list of players whose roles can now be followed up with mechanistic studies. The list also allows an exploration of the effect on splicing of single-nucleotide polymorphisms that disrupt important features, a direction that could prove relevant to human disease. Even at this early stage, the authors were able to come up with evidence for increased gene expression in embryonic stem cells through the exclusion of alternatively spliced ‘killer’ exons that reduce mRNA levels in adult tissue. Furthermore, the method itself can be applied to understand codes for processes other than splicing. Although this comprehensive study represents an important advance, there is more to be done. An improved code would provide quantitative predictions of exon inclusion rather than just directionality. Additional wet validation experiments to test the importance of features would allow conclusions based on statistics to be accepted with confidence. The use of RNA-seq data to measure exon inclusion should improve the accuracy of the code. Finally, tissue-specific levels of RNA-binding proteins, RNA-binding-protein occupancy and nucleosome position and modification may provide additional useful information. The strategy of Barash et al.1 was not aimed at determining a general code for exon definition but rather a code for alternative splicing— the difference in the splicing of a given exon in two different environments. Although there may be differences in how alternative exons are defined10, it would be surprising if many of the features identified here do not turn out to reflect basic mechanisms in splice site recognition. Indeed, the comparison of two different states (tissues) can help pinpoint such factors. Perhaps the most important message from this work is that each exon does not march to the beat of a different drummer, but is spliced through a complex but knowable system based on a large but definable set of features.
~200 most useful features CNS
555.
31. 190.
23.
Muscle
Digestive
Embryo
Rules
Figure 1 Scheme for associating RNA sequence features with splicing outcomes. Barash et al.1 used >1,000 diverse sequence features (top left); the examples shown here were chosen to illustrate their diversity. Each feature was also defined by the region in which it occurs, as indicated on the map on the lower left, where the alternatively spliced exon is shown in red. Exon inclusion data were originally measured in 27 mouse tissues or cell lines using microarrays and then consolidated into four tissue types: C, central nervous system; M, striated and cardiac muscle; D, digestion-related tissues; E, embryonic tissue and stem cells (upper right; darker shades represent higher exon inclusion levels). A machine learning algorithm was devised to associate particular features with particular splicing outcomes, the latter categorized as increased exon inclusion, increased exon exclusion or no difference between two tissue types. After training on a set of ~3,000 exons, the algorithm could reliably predict these splicing outcomes in a set of test exons. 4. Fu, X.D. Cell 119, 736–738 (2004). 5. Trifonov, E.N. Comput. Appl. Biosci. 12, 423–429 (1996). 6. Munoz, M.J., de la Mata, M. & Kornblihtt, A.R. Trends Biochem. Sci. (2010). doi:10.1016/ j.tibs.2010.03.010.
7. Tilgner, H. et al. Nat. Struct. Mol. Biol. 16, 996–1001 (2009). 8. Luco, R.F. et al. Science 327, 996–1000 (2010). 9. Das, D. et al. Nucleic Acids Res. 35, 4845–4857 (2007). 10. Xue, Y. et al. Mol. Cell 36, 996–1006 (2009).
A synthetic DNA transplant Mitsuhiro Itaya The complete set of tools needed to synthesize a functional genome and transplant it into a mycoplasma cell opens up the possibility of mixing and matching natural and synthetic DNA to make genomes with new capabilities. The recent creation of a new bacterium Mycoplasma mycoides JCVI-syn1.0 from an artificially constructed genome represents a technical tour de force. The accomplishment, described in a paper by Gibson et al.1 of the J. Craig Venter Institute (JCVI; Rockville, MD, USA) published in Science, is the culmination of over a decade of effort to create a cell with an artificial genome. Although creation of a self-replicating cell using a computer as the starting point represents an important breakthrough for synthetic biology, several Mitsuhiro Itaya is at the Laboratory of Genome Design Biology, Institute for Advanced Biosciences, Keio University, Yamagata, Japan. e-mail: [email protected]
nature biotechnology volume 28 number 7 JULY 2010
key details of the transplantation protocol remain to be established. Moreover, gaps in our knowledge of genome biology and the expense of producing whole genomes synthetically will likely limit wide adoption of the approach for the foreseeable future. The synthetic biology group at JCVI has developed and released several basic methods2–4 that together have made up incremental steps toward the ultimate aim of creating a synthetic genome that can then be transplanted into a recipient (so-called chassis) organism. In their present paper, Gibson et al.1 now combine these methods and successfully apply them to design a particular mycoplasma strain that never existed before. The methods essentially comprise three major parts, as illustrated in 687
© 2010 Nature America, Inc. All rights reserved.
ne w s and v i e w s Figure 1: writing genome sequences, assembling DNA fragments provided by de novo synthesis and delivering the assembled genome to the chassis for selection. In terms of designing and writing genome sequences, we remain largely constrained to those natural viral and bacterial genome templates that the sequencing projects have deciphered. Even with emerging multidisciplinary approaches in synthetic biology5, however, we are still far from being able to design complex circuits of genes that we can predict will be functional in cells, let alone writing from scratch the blueprint for an entire genome nucleotide sequence of 1,000 genes. Thus, for the time being, most work will likely continue to use existing genomes as the starting point, with efforts exploring the extent of gene additions or deletions that can be tolerated to produce new functionality without compromising viability. In the study by Gibson et al.1, 14 of the genes in the M. mycoides subsp. capri genome (on four of the >1,000 synthesized DNA fragments) were deleted and ‘watermarked’ with another 5,000-plus base pairs. In terms of de novo DNA synthesis, two fronts have played an important role in facilitating the efforts of the JCVI group and other synthetic biologists. First, substantial cost reductions in state-of-the-art nucleotide chemical synthesis technologies have made the creation of 5- to 10-kilobase segments of DNA economically feasible. Prices still prohibit the majority of research groups from undertaking projects as ambitious as that of the JCVI group, but increasing commoditization of oligonucleotide synthesis has already been predicted over the coming years6. Second, the increasing performance and power of DNA sequencing has greatly improved our ability to correct errors in synthesized sequences5. This has been pivotal in ensuring the fidelity of the final synthetic genome sequence. Because of the constraints on the length of oligos (~10 kb) that can be created by chemical synthesis alone, the JCVI group started with >1,000 1,080-base sequences covering the entire M. mycoides genome. Each of these sequences, propagated in Escherichia coli, had an 80-base overlap with its neighbors to ensure assembly in the correct order. They then turned to a familiar workhorse of the molecular biologist—baker’s yeast, Saccharomyces cerevisiae2—to assemble the DNA fragments into larger molecules. Familiarity with S. cerevisiae as a tractable recombinant DNA host enabled Gibson et al.1 to first stitch together 10,000 base sequences, then 100,000 base sequences and finally the complete 1.08 × 106-bp circular genome2. 688
The final and key step achieved in the latest paper is the delivery of the naked DNA genome constructed in yeast to an appropriate cell container or chassis. In their previous work3, the same group had alighted upon Mycoplasma capricolum, a species related to M. mycoides, as their preferred chassis organism. To ensure successful transition of the synthetic genome, they used a M. capricolum recipient strain containing an inactivated restriction enzyme gene (MCAP0050) and forced exchange with the existing natural genome by what they call genome transplantation3. A critical aspect of achieving successful transplantation relates to additional in vitro methylation of the synthetic genome using M. capricolum extracts, followed by deproteinization. Appropriate methylation of the genome assembled in yeast prevents its digestion by restriction enzymes in the chassis. Ultimately, the unified protocol described in Figure 1 borrows some additional important tricks from recombinant DNA technology; indeed, the M. mycoides JCVI-syn1.0 can be described as a genetically engineered microbe. But there are also some important differences in the Gibson et al.1 protocol from the type of DNA manipulations carried out in standard molecule biology protocols. One important difference is the size of the DNA molecules being handled. Until recently, it was not possible to manipulate several kilobases of DNA— let alone a 0.5- to 1.1-million base mycoplasma genome—in the test tube. DNA fragments are vulnerable to physical shearing and highly prone to breaking at random sites. Damage caused by shearing is relatively small for DNA fragments smaller than 10 kb—the size of most constructs handled in traditional recombinant technology. Conversely, the manipulation of DNA molecules >500 kb in size is hampered by unavoidable fragmentation4, even though approaches involving immobilization of DNA in agarose gels have made handling of DNA fragments longer than 500 kb fairly routine in many laboratories. The successful chemical synthesis and assembly of megabase-size-range DNA molecules and their transplantation into a bacterial cell now opens up several applications5. The nearest term application is to design and create mycoplasma genomes that possess additional genes or gene clusters within the sequence. Combinations of genes within such genomes would be instantly testable for action of the inserted genes. Thus, the work of the JCVI group should facilitate advances in the understanding of mycoplasma biology and genetics. In the longer term, if this synthetic biology approach can ultimately be extended to other organisms—limitations on the size
Design of M. mycoides genome
De novo chemical synthesis of DNA oligonucleotides spanning the entire genome
Assembly of synthetic intermediates in E. coli
Complete genome assembly in S. cerevisiae
Genome transplantation to M. capricolum Figure 1 Simplified protocol used to produce Mycoplasma mycoides JCVI-syn1.0 using a chassis derived from M. capricolum and a 1.08 Mbp variant of the M. mycoides genome designed to carry distinguishable ‘watermark’ sequences. Note that the step involving assembly of the contiguous fragment intermediates in E. coli can be bypassed by direct assembly in S. cerevisiae mediated by the yeast’s genetic repair and recombination systems.
of the genome that can be synthesized and/ or assembled notwithstanding—the approach might also facilitate genetic analysis of otherwise intractable systems. At present, it is unclear whether the mycoplasma-based chassis will also be amenable as a recipient of other synthesized genomes unrelated to mycoplasma species. In this respect, it is noteworthy that the Mycoplasma genitalium system, which was the original focus of the JCVI group2, has not yet been reported to successfully reboot in the M. capricolum system. Thus, although Gibson et al.1 have provided a proof of concept, many questions remain; for example, will the approach extend to more robust bacterial species with larger genomes and more complex restriction enzyme systems? And ultimately, will it be possible to dispense with bacterial chassis and employ instead synthetic chassis based on membrane vesicles? One immediate way in which the current system could have value is to investigate the concept of the mycoplasma genome
volume 28 number 7 JULY 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
ne w s and v i e w s as a minimal genome with a minimal set of genes7. Growth of M. genitalium (which has an 0.58-Mb genome) is much slower than M. mycoides (with a 1.08-Mb genome) or other mycoplasmas with larger genomes. The present protocol described by Gibson et al.1 thus provides a useful system to understand how smaller sets of genes in different sets of combinations are essential for growth. If genes from other bacteria can be codon optimized to work in M. capricolum, ultimately the approach may also prove useful in assessing the functions of nonculturable microorganisms that are abundant in nature8, for which we have growing sequence information. Indeed, one day it may be possible to use the system to study whole or partial plant or mammalian chromosomes. Conversely, rather than investigating minimal synthetic genomes, it should also be possible to amend new genes to existing ‘natural’ genomes to design enlarged bacterial genomes. This raises a fundamental question of what is the largest size possible for a circular bacterial genome. In the Gibson et al.1 protocol, the size of the synthesized genome is dependent on the largest molecule that
yeast can handle. Work in our group9 has begun to explore this in Bacillus subtilis. We have created a hybrid ‘Cyanobacillus’ that stably possesses a 7.7-Mb genome through the addition of the Synechocystis genome (3.5 Mb) to B. subtilis (4.2 Mb)9. Addition of another genome (5.0 Mb) to our Cyanobacillus strain could potentially produce bacterial cells with genomes of 12.7 Mb—larger than the genome of yeast (12.5 Mb), which has one of the smallest genomes of the eukaryotes for which full genome sequences are available. Ultimately, the importance of this breakthrough in synthetic biology will depend on further reductions in the cost of oligonucleotide synthesis, extensions in the size of artificial DNA molecules that can be constructed and demonstration that the principles described by Gibson et al.1 for mycoplasmas can be applied more widely to other bacterial systems (e.g., Escherichia coli) more familiar to the biology and biotech research communities. Unlike the recent advance in which induced pluripotent stem cells were created from a small set of transcription factors10—a breakthrough which was almost immediately widely
adopted across the research community—only a handful of laboratories around the world currently have the expertise and resources to carry out the kinds of experiments described by the JCVI group. The question is—with only a few groups around the world capable of working on this technology—how large a gap needs to be bridged between the mycoplasma genome described by Gibson et al.1 and the many other genomes of biological interest? COMPETING FINANCIAL INTERESTS The author declares no competing financial interests. 1. Gibson, D. et al. Science published online, doi:10.1126/ science.1190719 (20 May 2010). 2. Gibson, D. et al. Science 319, 1215–1220 (2008). 3. Lartigue, C. et al. Science 317, 632–638 (2007). 4. Gibson, D. et al. Nat. Methods 6, 343–345 (2009). 5. Carr, P.A. & Church, G.M. Nat. Biotechnol. 27, 1151– 1162 (2009). 6. Carlson, R. Nat. Biotechnol. 27, 1091–1094 (2009). 7. Glass, J. et al. Proc. Natl. Acad. Sci. USA 103, 425– 430 (2006). 8. Colwell, R.R. & Grimes, D.J. (eds.) Nonculturable Microorganisms in the Environment (ASM Press, Washington, DC, 2000). 9. Itaya, M., Tsuge, K., Koizumi, M. & Fujita, K. Proc. Natl. Acad. Sci. USA 102, 15971–15976 (2005). 10. Takahashi, K. & Yamanaka, S. Cell 126, 663–676 (2006).
Antibiotic leads challenge conventional wisdom Two recent papers in Science1,2 provide surprising twists to the conventional views on how members of two extensively studied classes of molecules exert their effects. Whereas Schneider et al.1 reveal a new mechanism of action for a subset of defensins, Wyatt et al.2 show that certain nonribosomal peptides, a group of secondary metabolites most commonly regarded as antibiotics, might in fact be promising drug targets. Defensins are a family of short antibiotic peptides conserved across the fungal, animal and plant kingdoms3. Whereas most defensins are thought to nonspecifically disintegrate bacterial membranes due to their amphipathic structures, Schneider et al.1 show that the fungal defensin plectasin instead targets cell wall biosynthesis by sequestering the Lipid II precursor of the
bacterial cell wall. At least four other defensins from fungi and invertebrates also inhibit the processing of Lipid II. Plectasin or improved plectasin derivates have previously been shown to be effective against multidrug-resistant strains of Gram-positive bacteria, including methicillin-resistant Staphylococcus aureus. Remarkably, the antibiotic vancomycin—one of the few remaining drugs in our arsenal to treat multidrugresistant Gram-positive infections—also binds and inhibits the processing of Lipid II. But fortunately, the authors observe no cross-resistance between vancomycin and plectasin and speculate that the distinct binding sites of the two molecules make the emergence of cross-resistance unlikely. Identification of a molecular target of plectasin may allow the rational design
nature biotechnology volume 28 number 7 JULY 2010
of improved variants and suggests that more rigorous scrutiny of the mechanisms of other defensins is warranted. Nonribosomal peptides are a major class of bacterial secondary metabolites including—most famously—penicillin. Wyatt et al.2 study the function of a nonribosomal peptide synthetase gene cluster that is conserved universally across Staphylococcus aureus strains, with orthologs in other pathogenic staphylococci. Although the products of the synthetase, two cyclic dipeptides named aureusimine A and B, are not required for growth, the expression of virulence factors is greatly reduced in their absence. Staphylococcus aureus strains
without the nonribosomal peptide synthetase gene cause much milder infections in mice and are unable to colonize spleen, liver and heart. It remains to be seen whether investigation of the functions of other nonribosomal peptides might find similarly promising drug targets. Markus Elsner 1. Schneider, T. et al. Science 328, 1168–1172 (2010). 2. Wyatt, X. et al. Science published online, doi: 10.1126/science. 1188888 (3 June2010). 3. Ganz, T. Nat. Rev. Immunol. 3, 710– 720 (2003).
689
r e s ea r ch highlight s
© 2010 Nature America, Inc. All rights reserved.
HIV-host interaction inhibitor Whereas most antiviral drugs target viral enzymes, such as proteases, integrases or reverse transcriptases, the necessity of host co-factors in viral infection and replication means that the latter also offer targets for drug development. Christ et al. rationally designed an inhibitor that disrupts the binding of the HIV integrase to the LEDGF/p75 transcriptional coactivator, which mediates chromatin binding of the integrase. Using structural information, the authors performed an in silico screen of 200,000 compounds, and selected and experimentally optimized the most promising hits. Their lead compound efficiently inhibited viral replication in vitro, but only moderately affected the catalytic activity of the integrase. Co-crystals corroborated binding of the inhibitor to the LEDGF/ p75 binding pocket in integrase. No inhibition of the binding of LEDGF/p75 to its cellular targets was observed, consistent with the lack of overt toxicity in cell culture. The molecule did not show significant cross-resistance with any anti-HIV drugs tested, including integrase inhibitors. Virus strains resistant to the new antiviral molecule retained susceptibility to azidothymidine (AZT) and the integrase inhibitor raltegravir, as expected from the different modes of action. (Nat. Chem. Biol. 6, 442–448, 2010) ME
Soil metagenome fuels discovery Many microbes in the soil cannot be cultured in the laboratory, which means that their genes have not been experimentally tested for useful functions. Sommer et al. bypass the culturing step by creating libraries of 40- to 50-kb DNA fragments directly from DNA isolated from soil samples. Instead of sequencing the DNA fragments, which is the route taken by traditional ‘metagenomics’ studies, Sommer et al. introduce them into Escherichia coli and screen the modified microbes for beneficial traits conferred by genes encoded by the foreign DNA. The researchers use this approach to identify three genes that confer resistance to the toxic by-products syringaldehyde and 2-fuoric acid, which are generated during the conversion of biomass to fuels. In contrast to existing approaches for microbial engineering that involve optimizing a microbe’s own genes or adding genes from existing libraries of well-characterized genetic ‘parts’, this approach, based on screening of metagenomic libraries, provides a means of rapidly identifying completely new genes with desirable functions. (Mol. Syst. Biol. 6, 360, 2010) CM Written by Kathy Aschheim, Laura DeFrancesco, Markus Elsner, Peter Hare & Craig Mak
690
Rapidly turning over histones Chromatin assembly and reassembly are essential in regulating gene expression and DNA replication, but a facile method for measuring turnover of chromatin-associated proteins has not been available. Deal et al. now describe a technique for doing this, dubbed CATCH-IT for covalent attachment of tags to capture histones and identify turnover. Cells are pulsed with a methionine analog, azidohomoalanine, which can be tagged with biotin by means of an addition reaction with a thiol group. Subsequent passage of isolated and labeled histones on streptavidin affinity columns enables the readout of genome-wide DNA sequences bound up in the newly synthesized histones using tiling arrays. Pulse-chase experiments show that turnover rates are dependent on gene expression levels and further reveal that epigenetic regulatory elements and replication origins are associated with rapid turnover of histones. The researchers measure histone half-lives on the order of 1 to 1.5 hours, far shorter than the cell cycle (~20 h). The fact that histones associated with epigenetically regulated genes are turned over more rapidly than the cell suggests that at least some histone modifications may not be preserved throughout cell division. This brings into question their role in maintaining epigenetic marks. (Science 328, 1161–1164, 2010) LD
Recellularized liver grafts The frequency of liver transplantation, the only effective treatment for hepatic failure, is limited not only by the scarcity of organ donations but also by the large number of donated livers that are unsuitable for transplantation. Uygen et al. report compelling progress towards taking full advantage of these otherwise discarded organs. In a refinement of an approach used to engineer replacement hearts, they flush cells out of the structural extracellular matrix of the liver, retaining the threedimensional structure of the organ and its complex microvasculature. They then repopulate the intricate structural framework with hepatocytes, using portal vein perfusion recirculation. The rejuvenated tissue functions for up to 10 days in culture, as reflected in assays of albumin secretion, urea synthesis and expression of cytochrome P450. Grafts connected to the circulation of live rats support normal liver activity for several hours. Although reconstructing a fully functional liver from the scaffold left by decellularization will require inclusion of the nonparenchymal cells (e.g., sinusoidal endothelial cells, stellate cells, biliary epithelial cells and Kupffer cells), the report provides a strong foundation for efforts to extend the technology to victims of liver disease, which annually claims ~27,000 lives in the United States alone. (Nat. Med., published online 13 June 2010; doi: 10.1038/nm.2170)PH
Antimalaria compound libraries Malaria research has received a fresh infusion of ideas with the publication of two large screens for compounds that kill Plasmodium falciparum, the most deadly of the five Plasmodium species known to cause malaria in humans. The two reports are noteworthy not only for the large number of hits identified, some of which may lead to new antimalarial drugs, but for the authors’ decisions to make their chemical libraries public so as to accelerate drug development by the entire malaria scientific community. Although drug cocktails based on artemisinin now provide effective firstline therapy for malaria around the world, the emergence of resistance to these and previous drugs requires continued research into novel antiparasitic strategies. Of particular interest, many of the hits discovered in the two screens correspond to new targets, including Plasmodium kinases. (Nature 465, 305–310, 311–315, 2010) KA volume 28 number 7 july 2010 nature biotechnology
c o m m e n ta r y
Cloud computing and the DNA data race Michael C Schatz, Ben Langmead & Steven L Salzberg
© 2010 Nature America, Inc. All rights reserved.
Given the accumulation of DNA sequence data sets at ever-faster rates, what are the key factors you should consider when using distributed and multicore computing systems for analysis?
I
n the race between DNA sequencing throughput and computer speed, sequencing is winning by a mile. Sequencing throughput has recently been improving at a rate of about fivefold per year1, whereas computer performance generally follows ‘Moore’s Law’, doubling only every 18 or 24 months2. As this gap widens, the question of how to design higher-throughput analysis pipelines becomes crucial. If analysis throughput does not turn the corner, research projects will continually stall until analyses catch up. How do we close the gap? One option is to invent algorithms that make better use of a fixed amount of computing power. Unfortunately, algorithmic breakthroughs of this kind, like scientific breakthroughs, are difficult to plan or foresee. A more practical option is to develop methods that make better use of multiple computers and processors, whose most recent manifestation is ‘cloud computing’. Parallel computing When many computer processors work together in parallel, a software program can often finish in significantly less time. Such types of parallel computing have existed for decades in various forms3–5. Cloud computing is a model in which users access computational resources from a vendor over the Internet1, such as from the commercial Amazon Elastic Compute Cloud (http:// aws.amazon.com/ec2/) or the academic US Department of Energy Magellan Cloud Michael C. Schatz and Steven L. Salzberg are at the Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA; Ben Langmead is at the Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA. e-mail: [email protected]
Table 1 Bioinformatics cloud resources Applications CloudBLAST24
Scalable BLAST in the cloud (http://www.acis.ufl.edu/~ammatsun/mediawiki-1.4.5/index.php/CloudBLAST_Project)
CloudBurst13
Highly sensitive short-read mapping (http://cloudburst-bio.sf.net)
Cloud RSD19
Reciprocal smallest distance ortholog detection (http://roundup.hms.harvard.edu)
Contrail
De novo assembly of large genomes (http://contrail-bio.sf.net)
Crossbow16
Alignment and SNP genotyping (http://bowtie-bio.sf.net/crossbow/)
Differential expression analysis of mRNA-seq Myrna (B.L., K. Hansen and J. Leek, (http://bowtie-bio.sf.net/myrna/) unpublished data) Quake (D.R. Kelley, M.C.S. and S.L.S., unpublished data)
Quality guided correction of short reads (http://github.com/davek44/error_correction/)
Analysis environments and data sets AWS Public Data
Cloud copies of Ensembl, GenBank, 1000 Genomes and other data (http://aws.amazon.com/publicdatasets/)
CLoVR
Genome and metagenome annotation and analysis (http://clover.igs.umaryland.edu)
Cloud BioLinux
Genome assembly and alignment (http://www.cloudbiolinux.com/)
Galaxy20
Platform for interactive large-scale genome analysis (http://galaxy.psu.edu)
(http://magellan.alcf.anl.gov/). The user can then apply the computers to any task, such as serving websites—or even running computationally intensive parallel bioinformatics pipelines. Vendors benefit from vast economies of scale6, allowing them to set fees that are competitive with what users would otherwise have spent building an equivalent facility and potentially saving all the ongoing costs incurred by a facility that consumes space, electricity, cooling and staff support. Finally, because the pool of resources available ‘in the cloud’ is so large, customers have substantial leeway to elastically grow and shrink their allocations.
nature biotechnology volume 28 number 7 JULY 2010
Cloud computing is not a panacea: it poses problems for developers and users of cloud software, requires large data transfers over precious low-bandwidth Internet uplinks, raises new privacy and security issues and is an inefficient solution for some types of problems. On balance, though, cloud computing is an increasingly valuable tool for processing large data sets, and it is already used by the US federal government (https://apps.gov/), pharmaceutical7 and Internet companies8, as well as scientific labs9 and bioinformatics services (http://dnanexus.com/, http://www. spiralgenetics.com/). Furthermore, several bioinformatics applications and resources 691
commentar y
Cloud cluster
© 2010 Nature America, Inc. All rights reserved.
Unaligned reads
Map to genome
Shuffle into bins
Scan alignments
Cloud storage
Cloud storage
Uplink Internet
Confirmed SNPs
Internet
Figure 1 Map-shuffle-scan framework used by Crossbow. Users begin by uploading sequencing reads into the cloud storage. Hadoop, running on a cluster of virtual machines in the cloud, then maps the unaligned reads to the reference genome using many parallel instances of Bowtie. Next, Hadoop automatically shuffles the alignments into sorted bins determined by chromosome region. Finally, many parallel instances of SOAPsnp scan the sorted alignments in each bin. The final output is a stream of SNP calls stored within the cloud that can be downloaded back to the user’s local computer.
have been developed specifically to address the challenges of working with the very large volumes of data generated by second-generation sequencing technology (Table 1). MapReduce and genomics Parallel programs run atop a parallel ‘framework’, or collection of auxiliary software code, to enable efficient, fault-tolerant parallel computation without making the software developer’s job too difficult. The Message Passing Interface framework3, for example, gives a programmer ample power to craft parallel programs, but it requires relatively complicated software development. Batch processing systems, such as Condor4, are very effective for running many independent computations in parallel but are not expressive enough for more complicated parallel algorithms. In between, the MapReduce framework10 is efficient for many (although not all) programs. It makes programming simpler by automatically handling duties, such as job scheduling, fault tolerance and distributed aggregation. MapReduce was originally developed at Google (Mountain View, CA, USA) to streamline analyses of very large collections of web pages. Google’s implementation is proprietary, 692
but Hadoop (http://hadoop.apache.org/) is a popular open-source implementation of the MapReduce framework that is maintained by the Apache Software Foundation. Programs based on Hadoop or MapReduce comprise a series of parallel computational steps (Map and Reduce), interspersed with aggregation steps (Shuffle). Despite its simplicity, MapReduce has been successfully applied to many largescale analyses within and outside of DNA sequence analysis11–15. In a genomics context, MapReduce is particularly well suited for common ‘mapshuffle-scan’ pipelines (Fig. 1) that use the following paradigm: 1. Map: many sequencing reads are mapped to the reference genome in parallel on multiple machines. 2. Shuffle: the sequence alignments are aggregated so that all alignments on the same chromosome or locus are grouped together and sorted by position. 3. S can: the sorted alignments are scanned to identify biological events, such as polymorphisms or differential expression within each region.
For example, the Crossbow16 genotyping program leverages the Hadoop implementation of MapReduce to launch many copies of the short-read aligner Bowtie17 in parallel. After Bowtie has aligned the reads (which may number in the billions for a human resequencing project) to the reference genome, Hadoop automatically sorts and aggregates the alignments by chromosomal region. It then launches many parallel instances of the Bayesian single-nucleotide polymorphism (SNP) caller SOAPsnp18 to accurately call SNPs from the alignments. In our benchmark test on the Amazon (Seattle) cloud, Crossbow genotyped a human sample comprising 2.7 billion reads in ~4 h, including the time required for uploading the raw data, for a total cost of $85 (ref. 16). Programs with abundant parallelism tend to scale well to larger clusters; that is, increasing the number of processors proportionally decreases the running time, less any additional overhead or nonparallel components. Several comparative genomics pipelines have been shown to scale well using Hadoop (B.L., K. Hansen & J. Leek, unpublished data; refs. 13,16,19), but not all genomics software is likely to follow suit. Hadoop, and cloud computing in general, tends to reward ‘loosely coupled’ programs where processors work independently for long periods and rarely coordinate with each other. But some algorithms are inherently ‘tightly coupled’, requiring substantial coordination and making them less amenable to cloud computing. That being said, PageRank14 (Google’s algorithm for ranking web pages) and Contrail (a large-scale genome assembler; M.C.S., D.D. Sommer, D.R. Kelley & M. Pop, unpublished data) are examples of relatively tightly coupled algorithms that have successfully been adapted to MapReduce in the cloud. Cloud computing obstacles To run a cloud program over a large data set, the input must first be deposited in a cloud resource. Depending on data size and network speed, transfers to and from the cloud can pose a substantial barrier. Some institutions and repositories connect to the Internet via highspeed backbones, such as Internet2 and JANET, but each potential user should assess whether their data-generation schedule is compatible with transfer speeds achievable in practice. A reasonable alternative is to physically ship hard drives to the cloud vendor (http://aws.amazon. com/importexport/). Another obstacle is usability. The rental process is complicated by technical questions of geographic zones, instance types and which software image the user plans to run. Fortunately, efforts such as the Galaxy project20
volume 28 number 7 july 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
commentar y and Amazon’s Elastic MapReduce service (http://aws.amazon.com/elasticmapreduce/) enhance usability by allowing customers to launch and manage resources and analyses through a point-and-click web interface. Data security and privacy are also concerns. Whether storing and processing data in the cloud is more or less secure than doing so locally is a complicated question, depending as much on local policy as on cloud policy. That said, regulators and institutional review boards are still adapting to this trend, and local computation is still the safer choice when privacy mandates apply. An important exception is the Health Insurance Portability and Accountability Act (HIPAA); several HIPAAcompliant companies already operate cloudbased services21. Finally, cloud computing often requires redesigning applications for parallel frameworks like Hadoop. This takes expertise and time. A mitigating factor is that Hadoop’s ‘streaming mode’ allows existing nonparallel tools to be used as computational steps. For instance, Crossbow uses the noncloud programs Bowtie and SOAPsnp, albeit with some small changes to format intermediate data for the Hadoop framework. New parallel programming frameworks, such as DryadLINQ22 and Pregel23, can also help in some cases by providing richer programming abstractions. But for problems where the underlying parallelism is sufficiently complex, researchers may have to develop sophisticated new algorithms. Recommendations With biological data sets accumulating at everfaster rates, it is better to prepare for distributed and multicore computing sooner rather than later. The cloud provides a vast, flexible
source of computing power at a competitive cost, potentially allowing researchers to analyze ever-growing sequencing databases while relieving them of the burden of maintaining large computing facilities. However, the cloud requires large, possibly network-clogging data transfers, it can be challenging to use and it isn’t suitable for all types of analysis tasks. For any research group considering the use of cloud computing for large-scale DNA sequence analysis, we recommend a few concrete steps. First, verify that your DNA sequence data will not overwhelm your network connection, taking into account expected upgrades for any sequencing instruments. Second, determine whether cloud computing is compatible with any privacy or security requirements associated with your research. Third, determine whether necessary software tools exist and can run efficiently in a cloud context. Is new software needed, or can existing software be adapted to a parallel framework? Consider the time and expertise required. Fourth, consider cost: what is the total cost of each alternative? And finally, consider the alternative: is it justifiable to build and maintain, or otherwise gain access, to a sufficiently powerful noncloud computing resource? If these prerequisites are met, then computing in the cloud can be a viable option to keep pace with the enormous data streams produced by the newest DNA sequencing. ACKNOWLEDGMENTS The authors were supported in part by US National Science Foundation grant IIS-0844494 and by US National Institutes of Health grant R01-LM006845. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. 1. Stein, L.D. Genome Biol. 11, 207 (2010). 2. Moore, G.E. Electronics 38, 4–7 (1965).
nature biotechnology volume 28 number 7 july 2010
3. Dongarra, J.J., Otto, S.W., Snir, M. & Walker, D. Commun. Assoc. Comput. Machinery 39, 84–90 (1996). 4. Litzkow, M., Livny, M. & Mutka, M. in Proceedings of the 8th International Conference of Distributed Computing Systems 104–111 (IEEE, Washington DC, 1988). 5. Dagum, L. & Menon, R. IEEE Comput. Sci. Eng. 5, 46–55 (1998). 6. Markoff, J. & Hansell, S. Hiding in plain sight, Google seeks more power. New York Times (14 June 2006). 7. Foley, J. Eli Lilly on what’s next in cloud computing. Plug Into the Cloud < http://www.informationweek. com/cloud-computing/blog/archives/2009/01/whats_ next_in_t.html> (14 January 2009). 8. Netflix selects Amazon web services to power missioncritical technology infrastructure. Amazon.com (7 May 2010). 9. AWS case study: Harvard Medical School. Amazon Web Services . 10. Jeffrey, D. & Sanjay, G. Commun. Assoc. Comput. Machinery 51, 107–113 (2008). 11. Lin, J. & Dyer, C. Synthesis Lectures on Human Language Technologies 3, 1–177 (2010). 12. Chu, C.-T. et al. Adv. Neural Inf. Process. Syst. 19, 281–288 (2007). 13. Schatz, M.C. Bioinformatics 25, 1363–1369 (2009). 14. Brin, S. & Page, L. Comput. Netw. ISDN Syst. 30, 107–117 (1998). 15. Matthews, S.J. & Williams, T.L. BMC Bioinformatics 11 Suppl 1, S15 (2010). 16. Langmead, B., Schatz, M.C., Lin, J., Pop, M. & Salzberg, S.L. Genome Biol. 10, R134 (2009). 17. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Genome Biol. 10, R25 (2009). 18. Li, R. et al. Genome Res. 19, 1124–1132 (2009). 19. Wall, D. et al. BMC Bioinformatics 11, 259 (2010). 20. Giardine, B. et al. Genome Res. 15, 1451–1455 (2005). 21. Anonymous. Creating HIPAA-compliant medical data applications with AWS. Amazon Web Services (April 2009). 22. Yu, Y. et al. DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. Symposium on Operating System Design and Implementation (OSDI), San Diego, California, 8–10 December 2008. 23. Malewicz, G. et al. in PODC 09: Proceedings of the 28th ACM Symposium on Principles of Distributed Computing 6 (ACM, 2009). 24. Matsunaga, A., Tsugawa, M. & Fortes, J. in Proceedings of the IEEE Fourth International Conference on eScience, 222–229 (IEEE, Washington, DC, 2008).
693
perspective
Proteomics: a pragmatic perspective
© 2010 Nature America, Inc. All rights reserved.
Parag Mallick1,2 & Bernhard Kuster3,4 The evolution of mass spectrometry–based proteomic technologies has advanced our understanding of the complex and dynamic nature of proteomes while concurrently revealing that no ‘one-size-fits-all’ proteomic strategy can be used to address all biological questions. Whereas some techniques, such as those for analyzing protein complexes, have matured and are broadly applied with great success, others, such as global quantitative protein expression profiling for biomarker discovery, are still confined to a few expert laboratories. In this Perspective, we attempt to distill the wide array of conceivable proteomic approaches into a compact canon of techniques suited to asking and answering specific types of biological questions. By discussing the relationship between the complexity of a biological sample and the difficulty of implementing the appropriate analysis approach, we contrast areas of proteomics broadly usable today with those that require significant technical and conceptual development. We hope to provide nonexperts with a guide for calibrating expectations of what can realistically be learned from a proteomics experiment and for gauging the planning and execution effort. We further provide a detailed supplement explaining the most common techniques in proteomics. Proteomics1 provides a complementary approach to genomics technologies by en masse interrogation of biological phenomena on the protein level. Two transforming technologies have been critical to the recent, rapid advance of proteomics: first, the emergence of new strategies for peptide sequencing using mass spectrometry (MS), including the development of soft ionization techniques, such as electrospray ionization (ESI) and matrix-assisted laser desorption/ionization (MALDI); and second, the concurrent miniaturization and automation of liquid chromatography. Together these technologies enable the measurement and identification of peptides at a rate of thousands of sequences per day2,3 with better than femtomole sensitivity (10−15 mol, or subnanogram)4 in complex biological samples. Early excitement about the potential for proteomics (Supplementary Glossary) to transform biological inquiry has been tempered by the discovery that the enormous molecular complexity and the dynamic nature of proteomes (Supplementary Glossary) pose much larger 1University
of Southern California Center for Applied Molecular Medicine, Departments of Medicine and Biomedical Engineering, Los Angeles, California, 2 USA. Department of Chemistry & Biochemistry, Univeristy of California, Los Angeles, Los Angeles, California, USA. 3Chair of Proteomics and Bioanalytics, Technische Universität München, Freising-Weihenstephan, Germany. 4Center for Integrated Protein Science Munich, Munich, Germany. Correspondence should be addressed to P.M. ([email protected]) or B.K. ([email protected]). Published online 9 July 2010; doi:10.1038/nbt.1658
nature biotechnology volume 28 number 7 july 2010
hurdles than encountered for either genome or transcriptome studies. In particular, issues related to splice variants, post-translational modifications (PTMs), dynamic ranges (Supplementary Glossary) of copy numbers spanning ten orders of magnitude, protein stability, transient protein associations and dependence on cell type or physiological state have limited our technical ability to characterize proteomes comprehensively and reproducibly in a reasonable time5. Despite the hurdles, after 15 years of evolution, proteomic technologies have significantly affected the life sciences and are an integral part of biological research endeavors (Supplementary Fig. 1). At present, the field of proteomics spans diverse research topics, ranging from protein expression profiling to analyzing signaling pathways to developing protein biomarker assay systems. It is important to note that within each area, distinct scientific questions are being asked and, therefore, distinct proteomic approaches may have to be applied; these approaches vary widely in their versatility, technical maturity, difficulty and expense. Consequently, we must recognize that some biological questions are much harder to answer by proteomics than others. Here, we review biologically directed MS-based proteomics, focusing on which parts are routinely working, which applications are emerging and promising, and which paradigms still require significant future investment in technology development and study design. Getting organized The catalog of proteomics experiments contains a wide diversity of techniques and approaches. In this section, we clarify the naming of these approaches. Proteomics experiments are foremost divided by objective into either discovery or assay (Fig. 1). Both objectives have strong scientific rationale, but they come with very different study requirements and technical challenges. Proteomic assay experiments typically seek to quantify a small, predefined set of proteins or peptides, whereas discovery experiments aim to analyze larger, ‘unbiased’ sets of proteins (see Supplementary Techniques) for a deeper discussion of ‘unbiased’ proteomics). A typical example of an assay experiment would be the measurement of the levels of cardiac troponins in human plasma samples6. Such experiments are often called ‘targeted’, ‘restrictive’ or ‘directed’ proteomics’ studies, and the analytical approach must typically address challenges such as data variation and sample throughput. Within discovery proteomics, we distinguish among comprehensive, broad-scale and focused approaches because these distinctions have a large influence on how a biological question is approached technically. Comprehensive approaches are typically qualitative in nature and aimed at enumerating as many components of a biological system as possible. For example, the Human Proteome Organization (HUPO) Plasma Proteome Project (PPP) aims to identify every possible protein and peptide in human plasma. Such experiments can span years and require 695
pe r specti v e accomplish. If, for example, the purpose of an experiment is to identify the components of a protein complex, it is unreasonable to expect that the analysis will also uncover Assay for the phosphorylation status of all proteins Discovery of new entities Objective known entities and their stoichiometries (Supplementary Glossary) at the same time. The ability to conduct a successful and subRestrictive or Scope Comprehensive Broad scale Focused stantial proteomic study also heavily depends targeted on the local or regional research infrastructure environment. Core facilities have been established in many places to give scientists Global Selected access to mainstream proteomic technologies and applications (for example, protein identification). Even so, more sophisticated appliHUPO Expression PTM Protein Biomarker candidate Example cations requiring specialized technologies or PPP profiling discovery complexes verification particular practical expertise (for example, top-down sequencing of intact proteins or ion Peptide sequencing, database searching SRM or MRM mobility measurements of glycosylated proApproach Peptide quantification Peptide identification and quantification tein isoforms) may only be available through collaboration with expert laboratories. In our view, much more effort needs to be expended Figure 1 Conceptual organization of proteomic experiments. We broadly divide the objectives of proteomics into discovery and assay experiments. The scope of these experiments can range from very in helping biologists understand proteomic narrow (few proteins) to comprehensive (all proteins). A small set of examples is shown here, along with technologies (and in helping technologists to the technology used to study them. understand more of the biology) so that the right experiment can be designed, meaningful conclusions can be drawn from the data, input from many labs7. In contrast, broad-scale experiments attempt to and the appropriate follow-up experiments can be initiated. Despite globally or selectively sample a large, but not necessarily complete, frac- significant investments in people and infrastructure over the past tion of the expressed proteome (for example, the phospho-proteome) decade, access to the technology and special expertise still constitutes and are commonly used as profiling tools to measure qualitative and a substantial bottleneck. In this Perspective, we place biologically motivated proteomics in quantitative changes in a system in response to perturbation or differences in genetic background8,9. The identification of several thousand context by detailing components of each of the columns in Figure 2. proteins or phospho-peptides10 may also require days to weeks of data As a comprehensive treatment of each topic is not possible, some topacquisition and analysis time but can be shouldered by any well funded ics are thoroughly discussed and the others only mentioned briefly. It laboratory. Focused approaches, such as the identification of proteins is beyond the scope of this Perspective to cover aspects of structural present in a mammalian protein complex, restrict their scope from the biology that are often discussed in the context of proteomics. Instead, start by copurifying relatively few interacting proteins. The challenge the interested reader may refer to reviews published on this topic13,14. in these experiments is not complexity or dynamic range but the related The guiding thoughts within each section of this article are the followchallenges of either the detection sensitivity or the large-scale sample ing: given a biological question, what are the specific challenges and generation required to measure interaction partners, which may be of which proteomic methods may be able to address them; what methextremely low cellular abundance11,12. ods are still experimental, but may emerge over the next decade; and Many, but not all, conceivable biological questions can be what are reasonable expectations for the outcomes of a given experiapproached through proteomic experiments. In Figure 2, we con- ment? A technical supplement to this Perspective (Supplementary trast the technical expertise required to implement and execute a Techniques) briefly explains the core proteomic technologies listed proteomic inquiry with the sample complexity (that is, the complex- in Figure 3. In addition, definitions of important proteomics and ity of the biological system being interrogated). Simply put, experi- MS terms (Supplementary Glossary), technical details of protein ments at the upper left of the chart are straightforward; those at the identification by MS (Box 1), and frequently asked questions (Table 1) bottom right are difficult or under development. This chart is critical provide more clarity and simplify reading. In Figure 4, we give a for understanding the effort involved in planning and conducting concrete example of a quantitative proteomics workflow drawing on a study using proteomics and for setting realistic expectations on elements from Figure 3. likely results. Success in a proteomic study is enabled and confined by the biological system (for example, do the cells actually respond Protein analysis to stimulus?), the study design (for example, are all the appropri- The classic tasks of characterizing the size, identity, presence of PTMs ate controls and statistics in place?), the available technology (for and purity of a single protein isolated from natural or recombinant example, does it deliver the required proteome coverage, sensitivity, sources draws on decades of experience in protein chemistry and is accuracy (Supplementary Glossary)) and, finally, the ability to per- broadly accessible to scientists through core facilities or commercial form hypothesis-driven follow-up experiments required to transform service providers. Many of the tools developed for protein characproteomic information into biological knowledge. Shortcomings terization are also frequently used on a broader scale in proteomic in any of these areas will significantly impair success, and clearly, workflows. Thus, although previously described as ‘protein characterexpectations must be measured against what the study can actually ization’, some protein characterization techniques are now referred to
© 2010 Nature America, Inc. All rights reserved.
Proteomics
696
volume 28 number 7 july 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
pe r specti v e as proteomics. We do not cover this area in detail, but instead touch on key points that also apply to later sections. In protein characterization, what can and cannot be done depends primarily on technical factors, such as available sample amounts, purity, solubility and stability of the material. Using modern mass spectrometers (Supplementary Glossary), the mass of an intact protein can be determined with an accuracy (Supplementary Glossary) of better than 0.01% and can often be used to confirm the integrity of the isolated protein. MS can also be used to assess the purity of a protein preparation, as contaminating proteins can be detected at <5% abundance. This is important in the production of therapeutic proteins and in preparation of samples for structural studies by nuclear magnetic resonance (NMR) or X-ray crystallography. Very large (say, >150 kDa) and/or poorly soluble proteins can present a challenge because the detection efficiency of mass spectrometers rapidly degrades with increasing mass and the presence of detergents and salts can suppress the mass spectrometric signal or interfere with chromatography. In such cases, the identity of a protein can be confirmed by sequencing proteolytic fragments either by MS or by classical Edman degradation. Albeit far less sensitive than MS, the latter approach offers a simple route to determination of the sequence of the protein’s N terminus. The presence and sites of PTMs on a single protein can also be generally analyzed by MS-based proteomics because many of the >200 described PTMs alter the mass of a protein in a predictable fashion15. Even so, robust protocols are as yet available for relatively few low molecular weight PTMs, such as phosphorylation, acetylation and methylation16. Protein oxidation can also be readily detected by MS, but it is generally impossible to distinguish a biologically important oxidation event from an experimental artifact. Important PTMs such as ubiquitinylation17 and glycosylation18 are difficult to analyze, even on an isolated protein, because the modification may exist in multiple or combinatorial numbers and can lead to molecular branching of the otherwise linear protein sequence. This may require the application of a more specialized MS platform, such as electron transfer dissociation (ETD) and infrared multiphoton dissociation (IRMPD). Further challenges can arise from the necessity to cover the entire protein sequence to ensure that no potential site has been missed. This can often be addressed by using several alternative proteases to generate complementary protein fragments for analysis by MS, but a significant proportion of all proteins seem to be completely refractory to any of the tried approaches. Determining the stoichiometry (Supplementary Glossary) of PTM at a given site is still challenging—even for a single isolated protein. The physicochemical properties of the modified and unmodified proteins or peptides are often vastly different, so that there is no unambiguous direct way to measure stoichiometry. Instead, one often must resort to indirect measures—for example, by chemically or enzymatically removing the PTM from the protein or peptide and then comparing the quantities of the unmodified peptide or protein before and after the transformation19–21. An alternative method for this purpose is the use of stable isotope (Supplementary Glossary) labeling with exogenously introduced analytical standards of precisely known quantities (absolute quantification, or AQUA22). Such standards have so far been generated for only very few PTMs (notably phosphorylation23) and, for economic reasons, are now mostly used to address specific questions rather than on a broad scale. A more fundamental factor that affects our ability to determine the quantity and stoichiometry of a PTM is the common occurrence of PTM microheterogeneity at a single site. An extreme example is human erythrocyte CD59, which carries more than 120 different asparagine-linked oligosaccharides at a single site24. The analytical task of PTM analysis becomes more complex still when multiple types of modifications are present at the same site or different nature biotechnology volume 28 number 7 july 2010
sites of the protein. A prominent example is the extensive modification of the N-terminal tail of histones by acetylation, methylation and phosphorylation. Using highly specialized MS methods, including ETD and proton transfer reactions (PTR), 74 isoforms of histone H4 have been isolated from differentiating human embryonic stem cells and subsequently characterized25. However, these approaches are not yet routinely available in core facilities. Generating comprehensive and quantitative information on protein modifications is a significant undertaking requiring several experimental approaches, significant amounts of pure starting material (midmicrogram range), special expertise and time. It should therefore only be undertaken if some functional hypothesis can be formulated or these data are required by regulatory agencies. A fundamental issue with the quantitative analysis of multiple PTMs present on a protein is that it is almost impossible to separate all existing protein isoforms (top-down proteomics; Supplementary Glossary), but this is required to estimate the amount of each isoform relative to the total protein amount. Electrophoretic and chromatographic methods in conjunction with high-resolution MS may resolve a substantial number of isoforms26, but even then, identifying the site and stoichiometry of modification remains difficult. In practice, quantitative PTM analysis is mostly performed at the peptide level (bottom-up proteomics; Supplementary Glossary). Here, special care must be exercised because variations in protein digestion, peptide recovery and peptide detection may distort the quantification results, and measurement of total protein is often difficult. We therefore recommend using analytical protein and peptide standards whenever possible, to account for systematic bias, and confining the analysis to one PTM at a time27. MS-based peptide sequencing can also be used to detect proteins resulting from splice variants and single-nucleotide polymorphisms28. This type of study has rarely been done systematically owing to the requirement for 100% sequence coverage and the difficulty of detection of low-abundance isoforms. With the advent of next-generation DNA sequencing techniques29, we expect proteomics to play a lesser role in this area in the future. Analysis of protein complexes It is by now widely accepted that proteins exert their cellular functions as part of multiprotein complexes30. In the analysis of protein complexes, the contribution of proteomics has been nothing short of phenomenal. Since the groundbreaking mass spectrometric identification of the components of the yeast spliceosome in 1997 (ref. 31), the analysis of protein complexes has uncovered countless important specific as well as global biological phenomena. As quantitative MS methods, such as SILAC (stable isotope labeling in cell culture32; Supplementary Glossary), have been perfected, proteomics has provided a powerful means to distinguish true interactors from abundant contaminants33. Although proteomics has been very successful at determining the composition of complexes, the detailed study of binary protein interactions is still surprisingly difficult by proteomic methods. In part, this results from the general challenge of purifying protein pairs in the presence of other interacting proteins. In vitro surface plasmon resonance or chemical crosslinking experiments are often used, but these techniques suffer from the need for significant quantities of pure protein. As a result, binary protein interactions are still mostly identified by the yeast two-hybrid system, which can be readily automated to enable systematic studies of transient protein-protein interactions34,35. The yeast two-hybrid system is not without issues, however, as the interaction of two exogenous proteins in a yeast nucleus can lead to various artifacts. In the analysis of the molecular composition of protein complexes, proteomics has several advantages. First, affinity purification typically yields moderately complex protein mixtures, a situation that 697
pe r specti v e Cell culture
© 2010 Nature America, Inc. All rights reserved.
Technical expertise
Protein
Biological complexity
Animal model
Human
Protein analysis
Protein complexes
Protein networks
Cell culture models
Translational studies
Population proteomics
In vitro biochemistry
In vitro biology
In vitro biology
In vitro target or marker discovery
In vivo marker discovery In vivo marker discovery or verification
Purity and identity
Pairwise interactions
Interaction screening
Cell composition
Biopsy composition
Genetic variation
Single PTM
Complex composition
Network composition
PTM discovery
Xenograft composition
Marker verification
Quantitative PTM
PTM analysis
Pathway crosstalk
Organelle composition Perturbation analysis
PTM characterization
Multiple PTM
Complex dynamics
Network dynamics
Expression profiling
Cross model analysis
Patient stratification
PTM stoichiometry
Complex stoichiometry
Comprehensive PTM
Protein activity
Biofluid composition
Marker discovery
Splicing, SNPs
Spatial organization
MALDI imaging
Figure 2 Applications of proteomic technologies. For the purpose of organizing the field of proteomics, it is instructive to compare and contrast the many conceivable applications on the basis of the complexity of the biological context versus the technical difficulty of implementing the appropriate technology. Each cell in the table shows an application of proteomics that is discussed in the main text.
is ideally matched by the capabilities of MS to identify proteins in mixtures. Second, interacting proteins can be purified under near physiological conditions from endogenous sources or from cell lines, limiting artifacts. Third, functionally important protein modifications, such as phosphorylation or acetylation, can often be determined in the same context. Finally, with few exceptions, 5–20 proteins are generally present in complexes and can usually be identified by LC-MS/MS after either a solution digest or a one-dimensional sodium dodecyl sulfate (SDS) gel. Protein complexes can be purified in several ways36,37. One approach is to attach an affinity tag to the protein of interest, express it in a cell line and purify the interacting partners by virtue of the tag. The advantage of using tagged proteins is that the tag can be systematically applied to any number of proteins in a particular pathway, including proteins discovered to interact with a certain bait protein. To allow validation of the components found to be in the complex, a reciprocal tagging experiment can be performed. A newly identified interactor is tagged and in turn used for the purification of the same complex. If the same proteins are identified, the interactions are valid. As proteins may be part of more than one complex, results from this type of experiment depend on the abundance of the respective complexes. Disadvantages are that the tag modifies the protein, which may alter its activity. Issues may also arise from overexpression of the tagged protein, but this can often be overcome by tagging the endogenous gene locus38,39 so that the endogenous promoter drives protein expression. The use of antibodies or other protein binders does not suffer from these shortcomings, as they purify the endogenous complex. High-quality antibodies are, however, available only for a limited set of proteins. The biochemical approach aside, the ability to identify interacting proteins by MS depends on two main factors: the abundance of the protein complex and the affinity with which interacting proteins are held together. As modern mass spectrometers offer attomole sensitivity, the former issue can be overcome when sufficient quantities of starting material are used. The latter is harder to address, as the time required to perform an affinity purification biases the results toward submicromolar interactions. In vivo crosslinking with low concentrations of formaldehyde has been used to stabilize transient 698
interactions before purification40, but there are not enough examples in the literature to validate this approach as a generic strategy. Because not all the proteins identified in the types of experiments mentioned above are genuine interactors, validation experiments at different levels are required. A common biochemical approach is to use coimmunoprecipitation of wild-type proteins at basal expression levels. Although coimmunoprecipitation is an independent approach, it suffers from the same issues of abundance and affinity. If the suspected interactor is nonspecifically copurified with a target protein, it will be detected by both coimmunoprecipitation and MS. A recent and elegant biochemical validation approach is a method called protein correlation profiling, in which the quantity of suspected interactors is compared across the different steps of the complex purification scheme41. Only those proteins that show an identical purification profile are genuine members of a complex, whereas all other proteins are (abundant) contaminants. As noted above, a reciprocal tagging experiment may also be used for validation. A common cell-biological approach is then to show cellular colocalization of the interacting proteins. Of course, none of these types of experiments demonstrates biological significance; this may come from experiments showing that the interaction takes place in vivo and is functional. Although the identification of members of stable protein complexes of low cellular abundance is fairly routine, the analysis of PTMs at the protein complex level is possible but difficult42. Variations in biological conditions may lead to changes in the composition, PTM status and activity of protein complexes. To capture this dynamic behavior, the respective biological and proteomic experiments must be modified, and several controls must be performed to ensure that the data can be meaningfully interpreted. First, it must be demonstrated that the biological system from which the proteomic sample is derived actually responds to the stimulus with the expected kinetics, doseresponses or other appropriate criteria (as would be the case for any biologically motivated proteomic experiment). Second, a quantitative MS technique should be used so that the observed changes can be statistically measured rather than assessed by intuition. And third, functional assays should be in place to validate the observed changes. As with static protein complexes, one should only expect to identify volume 28 number 7 july 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
pe r specti v e relatively stable protein interactions as the time scale of the experi- Analysis of protein pathways and networks ment generally does not permit the identification of transient inter- The next level of cellular organization is provided by pathways and actions. Maybe not surprisingly, the dynamics of individual protein networks in which proteins and protein complexes relay signals from complexes are not often studied by proteomic approaches43; other the extracellular space into the cell or distribute information within biochemical and cell biological techniques are often more suitable a cell and its compartments. Much of what was said about protein for this purpose once the proteomic experiment has established the complexes also applies to networks; however, many more proteins are involved in networks than in typical protein complexes. Charting protein components of a complex. One fundamental aspect of protein complex architecture is the a physical network is technically fairly straightforward, and analyzstoichiometry of its constituents. Experiments to determine stoichi- ing dynamic behavior in a global sense by MS has become more ometry are technically very challenging and often require combina- doable as quantitative MS methods become more widely available. tions of biophysical and proteomic approaches44,45. For stable protein However, the functional validation of identified proteins is by no complexes, gel filtration or centrifugation techniques can give indi- means trivial, as cross-talk between pathways can often render the cations of stoichiometry, but the larger the complex gets, the harder results somewhat ambiguous. Proteomic technologies have enabled the systematic charting of data become to interpret. Proteomic techniques are only beginning to be used to determine stoichiometry, but, given the sensitivity of cellular pathways and networks in several model organisms54–56. In MS, we anticipate that proteomics will be important in these types fact, two reports on large-scale protein interaction screens in yeast of analyses in the future. In the few published examples, stable iso- are among the five most highly cited papers in proteomics so far 51,57. tope or fluorescently labeled reference standards of precisely known Technically, such interaction screens take advantage of affinity tagquantities have been used to determine the quantities of members ging of proteins using genetic or molecular biology techniques and the of protein complexes46–48. The most rigorous controls must be used speed and sensitivity of MS. Use of affinity tags rather than antibodies for this type of study because bias must be avoided in purification steps in order to Box 1 Protein identification in mixtures by MS arrive at meaningful numbers. Intact mass Broadly, there are two strategies for protein identification in mixtures: first, mapping measurements of isolated protein complexes strategies that rely predominantly on the accurate mass, retention time, or both to infer will be of utility, but very few laboratories the composition of a mixture; and second, tandem MS approaches, now the most common now have the technical capability to perform (for greater detail, see Supplementary Techniques). MSn refers to sequential MS/MS these experiments49,50. experiments, where n is the number of MS/MS experiments. For MSn approaches, peptides The spatial organization of proteins in a are first selected for fragmentation (in either a targeted or a data-directed manner) inside complex is also of interest. Given that typithe mass spectrometer and then are fragmented by one of several methods (e.g., collisioncal protein complexes are made up of up to induced dissociation (CID) or electron capture detection (ECD)); the mass spectrum of the 5–20 members51, each protein in the suprapeptide fragments is then recorded. It is most common to perform this step only once (that molecular structure cannot physically contact is, conventional MS/MS); however, some studies have shown value in multiple isolation all the other proteins. Supramolecular strucand fragmentation steps (that is, MSn). Typically, the most intense ions are selected for ture determination typically is the domain fragmentation. Dynamic exclusion (Supplementary Glossary) and targeted inclusion lists of biophysical techniques such as X-ray are used to broaden the range of selected species. crystallography, NMR and cryo-electron Once ions have been selected and fragmented, three strategies are used to assign microscopy. Proteomic approaches have a peptide to the ion. The first is database searching (Supplementary Glossary). In this not yet been prominent but may contribute strategy, peptides are generated by an in silico digest of a proteome database and then in the future, given the comparatively small a theoretical mass spectrum is predicted for each peptide. The theoretical spectrum is sample needed for MS. The general idea is to compared with the experimental spectrum and a peptide identity is inferred on the basis crosslink the complex and then to sequence of the best match between the theoretical spectrum and the observed spectrum. In the the crosslinked peptides by MS to establish second approach, de novo sequencing (Supplementary Glossary), peptide sequences are the nearest-neighbor relationships. Although read out directly from fragment ion spectra. In hybrid techniques, short stretches of the conceptually simple, this is technically very peptides are sequenced, and then the rest of the spectrum is matched to existing data. demanding. Chemical crosslinking heavily Though fragmentation-based methods are generally successful, there are several modifies the proteins and may change the limitations. As noted in the main text, the largest limitation is the small number of peptides integrity of the complex. In addition, the selected for sequencing. Many instruments are able to sequence only a subset of the yields of the crosslinking reactions are typihundreds of peaks present in each mass spectrum. In addition, relatively few peptides with cally very low. Finally, the sequencing and fragmentation spectra give rise to high-confidence identifications. This low percentage identification of crosslinked peptides by MS can be attributed to several experimental and computational factors. Computationally, is nontrivial because crosslinking generates matching techniques are most successful with unmodified tryptic peptides. The inclusion branched peptides. Tandem mass spectra of of more modifications greatly increases the false discovery rate, and the larger size of the such peptides often contain information about sample space complicates identification. In addition, gas phase chemistry or ion source both of the sequences, but most database effects can fragment or modify peptides. Finally, for the inference of protein identifications search algorithms are unable to process this from peptide identifications, there is the issue that not all peptides are unique for a single information because they only consider the protein, as close sequence homologs or proteins with similar domains can contain the linear peptide sequences deposited in a prosame peptide sequence (so-called shared peptides). From this so-called peptide inference tein sequence database. As a result of all these problem follows the requirement to ascertain whether protein identifications are made on complications, the examples in the literature the basis of unique or shared peptides. If only shared peptides are identified, a protein are mostly confined to binary protein interacgroup rather than a single protein has been identified. tions or very small protein complexes52,53. nature biotechnology volume 28 number 7 july 2010
699
© 2010 Nature America, Inc. All rights reserved.
pe r specti v e to purify network components means that the strategy is generic (that is, it can in principle be applied to any protein). Tags, such as the Flag peptide (DYKDDDDK or MDYKDDDDK), hemagglutinin, streptavidin, green fluorescent protein (GFP) and TAP (tandem affinity purification: a fusion cassette encoding calmodulin-binding peptide, a tobacco etch virus protease cleavage site and Protein A), and combinations thereof, have been used effectively. GFP is attractive because it enables both the monitoring of protein localization and complex purification. Although not technically demanding, systematic mapping of protein networks on a large or genome-wide scale requires significant technical resources. Thousands of samples must be analyzed by MS to produce a mostly static picture of the physical organization of cells into protein networks. Even larger numbers of samples will be required to capture the dynamic nature of protein networks or to extend analysis to different cell types. This means that genome-wide interaction studies can likely only be undertaken by substantially funded academic consortia or companies. Proteomics has been important in identifying the component parts of smaller networks from all corners of biology. In the design of a proteomics experiment to evaluate a network, consideration should be given to the choice of initial bait proteins. Tagging scaffolding proteins or transcription factors has yielded particularly rich network coverage, whereas tagging of enzymes often results in disappointment because their interactions are generally too transient or too weak to be observed by proteomic methods. Thus, proteomic charting of networks typically provides a physical rather than functional
view of a network. Because of the multitude of possible interactions within and between complexes, as well as the fact that many proteins present in a network have generic cellular function (say, maintaining cell homeostasis), the interpretation of network mapping data needs to be carefully controlled. The extent to which such controls may have to be applied is illustrated by a study in which the tumor necrosis factor-α (TNF-α)–nuclear factor-κB (NF-κB) pathway was mapped in human embryonic kidney (HEK293) cells using 32 TAPtagged proteins11. The initial interaction map constructed from the mass spectrometric analysis of some 250 affinity purifications contained 680 proteins, only 130 of which were not identified in a counter-screen of 250 unrelated TAP purifications. This means that, even for relatively small protein networks, relatively large-scale proteomic analyses may be required for informed selection of new proteins for functional validation. Network mapping is most effective if carried out in a stepwise fashion in which one starts from proteins of well described biology to identify a small number of interaction partners that can be validated using functional assays established for the system under study. In mapping protein interaction networks and pathways, one soon realizes that the pathways are interconnected at many different levels58. Such cross-talk is of great biological importance, as it offers a means to generate functional redundancy, diversity and compensating mechanisms should parts of a pathway become unavailable. To identify pathway cross-talk systematically, one would again start out from a well known protein interaction hub and map protein interactions in its
Table 1 Frequently posed questions in MS-based proteomics Question
Answer
How do I prepare my sample for MS analysis?
High amounts of salts and detergents must be removed before MS analysis. There are many ways of accomplishing this, including protein precipitation, SDS-PAGE and ultrafiltration or dialysis. If in doubt, ask your analytical collaborator.
How much protein do I need for protein identification or quantification?
You can expect to identify and quantify: 1. 10s to 100s of proteins from nanograms of total protein 2. 100s to 1,000s of proteins from micrograms of total protein 3. 1,000 to 10,000 proteins from milligrams of total protein Results strongly depend on the complexity and dynamic expression range of samples. Typically, one-tenth as many proteins are identified from serum than from cell lines or tissues.
How much protein do I need for PTM analysis?
Systematic PTM analysis of a single protein requires microgram amounts of a reasonably pure protein. Proteome-wide shotgun (Supplementary Glossary) PTM analysis requires milligram amounts of protein. For very rare modifications, other requirements may apply.
What protein coverage can I expect to achieve?
This depends on (i) the complexity of the mixture, (ii) the amount of protein in the mixture and (iii) the MS/MS selection and dynamic exclusion criteria (Supplementary Glossary). There is a rough correlation between protein coverage and protein abundance; however, even for simple mixtures or for the most abundant proteins, it is rare to observe >60% coverage unless specific efforts are taken (for example, multiple digestion protocols) to increase coverage. In complex mixture experiments, many lowabundance proteins will be identified by only a single unique peptide.
What proteome coverage can I expect to achieve?
This depends on (i) the amount of protein used for the analysis and (ii) the degree of proteome fractionation. Coverage of 500–1,000 proteins may be achieved by direct LC-MS/MS of proteome digests. Coverage of 1,000–3,000 proteins requires at least one dimension of proteome fractionation on the peptide or protein level (for example, protein fractionation by one-dimensional SDS-PAGE followed by LC-MS/MS, or peptide fractionation by in-solution isoelectric focusing followed by LC-MS/MS). Coverage of >3,000 proteins usually requires multiple dimensions of fractionation on protein and/or peptide level. Note that typically, one-tenth as many proteins are identified from serum than from cell lines or tissues.
Which identifications can I trust?
Three general quality criteria (or combinations) can be applied: 1. Calculation of a global false discovery rate (FDR) for the list of identified proteins. FDRs of <1% are generally accepted. FDRs give information about the general quality of a data set. Most protein identification software packages provide FDR calculation tools. 2. Calculation of the probability that matching a tandem MS spectrum to a peptide sequence is a random event. Random matches of <1%–5% are generally accepted. Peptide probabilities give a quality assessment for each identified protein. Not all protein identification software can perform this probability calculation. 3. For publication in some journals, at least two peptide identifications are required. This is an ad hoc criterion and says very little about data quality.
How does the protein identification list correlate with protein amount?
As a rule of thumb, the abundance of a protein correlates with the number of tandem MS spectra that identify the peptides belonging to a protein. Proteins at the top of the list are therefore generally more abundant than proteins further down on the list. This is a very crude correlation as the relationship between detection efficiencies of different peptides in a proteomic workflow is complex and not well understood. Although it is fairly safe to compare the same protein across different experiments, it is more dangerous to make comparisons of different proteins in the same experiment. (continues)
700
volume 28 number 7 july 2010 nature biotechnology
pe r specti v e
© 2010 Nature America, Inc. All rights reserved.
Table 1 Frequently posed questions in MS-based proteomics (continued) Question
Answer
Where do I cut the list of identified proteins?
Physical presence of a protein may be judged by the criteria described above for protein identification. This does not automatically mean relevance for the experiment performed, as many of the identified proteins may be contaminants, either endogenous (for example, abundant housekeeping proteins) or exogenous (for example, keratins from human skin).
Which quantification approach should I choose?
This strongly depends on the experiment. Simple guides are the following: 1. Metabolic labeling (for example, SILAC 15N) is best for small changes (10–50%) and cell culture systems. 2. Peptide labeling (for example, iTRAQ, TMT, dimethylation) is best for moderate changes (50%–200%), primary tissue protein sources and multiplex experiments (for example, time courses, dose responses). 3. Label-free methods using the MS detector response (for example, extracted ion chromatograms (Supplementary Glossary)) are best for moderate changes (20%–200%) and for comparison of many highly similar experiments. 4. Label-free methods using spectrum counts are best for large changes (>100%) and for comparison of many highly similar experiments. 5. Single or multiple reaction monitoring (SRM or MRM) in conjunction with spiked synthetic standards (AQUA) is best for determining the absolute quantity of a protein in a complex biological matrix (for example, serum).
What fold change can I trust in quantitative experiments?
Any observed change should bear a statistical measure of variance to define the changes that can be trusted. This may be computed for every protein on the basis of the number of available data points (for example, number of peptides per protein, amplitude of MS response, technical and biological replicates). Several free and commercial software packages have become available, but many proteomics laboratories still struggle with quantification statistics.
How reproducible are the results for protein identification?
Generally, reproducibility is a function of the complexity of a protein mixture and the number of upstream sample handling steps. For simple protein mixtures and short workflows (for example, immunoprecipitations), reproducibility should generally be better than 80%. For whole proteome analysis or complex proteome fractionation schemes, reproducibility may vary greatly, from 40 to 70%. It should be stressed that not identifying or quantifying a peptide or protein does not necessarily mean that the peptide or protein is not present in a mixture.
How reproducible are the results for protein quantification?
As for protein identification, sample complexity greatly affects reproducibility. Stable isotope labeling methods generally reproduce within 5%–25%, whereas spectrum counting typically shows larger variance.
How long will it take to get the results?
This depends largely on whether the work is done with a core facility or with a research lab. The following turnaround timesfrom sample submission to data reporting are typical for core facilities and research labs: 1. 5–10 working days for simple protein identification and quantification 2. 4–6 weeks for quantitative protein expression profiling 3. 2–6 months for PTM analysis
How much will this cost?
Proteomic analysis is not yet a commodity. Costs vary depending on the collaborator. For commercial and academic service providers, the costs scale with the requirements of time of personnel, cost for reagents and equipment and overheads. Typical figures would be as follows: 1. $50–200 for simple protein identification 2. $500–2,000 for simple PTM analysis 3. $5,000–15,000 for complex PTM analysis 4. $1,000–2,000 for quantitative protein expression profiling
close vicinity, rather than choosing biologically unconnected ‘islands’. Technically, analysis of pathway cross-talk is no more demanding than mapping of protein interactions within a confined network. Even so, validation issues become more acute. For example, confirming the specificity of individual or even relatively few protein–protein interactions becomes a large-scale experiment because of the numbers of candidate proteins. In addition, the under-representation of enzymes in protein interaction studies makes direct functional validation of potential cross-talk events much more difficult. As a result, study of pathway cross-talk may be best approached by a battery of cell biological assays in combination with loss-of-function approaches, such as RNA interference, rather than by proteomics. Clearly, pathways are dynamic, both in their physical makeup and their functional activity, although most published proteomics studies so far have provided static views. Going forward, the quantitative analysis of protein pathways and networks must include perturbation or stimulation experiments to learn about proteins moving in and out of complexes, changes in activation status, and the behavior of the network in general (rather than that of a single protein)59,60. Quantitative proteomic technology has advanced to enable a fairly accurate assessment of the relative changes between different cellular states. Although MS-based approaches are very successful in discovering the members of the network, measuring their dynamic behavior under a multitude of different conditions may be better served by normal or reversed protein arrays, owing to their inherent throughput61–64. Obviously, however, the effort involved in creating global or themed protein arrays nature biotechnology volume 28 number 7 july 2010
is a significant up-front investment, and a protein array strictly speaking does not measure interactions occurring in a cell. As mentioned before, the activity of a signaling pathway or network is often regulated by PTMs, and the techniques of PTM analysis can also be applied in the context of network analysis. Clearly, the comprehensive and simultaneous measurement of PTMs on many proteins is technically difficult, and the regulation mechanisms may be complex, so that analysis of PTM levels may not suffice to describe the behavior of the network or pathway. Still, MS today allows identification of thousands of phosphorylation sites in a quantitative manner and, as such, has made important contributions to our present knowledge of signaling pathways65. Nevertheless, our recommendation for a pathwaywide PTM study would be to focus on one particular PTM at a time and complement proteomic techniques with available PTM-specific antibodies if available. Proteomic measurements are an important part of systems biology (Supplementary Glossary) data pipelines. The challenge here is to provide robust quantitative information so that mathematical models of the behavior of pathways and networks can be developed. So far, most proteomic studies have provided data on relative changes in protein abundance or PTM status in response to some form of biological perturbation. Although this often suffices to describe a pathway phenomenologically, information about the absolute numbers of molecules involved in a process is often required to compute a predictable outcome. Proteomics technologies based on MS are not now able to deliver such information routinely, even for one single 701
pe r specti v e
© 2010 Nature America, Inc. All rights reserved.
pathway, let alone for the flux of information between pathways. But for focused applications (say, a small protein network), targeted analytical approaches such as the multiple reaction monitoring (MRM) technique hold considerable promise for the future66. Cell culture models In vitro, prokaryotic67, and eukaryotic68,69 systems have been widely used to ask questions about the fundamental composition of proteomes and subproteomes (for example, phospho-proteome, mitochondrial proteome or cell-surface proteome) and how those proteomes are altered by genetic changes (for example, deletions or mutations), cell growth (for example, differentiation or cell state transition) or an intervention (for example, growth factor stimulation or drug treatment). Technically, qualitative protein expression profiling for thousands of proteins is no longer particularly difficult. The three principal challenges faced in system-scale analyses are sample purity, complexity and dynamic range. Consequently, the most profiling approaches aim to address all these in some shape or form. Sample purity is affected by contamination from other proteomes. For instance, the bovine or horse proteome from sera used in cell culture media may complicate secretome studies of human cell culture systems. Sample complexity refers to the number of different species within a sample being analyzed. Dynamic range refers to the range of protein concentrations from the least to the most abundant within a sample. Lastly, as very few protocols actually select for proteins, mixtures may contain a significant percentage of lipid, nucleic acid or small molecule contaminants that interfere with protein profiling. One very common approach to reducing a sample’s protein and peptide complexity is fractionation, such as by chromatographic methods. There are several key considerations when using chromatographic methods to partition a mixture before MS analysis. First is sample abundance. If this is severely limited, it may not be possible to use chromatographic methods. Next is analysis time. Chromatographic separation techniques can turn one sample into many and thus significantly increase analysis time and analysis cost. For reference, in a typical study of a cell lysate sample, ~400 proteins based on ~1,000 sequence-unique peptides (Box 1) can be confidently identified with a false discovery rate <5% within a 1–2 h gradient. Unfortunately, the relationship between number of chromatographic fractions and number of identified proteins is not linear70. For example, a typical 20-fraction experiment (requiring days rather than hours of instrument time) is likely to identify on the order of 3,000 proteins instead of the expected 8,000 (20 fractions with 400 proteins per fraction). This is because some analytes fall below the limit of detection of an instrument, but we may also be approaching the limit of expressed proteins in a biological system at a given time. Generally, chromatographic approaches can be applied at either the intact protein or peptide level, and it is not yet clear which fractionation strategy gives the best proteome coverage. A benefit of protein level fractionation (by one- or two-dimensional gels or column chromatography) is that proteins are separated both by mass and by other characteristics, which may distinguish among different protein isoforms. For example, glycosylated versions of a given protein will frequently segregate to different fractions than the parent protein. Another advantage of protein-level methods is the potential reduction in local dynamic range of a sample. However, many chromatographic separation techniques work better at the peptide level, providing better reproducibility and resolution. As a result, combinations of protein (for example, SDS-polyacrylamide gel electrophoresis; SDS-PAGE) and peptide separations (for example, the multidimensional protein identification technology, MUDPIT (Supplementary Glossary), which uses reverse-phase and strong cation exchange (SCX) columns in tandem, or 702
peptide isoelectric focusing or hydrophilic interaction chromatography (HILIC) approaches) are frequently used to reduce proteome complexity and maximize proteome coverage71. Another possible explanation for the limitations imposed by sample complexity and dynamic range is ion suppression, a phenomenon wherein some analytes literally interfere with the ionization of other analytes so that they cannot be detected by the mass spectrometer, even though they are physically present. If one considers the process of ionization as containing a fixed amount of charge to be distributed, and that charge is distributed as a function of both abundance and ionizability, then high-abundance peptides that ionize well take most of the available charge, leaving only a small amount remaining. Probably more so, chemical noise present in a sample (for example, salts, detergents and solvent clusters) often limits dynamic range. The signal of lowabundance species (peptides) may be too weak to exceed the sometimes large signal of the chemical noise; or, conversely, highly abundant ions may saturate some detectors. Despite these issues, recent profiling approaches based on multidimensional chromatography and MS have demonstrated the ability to identify >5,000 proteins expressed over four orders of magnitude of cellular abundance in cellular proteomes72,73, a tenfold increase over what was possible only five years ago74. In addition to proteome coverage, another key consideration in profiling experiment analyses is protein coverage. At first blush, it would seem that if a protein were present in a sample, all of its peptides should be readily observable. For several reasons that we have discussed elsewhere75,76, this is not the case. Instead, in a typical proteomics experiment, only a single peptide is observed for many proteins; the median protein is identified by observation of only three peptides. This not only limits confidence in many of the identified proteins but also in their quantification (discussed further below). One can usually improve protein coverage by simplifying the composition of the mixture (for example, by the fractionation approaches described above). Alternatively, it has been shown that repeatedly analyzing the same mixture can improve coverage, at the expense of measurement time. Such resampling frequently not just increases a protein’s peptide coverage but can also allow identification of 30% or more additional proteins77. A marked improvement can also be obtained by using multiple proteases with different cleavage specificities78. Analysis of PTMs in complex mixtures In vitro cell culture models are also attractive systems for the broadscale discovery of PTMs because one can subject cultured cells to a set of biologically well-controlled experiments. However, largescale PTM expeditions require specialized upstream chromatography approaches that enrich or select for the PTM under investigation, highly sensitive mass spectrometers and sophisticated downstream software tools for the assignment of the modified peptide and the exact site of the modification. Specialized laboratories have identified thousands of phosphorylation sites in many model organisms, ranging from yeast to worms and flies to plants and mammals10,79–83. Such studies produce a rough PTM signature of a biological system rather than of an individual protein. To obtain a full picture of the phosphorylation status of a particular protein or group of proteins, more focused experiments have proven successful. For instance, immunoprecipitation of phosphotyrosinecontaining proteins or peptides have led to interesting discoveries in diverse applications84,85. Despite substantial advances in this area, however, several issues remain for global PTM discovery. It remains difficult to study glycosylation (owing to heterogeneity of oligosaccharide types and structures), ubiquitinylation (owing to branching of the protein) and very transient PTM events. volume 28 number 7 july 2010 nature biotechnology
pe r specti v e
Metabolic labeling (SILAC, 15N) Chemical protein labeling (ICPL) Chemical peptide labeling (ICAT, cICAT, iTRAQ, TMT, methylation, esterification) Enzymatic peptide labeling (18O) Absolute quantification (AQUA, QconCAT) Label-free (spectrum counting, emPAI, APEX, XICs, expression) Single/multiple reaction monitoring (SRM, MRM) Express, Pepper, MSQuant, MaxQuant, itracker, TPP, CPAS, TOPP, ProteoWizard
© 2010 Nature America, Inc. All rights reserved.
Protein quantification Database searching De novo sequencing Peptide mass fingerprinting (PMF) Accurate mass and time tag (AMT) Mascot, Sequest, X!Tandem OMSSA, Phenyx, Spectrum Mill PEAKS, PepNovo, InsPecT, PTM Score, A-Score, ModifiComb
Protein identification
Mass spectrometry Electrospray ionization (ESI) Matrix-assisted laser desorption/ionization (MALDI) Time-of-flight MS (TOF) Ion trap MS Quadrupole MS Orbitrap MS Fourier-transform ion cyclotron MS (FT-ICR) Liquid chromatography MS (LC-MS) Imaging MS Ion mobility MS Tandem mass spectrometry (MSn) Collision-induced dissociation (CID) Electron-transfer dissociation (ETD) Electron-capture dissociation (ECD) Post-source decay (PSD)
Biopsy Biofluid Laser-capture microdissection Cell sorting (FACS) Primary cell culture Stable cell line culture Free-flow electrophoresis Gradient centrifugation
Sample extraction
Protein fractionation
Peptide fractionation
1D and 2D gel electrophoresis Isoelectric focusing Capillary electrophoresis Column chromatography Immunoprecipitation Pulldowns with tagged proteins Cell surface labeling Active site labeling Affinity depletion Phosphoflow Glycocapture
Ion-pairing reversed phase (RP-HPLC) Isoelectric focusing (IEF) Strong cation exchange (SCX) Weak anion exchange (WAX) Hydrophilic interaction (HILIC) Immobilized metal affinity (IMAC) Titanium dioxide, zirconium dioxide Lectin affinity chromatography Immunoprecipitation Biotinylation Fractional diagonal chromatography
Figure 3 Technologies for proteomics. This figure depicts the proteomic workflow from sample extraction to protein quantification. For each step in the workflow, the text boxes give examples of commonly used techniques, many of which may be combined in any one study. All featured techniques are discussed in detail in the Supplementary Techniques. Further details related to the terms database searching, de novo sequencing, peptide mass fingerprinting, electrospray ionization and matrix-assisted laser desorption/ionization can be found in the Supplementary Glossary. FACS, fluorescenceactivated cell sorting; 1D, one-dimensional; 2D, two-dimensional.
Analysis of organellar protein compositions A logical extension to the mapping the proteome of cells is the analysis of the protein complement of organelles or other large cellular structures. Organellar proteomics86,87 links individual proteins to the functional context of a particular organelle (for example, drug receptors at the cell surface, transport mechanisms by vesicles or cell fate decisions at mitochondria). These experiments obviously depend on isolation of a particular organelle before identification of its protein constituents. The methods for the isolation of many organelles (mainly based on the sedimentation characteristics during centrifugation) are quite well established, and a combination of enzyme assays, western blotting and electron microscopy can be used to assess the enrichment and integrity of a preparation. Still, the field is plagued by controversies over whether or not certain proteins are genuine constituents of organelles or mere nature biotechnology volume 28 number 7 july 2010
‘contaminants’. The use of stable isotope labeling and quantitative MS can offer insight here, as hundreds of proteins can be followed throughout the purification scheme and only those proteins showing the same quantitative behavior during purification are part of the same cellular structure41,88. Likewise, targeting the cell-surface proteome by protein chemistries specific for certain structures (for example, glycosylation) in combination with quantitative MS has led to determination of the proteomic content of this important organelle89–91. In vitro quantitative expression profiling Although MS has been very successfully used in the analysis of proteins in complex mixtures, these studies have been so far dominated by qualitative results. This situation is slowly changing as quantitative measurement methods are becoming more widely available. 703
© 2010 Nature America, Inc. All rights reserved.
pe r specti v e Quantitative expression profiling aims not just to identify the components of a proteome but also to compare two or more distinct proteomes to identify proteins with altered expression levels or post-translational forms in response to a given stimulus. Broadly, there are two primary approaches92: so-called label-free quantification methods and those that use stable isotope labeling of proteins or peptides. The former is attractive because one can in principle perform comparisons across many samples. The strength of the latter is its superior accuracy of quantification, albeit only for a small number of samples (up to eight). Each quantification approach compares the peptide signals observed in samples prepared under different conditions (for example, cells undergoing normal growth compared with cells treated with a therapeutic agent). Historically, proteomics has been most successful at relative quantification—determination of a ratio between a protein’s concentration in one sample versus that in another. Absolute protein quantification approaches do exist, but they typically require the time-consuming and costly development of reference materials and assay conditions for each of the proteins of interest. The simplest quantification techniques are the spectral counting (Supplementary Glossary) approaches (one variant of label-free quantification), which infer the abundance of a protein using the number of distinct peptides observed and/or the number of times a peptide from a protein is sequenced in an experiment. These approaches rely on the empirical observation that peptides from more abundant proteins are more likely to be sequenced and identified than peptides from less abundant proteins. Recently, counting approaches have demonstrated a dynamic range approaching six orders of magnitude93, but several experimental conditions, including the selection criteria for picking a peptide for sequencing, can skew data. For example, MS acquisition regimes using ‘inclusion lists’, wherein only peptides from a predetermined list are sequenced (for example, in experiments probing a particular set of proteins from a pathway) are incompatible with counting approaches. In addition, if multiple ‘dynamic exclusion’ criteria (criteria by which peptides are excluded from further sequencing once they have been selected once by the mass spectrometer) are used, data sets may not be readily comparable. In addition, digestion artifacts and the variability of peptide ionization can make data unreliable. Furthermore, if only a few peptides are observed for a given protein (as is commonly the case), quantification accuracy decreases significantly. As with all label-free approaches, variation in sample handling can affect the reliability of estimates of protein relative abundance. Counting approaches are not exceptionally sensitive to small changes in abundance and cannot provide information about the change in abundance of a peptide relative to a protein, such as frequently arises by truncation or modification of a protein. When using the spectral counting technique, results can be computed in any of several ways. The simplest reports the average of ratios94, and using an intensity threshold can help to minimize the noise-based bias95. More reliable results are achieved when computing the ratios on the basis of the intensityweighted average or on the sum of all the observed spectra94,96 or when using linear regression analysis97. An alternative label-free quantification technique compares the mass spectrometric intensity of each peptide in each of the experiments98. Peak intensity is a more direct measure of abundance than is the count of peptide identifications and thus offers some advantages (for example, linearity and accuracy). Unfortunately, this is as yet beyond the reach of most laboratories owing to the stringent requirements for MS quality assurance measures, as well as a lack of sophisticated software that can normalize for experimental variables introduced by peptide chromatographic drift between experiments. As more effort is put into building these software tools99–106, this form of label-free quantification can be expected to become much more widely used. 704
Use of isotopic labels, wherein samples are labeled either biosynthetically (as in SILAC) or, after isolation, chemically (as in isotope coded affinity tag (iCAT) or isobaric tags for relative and absolute quantification (iTRAQ) approaches) to create populations of peptides that are either isotopically light or isotopically heavy, provides more reliable quantification than label-free methods. When light and heavy samples are mixed and then measured in a mass spectrometer, the ratios of the intensities of the ions with slightly different masses, but the same chemical properties, can reliably be used for determining relative quantities. The addition of the label allows mixing of samples originating under different conditions for simultaneous analysis. When samples are mixed early in the workflow (that is, before a separation step), little bias is introduced during sample processing, resulting in high reproducibility. Therefore, methods that incorporate the stable isotope label at the protein level have generally higher reproducibility than those that introduce it at the peptide level. Label-based approaches have been shown to have excellent resolving power for quantifying small differences in protein abundance if combined with the appropriate mass spectrometer. For instance, the SILAC technique works best when using instruments with high resolving power, whereas the AQUA technique benefits from the large dynamic detection range of instruments capable of performing single or multiple reaction monitoring (SRM or MRM) experiments (discussed in more detail in ref. 107). As for protein identification, the dynamic range of protein quantification is often limited by the presence of chemical noise and the complexity of the analyzed peptide mixture. In practice, the linear dynamic range of quantification is often limited to 10- to 20-fold. Several factors have to be considered when performing quantitative experiments. When choosing a stable isotope label, it must be determined whether the label alters the physicochemical properties of a peptide. For example, there is minimal impact when using 13C, 15N, or 18O labeling108, but deuterium labeling can be problematic because labeled and unlabeled peptides often differ in their retention time in reversed-phase high-performance liquid chromatography109. If retention times of labeled and unlabeled peptides differ, an extra signal integration step must be used to correct for this. The spectral quality also greatly affects accuracy. Data should be scrutinized when the signal is very low (close to the noise level) or very high (possibly resulting in detector saturation) because both will lead to distortion of the isotope envelope (Supplementary Glossary) intensity and result in inaccurate quantification. Accuracy also depends on the ability of the instrument to discriminate between interfering signals resulting from coeluting peptides of almost the same mass (a particular problem when using labels that are quantified in tandem mass spectra, such as in iTRAQ and TMT (tandem mass tags) techniques). This can be minimized by reducing the sample complexity through fractionation or by computational means110. A complicating factor is that analytes often do not elute in a narrow profile and sometimes even elute into two or more fractions in separate regions111. In vitro activity profiling Activity- and affinity-based approaches are finding application in proteomics because they directly or indirectly focus on protein function and thus add a dimension that has mostly been missing in expression proteomics. Activity profiling was first demonstrated for serine hydrolases112 but has since been applied to other enzyme classes, such as kinases, phosphatases and histone deacetylates113,114. In a typical activity-profiling approach, a small molecule inhibitor that can bind to members of an enzyme class is used as an affinity tool to purify these enzymes from a complex proteome before quantification by MS. This generates an enzyme class–specific expression profile of the underlying biological material that can be used to identify enzymes over- or volume 28 number 7 july 2010 nature biotechnology
pe r specti v e
Translational studies Above, we focused on the qualitative and quantitative characterization of in vitro systems. Here, we extend the discussion to proteomic characterization of in vivo systems. Studies in murine and human systems bear strong similarity to one another, and so the canon of techniques, except where noted, can typically apply to either. Specimen or sample extraction does add potential for introduction of bias that is largely irrelevant in in vitro studies. This is true for analysis of both body fluids and tissue biopsies. Furthermore, biological heterogeneity (genetic background, multiple cell types in organs and host/graft issues) poses significant technical and conceptual challenges. Unsurprisingly, as mice can be genetically identical and maintained on identical diets in near identical environments (for example, adjacent cages with similar temperature and light) their biological heterogeneity is much less than that of humans. The small size of mice can make sample extraction from tissues, such as ovaries, prostate or brain substructures, difficult and can lead to a sample-to-sample heterogeneity. Often, biological heterogeneity will require performing the biomarker discovery phase in a subgroup of proteins (to reduce the false discovery rate) or in cell culture models (molecular phenotyping), with subsequent corroboration in the relevant in vivo situation. One typical study type is the characterization of the protein content of an organ or biopsy sample. Such studies are used to define organ nature biotechnology volume 28 number 7 july 2010
Heavy cells
Labeling
Isolation
Digestion
Retention Rete ention e ntion time
LC-MS/MS
Intensity
Fractionation
Mass/charge
Mass/charge
Quantification
Intensity
underexpressed in healthy versus pathological conditions. Activity profiling enables the identification of the targets of small molecule drugs in a proteome-wide fashion. We envisage activity proteomics playing a significant role in drug discovery as it becomes possible to profile the selectivity of drugs and their mechanisms of action systematically in relevant tissues. An alternative to investigating enzymes is to look at their substrates. For kinases, this can be accomplished by techniques such as global and quantitative phosphopeptide profiling85. This is particularly attractive for studying cancer, as many individual tumor biologies arise from the dysregulation of signaling pathways. Global phosphorylation profiling therefore offers a route to classifying patients into groups on the basis of signaling pathways that underlie the development or progression of the disease115. An important future task will be to link quantitative phosphorylation profiles with the upstream kinases. This is not routinely possible now because the substrate specificities of most kinases are not precisely known. Even so, substrate trapping approaches116 may make these studies possible. The ability to link enzymes and substrates is important, as it will reveal regulatory mechanisms as well as potential therapeutic targets. The analytical hurdles are often not very high for activity proteomics (unless accurate quantitative data is required) because the approach drastically reduces the complexity of the proteome by focusing on a class of proteins. The downside is that synthetic activity probes are often not available. This is because a fair amount of structural data on the catalytic site of an enzyme class is required to design a probe of suitable potency, and the catalytic site must be accessible for a generic inhibitor to purify a class of enzymes. Proteins with highly constrained binding sites will therefore be difficult to target. We believe that once organic chemistry is further integrated into proteomics research, activity-based approaches will become mainstream.
Light cells
Light peptide
Heavy peptide
Retention time
T E C H N
Identification
I
K
Intensity
© 2010 Nature America, Inc. All rights reserved.
Figure 4 Protein identification and quantification by mass spectrometry. A typical proteomic workflow starts by extracting proteins from cells (here metabolically labeled), followed by proteome complexity reduction by fractionation techniques before MS is used to identify and quantify the proteins present in the original sample. Each element in the tubes represents a peptide, with its identically shaped elements originating from the same protein.
200
700
1,200
Mass/charge
proteomes (for example, the liver proteome) and to describe how such proteomes are altered by endogenous or exogenous perturbations. Studies have also been done to investigate the impact of wounds on the local proteome or the serum proteome. The challenge of these studies is primarily in sample extraction. For example, the extraction process itself can lead to inflammation and hypoxia, which significantly alters the proteome. In addition, contamination with vasculature, stroma and neighboring tissue (as is often encountered in tumor biopsies) may 705
© 2010 Nature America, Inc. All rights reserved.
pe r specti v e lead to quantitative differences between samples that are a function of differences in sample collection. Generally, replicate analysis of a given tissue can help distinguish biological variability from technical variability. Another approach, mostly used in the study of cancer, has been to use nearby ‘normal’ tissue as a control. Another option is to use cell sorting or tissue microdissection before proteome analysis117–119. However, the low amounts of material available from these techniques or the presence of fixation or crosslinking reagents can markedly limit the desired analysis120. One broad question still under debate is how best to handle biological heterogeneity in discovery experiments. Practically, this issue is often handled by analysis of pooled samples rather than samples from individuals. In pooled samples, effects are averaged. In addition, if resources are a concern, pooling may reduce the amount of instrument time required or allow a deeper fractionation approach and thus a broader look at the proteome. Pooling is sometimes necessary because material collected from a single subject is insufficient for a desired analysis. For example, a single tail bleed from a mouse provides only 50 µl of blood; after depletion, one is left with less than 10 µg of protein, which is insufficient for extensive fractionation. On the other hand, if subpopulations exist in a group of individuals, such signals are likely to be averaged by pooling. A commonly successful technique has been initial discovery in pooled samples to identify dominant effects and then verification and exploration of biodiversity in follow-up studies on individual samples121,122. Xenografts (Supplementary Glossary) or orthotopically implanted materials are extensively used in cancer research. One distinct benefit of these studies is their potential to differentiate proteins generated by the implant from host proteins, both in tissue and in the circulatory system. This benefit is also a complication, as sometimes 30% of tryptic peptides cannot be distinguished as murine or human by sequence. When performing quantification studies, such as regarding the growth of a tumor or its response to therapy, key experimental design questions must also be addressed. For example, how does one synchronize the size of samples extracted from multiple animals? What time points are most appropriate? At those time points, how does the disease or drug burden affect the intended results? For example, a common study comparing treated tumors to controls must consider the size and protein content of the tumor. If a tumor is smaller owing to treatment, what is the optimal way to normalize the samples before comparison? Biofluid analysis The analysis of biofluids is of interest for the discovery of serum- and plasma-borne markers. As with studies of tissue, biofluid analysis poses challenges in terms of sample collection and biological variability. So-called preanalytical variables have been confounding in serum studies because samples allowed to sit for varying amounts of time (from minutes to hours) have radically altered protein compositions. In addition, hemolysis, bacterial protein contamination and degradation are problems. There are also significant technical hurdles related to sample preparation, data reproducibility and protein dynamic range. A variety of chromatographic techniques for addressing sample complexity were described above, and each of these approaches has been used extensively for serum and plasma analysis. In addition, several chromatographic approaches have been developed specifically for serum and plasma111,123. Two techniques broadly used for improving the dynamic range of serum studies are targeted depletion of abundant proteins and selective enrichment of low-abundance proteins. Depletion techniques using antibodies (specific) or immobilized peptide bead libraries (nonspecific)124 have been used to improve the dynamic range of proteomic analysis, as these techniques eliminate the most abundant proteins. Even 706
though 99% of a target protein can be removed using these approaches, however, this may be insufficient when certain proteins (for example, albumin) are eight or ten orders of magnitude more abundant than proteins of interest. Both specific and nonspecific depletion techniques semirandomly deplete off-target proteins125. Consequently, some intersample differences in protein abundance may result from the depletion procedure itself126. Furthermore, proteins such as albumin are natural buffering and carrier agents. Consequently, depletion of these proteins can lead to adverse effects, such as precipitation. In addition to targeted depletion, techniques to broadly or selectively enrich low abundance proteins are becoming popular. An example of broad enrichment is the capture of glycopeptides from serum proteome digests on hydrazide beads91. Recently, immunoprecipitation with antipeptide antibodies and MS have been used to quantify troponin I and interleukin-33 in serum127,128. The concept of using antibodies directed toward tryptic peptides is intriguing and has potential because it combines the advantages of the classic enzyme-linked immunosorbent assay (ELISA), namely selectivity, with the multiplexing capability of a mass spectrometer. There are challenges, including the questions of which peptide to choose for antibody generation, whether a single peptide is sufficient and representative and how one makes sure that plasma or serum digestion can be done reliably. Focusing on the glycosylated part of the proteome can be used in discovery projects, whereas selective enrichment through antibodies constitutes an assay for a particular protein of interest (Fig. 1). As an alternative to discovering markers directly in samples from human subjects, several recent studies have first uncovered proteins in murine models and subsequently verified these findings in human clinical samples. Most of these studies used a combination of depletion and extensive fractionation to overcome dynamic range issues129–134. Such approaches are highly attractive because they can draw on the many murine models of human disease that have been established over the years. Obviously, the known limitations of using rodents as models of human disease apply. Tissue imaging One area of proteomics that is attracting increasing attention, particularly from pathologists, is imaging mass spectrometry (IMS)135,136. In this technique, a MALDI mass spectrometer records spectra from thin tissue sections to produce molecular weight–encoded ‘images’ of the distribution of constituent biomolecules. In contrast to conventional histological staining, IMS acts as a molecular microscope that records the distribution of hundreds of molecular species simultaneously without the need for a priori information about their molecular identity. IMS has proven utility in imaging of small molecules, such as lipids and drug metabolites (in this case, the molecule of interest is known)137,138; it is as yet unclear what the technique can deliver with respect to the discovery of protein biomarkers. This is because it is rarely possible to identify the molecular nature (that is, the protein identity) of a peak in an IMS spectrum. It is of paramount importance to overcome this hurdle as it is not even clear whether differential signals recorded for particular tissue areas indicate the underlying cellular structure or are artifacts of sample preparation such as cell or blood vessel damage. Despite these issues, one IMS protocol, in which HER2 status can be determined directly in breast cancer tissues, has been approved for diagnostic use139. Population proteomics Population proteomics studies have proven difficult, and no protein biomarker discovered using proteomics has yet attained a level of validation accepted by regulatory agencies such as the US Food and Drug volume 28 number 7 july 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
pe r specti v e Administration (FDA; Rockville, Maryland) or the European Medicines Agency (EMA; London). One challenge of population studies relates to the genetic variations among subjects. MS techniques have detected polymorphisms, truncations and splicing events in data repositories, such as peptideAtlas7,140–142 and INSPECT143–146, but the lack of complete coverage of proteins substantially impairs these studies, and genomics methods are at present far more powerful for these analyses. Although several pilot studies have been performed to quantify cardiovascular proteins and general serum proteins across a large number of people6,128,147, the large inter-person variability of proteomes in the background of complex genetic diseases poses enormous challenges to study design, statistical significance and technical viability. As mentioned before, it can be argued that before analyzing cohorts of individuals to discover biomarkers (Supplementary Glossary), the candidate list, regardless of which kind of molecular marker is sought (disease, diagnostic, prognostic, response, stratifying or other) and what molecular nature it may have (protein, peptide, modification), should be built from prior experiments in suitable and more controllable models. Once this information is available, emerging techniques such as MRM might be used to gather the many data points necessary for rigorous biomarker verification and validation. Proteomics is only one out of many pieces in the biomarker puzzle, and other techniques may be much more suitable for particular parts of the discovery process. The verification and clinical applications aside, proteomics for discovering biomarkers in human populations is in early development, and it will be some time before significant results can be expected. But several proof-of-principle studies have been performed, and some of these will hopefully develop into full clinical applications148,149. Conclusions After 15 years of evolution, MS-based proteomics has measurably improved its robustness, sensitivity and usability and is now a routine part of biological inquiry workflows. MS-based proteomics is clearly a versatile tool and will become even more useful as currently novel proteomics approaches mature. Although proteomics technologies can now deliver very high quality data for basic biological research, their utility is most notable when the biological problem can be conceptually confined and experimentally approached in a focused fashion, with relevant discovery controls and extensive post-proteomic follow up. It is critical for the field’s success that proteomics be treated as a component of broader biological studies. As with any experimental technique, the value of proteomics is not related to the price of the instrumentation being used, but instead to the rigor and thoroughness of the overall experimental design. As part of larger studies, there is no doubt that proteomics technology can help ask and answer important biological questions. For example, with the rapid pace of technological improvements, systems-wide profiling experiments are emerging as valuable additions to genomic technologies. Proteomics at the organism level, however, continues to pose significant conceptual and technical challenges. As our ability to deeply profile proteomes becomes more time and cost effective and the general understanding of biological systems is refined, biomarker candidates are likely to surface at increasing rates. For the full utility of proteomics experiments to be realized, improvement in productivity in the discovery phase must be complemented by more rapid and more globally applicable verification approaches than are now available. Though the gap between biologists’ expectations of proteomics and what proteomics can deliver has historically often been wide, we fully anticipate that, through close collaboration, biologists and proteome scientists will be able to bridge this gap and use proteomic technologies to significantly contribute to our understanding of biological systems. nature biotechnology volume 28 number 7 july 2010
Note: Supplementary information is available on the Nature Biotechnology website. ACKNOWLEDGMENTS The authors thank M. Bergman, C. Flinders, K. Kramer and S. Mumenthaler for comments on the manuscript. The work of P.M. was supported by NCI1U54CA143907 and NCI-U54CA119367. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/. 1.
Wasinger, V.C. et al. Progress with gene-product mapping of the Mollicutes: Mycoplasma genitalium. Electrophoresis 16, 1090–1094 (1995). 2. Ducret, A., Van Oostveen, I., Eng, J.K., Yates, J.R. III & Aebersold, R. High throughput protein characterization by automated reverse-phase chromatography/ electrospray tandem mass spectrometry. Protein Sci. 7, 706–719 (1998). 3. Washburn, M.P., Wolters, D. & Yates, J.R. III. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 19, 242–247 (2001). 4. Wilm, M. et al. Femtomole sequencing of proteins from polyacrylamide gels by nano-electrospray mass spectrometry. Nature 379, 466–469 (1996). 5. Aebersold, R. Constellations in a cellular universe. Nature 422, 115–116 (2003). 6. Keshishian, H. et al. Quantification of cardiovascular biomarkers in patient plasma by targeted mass spectrometry and stable isotope dilution. Mol. Cell Proteomics 8, 2339–2349 (2009). 7. Omenn, G.S. et al. Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database. Proteomics 5, 3226–3245 (2005). 8. de Godoy, L.M. et al. Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast. Nature 455, 1251–1254 (2008). 9. Rush, J. et al. Immunoaffinity profiling of tyrosine phosphorylation in cancer cells. Nat. Biotechnol. 23, 94–101 (2005). 10. Olsen, J.V. et al. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127, 635–648 (2006). 11. Bouwmeester, T. et al. A physical and functional map of the human TNF-alpha/ NF-κB signal transduction pathway. Nat. Cell Biol. 6, 97–105 (2004). 12. Muzio, M. et al. FLICE, a novel FADD-homologous ICE/CED-3-like protease, is recruited to the CD95 (Fas/APO-1) death–inducing signaling complex. Cell 85, 817–827 (1996). 13. Heck, A.J. Native mass spectrometry: a bridge between interactomics and structural biology. Nat. Methods 5, 927–933 (2008). 14. Sharon, M. & Robinson, C.V. The role of mass spectrometry in structure elucidation of dynamic protein complexes. Annu. Rev. Biochem. 76, 167–193 (2007). 15. Mann, M. & Jensen, O.N. Proteomic analysis of post-translational modifications. Nat. Biotechnol. 21, 255–261 (2003). 16. Ong, S.E., Mittler, G. & Mann, M. Identifying and quantifying in vivo methylation sites by heavy methyl SILAC. Nat. Methods 1, 119–126 (2004). 17. Denison, C., Kirkpatrick, D.S. & Gygi, S.P. Proteomic insights into ubiquitin and ubiquitin-like proteins. Curr. Opin. Chem. Biol. 9, 69–75 (2005). 18. Zaia, J. Mass spectrometry of oligosaccharides. Mass Spectrom. Rev. 23, 161– 227 (2004). 19. Gerber, S.A., Rush, J., Stemman, O., Kirschner, M.W. & Gygi, S.P. Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc. Natl. Acad. Sci. USA 100, 6940–6945 (2003). 20. Steen, H., Jebanathirajah, J.A., Springer, M. & Kirschner, M.W. Stable isotopefree relative and absolute quantitation of protein phosphorylation stoichiometry by MS. Proc. Natl. Acad. Sci. USA 102, 3948–3953 (2005). 21. Zhang, X., Jin, Q.K., Carr, S.A. & Annan, R.S. N-terminal peptide labeling strategy for incorporation of isotopic tags: a method for the determination of site-specific absolute phosphorylation stoichiometry. Rapid Commun. Mass Spectrom. 16, 2325–2332 (2002). 22. Kirkpatrick, D.S., Gerber, S.A. & Gygi, S.P. The absolute quantification strategy: a general procedure for the quantification of proteins and post-translational modifications. Methods 35, 265–273 (2005). 23. Gerber, S.A., Kettenbach, A.N., Rush, J. & Gygi, S.P. The absolute quantification strategy: application to phosphorylation profiling of human separase serine 1126. Methods Mol. Biol. 359, 71–86 (2007). 24. Rudd, P.M. et al. The glycosylation of the complement regulatory protein, human erythrocyte CD59. J. Biol. Chem. 272, 7229–7244 (1997). 25. Phanstiel, D. et al. Mass spectrometry identifies and quantifies 74 unique histone H4 isoforms in differentiating human embryonic stem cells. Proc. Natl. Acad. Sci. USA 105, 4093–4098 (2008). 26. Siuti, N. & Kelleher, N.L. Decoding protein modifications using top-down mass spectrometry. Nat. Methods 4, 817–821 (2007). 27. Mayya, V., Rezual, K., Wu, L., Fong, M.B. & Han, D.K. Absolute quantification of multisite phosphorylation by selective reaction monitoring mass spectrometry: deter-
707
© 2010 Nature America, Inc. All rights reserved.
pe r specti v e mination of inhibitory phosphorylation status of cyclin-dependent kinases. Mol. Cell. Proteomics 5, 1146–1157 (2006). 28. Desiere, F. et al. Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biol. 6, R9 (2005). 29. Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005). 30. Alberts, B. The cell as a collection of protein machines: preparing the next generation of molecular biologists. Cell 92, 291–294 (1998). 31. Neubauer, G. et al. Identification of the proteins of the yeast U1 small nuclear ribonucleoprotein complex by mass spectrometry. Proc. Natl. Acad. Sci. USA 94, 385–390 (1997). 32. Ong, S.E. et al. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 1, 376–386 (2002). 33. Blagoev, B. et al. A proteomics strategy to elucidate functional protein-protein interactions applied to EGF signaling. Nat. Biotechnol. 21, 315–318 (2003). 34. Rual, J.F. et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173–1178 (2005). 35. Uetz, P. et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000). 36. Bauer, A. & Kuster, B. Affinity purification-mass spectrometry. Powerful tools for the characterization of protein complexes. Eur. J. Biochem. 270, 570–578 (2003). 37. Gingras, A.C., Gstaiger, M., Raught, B. & Aebersold, R. Analysis of protein complexes using mass spectrometry. Nat. Rev. Mol. Cell Biol. 8, 645–654 (2007). 38. Poser, I. et al. BAC TransgeneOmics: a high-throughput method for exploration of protein function in mammals. Nat. Methods 5, 409–415 (2008). 39. Rigaut, G. et al. A generic protein purification method for protein complex characterization and proteome exploration. Nat. Biotechnol. 17, 1030–1032 (1999). 40. Schmitt-Ulms, G. et al. Time-controlled transcardiac perfusion cross-linking for the study of protein interactions in complex tissues. Nat. Biotechnol. 22, 724–731 (2004). 41. Andersen, J.S. et al. Proteomic characterization of the human centrosome by protein correlation profiling. Nature 426, 570–574 (2003). 42. Pflieger, D. et al. Quantitative proteomic analysis of protein complexes: concurrent identification of interactors and their state of phosphorylation. Mol. Cell. Proteomics 7, 326–346 (2008). 43. Andersen, J.S. et al. Nucleolar proteome dynamics. Nature 433, 77–83 (2005). 44. Alber, F. et al. Determining the architectures of macromolecular assemblies. Nature 450, 683–694 (2007). 45. Alber, F. et al. The molecular architecture of the nuclear pore complex. Nature 450, 695–701 (2007). 46. Hochleitner, E.O., Sondermann, P. & Lottspeich, F. Determination of the stoichiometry of protein complexes using liquid chromatography with fluorescence and mass spectrometric detection of fluorescently labeled proteolytic peptides. Proteomics 4, 669–676 (2004). 47. Menetret, J.F. et al. Single copies of Sec61 and TRAP associate with a nontranslating mammalian ribosome. Structure 16, 1126–1137 (2008). 48. Nanavati, D., Gucek, M., Milne, J.L., Subramaniam, S. & Markey, S.P. Stoichiometry and absolute quantification of proteins with mass spectrometry using fluorescent and isotope-labeled concatenated peptide standards. Mol. Cell. Proteomics 7, 442–447 (2008). 49. Hernandez, H. & Robinson, C.V. Determining the stoichiometry and interactions of macromolecular assemblies from mass spectrometry. Nat. Protoc. 2, 715–726 (2007). 50. Lorenzen, K., Olia, A.S., Uetrecht, C., Cingolani, G. & Heck, A.J. Determination of stoichiometry and conformational changes in the first step of the P22 tail assembly. J. Mol. Biol. 379, 385–396 (2008). 51. Gavin, A.C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002). 52. Maiolica, A. et al. Structural analysis of multiprotein complexes by cross-linking, mass spectrometry, and database searching. Mol. Cell. Proteomics 6, 2200–2211 (2007). 53. Sinz, A. Chemical cross-linking and mass spectrometry to map three-dimensional protein structures and protein-protein interactions. Mass Spectrom. Rev. 25, 663– 682 (2006). 54. Ewing, R.M. et al. Large-scale mapping of human protein-protein interactions by mass spectrometry. Mol. Syst. Biol. 3, 89 (2007). 55. Gavin, A.C. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636 (2006). 56. Krogan, N.J. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637–643 (2006). 57. Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002). 58. Kolch, W. Meaningful relationships: the regulation of the Ras/Raf/MEK/ERK pathway by protein interactions. Biochem. J. 351, 289–305 (2000). 59. Schubert, P., Hoffman, M.D., Sniatynski, M.J. & Kast, J. Advances in the analysis of dynamic protein complexes by proteomics and data processing. Anal. Bioanal. Chem. 386, 482–493 (2006). 60. White, F.M. Quantitative phosphoproteomic analysis of signaling network dynamics. Curr. Opin. Biotechnol. 19, 404–409 (2008). 61. Kung, L.A. & Snyder, M. Proteome chips for whole-organism assays. Nat. Rev. Mol. Cell Biol. 7, 617–622 (2006).
708
62. Paweletz, C.P. et al. Reverse phase protein microarrays which capture disease progression show activation of pro-survival pathways at the cancer invasion front. Oncogene 20, 1981–1989 (2001). 63. Speer, R. et al. Molecular network analysis using reverse phase protein microarrays for patient tailored therapy. Adv. Exp. Med. Biol. 610, 177–186 (2008). 64. Zhu, H. et al. Global analysis of protein activities using proteome chips. Science 293, 2101–2105 (2001). 65. Huang, P.H. & White, F.M. Phosphoproteomics: unraveling the signaling web. Mol. Cell 31, 777–781 (2008). 66. Picotti, P., Bodenmiller, B., Mueller, L.N., Domon, B. & Aebersold, R. Full dynamic range proteome analysis of S. cerevisiae by targeted proteomics. Cell 138, 795–806 (2009). 67. Van, P.T. et al. Halobacterium salinarum NRC-1 PeptideAtlas: toward strategies for targeted proteomics and improved proteome coverage. J. Proteome Res. 7, 3755–3764 (2008). 68. King, N.L. et al. Analysis of the Saccharomyces cerevisiae proteome with PeptideAtlas. Genome Biol. 7, R106 (2006). 69. Chen, E.I., Hewel, J., Felding-Habermann, B. & Yates, J.R. III. Large scale protein profiling by combination of protein fractionation and multidimensional protein identification technology (MudPIT). Mol. Cell. Proteomics 5, 53–56 (2006). 70. Malmstrom, J. et al. Optimized peptide separation and identification for mass spectrometry based proteomics via free-flow electrophoresis. J. Proteome Res. 5, 2241–2249 (2006). 71. Hubner, N.C., Ren, S. & Mann, M. Peptide separation with immobilized pI strips is an attractive alternative to in-gel protein digestion for proteome analysis. Proteomics 8, 4862–4872 (2008). 72. Chen, E.I., McClatchy, D., Park, S.K. & Yates, J.R. III. Comparisons of mass spectrometry compatible surfactants for global analysis of the mammalian brain proteome. Anal. Chem. 80, 8694–8701 (2008). 73. Graumann, J. et al. Stable isotope labeling by amino acids in cell culture (SILAC) and proteome quantitation of mouse embryonic stem cells to a depth of 5,111 proteins. Mol. Cell. Proteomics 7, 672–683 (2008). 74. Schirle, M., Heurtier, M.A. & Kuster, B. Profiling core proteomes of human cell lines by one-dimensional PAGE and liquid chromatography-tandem mass spectrometry. Mol. Cell. Proteomics 2, 1297–1305 (2003). 75. Kuster, B., Schirle, M., Mallick, P. & Aebersold, R. Scoring proteomes with proteotypic peptide probes. Nat. Rev. Mol. Cell Biol. 6, 577–583 (2005). 76. Mallick, P. et al. Computational prediction of proteotypic peptides for quantitative proteomics. Nat. Biotechnol. 25, 125–131 (2007). 77. Tabb, D.L. et al. Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. J. Proteome Res. 9, 761–776 (2009). 78. Swaney, D.L., Wenger, C.D. & Coon, J.J. Value of using multiple proteases for large-scale mass spectrometry-based proteomics. J. Proteome Res. 9, 1323–1329 (2010). 79. Gruhler, A. et al. Quantitative phosphoproteomics applied to the yeast pheromone signaling pathway. Mol. Cell. Proteomics 4, 310–327 (2005). 80. Pinkse, M.W. et al. Highly robust, automated, and sensitive online TiO 2-based phosphoproteomics applied to study endogenous phosphorylation in Drosophila melanogaster. J. Proteome Res. 7, 687–697 (2008). 81. Reiland, S. et al. Large-scale Arabidopsis phosphoproteome profiling reveals novel chloroplast kinase substrates and phosphorylation networks. Plant Physiol. 150, 889–903 (2009). 82. Villen, J., Beausoleil, S.A., Gerber, S.A. & Gygi, S.P. Large-scale phosphorylation analysis of mouse liver. Proc. Natl. Acad. Sci. USA 104, 1488–1493 (2007). 83. Zielinska, D.F., Gnad, F., Jedrusik-Bode, M., Wisniewski, J.R. & Mann, M. Caenorhabditis elegans has a phosphoproteome atypical for metazoans that is enriched in developmental and sex determination proteins. J. Proteome Res. 8, 4039–4049 (2009). 84. Lemeer, S. et al. Endogenous phosphotyrosine signaling in zebrafish embryos. Mol. Cell. Proteomics 6, 2088–2099 (2007). 85. Zhang, Y. et al. Time-resolved mass spectrometry of tyrosine phosphorylation sites in the epidermal growth factor receptor signaling network reveals dynamic modules. Mol. Cell. Proteomics 4, 1240–1250 (2005). 86. Au, C.E. et al. Organellar proteomics to create the cell map. Curr. Opin. Cell Biol. 19, 376–385 (2007). 87. Lilley, K.S. & Dupree, P. Plant organelle proteomics. Curr. Opin. Plant Biol. 10, 594–599 (2007). 88. Dunkley, T.P., Watson, R., Griffin, J.L., Dupree, P. & Lilley, K.S. Localization of organelle proteins by isotope tagging (LOPIT). Mol. Cell. Proteomics 3, 1128–1134 (2004). 89. Jang, J.H. & Hanash, S. Profiling of the cell surface proteome. Proteomics 3, 1947–1954 (2003). 90. Wollscheid, B. et al. Mass-spectrometric identification and relative quantification of N-linked cell surface glycoproteins. Nat. Biotechnol. 27, 378–386 (2009). 91. Zhang, H., Li, X.J., Martin, D.B. & Aebersold, R. Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry. Nat. Biotechnol. 21, 660–666 (2003). 92. Bantscheff, M., Schirle, M., Sweetman, G., Rick, J. & Kuster, B. Quantitative mass spectrometry in proteomics: a critical review. Anal. Bioanal. Chem. 389, 1017– 1031 (2007). 93. Griffin, N.M. et al. Label-free, normalized quantification of complex mass spectrometry data for proteomic analysis. Nat. Biotechnol. 28, 83–89 (2009).
volume 28 number 7 july 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
pe r specti v e 94. Saito, A. et al. AYUMS: an algorithm for completely automatic quantitation based on LC-MS/MS proteome data and its application to the analysis of signal transduction. BMC Bioinformatics 8, 15 (2007). 95. Wolf-Yadlin, A., Hautaniemi, S., Lauffenburger, D.A. & White, F.M. Multiple reaction monitoring for robust quantitative proteomic analysis of cellular signaling networks. Proc. Natl. Acad. Sci. USA 104, 5860–5865 (2007). 96. Ono, M. et al. Label-free quantitative proteomics using large peptide data sets generated by nanoflow liquid chromatography and mass spectrometry. Mol. Cell. Proteomics 5, 1338–1347 (2006). 97. Parish, R. Comparison of linear regression methods when both variables contain error: relation to clinical studies. Ann. Pharmacother. 23, 891–898 (1989). 98. Mueller, L.N. et al. SuperHirn - a novel tool for high resolution LC-MS-based peptide/ protein profiling. Proteomics 7, 3470–3480 (2007). 99. Cox, J. et al. A practical guide to the MaxQuant computational platform for SILACbased quantitative proteomics. Nat. Protoc. 4, 698–705 (2009). 100. Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008). 101. Han, D.K., Eng, J., Zhou, H. & Aebersold, R. Quantitative profiling of differentiationinduced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat. Biotechnol. 19, 946–951 (2001). 102. Faca, V. et al. Quantitative analysis of acrylamide labeled serum proteins by LC-MS/ MS. J. Proteome Res. 5, 2009–2018 (2006). 103. Rauch, A. et al. Computational Proteomics Analysis System (CPAS): an extensible, open-source analytic system for evaluating and publishing proteomic data and high throughput biological experiments. J. Proteome Res. 5, 112–121 (2006). 104. Jaffe, J.D. et al. PEPPeR, a platform for experimental proteomic pattern recognition. Mol. Cell. Proteomics 5, 1927–1941 (2006). 105. Park, S.K., Venable, J.D., Xu, T. & Yates, J.R. III. A quantitative analysis software tool for mass spectrometry-based proteomics. Nat. Methods 5, 319–322 (2008). 106. Du, X. et al. A computational strategy to analyze label-free temporal bottom-up proteomics data. J. Proteome Res. 7, 2595–2604 (2008). 107. Domon, B. & Aebersold, R. Three strategies for quantitative proteomics and their use. Nat. Biotechnol. 28, 710–721 (2010). 108. Zhang, R. & Regnier, F.E. Minimizing resolution of isotopically coded peptides in comparative proteomics. J. Proteome Res. 1, 139–147 (2002). 109. Zhang, R., Sioma, C.S., Wang, S. & Regnier, F.E. Fractionation of isotopically labeled peptides in quantitative proteomics. Anal. Chem. 73, 5142–5149 (2001). 110. Zhang, Y. et al. A robust error model for iTRAQ quantification reveals divergent signaling between oncogenic FLT3 mutants in acute myeloid leukemia. Mol. Cell Proteomics 7, 780–790 (2009). 111. Faca, V. et al. Contribution of protein fractionation to depth of analysis of the serum and plasma proteomes. J. Proteome Res. 6, 3558–3565 (2007). 112. Liu, Y., Patricelli, M.P. & Cravatt, B.F. Activity-based protein profiling: the serine hydrolases. Proc. Natl. Acad. Sci. USA 96, 14694–14699 (1999). 113. Bantscheff, M. et al. Quantitative chemical proteomics reveals mechanisms of action of clinical ABL kinase inhibitors. Nat. Biotechnol. 25, 1035–1044 (2007). 114. Cravatt, B.F., Wright, A.T. & Kozarich, J.W. Activity-based protein profiling: from enzyme chemistry to proteomic chemistry. Annu. Rev. Biochem. 77, 383–414 (2008). 115. Rikova, K. et al. Global survey of phosphotyrosine signaling identifies oncogenic kinases in lung cancer. Cell 131, 1190–1203 (2007). 116. Blethrow, J.D., Glavy, J.S., Morgan, D.O. & Shokat, K.M. Covalent capture of kinasespecific phosphopeptides reveals Cdk1-cyclin B substrates. Proc. Natl. Acad. Sci. USA 105, 1442–1447 (2008). 117. Emmert-Buck, M.R. et al. Laser capture microdissection. Science 274, 998–1001 (1996). 118. Lu, Q. et al. Analysis of mouse brain microvascular endothelium using immuno-laser capture microdissection coupled to a hybrid linear ion trap with Fourier transformmass spectrometry proteomics platform. Electrophoresis 29, 2689–2695 (2008). 119. Johann, D.J. et al. Approaching solid tumor heterogeneity on a cellular basis by tissue proteomics using laser capture microdissection and biological mass spectrometry. J. Proteome Res. 8, 2310–2318 (2009). 120. Reimel, B.A. et al. Proteomics on fixed tissue specimens - a review. Curr. Proteomics 6, 63–69 (2009). 121. Faca, V.M. et al. A mouse to human search for plasma proteome changes associated with pancreatic tumor development. PLoS Med. 5, e123 (2008). 122. Harsha, H.C. et al. A compendium of potential biomarkers of pancreatic cancer. PLoS Med. 6, e1000046 (2009). 123. Bandhakavi, S., Stone, M.D., Onsongo, G., Van Riper, S.K. & Griffin, T.J. A dynamic range compression and three-dimensional peptide fractionation analysis platform
nature biotechnology volume 28 number 7 july 2010
expands proteome coverage and the diagnostic potential of whole saliva. J. Proteome Res. 8, 5590–5600 (2009). 124. Righetti, P.G., Boschetti, E., Lomas, L. & Citterio, A. Protein equalizer technology: the quest for a “democratic proteome”. Proteomics 6, 3980–3992 (2006). 125. Brand, J., Haslberger, T., Zolg, W., Pestlin, G. & Palme, S. Depletion efficiency and recovery of trace markers from a multiparameter immunodepletion column. Proteomics 6, 3236–3242 (2006). 126. Seam, N. et al. Quality control of serum albumin depletion for proteomic analysis. Clin. Chem. 53, 1915–1920 (2007). 127. Anderson, N.L. et al. Mass spectrometric quantitation of peptides and proteins using Stable Isotope Standards and Capture by Anti-Peptide Antibodies (SISCAPA). J. Proteome Res. 3, 235–244 (2004). 128. Kuhn, E. et al. Developing multiplexed assays for troponin I and interleukin-33 in plasma by peptide immunoaffinity enrichment and targeted mass spectrometry. Clin. Chem. 55, 1108–1117 (2009). 129. Pitteri, S.J. et al. Integrated proteomic analysis of human cancer cells and plasma from tumor bearing mice for ovarian cancer biomarker discovery. PLoS ONE 4, e7916 (2009). 130. Katayama, H. et al. Application of serum proteomics to the Women’s Health Initiative conjugated equine estrogens trial reveals a multitude of effects relevant to clinical findings. Genome Med. 1, 47 (2009). 131. Faça, V., Wang, H. & Hanash, S. Proteomic global profiling for cancer biomarker discovery. Methods Mol. Biol. 492, 309–320 (2009). 132. Faça, V.M. & Hanash, S.M. In-depth proteomics to define the cell surface and secretome of ovarian cancer cells and processes of protein shedding. Cancer Res. 69, 728–730 (2009). 133. Faça, V.M. et al. Proteomic analysis of ovarian cancer cells reveals dynamic processes of protein secretion and shedding of extra-cellular domains. PLoS ONE 3, e2425 (2008). 134. Hanash, S.M., Pitteri, S.J. & Faça, V.M. Mining the plasma proteome for cancer biomarkers. Nature 452, 571–579 (2008). 135. Caprioli, R.M., Farmer, T.B. & Gile, J. Molecular imaging of biological samples: localization of peptides and proteins using MALDI-TOF MS. Anal. Chem. 69, 4751–4760 (1997). 136. Cornett, D.S., Reyzer, M.L., Chaurand, P. & Caprioli, R.M. MALDI imaging mass spectrometry: molecular snapshots of biochemical systems. Nat. Methods 4, 828– 833 (2007). 137. Hsieh, Y., Chen, J. & Korfmacher, W.A. Mapping pharmaceuticals in tissues using MALDI imaging mass spectrometry. J. Pharmacol. Toxicol. Methods 55, 193–200 (2007). 138. Woods, A.S. & Jackson, S.N. Brain tissue lipidomics: direct probing using matrixassisted laser desorption/ionization mass spectrometry. AAPS J. 8, E391–E395 (2006). 139. Taguchi, F. et al. Mass spectrometry to classify non-small-cell lung cancer patients for clinical outcome after treatment with epidermal growth factor receptor tyrosine kinase inhibitors: a multicohort cross-institutional study. J. Natl. Cancer Inst. 99, 838–846 (2007). 140. Deutsch, E.W. et al. Human Plasma PeptideAtlas. Proteomics 5, 3497–3500 (2005). 141. Deutsch, E.W., Lam, H. & Aebersold, R. PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EMBO Rep. 9, 429–434 (2008). 142. Zhang, Q. et al. A mouse plasma peptide atlas as a resource for disease proteomics. Genome Biol. 9, R93 (2008). 143. Castellana, N.E. et al. Discovery and revision of Arabidopsis genes by proteogenomics. Proc. Natl. Acad. Sci. USA 105, 21034–21038 (2008). 144. Gupta, N. et al. Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes. Genome Res. 18, 1133–1142 (2008). 145. Gupta, N. et al. Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation. Genome Res. 17, 1362–1377 (2007). 146. Tanner, S. et al. InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. Anal. Chem. 77, 4626–4639 (2005). 147. Gerszten, R.E., Carr, S.A. & Sabatine, M. Integration of proteomic-based tools for improved biomarkers of myocardial injury. Clin. Chem. 56, 194–201 (2010). 148. Kentsis, A. et al. Discovery and validation of urine markers of acute pediatric appendicitis using high-accuracy mass spectrometry. Ann. Emerg. Med. 55, 62–70 (2010). 149. Rifai, N., Gillette, M.A. & Carr, S.A. Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat. Biotechnol. 24, 971–983 (2006).
709
perspective
Options and considerations when selecting a quantitative proteomics strategy
© 2010 Nature America, Inc. All rights reserved.
Bruno Domon1 & Ruedi Aebersold1,2 The vast majority of proteomic studies to date have relied on mass spectrometric techniques to identify, and in some cases quantify, peptides that have been generated by proteolysis. Current approaches differ in the types of instrument used, their performance profiles, the manner in which they interface with biological research strategies, and their reliance on and use of prior information. Here, we consider the three main mass spectrometry (MS)-based proteomic approaches used today: shotgun (or discovery), directed and targeted strategies. We discuss the principles of each technique, their strengths and weaknesses and the dependence of their performance profiles on the composition of the biological sample. Our goal is to provide a rational framework for selecting strategies optimally suited to address the specific research issue under consideration. The proteome is more than the mere translation of the protein-coding regions of a genome. Processes such as alternative splicing, protein processing and post-translational modification are key to providing the full complexity of life. Moreover, because the abundances of proteins are often of great biological significance, these must often be tightly controlled1. Although the ultimate goal of proteomics is to both identify and quantify the full complement of proteins and their variants in any cell type under conditions of interest, neither the composition of a proteome nor the quantity of its constituents can be reliably predicted by computation or determined by experimentation. Nonetheless, several proteomic strategies now effectively support a range of strategies for biological experimentation. Owing to limited availability and accessibility of suitable reagents, the majority of proteins in any species cannot be detected and quantified by affinity-based assays. Therefore, essentially all proteomic studies have used mass spectrometric discovery techniques, which are now capable of unambiguously identifying and quantifying thousands of protein components of complex samples (see reviews by B.D and R.A. and others; refs. 2,3). Of the several such discovery methods that have been developed, all involve digesting the protein sample into peptides, typically by trypsin, and then fractionating the resulting peptide mixture before it is subjected to mass spectrometric analysis. MS involves ionizing the peptides and selecting specific precursor ions from the pool of detected peptide ions for fragmentation. The resulting product-ion 1Institute
of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland. of Science, University of Zurich, Zurich, Switzerland. Correspondence should be addressed to R.A.:([email protected]) or B.D. ([email protected]).
2Faculty
Published online 9 July 2010; doi:10.1038/nbt.1661
710
mass spectra, commonly generated by collisional activation, are recorded and used to determine the amino acid sequence of the selected peptides. Finally, the proteins present in the sample are inferred from the ensemble of identified peptides. In the most common implementation of the method, the precursor ions are selected automatically from the ions detected in a survey scan immediately preceding the ion selection, a process referred to as data-dependent analysis (DDA). When combined with an appropriate stable-isotope labeling strategy, these protein identification methods also permit relative (involving comparison to a reference sample) or absolute quantification of the identified proteins2,4. The various methods differ in the requirements for sample preparation, the extent of sample fractionation and the level (protein or peptide) at which fractionation is performed, the type of mass spectrometer used and their needs in data processing tools5. Most common implementations rely on electrospray ionization directly coupled to an instrument for liquid chromatography. However, alternative or complementary methods based on matrix-assisted laser desorption ionization (MALDI) have also been proposed6,7. MALDI enables repetitive, sequential interrogation of the same sample. Multiple incremental improvements at each level of this fundamental process have substantially increased the number of proteins typically identified in proteomic studies, the confidence with which fragmention spectra resulting from collision-activated dissociation (CAD) are assigned to peptide sequences and the confidence with which protein identities are inferred. Overall, liquid chromatography tandem mass spectrometry (LC-MS/MS) with DDA is now a robust and powerful technology to detect and quantify proteins and their post-translational modifications, as exemplified by some recent large-scale studies8–10. As outlined elsewhere in this issue11, however, the perpetual de novo discovery of the proteome or fractions thereof in every proteomic study may not be the most suitable and effective strategy to interface proteomics with biological research. First, even though the fraction of the proteome identified in discovery proteomic studies has increased over time, the analysis of the complete proteome of even moderately complex cells remains challenging, expensive and slow. Second, the heuristics based on signal intensity used for the precursor-ion selection in DDA MS result in an irreproducible and incomplete sampling of the peptide mixture generated to represent the proteome. Consequently, different subsets of the proteome are identified and quantified after repeated analyses of identical or substantially similar samples. Therefore, partially overlapping proteome data sets are generated even when the parameters for measurement are carefully controlled12,13. Third, in each discovery experiment that is focused on a specific biological question (e.g., the proteomic changes induced by the stimulation of a cell sample with a volume 28 number 7 JULY 2010 nature biotechnology
particular drug), a large number of proteins with no relevance to the particular question are identified and quantified. Conversely, some of the relevant proteins are missed. Consequently, a model of the studied biological process has to be assembled from data sets that are noisy, incomplete and contain large amounts of irrelevant data. Moreover, the absence of a protein from the list of identified proteins does not indicate the absence of the protein from the sample. This complicates the comparison of lists of proteins identified in different studies. Fourth, no prior information on the system studied is used to design the experiment, even when it is being conducted in an area with the benefit of vast amounts of prior biological knowledge. Finally, there is a large discrepancy between the size and quality of reported proteomic data sets. Whereas a few highly specialized laboratories now routinely and reliably identify thousands of proteins per study8–10,14, a more representative selection of proteomic laboratories identified the components of a sample consisting of 20 equimolar proteins only with considerable difficulty and after optimization of their methods15. In light of these challenges, we have argued that proteomics will make a larger and more immediate impact on progress in biology if reproducible and quantitatively accurate data can be generated for all the proteins that constitute a particular system or process (R.A. and colleagues16). In such a scenario, sets of proteins are defined from prior biological knowledge and then identified and quantified by targeted MS to generate complete, accurate and reproducible data sets that represent the whole system studied under different conditions. Such a proteomic strategy supports the standard way of biological inquiry, where specific hypotheses (proposed explanations for observable phenomena) are generated from the available knowledge and then tested. In contemporary proteomics, the hypothesis is almost invariably that unique and tight regulation of a group of proteins underlies the function or process of interest. Recent years have witnessed the emergence of several MS-based methods that advance such targeted proteomic strategies. The consistent and reproducible detection of complete sets of proteins in multiple samples and their accurate quantification is important for a wide range of biological studies. In particular, this applies to biomarker research and systems biology, where quantitative data of a system in multiple Sample prep. perturbed states are critical for the mathematical modeling of the process in question. Three MS strategies The generic overall process by which peptides are identified and quantified in MS-based proteomics follows the sequence of events indicated in the core section of Figure 1. First, the ionized peptides present in the sample solution are transferred into the gas phase, most commonly using the (nano)electrospray technique. Alternatively, peptides deposited on a solid surface are ionized by MALDI. Second, the mass-to-charge ratio (m/z) of the generated peptide ions (precursor ions) is measured, and the mass of the peptide is determined implicitly. Third, selected precursor ions are isolated sequentially in the gas phase. Fourth, the selected precursor ions are fragmented, most commonly through CAD. Fifth, fragment-ion masses are analyzed and recorded as product-ion spectra. From these, the peptide sequence is inferred
from the ensemble of fragment-ion masses. The quantity of a peptide is determined from the signal intensity of the precursor ion, most commonly by comparing this to the signal intensity of an isotopically labeled reference peptide of identical sequence. The MS steps in this technique are preceded by sample preparation protocols (that generate suitable protein samples, proteolyze the proteins into peptides and separate the peptide samples) and followed by post-acquisition data processing and analysis. Over the past decade, three main MS-based strategies have emerged, which we subsequently refer to as shotgun (or discovery), directed and targeted proteomic strategies. These are distinguished by the way in which the individual steps are performed and connected. The shotgun (discovery) approach has been the most widely used method and has generated the vast majority of proteomic data available today. Directed and targeted MS methods, which are still emerging, support proteomic strategies in which prior information is used to define sets of peptides or proteins to be analyzed selectively. We next describe the principles of each method. Shotgun (or discovery) proteomics The hallmark of the shotgun proteomic method is the selection of peptide ions detected in a particular sample and their fragmentation by simple heuristics, based on signal intensity. The principles and information pertinent to the technique are summarized in Box 1 (see also Fig. 2) and some of its potential pitfalls are discussed elsewhere (R.A. and others17). In a shotgun experiment, the masses (more precisely, m/z) of the ions produced in the ion source at a particular time are recorded to generate a mass spectrum, often referred to as a survey scan. The mass spectrometer then automatically selects one of the detected peptide ions, called a precursor ion, isolates it, subjects it to fragmentation by CAD and records the resulting fragment-ion mass spectrum. This process is called product-ion scanning. Because a cycle that comprises a survey scan and a product-ion scan is fast (~100 ms) compared with the chromatographic elution time of a particular peptide (~30 s), and because many precursor ions are typically
Mass spectrometry
Proteome
Data analysis
MS
1
Ionization (electrospray)
Digest
m/z
2 Separation HPLC
MS (full spectra)
3
MS (precursor selection)
HPLC
Time
Quantification
© 2010 Nature America, Inc. All rights reserved.
perspecti v e
Quantification
4 Fragmentation (CAD)
5
MS/MS (Production spectra)
MS/MS
Identification
m/z
DB
Figure 1 Workflow of a typical proteomic experiment. Proteins are digested to produce a complex mixture of peptides, which are separated by HPLC before analysis by MS. The overall process consists of a number of steps, specifically the ionization of the peptides, acquisition of a full spectrum (survey scan) and selection of specific precursor ions to be fragmented, fragmentation, and acquisition of MS/ MS spectra (product-ion spectra). The data are processed to either quantify the different species and/or determine the peptide amino acid sequence through a database search.
nature biotechnology volume 28 number 7 JULY 2010
711
perspecti v e
© 2010 Nature America, Inc. All rights reserved.
Box 1 Principles of shotgun (or discovery) proteomics Discovery or shotgun proteomics is a universally and successfully used proteomic method that is used almost exclusively in a configuration in which an LC system is connected online to a tandem mass spectrometer operated in electrospray ionization mode (LC-MS/ MS; Fig. 2). Less frequently, LC systems are used offline to deposit samples on a sample plate for analysis by a MALDI-MS (LC-MALDI-MS). In LC-MS/MS the peptide components present in the sample are separated by reversed phase liquid chromatography and analyzed by MS in the full-scan and MS/MS modes. The method is uniquely suited for the identification of the protein components of samples, including their posttranslational modifications. If used with stable isotope-based labeling, it is also suitable for protein quantification.
LC-MS/MS
HPLC
ESI MS full spectra (survey scan)
Precursor selection MS1
CAD
(A)
Product-ion spectra
MS1 MS/MS
Identification
(B)
Figure 2 Workflow of a discovery proteomic experiment. The peptide mixture is separated by HPLC and analyzed by MS in full-scan mode. Using simple data-dependent acquisition heuristics based on signal intensity, peptide ions are selected for fragmentation and dissociated by collisional activation. The resulting MS/MS spectra permit determination of the amino acid sequence of the fragmented peptide. The intensity of the precursor-ion signal in the survey scan is used for quantification. The insert indicates the different modes of acquisition; either sequential MS and MS/MS analysis as performed using a quadrupole/time-of-flight instrument (A), or parallel analysis as performed on a linear ion trap/orbitrap mass spectrometer (B).
Instrumentation. The most common instrument types used are ion trap, hybrid quadrupole/TOF and hybrid iontrap/orbitrap mass spectrometers. Most modern tandem mass spectrometers are compatible with the method, although their respective performances vary considerably.
Workflow. A proteolytic digest of the protein sample is analyzed by LC-MS/MS while the mass spectrometer is operated in DDA mode. In this mode, the system continuously acquires series of survey scans (MS1 mode) and a set of subordinated MS/MS scans, generating fragment-ion spectra of selected peptide ions. These fragment-ion spectra, combined with information on the precursor ions, are then analyzed to determine the amino acid sequence of the fragmented peptides, and to infer the proteins from which the peptides originate. Survey scan. The survey scan is critical in a shotgun experiment. It detects the peptide ions that are selected on the fly for CAD, using a simple heuristic method. Typically, a subset of three to eight ions per survey scan is selected for fragmentation. The resolution and mass accuracy achieved in the survey scans affect the subsequent database search to assign the amino acid sequences to the generated fragment-ion spectra. The capacities of the FT-ICR, orbitrap and recently developed QTOF instruments for accurate mass measurement have considerably increased the confidence with which peptides can be identified. MS/MS mode. Most experiments are performed using CAD. Two parameters affect the quality of MS/MS spectra, and thus the results of a shotgun measurement. The first is the mass window used for precursor-ion selection, that is, purity of the signal for sequencing. Typically, a broad window of 2–3 Th ensures sufficient sensitivity. The second is the analyzer performance in the MS/MS mode. Spectra obtained using ion-trap instruments are typically of low resolution and have limited mass accuracy (>0.1 Da), whereas TOF mass spectrometers and the orbitrap instruments provide high mass accuracy measurements for
712
MS/MS
Quantification
fragment ions. This facilitates the assignment of sequences to the spectra. Accurate mass determination of the precursor ion adds a discriminating constraint in sequence database searching. Selection of precursor. Precursor-ion selection is performed automatically by the spectrometer on the fly, based on the information detected in the survey scan. Quantification. Quantification is coupled to protein identification. Because quantification is performed on the ‘sparse’ survey scan, data precision is limited. Informatics. All data processing and data analysis occurs after the completion of the mass spectrometric analysis. The tasks of assigning the correct peptide sequence to each acquired fragment-ion spectrum and of inferring the correct set of proteins represented by the identified peptides is computationally challenging and represents a large overhead, especially considering the volume of data acquired during shotgun experiments. This issue and the computational tools developed to address it have been reviewed recently 4,5. Applications. The method is often used qualitatively, aiming at identifying large sets of proteins in complex samples. More recently, it has been used for differential quantification of the identified proteins. It is almost exclusively applied for discovery experiments. Because no prior knowledge is required, the method is ideally suited for open discovery experiments. The main limitation is its bias in the precursor selection process toward the more abundant component present in the sample, in particular for samples of very high complexity where the number of analytes exceeds the peak capacity of the LC-MS analytical system. It results in an irreproducible replication of the DDA experiment, as simple heuristics sample a different pool of peptides in each experiment11,12.
volume 28 number 7 JULY 2010 nature biotechnology
perspecti v e detected in a survey scan, one survey scan can be followed by several product-ion scans. The instrument selects the specific precursor ions of each fragment-ion spectrum on the fly by DDA. State-of-the-art
instruments permit data acquisition at a rate of a fraction of a second, enabling thousands of fragment-ion spectra to be collected during a typical reversed phase LC-MS/MS experiment. Although impressive,
Box 2 Principles of proteomic strategy based on directed MS
© 2010 Nature America, Inc. All rights reserved.
This method contrasts with the shotgun strategy in that protein identification (based on fragment-ion spectra) and protein quantification (based on survey scans) are decoupled and performed in two distinct experiments (Fig. 3). In fact, the two steps happen in the reverse order, and unlike shotgun proteomics, each sample is analyzed twice. A variant of directed sequencing, termed AIMS (accurate inclusion mass screening22), has been proposed to expedite the qualification of candidates and overcome some of the limits on an uncontrolled discovery experiment. LC-MALDI–based strategies also have the capability of performing inclusion list–driven peptide identification. Instrumentation. This type of experiment is typically performed on high performance instruments, such as QTOF or LIT-OT instruments, to leverage their high mass resolution and mass accuracy capabilities. Workflow. A directed MS experiment includes at least two LC-MS or LC-MS/MS analyses. The first is focused primarily on collecting survey scans, which are processed offline, to detect the features that will be selected for the inclusion list. This step creates an inventory of all detected peptide ions. This information is then used to design a second measurement of the same sample that aims at sequencing the analytes of interest, such as those that show differential expression between two conditions. The second LC-MS/MS run is performed in product-ion mode to generate tandem mass spectra used to identify specific targets listed on the inclusion list. Survey scan. The survey scan remains mandatory in the second measurement, because the detection of a signal is required to trigger the MS/MS acquisition for an ion that is present in the inclusion list. As in the shotgun strategy, the resolution and accuracy of the survey scan are critical for the selection of the species of interest. The high mass accuracy of high-performance mass spectrometers, coupled with their low tolerance for the detection of the precursor ion needed to trigger an MS/MS event, ensure more HPLC ESI effective exclusion of contaminant species that have similar m/z as the target peptide. MS/MS mode. The MS/MS acquisition is performed in data-dependent mode, but the precursor mass selection takes into account the additional constraints of the inclusion list. To trigger a CAD event, an ion has to be observed in the survey scan with an intensity above a preset threshold, and it has to be present in the inclusion list. Precursor selection. As in a shotgun experiment, a broad selection window ensures sensitivity. However, accurate masses are taken into account for the selection of the precursor and for database searching. Quantification. As mentioned above, this method provides high quality LC-MS
data and precise quantification, using the chromatographic dimension. The quantification of any analyte present in the sample is independent of the sequencing events. Therefore, differential analyses can be performed on all detected analytes. Even low-intensity signals in noisy survey spectra that would not be selected in a shotgun experiment can be detected and identified. The method is compatible with stable isotope–based and label-free quantification schemes. Informatics. The database-searching overhead to perform peptide identification is substantially reduced as the redundancy of the acquired data decreases. There is, however, a large additional cost in processing the LC-MS data to detect and inventory all the ions and their attributes (mass, charge, elution time and signal intensity), and to possibly align and compare data from multiple measurements for the selection of the precursor set that constitutes the inclusion list. Several commercial and open source software tools for feature detection and alignment have recently been developed. Applications. The method is primarily used in discovery experiments with an emphasis on less abundant species. Directed MS/MS approaches improve the efficiency of peptide identification in complex samples. This strategy has significant advantages over a conventional LC-MS/MS experiment in that the bias in favor of the most intense signals is partially removed, thus providing a deeper penetration into a proteome. In addition, decoupling the quantification and the identification steps provides more reliable quantitative measurements than can be accomplished in shotgun experiments. Triggering an MS/MS acquisition is contingent on the presence of signals corresponding to the peptide of interest in the survey spectrum. Nonetheless, the inclusion list allows the experiment to be tailored toward a specific set of ions. LC-MS/MS
Data analysis
MS full spectra (survey scan)
Quantification
List of targets
HPLC
ESI Precursor selection
CAD
Product-ion spectra
Identification
Figure 3 Workflow of a directed proteomic experiment. The sample is first analyzed in LC-MS mode, and the results are analyzed using a suite of bioinformatic tools to quantify the peptides. Typically, peptides that are of particular interest (e.g., those that are regulated by comparing multiple samples) are included in a list of targets for MS/MS sequencing. In a second step, the sample is reanalyzed to sequence exclusively the peptide ions present on the target list. The resulting MS/MS spectra enable the amino acid sequence to be determined.
nature biotechnology volume 28 number 7 JULY 2010
713
perspecti v e this number is small in relation to the number of peptides generated by tryptic digestion of a proteome. The substantial discrepancy between the number of peptides present in a digest of a proteome and the analytical capacity of the LC-MS/MS analytical system (that is, the number of components that can be separated, detected and identified) prevents a perfectly reproducible set of peptides from being identified in repeat analyses of the same sample. This arises because a different subset of the available precursor ions is sampled in each subsequent analysis. Proteome coverage and data reproducibility can be improved by increasing the fraction of available precursor ions selected for CAD. This can be accomplished by repeated analysis of the same sample or fractionating the sample for subsequent analysis
of each fraction9,10,14. With extensive sample prefractionation and the LC-MS/MS analysis of tens to hundreds of fractions per sample, the fraction of a proteome identified can be increased, presumably along with the reproducibility of the proteome patterns generated. These gains are, however, offset by the cost and time required to carry out such extensive proteome discovery experiments. Developments of MS instrumentation and software engineering have enabled substantial advances in shotgun proteomics over the past decade. Although initially performed on low-resolution ion-trap instruments, the technique is now commonly implemented on lastgeneration, high-performance, hybrid mass spectrometers (e.g., linear ion trap orbitrap (LIT-OT) or quadrupole time of flight (Q-TOF)
© 2010 Nature America, Inc. All rights reserved.
Box 3 Principles of proteomics based on targeted MS This technique distinguishes itself from shotgun or directed MS in that it uses prior information to generate validated mass spectrometric assays for the detection and quantification of predetermined analytes in complex samples (Fig. 4). It is most frequently implemented on triple quadrupole instruments operated in the selected reaction monitoring mode (SRM, often also called MRM). Instrumentation. This type of experiment is preformed on triple quadruple instruments in which the second analyzer (third quadrupole) is used in nonscanning mode, which concentrates the available measurement time on the targeted analytes. This signal accumulation translates into an improved limit of detection.
LC-MS/MS Hypothesis
Pep lib
Method
HPLC
ESI Precursor selection
CAD
Fragment-ion
Quantification
Figure 4 Workflow of a targeted proteomic experiment. As the experiment is hypothesis-driven, it targets a very specific subset of peptides uniquely associated with the proteins of interest. An instrument method is built using existing proteomic resources (peptide spectral libraries) required for a target analysis and is typically performed using a triple-quadrupole instrument. For each peptide, a series of transitions (pairs of precursor and fragment ion m/z values) are monitored during a time that specifically corresponds with its predicted elution time. This enables hundreds of peptides to be analyzed in a single experiment.
Workflow. The method is exclusively hypothesis driven, that is, it requires a priori information at the level of both assay design and target selection. For each peptide, the m/z of the precursor ion, its retention time and a set of high-intensity fragment ions unique to the targeted peptide need to be defined, and these values constitute a definitive assay for the detection of the targeted peptide in any sample. The generation of validated SRM assays can be performed at high throughput through the use of synthetic peptide libraries34. Survey scan. No survey scan is performed in this mode. MS/MS mode. As the SRM method is characterized by the measurement of only a few fragment ions of each targeted peptide, the second analyzer will ‘jump’ to a set of preset values, rather than scan across the entire m/z range. The parameters required for each measurement (precursor and fragment-ion m/z values, collision energy, elution time, dwell time per transition) have to be defined in the analytical method uploaded to the instrument. Selection of precursor. As the precursor ions are monitored by default, regardless of their presence in the sample or their detection as a precursor ion, the method is not data dependent. Because of its intrinsically improved limit of detection, lower-mass selection windows (≥1 Th) can be used. This substantially reduces co-eluting interferences, thus increasing the overall selectivity.
714
Method builder
Quantification. SRM is the prototypical mass spectrometric quantification method, yielding precise measurements, with very low coefficients of variation and high reproducibility28. The limits of detection and quantification are typically two orders of magnitude lower than in conventional LC-MS experiments, especially if complex samples are being analyzed. Informatics. Most of the informatics effort is performed upfront. In essence, SRM exploits existing information from the proteomics databases, such as specific SRM assays stored in either MRMAtlas25, or previous discovery information present in a repository such as PeptideAtlas (www.peptideatlas.org/). Applications. The technique is exclusively hypothesis driven. It is focused on the detection and quantification of peptide candidates that are explicitly included in the experiment. The identity of the analytes relies on the elution time, and sometimes isotopically labeled internal standards are used for accurate quantification and for gaining confidence in the detected transition traces. Developments in instrument-control software that schedule the measurement of targeted peptides in predetermined time windows allow >1,000 transitions to be analyzed in a single LC-MS experiment, without compromising sensitivity25.
volume 28 number 7 JULY 2010 nature biotechnology
perspecti v e Targeted proteomics The hallmark of targeted MS is the detecResolving Limit of tion of a set of predetermined fragment ions Analyzer Implementation Type power Mass accuracy detection Dynamic range from precursor ions that are anticipated, but Quadrupole TQ-QTOF In-beam 1,000–2,000 Low Very low 4–5 not necessarily detected, in a survey scan. Ion trap IT Trapping 1,000–2,000 Low Very low 2–3 Currently, the main implementation of this TOF Q-TOF In-beam >25,000 High Low 3 concept is selected reaction monitoring OT/ICR Hybrid Trapping >50,000 Very high Low 3 (SRM) using triple quadrupole instruments. TQ, triple quadrupole. SRM is a quantitatively accurate technique that has been well established in small-molecule MS24. The principles and information instruments), resulting in dramatically increased data quality and pertinent to targeted MS are summarized in Box 3 (see also Fig. 4). faster rates of data acquisition. Recent studies have demonstrated In this approach, the fragment-ion spectrum of the targeted pepdramatic increases in the proteome coverage achieved and the ability tide is determined in prior measurements. The precursor-ion mass, to identify large numbers of modified peptides8,9. Furthermore, the the charge state, elution time and characteristic high-intensity fragrecent implementation of alternative fragmentation techniques, such ment ions represent a definitive assay for the targeted peptide used to as electron transfer dissociation18, has further increased the range of detect and quantify the targeted peptide in a sample. The relationship peptide analytes accessible to mass spectrometric analysis. Specifically, between a precursor ion and a specific fragment ion is referred to as a large peptides and peptides subject to post-translational modifications transition. Quantification is accomplished by relating the fragmentshow favorable electron transfer dissociation fragmentation patterns19. ion intensities of the targeted peptide to the corresponding signals of Therefore, shotgun proteomics is the method of choice for the a priori isotopically labeled reference peptides of identical sequence. If the identification of the protein components of complex samples and the elution times of the targeted peptides are used as a measurement constraint (that is, specific subsets of the targeted peptide are only characterization of their post-translational modifications. detected in a narrow time window spanning a few minutes around their anticipated elution time), several hundred peptides can be tarDirected proteomics The hallmark of directed MS is the selection and fragmentation of a geted in a single LC-MS/MS analysis25. predetermined set of peptide ions detected in a survey scan20–22. The The precursor ion of the targeted peptide does not need to be explicprinciples and information pertinent to directed MS are summarized itly detected within the matrix of the sample, and background noise in Box 2 (see also Fig. 3). In this method, the precursor ions that are is filtered out sequentially at the precursor- and fragment-ion levels. of interest for a particular study (e.g., peptides that are differentially These considerations make targeted MS the most sensitive mass specexpressed between samples) are compiled into a master list, along trometric strategy and the one least affected by interference effects with relevant attributes such as the precursor-ion charge state, m/z when analyzing complex samples. The optimal transitions (precurratio and retention time. This list is the basis for the generation of one sor- and fragment-ion pairs), retention time and collision energy that or several inclusion lists that are loaded into the computer control- constitute a definitive assay need to be established once for a particular ling the mass spectrometer to ensure that the instrument exclusively instrument type and can then be used perpetually. They can therefore selects for CAD those features that are detected in a survey scan be made accessible in public databases26. and are present on the inclusion list. Selection of multiple precursors from a survey scan and tight scheduling of retention times have Implementation of MS strategies now increased the number of precursors selected in a 60 min or 90 Each of the three strategies we have described relies on tandem MS. Each min LC-MS/MS run to several thousand. Because the generation presents unique characteristics that determine its suitability for tackling of the master list and its use for measurements are uncoupled in a specific proteomic or biological research question. The strategies also time, feature selection can be optimized according to the quality differ in the way the mass spectrometers are used. The types of mass of the sample and the biological question at hand. A variant of the spectrometers commonly used in proteomics, along with some of their approach, referred to as LC-MALDI, involves spotting the column distinctive traits, are summarized in Table 1. The instrument charactereffluent on the solid surface of a sample plate and then sampling the istics pertinent to proteomics are the selectivity of measurement to avoid contents of sequential spots by MALDI-MS/MS. cross-talk from other analytes (resolving power), the linear dynamic Different types of input data have been used to compile master lists range, the limits of detection and quantification and the mass accuracy (R.A. and colleagues23). They include, for example, prior quantitative (Box 4 and Fig. 5). Shotgun proteomics depends on the ability of the instrument to proteome measurements by differential stable isotope labeling or by comparative analysis of LC-MS feature maps generated from differ- reliably detect precursor ions in a survey scan, to select an optimal ent samples. Compared with a discovery proteomic experiment using set of detected precursor-ion signals for CAD and to generate and DDA, precursor ions of lower abundance can be selected, especially if acquire fragment-ion spectra with ion series sufficient for the unamhighly complex samples are being analyzed and the identification rate biguous assignment of the correct peptide sequence to the spectrum. is increased. Selection of the same set of precursor ions for fragmen- Additionally, these operations should be carried out at a high cycle tation in repeat analyses of the same or substantially similar samples frequency to maximize the number of peptide identifications, and the increases reproducibility between data sets. Finally, peptides with measurements should have high sensitivity, large dynamic range and detectable features, such as distinctive isotopic signatures or mass high mass accuracy. These requirements are best matched by ion trap defects, or peptide patterns indicating structurally related peptides hybrid instruments such as ion trap–Fourier transform ion cyclotron (e.g., differentially modified peptides) can be detected in LC-MS fea- resonance (FT-ICR) and ion trap–orbitrap, and Q-TOF instruments, ture maps and specifically selected for analysis in subsequent LC-MS/ respectively. Currently, shotgun proteomic measurements are most frequently carried out using LIT-OT instruments. MS runs driven by inclusion lists.
© 2010 Nature America, Inc. All rights reserved.
Table 1 Mass analyzers commonly used in proteomics
nature biotechnology volume 28 number 7 JULY 2010
715
perspecti v e The main difference between shotgun and directed sequencing experiments is the method used to select precursor ions detected in survey scans for CAD. Although this process is instrument driven, it is controlled by a time-constrained inclusion list in the directed method and is no longer intensity-dependent. Therefore, shotgun and directed sequencing differ at the level of instrument control rather than at the level of the instrument type, and the same considerations related to instrument performance and characteristics apply to both methods. Targeted experiments, which are based on SRM (Box 4), depend on the effective and sequential filtering of noise at the precursor-ion and fragment-ion level, which increases the signal-to-noise ratio and therefore the limit of detection. Targeted strategies are characterized by a
dynamic range of concentrations spanning four to five orders of magnitude, high sensitivity and a relatively small number of analytes detected per unit time. To achieve precise quantification, measurements need to be performed to ensure that enough data points are acquired over the chromatographic elution range of a peptide to reconstruct the chromatographic peak. This limits the number of peptides detected per unit time. For instance, at a 2-s cycle time, 100 transitions using a 20-ms dwell time for each measurement would be acquired. Presently, the characteristics for SRM can be fulfilled only by triple quadrupole mass spectrometers. An interesting variant, useful for the development of SRM assays, is the capability of acquiring full fragment-ion spectra driven by an SRM transition25. An advantage of quadrupole/linear ion trap instruments
© 2010 Nature America, Inc. All rights reserved.
Box 4 Key considerations when planning quantitative proteomics experiments When conducting any proteomics experiment, several factors are key to the characterization of MS measurements. These are summarized in Figure 5 and described below. Selectivity. The selectivity of a method is its ability to discriminate and quantify a particular analyte in a mixture or matrix without interferences from the other components. The reliability of measurements depends on the selectivity of the analytical device. An increased selectivity is achieved by analyzers with higher resolving power, which separate near-isobaric ionic species and determine their respective accurate mass. High selectivity is particularly critical in the LC-MS analysis of complex mixtures, in which multiple components co-elute from the column. Analyzers such as FT-ICR, orbitrap or the last-generation TOF analyzers present high-resolution capabilities and thus increased selectivity. Alternatively, the selectivity of quantitative analyses can be improved by using a second level of mass selection, as in the SRM mode. Limit of detection (LOD). The intrinsic LOD of an instrument or a method, which is often incorrectly called the sensitivity, is defined as the minimal quantity of an analyte that can be confidently detected. The related term, limit of quantification (LOQ), is defined as the minimal amount of an analyte that can be confidently quantified (Box 5). The instrument LOD is usually specified by measuring the components of a simple mixture or individual analytes in dilution series. In such samples, the chemical background is minimal. The limit of detection and dynamic range, which are pertinent in the context of complex biological samples, are modulated by the background and the interferences associated with it. The components of a complex sample will affect the detected signal-to-noise ratio and may affect the ionization efficiency through suppression effects. Under the conditions encountered when using biological samples, the chemical background is significant and poor signal-to-noise ratios are observed for analytes present at very low concentrations. Although state-of-the-art instruments have LODs and LOQs for single compounds or simple mixtures in the low amol range, matrix and ion-suppression effects considerably reduce the practical ability to detect species of low abundance in complex samples, especially in cases in which the respective precursor-ion signal needs to be detected in a survey scan. Thus, the sample preparation (that is, reduction of the sample complexity) cannot be dissociated from the entire analytical protocol. Dynamic range. The dynamic range of an instrument denotes the range between the highest signal and the lowest amount of an analyte detected in a single analysis. Often, the linear range of the response is also specified. The dynamic range is determined
716
by performing dilution series of specific analytes, either by themselves, or added to a matrix. The highest dynamic range is currently obtained on in-beam instruments such as quadrupoles, where ions are continuously monitored. Sample overloading is possible in such instruments. This leads to saturation of the major components, whereas minor species emerge from the background. In-beam systems are often preferred for quantitative analyses over trapping devices. Matrix and ion suppression effects occur if multiple components eluting concurrently from the highperformance (HP)LC column are ionized together. As mentioned above, complexity of the samples and chemical background affect the dynamic range, in particular for trapping devices. Data density. The data density is defined as the number of measurements acquired during one experiment. In a conventional shotgun experiment, the value indicates the number of MS/MS sequencing events. In a targeted experiment, it reflects the number of peptides analyzed, including multiple measurements for each peptide. Obviously, the volume of data acquired is closely related to the sensitivity and the acquisition rate of the instrument. Repeatability. The repeatability of a measurement refers to the ability of the method to generate identical results if identical test samples are processed with the same procedure under the same conditions (instrument settings, operator, apparatus and laboratory) within a short interval of time. Reproducibility. The reproducibility of a method refers to the ability to replicate the measurement accurately by someone else working independently, that is, the ability to generate identical results obtained with the same method on identical test material, but under different conditions (different operators, different apparatus, different laboratories and/or after different intervals of time). Data density and effectiveness
Selectivity Dynamic range
Reproducibility
Limit of detection Repeatability
Figure 5 A representation of the desired characteristics of a proteomic experiment. The actual performances for each of the approaches can be compared visually by representing the individual characteristics on each of the six axes (Fig. 6).
volume 28 number 7 JULY 2010 nature biotechnology
perspecti v e is that they can be operated alternatively in triple quadrupole and LIT operating mode to acquire MS/MS spectra. In summary, proteomics researchers have yet to develop the ideal universal mass spectrometer for proteomics. The type of experiment performed and the method chosen for data acquisition determine the optimal type of instrument for each application. Moreover, every instrument and data acquisition mode presents a series of compromises that affects the performance of a given proteomic strategy.
a
Selectivity
Data density and effectiveness
Low
High
Dynamic range
Limit of detection
Reproducibility
© 2010 Nature America, Inc. All rights reserved.
Repeatability
Performance profiles of the three strategies There is currently no single method capable of routinely identifying and quantifying all the components of a proteome. Each method is therefore a compromise that maximizes the performance at some levels, while reducing it at others. For example, in a SRM-based targeting experiment, the recorded signal-to-noise ratio is related to the dwell time (that is, the time the spectrometer takes to record the signal of a given transition). The lower limit of detection achieved by longer dwell times negatively affects the number of transitions and therefore the number of peptides that can be analyzed during a time segment. Similarly, an increase of the resolving power of a quadrupole mass analyzer reduces sensitivity. As another example, in quantitative shotgun proteomics in trapping instruments, the limit of detection for precursor ions and therefore the quantitative accuracy achieved, depends on the trapping time. Longer trapping times improve the limit of detection but reduce the number of different analytes measured per unit time. Furthermore, many of the performance characteristics depend on the source of the sample and its complexity. For example, the shotgun and directed MS methods, where the precursor ion has to be explicitly detected in a matrix of background ions before selection for CAD, are more strongly affected by background noise than the targeted methods where the precursor ion does not need to be explicitly detected. A comprehensive discussion of the benefits and trade-offs of each strategy is beyond the scope of this account. We therefore summarize the trade-offs inherent to each method with respect to the main factors characterizing proteomic measurements: selectivity, dynamic range, limit of detection, repeatability, reproducibility, data density and effectiveness. These terms are defined in Box 4 and the performance characteristics of each method are summarized in Figure 6. The above discussion of a few of the trade-offs that apply to proteomic measurements already suggests that there is no single best implementation of a particular strategy. The performance profiles discussed below therefore apply to implementation parameters that are commonly applied in proteomics. Performance profile of shotgun proteomics Shotgun MS typically involves using a hybrid mass spectrometer with a fast cycle time to analyze complex sample mixtures comprising potentially hundreds of thousands of peptides with abundances that span up to ten orders of magnitude. The combination of intensity-based heuristics for precursor-ion selection, limited cycle speed, high sample complexity and lack of input of prior data for precursor selection contribute to the performance profile indicated in Figure 6a. The high acquisition frequency (1–10 Hz range) of modern spectrometers ensures that shotgun measurements produce a high data density (Box 4). Even so, extensive proteome coverage can be achieved only if the samples are fractionated before MS analysis and the individual fractions are sequentially analyzed. This is because the precursors are selected based on their signal intensity, and, even in the fastest available instruments, the number of precursors in a proteome digest exceeds the number of sequencing events available in a LC-MS/MS run. Multiple, repeated selection of the same precursor in the same or sequential fractions results in the redundant identification of the same peptides and nature biotechnology volume 28 number 7 JULY 2010
b
Selectivity
Data density and effectiveness
Low
High
Dynamic range
Limit of detection
Reproducibility Repeatability
c
Selectivity
Data density and effectiveness
Low
High
Dynamic range
Limit of detection
Reproducibility Repeatability
Figure 6 Performance profiles of the shotgun or discovery (a), directed (b) and targeted (c) proteomic methods. The characteristics are defined and discussed in Box 4. The terms ‘high’ and ‘low’ refer to sample complexity.
proteins. This also reduces the yield of newly identified peptides and proteins and limits the repeatability of the results from replicate analyses of identical or substantially similar samples, especially for proteins of lower abundance. Another striking feature of the performance profile for shotgun proteomics is the strong dependence of most parameters on sample complexity. In particular, the limit of detection, the dynamic range and sample reproducibility—three of the most critical parameters for proteome analysis—are negatively affected by increasing sample complexity. These considerations significantly affect the experimental strategy of shotgun proteomics, especially when repeat analyses of substantially similar samples are being analyzed, as is the case, for example, in clinical, time-series or dose-response studies. In conclusion, the shotgun proteomic strategy has a unique potential to discover new proteins and to determine relative protein abundance of proteins identified in different samples. However, the extensive or complete analysis of complex samples, such as those representing whole proteomes, comes at a very high cost in measurement and computational time. Moreover, the performance of the method may vary substantially between samples. Therefore, the shotgun strategy is most frequently applied in cases when samples of unknown composition are being analyzed to identify the largest number of proteins possible; shotgun proteomics is a uniquely powerful method to generate protein inventories. If combined with stable isotope labeling, shotgun proteomics is also commonly used for quantitative comparison of related subsets of the proteins in complex samples. The factors discussed above limit the number of samples that can 717
perspecti v e A wide range of typical applications of directed MS has been discussed recently23. They include the directed measurement and quantification of proteins that are of different abundance across samples, the directed measurement of modified peptides and the analysis of protein biomarkers in clinical samples. Because identical sets of peptides can be measured in multiple samples with a high degree of repeatability, the method used in the context of predefined peptide lists is well suited to generate reproducible, quantitative data sets.
Performance profile of targeted proteomics Much like directed MS, the targeted method also depends on lists of peptides deemed important for detection and quantification in a sample Performance profile of directed proteomics Shotgun and directed MS measurements are usually performed using based on prior information. However, in contrast to directed sequencidentical instruments. The two methods are essentially identical, except ing, the targeted precursor ions are not detected in a survey scan and a that in directed sequencing, precursor-ion selection no longer follows full fragment-ion spectrum of the selected precursor is not generated. abundance-dependent heuristics, but is instead directed by a time-con- Instead, the targeted precursor is selected ‘blindly’ in an anticipated strained inclusion list that is compiled based on prior information. It is chromatographic time window, and the only signals detected are fragapparent from Figure 6b that this seemingly simple difference has sev- ment ions that are derived from the targeted peptide (transitions). In the eral important implications for the performance profile of the method. targeted method, an initial effort is required to determine the optimum First, the high cycle time is maintained but the same precursor ion is fragmentation conditions and, thus, generate optimized assays for each analyzed with dramatically reduced redundancy (ideally once), even if peptide. However, the benefits of this one-time investment are apparent multiple fractions are being analyzed. This significantly increases the from the performance profile of the method (Fig. 6c). Several important features are readily apparent. First, the targeted repeatability and the reproducibility of the method. Second, the control of the sequencing events reduces the rate of futile repeated identifica- method is less affected by sample complexity and background, as noise tions, and the associated computational overhead for data analysis is signals are filtered out both at the precursor level by a narrow (<1 Da) reduced. Third, because the precursor-ion signal of the selected precur- mass-selection window and at the fragment-ion level, where the tarsor still needs to be detected, the dynamic range and limit of detection geted precursor and background ions are expected to lead to distinct of directed MS depend on the sample complexity, albeit less so than in fragment ions for detection and quantification. Second, of the available shotgun methods. Finally, the overall dependency of the performance mass spectrometric methods, the targeted method has the lowest limit of detection and the widest dynamic range, especially for complex samples. profile on sample complexity is reduced. In summary, by virtue of its focus on sequencing only those peptides This is a result of the nonscanning nature of fragment-ion signal acquisiof particular interest, directed MS offers an effective way to character- tion, which allows the integration of the respective signal over extended ize a proteome. It can also be used in a hypothesis-driven mode involv- periods (dwell time). Third, repeatability and reproducibility are exceling identification of a predetermined set of precursor ions in multiple lent because of the nonredundant, targeted data acquisition29. Finally, samples28. The data sets generated by directed MS are generally of much the data density is lower than that of the shotgun or directed methods higher information content and lower redundancy than those generated because of the increased time needed to measure each peptide to lower by DDA. It can be expected that the recent development of software tools the limit of detection. In targeted MS, accurate quantification is achievable by any of the to generate optimized inclusion lists will catalyze wider application of the method. Directed MS can be used for quantitative measurements in commonly used stable isotope labeling techniques4. Of course, no new conjunction with stable isotope labeling or with label-free quantification, proteins are detected by the targeted method; the approach depends whereby peptide quantities are estimated from their precursor-ion current. substantially on the prior measurement of the targeted proteins by discovery proteomics. This strategy therefore is an excellent choice for those studies in which suf7 ficient prior information on a system has been Targeted Shotgun Directed acquired, and the types of research question 6 ULOQ have shifted from identifying the full comple5 ment of proteins associated with a process or location to issues related to characterizing coor4 dinated changes in the abundances of these proteins. Targeted MS will therefore likely become 3 a key technology to test biological hypotheses, 2 reproducibly generate complete quantitative data sets for systems biology or validate bio1 markers by scoring changes in their abundances LLOQ in large set of clinical samples.
Amount detected (log (amol))
© 2010 Nature America, Inc. All rights reserved.
be compared, the number of proteins that can be consistently identified and quantified in multiple samples and the range of protein abundances that can be accommodated. Typical applications of this strategy include the quantitative comparison of the proteomes of differentially perturbed cells, the comparison of protein extracts from diseased and healthy tissues and the analysis of specific subproteomes. More recently, shotgun proteomics has been expanded to the systematic analysis of post-translational modifications, specifically protein phosphorylation8,27. The implementation of electron transfer dissociation is expected to further advance the potentials of discovery-based analyses, especially for modified subproteomes19.
2
4
6
2 4 6 Amount loaded (log (amol))
2
4
6
Figure 7 Effect on biochemical background on quantification by the shotgun (discovery), directed and targeted proteomics strategies. Whereas dotted lines indicate a low-complexity background, full lines represent a complex background, such as a full cell lysate. LLOQ, lower limit of quantification; ULOQ, upper limit of quantification.
718
Differential and quantitative analyses The term quantitative proteomics usually refers to measuring the changes in the level of abundance of proteins in different samples. Typical studies include the quantitative comparison
volume 28 number 7 JULY 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
perspecti v e of samples from ‘different biological conditions,’ with the underlying analyses can in principle be performed using any platform, whereas preassumption that the proteins showing different abundance are func- cise quantification is routinely performed on triple quadrupole instrutionally related to the processes affected by the applied conditions. ments in the targeted mode. The design of quantitative experiments Commonly, comparative studies use isotopiclabeling approaches such as isotope-coded Box 5 Limit of detection and dynamic range affinity tagging (ICAT), isotope-coded protein The limit of detection is defined, in a a 100 labeling or stable isotope labeling with amino c = 100 u first approximation, as three times the acids in cell culture (SILAC)2,4. All involve 80 signal-to-noise ratio. Correspondingly, the labeling the peptides in a sample before the 60 c = 50 u limit of quantification (LOQ) is defined as LC-MS analysis with different reagents or 40 nine times the signal-to-noise ratio. The labels that are chemically identical but differ in c = 10 u c=2u 20 LOD/LOQ value of a measurement and its their isotope composition. The relative abundynamic range are ultimately determined 0 dance of a specific peptide across the samples by the nature and the complexity of the tested is then computed from the precursor-ion b 100 analyzed sample (Fig. 8). signals of the heavy and light forms of the pepAlthough it is relatively straightforward 80 tide, respectively. It should be noted that other to detect low concentrations of peptides in Low 60 quantification methods, based on isobaric simple mixtures (not taking into account tagging reagents, or tandem mass tags, exem40 possible losses during the sample handling plified by isobaric tagging for relative and abso20 and the HPLC separation), the LOD of the lute quantification (iTRAQ)30, are also used 0 same peptide in very complex samples for quantitative proteomics. In these methods, is considerably higher, often by several the determination of the relative abundance of c 100 orders of magnitude. The main factor for peptides is performed on reporter fragment 80 this apparent sample dependency of the ions measured in the MS/MS mode. In such achieved limit of detection is the low ratio 60 High studies, four to eight samples are compared and of analyte to the total amount of peptide. 40 analyzed concurrently. It results in the co-elution of chemically 20 Such relative abundance measurements, similar species which, in turn, can cause ion 0 often also erroneously referred to as being suppression during the ionization process. semiquantitative, contrast with the definition In addition, some mass analyzers, more d 100 of quantification used in analytical chemisspecifically the trapping devices, show a 80 try. This denotes the precise determination of decreased signal if the analyte in question the concentration of specific analytes present 60 is in a complex sample compared to in the sample with a coefficient of variation 40 the signal recorded if the same nominal typically <20%. Such measurements require amount of a pure analyte is injected. This is 20 calibrants (internal standards) and have to be due to the limited total number of ions that 0 performed in the linear dynamic range of the can be stored in the trap without affecting 25 25 e analytical system . Quantification performed its performance by space charging. In pure by mass spectrometric techniques is usually 20 samples, a large fraction of the available performed by stable isotope dilution, that ion capacity is occupied by the ion in 15 S/N 10:1 is, by adding the analyte of interest in which question, whereas in complex samples 10 some stable isotopes have been incorporated. the majority of the available ion capacity 5 S/N 2:1 Internal standards, prepared by incorporation can be occupied by ions representing 0 of stable isotopes, such as 13C, 15N and 18O, are background signals. most commonly used in proteomics. The use In contrast, in-beam analyzers such as f 15 of deuterium is less desirable as it changes the quadrupoles can handle large ion fluxes. 10 peptides’ physicochemical properties such that Even if the most abundant components Low the corresponding heavy and light compounds saturate the detector and can no longer 5 no longer co-elute under reversed phase condibe quantified accurately, low abundance tions. Dilution series of a limited number of species can be detected as long as their 0 the reference compounds are usually measured signal exceeds the signal-to-noise ratio of the background. These dependencies are to ensure that measurements are performed g 10 schematically illustrated in Figure 8. within the linear dynamic range. Differential analysis and precise quantifiHigh 5 cation differ in their scope, the experimental Figure 8 Effect of background on the detection design and the platform used for such studies. of a peptide mixture in various concentrations Whereas comparative studies often deal with a (arbitrary units). (a) In-beam, no background; (b) 0 low/moderate background; (c) high background; limited number of samples (typically no more (d) increased amount of sample loaded; (e) than a dozen), quantitative studies may include trapping instrument, no background; (f) moderate; (g) high background; note the changes in the hundred of samples. Quantitative analysis thus ordinate as ion counts decrease. c, concentration; u, arbitrary units; S/N, signal-to-noise; the x and y requires a rugged high-throughput platform axes correspond to m/z values and signal intensities, respectively. to generate reproducible data sets. Differential nature biotechnology volume 28 number 7 JULY 2010
719
perspecti v e a discovery mode, whereby the differentially abundant peptides can be detected first and Protein Peptide LC-MS Sample sequenced second, typically in sequential fraction digest quantification LC-MS runs. This simple reversal of steps compared to the shotgun method has profound implications: the available sequencing cycles can be focused on the differentially abundant peptides and wider array of comLabeled Labeled Labeled Internal standards proteins concatenate peptides parative quantification strategies can be used to initially identify differentially abundant Figure 9 Isotopically labeled internal standards can be added at various stages in quantitative proteomics analytes that are then subsequently subexperiments. Full-length reference proteins are added at the beginning of analyses, concatenated peptides jected to directed MS/MS sequencing. The are added prior to digestion and synthetic reference peptides are added prior to the LC-MS analysis. directed approach is also compatible with accurate quantification by means of stable needs to consider the sample preparation and the mass spectrometric isotope dilution, using isotopically labeled reference compounds. measurements. A detailed discussion of the different isotope labeling Quantification is based on the comparison of the precursor-ion curmethods is beyond the scope of this account and can be found else- rents of the heavy and light forms of the analytes, and quantitative where4. Overall, quantification is based on the first principle claiming accuracy might be compromised by contaminant signals interfering a direct relationship between signal measured and amount of analyte with either or both isotopic forms. The potential for such interferpresent in a sample (Signal = F × Amount, where F is the response factor ences increases with increasing sample complexity. In addition, apart from the differential isotope labeling methods specific to each analyte). commonly used for shotgun comparative studies, differentially abundant peptides are also frequently detected by comparative precursorQuantification in shotgun proteomics Historically, shotgun proteomics has focused on identifying a large set ion pattern analysis without isotope labeling5,23. In such cases, also of peptides and proteins. In this method, only precursor signals assigned referred to as label-free quantification, the different samples are anato a sequence are quantified, it has generally been used less frequently lyzed sequentially under rigorously controlled instrument conditions. for protein quantification than for protein identification. Consequently, The signal intensity of each of the peptides is then used to assess the in such measurements, the majority of the data acquisition time is spent amount present in each sample. If advanced software tools for pattern on analytes of unchanged abundance that are likely of no relevance comparison are used, the precursor-ion patterns of many samples can to the biological question of the study. The focus has been on detect- be overlaid, compared and statistically analyzed for the selection of the ing relative changes in the concentrations of peptides (and indirectly most significant differentially abundant species. There are underlythe associated proteins) across samples. Such experiments, by analogy ing assumptions or approximations in this approach that, if violated, to genomics arrays, typically focus on changes of at least twofold and impinge on the precision of the measurements. More specifically, it is usually analyze all signals observed in an LC-MS experiment. The assumed that the response factor is not affected by the ‘micro-environobjective of such measurements is to maximize the number of sample ment’; in other words, ionization suppression or enhancement effects components measured. Because relative changes are of primary inter- are negligible. Furthermore, the amount of material injected in each est, the precision of measurements becomes of secondary importance. experiment has to be carefully controlled, or a correction factor has to There is an obvious trade-off between the comprehensiveness and the be applied after the analyses. precision of the measurements. Differential shotgun proteomics is a typical discovery technology. Quantification in targeted proteomics In addition to the technical limitations discussed above, the user is Targeted proteomics can be used for comparative quantitative analysis routinely faced with the problem of assigning biological significance or for accurate, absolute quantification of the targeted peptides. Owing and meaning to hundreds of proteins whose abundance is regulated. In to its exquisite selectivity and very low limit of detection, SRM has principle, such comparative analyses can be performed on any platform de facto become the reference method for quantification in complex capable of LC-MS analysis. However, the overall analytical precision samples. As discussed above, quantitative, targeted MS is most fredepends heavily on the choice of the instruments and the characteris- quently used in studies in which the same, predefined set of proteins tics of the analyzer in terms of resolving power, mass accuracy, limit of is quantified in multiple samples. Accurate quantification depends on detection and dynamic range. For instance, ion-trap instruments com- the addition of isotope-labeled reference molecules to the samples31–33. monly used in qualitative proteomics experiments have limited resolv- In a proteomic experiment, internal standards can be added at variing power and dynamic range. Other platforms with increased resolving ous stages, from the crude protein isolate such as cell lysate or plasma power or dynamic range might be more suited for such experiments sample to the fractionated peptide sample immediately before injection (Table 1). Furthermore, as illustrated in Figure 7, the biochemical into the LC-MS system (Fig. 9). By adding the reference samples at the background inherent to complex samples greatly affects performance. step closest to the origin of the biological sample, results of higher preciMoreover, the lower limits of detection or quantification are signifi- sion are generated because progressive losses and variability induced cantly compromised in complex samples (Box 5 and Fig. 8). by sample processing are compensated for.
© 2010 Nature America, Inc. All rights reserved.
Fractionation
Digestion
Quantification in directed proteomics In contrast to shotgun proteomics, which is used exclusively as a discovery method and where only the identified analytes are subject to comparative quantification, the directed MS method offers an interesting range of strategic possibilities. The method can be used in 720
Conclusions The development and commercialization of better mass spectrometer software tools for data analysis have driven tremendous advances in proteomics over the past decade. This progress has translated into larger and more reliable data sets, mostly generated using the shotgun volume 28 number 7 JULY 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
perspecti v e (or discovery) approach. Concurrently, new proteomic strategies have emerged that are accurately quantitative, support rigorous testing of biological hypotheses and show improved reproducibility. The advent of these new strategies has been primarily driven by the need for comprehensive and reproducible data sets for applications such as biomarker validation or modeling processes of interest to systems biologists. In either case, large sets of peptides (used as surrogate for the proteins of interest) have to be analyzed precisely and reliably in each of many samples in the study. Like any other analytical process, the three proteomic strategies discussed above have limitations that set the boundaries of their respective performance and define the biological or biomedical research questions that best match the performance profile of each method. It can be expected that the emergence of approaches, such as directed and targeted MS, which are built on the use of often vast amounts of prior knowledge, will increase the impact of proteomics in biomedical research. These techniques will increasingly augment more common types of experimentation, especially as they provide the capacity of generating data sets that can be compared across studies and laboratories29, and because quantitative proteomics data are generated with unprecedented sensitivity, accuracy and reproducibility. ACKNOWLEDGMENTS We thank P. Picotti, A. Schmidt and N. Selevsek for the preparation of figures. The Swiss National Science Foundation (grant no. 3100AO-107679), SystemsX. ch, the Swiss initiative for systems biology and the European Research Council are acknowledged for financial support. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature. com/reprintsandpermissions/. 1. Schrimpf, S.P. et al. Comparative functional analysis of the Caenorhabditis elegans and Drosophila melanogaster proteomes. PLoS Biol. 7, e48 (2009). 2. Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003). 3. Domon, B. & Aebersold, R. Mass spectrometry and protein analysis. Science 312, 212–217 (2006). 4. Bantscheff, M., Schirle, M., Sweetman, G., Rick, J. & Kuster, B. Quantitative mass spectrometry in proteomics: a critical review. Anal. Bioanal. Chem. 389, 1017–1031 (2007). 5. Listgarten, J. & Emili, A. Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry. Mol. Cell. Proteomics 4, 419–434 (2005). 6. Shevchenko, A., Loboda, A., Ens, W. & Standing, K.G. MALDI quadrupole time-offlight mass spectrometry: a powerful tool for proteomic research. Anal. Chem. 72, 2132–2141 (2000). 7. Medzihradszky, K.F. et al. The characteristics of peptide collision-induced dissociation using a high-performance MALDI-TOF/TOF tandem mass spectrometer. Anal. Chem. 72, 552–558 (2000). 8. Beausoleil, S.A., Villen, J., Gerber, S.A., Rush, J. & Gygi, S.P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 24, 1285–1292 (2006).
nature biotechnology volume 28 number 7 JULY 2010
9. de Godoy, L.M. et al. Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast. Nature 455, 1251–1254 (2008). 10. Denny, P. et al. The proteomes of human parotid and submandibular/sublingual gland salivas collected as the ductal secretions. J. Proteome Res. 7, 1994–2006 (2008). 11. Parag Mallick, P. & Bernhard Kuster, B. Nat. Biotechnol. 28, 695–709 (2010). 12. Tabb, D.L. et al. Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. J. Proteome Res. 9, 761–776 (2010). 13. Paulovich, A.G. et al. Interlaboratory study characterizing a yeast performance standard for benchmarking LC-MS platform performance. Mol. Cell. Proteomics 9, 242–254 (2010). 14. Washburn, M.P., Wolters, D. & Yates, J.R. III. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 19, 242–247 (2001). 15. Bell, A.W. et al. A HUPO test sample study reveals common problems in mass spectrometry-based proteomics. Nat. Methods 6, 423–430 (2009). 16. Kuster, B., Schirle, M., Mallick, P. & Aebersold, R. Scoring proteomes with proteotypic peptide probes. Nat. Rev. Mol. Cell Biol. 6, 577–583 (2005). 17 Duncan, M., Aebersold, R. & Caprioli, R. Nat. Biotechnol. 28, 659–664 (2010). 18. Syka, J.E., Coon, J.J., Schroeder, M.J., Shabanowitz, J. & Hunt, D.F. Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. Proc. Natl. Acad. Sci. USA 101, 9528–9533 (2004). 19. Coon, J.J. Collisions or electrons? Protein sequence analysis in the 21st century. Anal. Chem. 81, 3208–3215 (2009). 20. Schmidt, A. et al. An integrated, directed mass spectrometric approach for in-depth characterization of complex peptide mixtures. Mol. Cell. Proteomics 7, 2138–2150 (2008). 21. Domon, B. & Broder, S. Implications of new proteomics strategies for biology and medicine. J. Proteome Res. 3, 253–260 (2004). 22. Jaffe, J.D. et al. Accurate inclusion mass screening: a bridge from unbiased discovery to targeted assay development for biomarker verification. Mol. Cell. Proteomics 7, 1952–1962 (2008). 23. Schmidt, A., Claassen, M. & Aebersold, R. Directed mass spectrometry: towards hypothesis-driven proteomics. Curr. Opin. Chem. Biol. 13, 510–517 (2009). 24. Baty, J.D. & Robinson, P.R. Single and multiple ion recording techniques for the analysis of diphenylhydantoin and its major metabolite in plasma. Biomed. Mass Spectrom. 4, 36–41 (1977). 25. Stahl-Zeng, J. et al. High sensitivity detection of plasma proteins by multiple reaction monitoring of N-glycosites. Mol. Cell. Proteomics 6, 1809–1817 (2007). 26. Picotti, P. et al. A database of mass spectrometric assays for the yeast proteome. Nat. Methods 5, 913–914 (2008). 27. Pan, C., Olsen, J.V., Daub, H. & Mann, M. Global effects of kinase inhibitors on signaling networks revealed by quantitative phosphoproteomics. Mol. Cell. Proteomics 8, 2796–2808 (2009). 28. Malmstrom, J. et al. Proteome-wide cellular protein concentrations of the human pathogen Leptospira interrogans. Nature 460, 762–765 (2009). 29. Addona, T.A. et al. Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma. Nat. Biotechnol. 27, 633–641 (2009). 30. Ross, P.L. et al. Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell. Proteomics 3, 1154–1169 (2004). 31. Gerber, S.A., Rush, J., Stemman, O., Kirschner, M.W. & Gygi, S.P. Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc. Natl. Acad. Sci. USA 100, 6940–6945 (2003). 32. Rivers, J., Simpson, D.M., Robertson, D.H., Gaskell, S.J. & Beynon, R.J. Absolute multiplexed quantitative analysis of protein expression during muscle development using QconCAT. Mol. Cell. Proteomics 6, 1416–1427 (2007). 33. Wolf-Yadlin, A., Hautaniemi, S., Lauffenburger, D.A. & White, F.M. Multiple reaction monitoring for robust quantitative proteomic analysis of cellular signaling networks. Proc. Natl. Acad. Sci. USA 104, 5860–5865 (2007). 34. Picotti, P. et al. High-throughput generation of selected reaction-monitoring assays for proteins and proteomes. Nat. Methods 7, 43–46 (2010).
721
letters
Live attenuated influenza virus vaccines by computer-aided rational design
© 2010 Nature America, Inc. All rights reserved.
Steffen Mueller1, J Robert Coleman1,3, Dimitris Papamichail2,3, Charles B Ward2, Anjaruwee Nimnual1, Bruce Futcher1, Steven Skiena2 & Eckard Wimmer1 Despite existing vaccines and enormous efforts in biomedical research, influenza annually claims 250,000–500,000 lives worldwide1, motivating the search for new, more effective vaccines that can be rapidly designed and easily produced. We applied the previously described synthetic attenuated virus engineering (SAVE)2 approach to influenza virus strain A/PR/8/34 to rationally design live attenuated influenza virus vaccine candidates through genome-scale changes in codon-pair bias. As attenuation is based on many hundreds of nucleotide changes across the viral genome, reversion of the attenuated variant to a virulent form is unlikely. Immunization of mice by a single intranasal exposure to codon pair– deoptimized virus conferred protection against subsequent challenge with wild-type (WT) influenza virus. The method can be applied rapidly to any emerging influenza virus in its entirety, an advantage that is especially relevant when dealing with seasonal epidemics and pandemic threats, such as H5N1or 2009-H1N1 influenza. Influenza viruses are negative-stranded, enveloped orthomyxoviruses with eight gene segments, each encoding one or two proteins3. The signature antigenicity of the A and B types of influenza viruses is determined by the viral glycoproteins hemagglutinin (HA) and neuraminidase (NA). The annual genetic drift in antigenicity, which is driven by point mutations, is responsible for seasonal influenza epidemics1,3,4. Swapping of gene segments by reassortment between viruses of aquatic birds, swine and humans (genetic shift) produces new type A influenza viruses with novel antigenicity that may cause devastating pandemics1,3,4. The capacity of influenza viruses for immune escape requires that vaccine strains be updated annually to reflect changes in the HA and NA genes within the impending seasonal strains. Two types of vaccines are currently used: a chemically inactivated virus delivered by injection, and a live attenuated influenza virus vaccine of cold-adapted virus5, delivered as a nasal-spray (FluMist) (http://www.cdc.gov/ flu/protect/keyfacts.htm). Both vaccines have limitations. Whereas cell-mediated responses are increasingly recognized as a major determinant of influenza immunity6–9, traditional, killed influenza
virus vaccines act mainly by inducing neutralizing antibodies. Unfortunately the killed vaccine appears to have suboptimal efficacy in the elderly population (>65 years old)10, which is the same popu lation most prone to morbidity and mortality from seasonal influenza epidemics. In contrast, live attenuated influenza virus vaccine induces both humoral and cellular immunity but its administration remains restricted to healthy children, adolescents and adults (nonpregnant females), ages 2–49. It works better in immunologically naive young children than in adults11,12. Here we illustrate the use of SAVE2 to rationally design live attenuated influenza virus vaccines. The central idea of SAVE is to recode and synthesize a viral genome13 in a way that perfectly preserves the WT amino acid sequence, while rearranging existing synonymous codons to create a suboptimal arrangement of pairs of codons2. For reasons that are not understood, some pairs of codons occur more frequently, and others less frequently, than expected14. This codon-pair bias, which is found in every species examined15, evolves slowly. Yeast and humans have a radically different codon-pair bias, but all mammals share essentially the same codon-pair bias (unpublished results). Codon-pair bias is independent of codon bias. For example, consider the amino acid pair Arg-Glu. As there are six codons for arginine and two for glutamic acid, there are twelve possible codon combinations that encode this pair of amino acids. Taking into account the frequency of the two contributing codons (codon bias), the pair CGC-GAA is expected 2,397 times in the annotated human ORFeome. However, it is instead observed only 268 times (observed/expected = 0.11). This is an infrequently used codon pair. In contrast, the Arg-Glu pair encoded by AGA-GAA is expected 2,644 times, but is observed 4,195 times (observed/expected = 1.59); this is a frequently used codon pair. By whole genome synthesis13,16 we previously recoded poliovirus to contain ‘poor’ (that is, infrequently used) codon pairs, and found that this dramatically attenuated the virus2. Although the mechanism of attenuation is unclear, preliminary evidence suggests that translation is affected2. Attenuation can be ‘titrated’ by adjusting the extent of codon-pair deoptimization2. Because codon-pair deoptimization results from miniscule effects at each of hundreds or thousands of nucleotide
1Department of Molecular Genetics and Microbiology, Stony Brook University, Stony Brook, New York, USA. 2Department of Computer Science, Stony Brook University, Stony Brook, New York, USA. 3Present addresses: Department of Medicine, Division of Infectious Disease, Albert Einstein College of Medicine, Bronx, New York, USA (J.R.C.) and Department of Computer Science, University of Miami, Coral Gables, Florida, USA (D.P.). Correspondence should be addressed to S.M. ([email protected]).
Received 12 March; accepted 21 April; published online 13 June 2010; doi:10.1038/nbt.1636
nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
723
letters Table 1 Characteristics of deoptimized influenza genome segments Gene Deoptimized CPB of CPB of Number of segment coding regiona WT segmentb deoptimized segmentc silent mutations NPMin PB1Min HAMin
125–1426 519–1494 157–1654
0.012 0.007 0.019
−0.421 −0.386 −0.420
314 236 353
mutations (without changing amino acid sequences), reversion to virulence is extremely unlikely2. Aided by computer algorithms2, codon pair–deoptimized viral genomes can be rapidly designed and synthesized, and live virus can be generated by reverse genetics. To attenuate influenza virus, we redesigned large parts of the coding regions of the polymerase subunit B1 (PB1), nucleoprotein (NP) and HA genes of influenza virus A/PR/8/34 (PR8), using our deoptimization computer program2. Along with other viral genes, these genes play important roles in replication and assembly of influenza virus. Without altering either amino acid sequence or codon bias, the program rearranged existing synonymous codons to deoptimize codon pairs. This resulted in hundreds of silent mutations per genome segment without any amino acid changes. The characteristics of the synthetic genome segments and their changes in codon-pair bias are summarized in Table 1 and Supplementary Figure 1. The deoptimized segments were synthesized de novo and cloned into a standard ambisense, eight-plasmid system17,18. To generate influenza viruses carrying one or more deoptimized segments, the plasmids carrying the recoded, synthetic segments, together with the complement of the remaining PR8 WT plasmids, were transfected into susceptible cells. Each deoptimized segment PB1Min, NPMin and HAMin in the background of the complementing seven WT segments yielded a viable virus, as did any combination thereof, including the virus simultaneously containing all three deoptimized segments (PR8-PB1Min/NPMin/HAMin, abbreviated PR83F) (Fig. 1 and data not shown). Several of these new synthetic viruses were analyzed for their in vitro growth characteristics in Madin-Darby canine kidney epithelial (MDCK) cells. All mutant viruses formed plaques that were either indistinguishable from, or only slightly smaller than that of the WT virus (Fig. 1a). The mutant viruses did not grow as well as WT, but typically only to about tenfold lower titers (Fig. 1b). The plaque and growth properties of viruses carrying combinations of synthetic segments other than those depicted in Figure 1 fall in between those of PR8 and PR83F (data not shown). The deoptimized virus PR83F had about the same response to changes in temperature (to either 33 °C or 39.5 °C) as the WT virus (Supplementary Fig. 2).
724
c
m
b
PR PR8 PR8-N P PR8-H min A M 8-P min oc B k 1
a
in
Figure 1 Characterization in tissue culture of 10 synthetic codon pair–deoptimized influenza 9 viruses. (a) Plaque phenotypes on MDCK cells WB: 8 of PR8 WT virus and synthetic PR8 derivatives, 7 α-NP carrying one (NPMin, HAMin or PB1Min), two 6 (NPMin/HAMin or HAMin/PB1Min) or three (PR83F) PR8 NPMin HAMin PB1Min PR8 α-HA 5 PR8-HAMin deoptimized gene segments. (b) Growth PR8-NP/HAMin 4 kinetics of PR8 WT virus and three synthetic 3F α-PB1 PR8 3 PR8 derivatives in MDCK cells after infection 2 with 0.001 MOI of the indicated viruses. All 50 0 10 20 30 40 α-actin NPMin/HAMin HAMin/PB1Min PR83F combinations grew within log 8–9 although Hours post infection only PR8-HAMin, PR8-NP/HAMin and PR83F are shown as compared to WT. (c) Analysis of influenza virus protein expression in infected cells. MDCK cells were infected with PR8 WT virus or synthetic PR8 derivatives, carrying one deoptimized gene segment each (NP Min, HAMin or PB1Min), as indicated. Western blot (WB) analysis of proteins extracted from whole cell lysates was carried out with PB1, NP, HA or actin antibodies. Actin was used to indicate equal protein loading. Virus titer (log10 PFU)
© 2010 Nature America, Inc. All rights reserved.
aNucleotide position within the genome segment that underwent the codon pair–deoptimization algorithm. bOriginal codon-pair bias (CPB) of the corresponding WT sequence. cCPB of the synthetic, codon pair–deoptimized gene segment.
Previously, we found that codon- or codon pair–deoptimized polio viruses had a reduced specific infectivity2,19. This was not the case for deoptimized influenza viruses, as their ratio of plaque-forming units (PFU) to HA units was nearly identical to that of WT virus (data not shown). Our working hypothesis postulates that deoptimization of viral open reading frames reduces protein synthesis, which in turn generates an attenuated virus2. Therefore we used western blot analysis of infected whole-cell lysates to test protein synthesis driven from the deoptimized gene segments in PR8-NPMin, PR8-HAMin and PR8-PB1Min (Fig. 1c). In all three viruses, the deoptimized viral gene product was specifically reduced compared to other proteins from the same virus, or to the same protein in WT-infected cells, or to the actin control in the same cells (Fig. 1c). Protein synthesis from the WT segments in a deoptimized virus was apparently not affected. The exact molecular mechanisms responsible for the reduced protein production remain to be determined. Despite their reasonably robust growth, codon pair–deoptimized influenza viruses proved to be remarkably attenuated in mice (Table 2). Each individual deoptimized segment had a demonstrable attenuating effect, reducing the median lethal dose (the dose killing half the animals; LD50) ~10-, 30- and 500-fold, for PR8-NPMin, PR8-HAMin, and PR8-PB1Min, respectively. Combinations of two deoptimized genes have not been tested, but combining all three attenuating genes into one virus (PR83F) led to a cumulative attenuation of about 13,000-fold (Table 2). To test the codon pair–deoptimized viruses in animals, BALB/c mice were infected intranasally with 104 PFU of PR83F or PR8, and monitored for disease symptoms (ruffled fur, lethargy, weight loss, death). At this dose, mice infected with PR8 WT virus developed severe symptoms with rapid weight loss and did not survive more than 5 d after infection (Fig. 2a). Mice infected with PR83F, on the other hand, experienced no observable symptoms or weight loss, except for a small, transient delay in weight gain as compared to mock-infected animals (Fig. 2a). Live attenuated vaccines in general depend on a limited, yet safe, degree of replication within the host to stimulate the immune system. To assess the replicative potential of our influenza vaccine candidate in an immune competent host, we monitored viral load in the lungs of BALB/c mice infected intranasally with either 103 PFU of PR83F or PR8 WT virus. Within 24 h, WT-infected mice had 3,000-fold higher viral load in their lungs compared to PR83F, leading to death in less than 6 d (Fig. 2b). Conversely, in PR83F-infected animals, amplification of the vaccine virus progressed slowly and peaked at a lower viral load than the WT virus, resulting in a controlled infection with no overt disease symptoms, and virus clearance after 9 d (Fig. 2b). Infection by a sub-lethal dose of WT virus can in principle provoke protective immunity, which is the usual course of natural human
VOLUME 28 NUMBER 7 JULY 2010 nature biotechnology
letters
PR8 (WT) PR8-NPMin PR8-PB1Min PR8-HAMin PR8-NPMin/HAMin/PB1Min (PR83F)
6.1 5.0 3.2 1.7 7.9
PD50 (PFU)b
101
× × × × ×
100 c
~1.0 × n.d.d n.d. n.d. 1.3 × 101
102 104 103 105
aThe
dose required to result in lethal disease in 50% of inoculated mice24. bThe dose of vaccine required to protect 50% of mice with a single vaccination from a challenge infection with 1,000 LD50 of the PR8 WT virus on day 28 post vaccination. cAt the lowest level of inoculum tested (1.0 × 100 PFU), 60% of mice were protected. dNot determined.
a
b
120 110 100 90
PR8 3F PR8 Mock
80 70
† 0
2
4
6
8
10
12
c
After PR83F inoculation After PR8 WT challenge 100 80 60 40 20 0
Challenge virus in lungs (PFU/organ)
Survival (%)
a
Safe & effective vaccine range 0
1
2
3
4
5
6
d Anti-influenza antibody titer (fold serum dilution)
b After PR8 inoculation After PR8 WT challenge 100 80 60 40 20 0
0
1
2
3
4
5
Vaccine dose (log10 PFU)
6
108 107 106 5
10
104 103 102 101
Vaccine dose (log10 PFU)
Survival (%)
© 2010 Nature America, Inc. All rights reserved.
Days post infection
i nfections. The Chinese scholar Li Shizhen described the art of inoculating humans with live smallpox20. This method of smallpox vaccination, practiced in China for centuries, was very dangerous because the difference between the lethal dose and the immunizing protective dose of WT smallpox is very small. To address the issue of safety margin quantitatively with our influenza viruses, we determined the LD50 and the protective dose 50 (the dose providing protective immunity to half the animals; PD50) for PR8 WT virus and also for the attenuated strain, PR83F (Fig. 3). PR8 had a very low PD50 of 1 PFU, which is equivalent to ~40 virus particles when titered on MDCK cells21. The LD50 of PR8 was 61 PFU, resulting in an LD50/PD50 ratio of about 60. This ratio between the LD50 dose and the PD50 dose is the ‘safety margin’ of a given virus if it were to be used as a vaccine. The narrow safety margin of WT (LD50/PD50 = 60) compromises its suitability for use as a vaccine. In contrast, the attenuated virus PR83F had a PD50 of 13 PFU, which although higher than the PD50 of WT, is still very low. The attenuated PR83F had an LD50 of 790,000 PFU and, thus, an LD50/PD50 ratio (safety margin) of 60,000, which is 1,000-fold better than the WT virus (Fig. 3a versus Fig. 3b, shaded areas under the curve). Thus, it is easy to determine a dose of the attenuated virus PR83F that is both safe to administer and effective in provoking immunity (Supplementary Fig. 3). To characterize the protective immunity in more detail, we immunized mice with a single intranasal vaccination of 104 PFU of PR83F. After 28 d, these mice were challenged with 1,000× LD50 of WT PR8. Three days after challenge, PR8 titers in lung homogenates were determined. In 80% of the mice, challenge virus was below the level of detection (suppressed at least 106-fold) (Fig. 3c; open circles), whereas titers of ~107 PFU were found in lungs of mock-vaccinated animals (Fig. 3c; open squares).
Limit of detection 3F
PR8 vaccinated Mock vaccinated
107 106 105 104 103 102 0.01 × LD50 PR83F
0.001 × LD50 PR83F
Unvaccinated 0.01 × LD50 PR8
nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
Virus load (log10 PFU/organ)
LD50 (PFU)a
Virus
Relative body weight (%)
Table 2 LD50 and PD50 of deoptimized influenza viruses
14
16
8
PR8 PR83F
7 6
†
5 4 3 2 1
n.d. (<40)* 0
2
4
6
8
10
Days post infection
Figure 2 Attenuation of codon pair–deoptimized influenza virus PR8 3F in BALB/c mice. (a) Body weight curve after intranasal infection with 10 4 PFU of PR8 WT virus (triangles), 104 PFU of deoptimized PR83F virus (diamonds) or mock-infected (saline; squares). Each time point indicates the average of five mice, with error bars indicating s.d. WT-infected mice did not survive beyond day 5 (indicated by a cross). (b) Virus titer in whole lung homogenate after infection with either 103 PFU of PR8 WT virus (squares) or deoptimized PR83F (circles). Average of three mice per time point. *On day 9 post infection, PR83F was no longer detectable (below 40 PFU/lung).
The mean anti-influenza serum IgG titer in mice immunized with 0.01× LD50 of the respective viruses was 312,500 for PR83F and 27,540 for PR8 (Fig. 3d). At an even lower, and thus safer, vaccine dose of 0.001× LD50, the immune response toward PR83F was nearly unchanged with an antibody titer of 237,500 (Fig. 3d). Thus, at identical doses relative to their respective LD 50, PR83F is a more potent inducer of influenza-specific antibodies than WT. These findings attest to the strong immunizing potential of a low-grade influenza virus infection in general, and to the safety profile of codon pair– deoptimized influenza viruses in particular. Vaccines created in this way encode all viral proteins with the amino acid sequences found in WT viruses. Thus the viruses express the entire WT repertoire of antigenic sites and would have the maximum chance of inducing both cellular and humoral immunity against all epitopes. As attenuation results from hundreds or thousands of nucleo tide changes, the probability of reversion to virulence is extremely low, an advantage over other approaches to making live vaccines. Co-infection of the vaccine recipient with a naturally circulating WT virus could lead to gene reassortment, but this is unlikely to produce variants more virulent than the co-infecting virus. The deoptimized segments of the vaccine strain would ‘poison’ any such reassortant. Figure 3 Immune responses and protection. (a,b) Vaccine margin of safety for PR8 WT and deoptimized PR83F viruses. The left ordinate indicates the percentage of animals surviving the primary inoculation (black squares) with (a) PR83F or (b) WT PR8, at doses ranging between 100 to 106 PFU. After 28 d, the surviving, vaccinated animals were challenged with a single 1,000× LD50 of PR8 WT virus. Disease and survival were monitored (right ordinate; open circles) for (a) PR83F- and (b) PR8-vaccinated mice. (c) Virus load in mouse lungs after PR8 WT virus challenge of PR83F-vaccinated animals. Twenty-eight days after a single intranasal vaccination with 104 PFU PR83F mice were challenged with 1,000 × LD50 of PR8 WT virus. Three days thereafter the level of challenge virus in lung homogenates was determined. (d) ELISA determination of influenzaspecific serum antibodies. Twenty-eight days after a primary infection, serum was collected, and anti-influenza IgG serum titers were determined from animals that had received a primary inoculation of 0.01× LD 50 (black diamonds) or 0.001× LD50 of PR83F (black circles), 0.01× LD50 of PR8 (white squares) or saline (black triangles). ELISA antibody titer against PR8 virus antigen is expressed as the lowest reciprocal serum dilution that resulted in a positive ELISA signal (5 s.d. above background). In c and d each symbol represents the data from one animal.
725
© 2010 Nature America, Inc. All rights reserved.
letters In spite of striking genetic dissimilarities, codon-pair deoptimization has given similar results with both influenza virus and poliovirus2. Poliovirus is a plus-stranded RNA virus whose genome encodes a single polyprotein. Codon-pair deoptimization in the 5′-region of the poliovirus genome disturbs the synthesis of all viral proteins with drastic effects on replication2. In contrast, influenza virus, a minus-stranded RNA virus, has a genome divided into eight segments and its proteins are expressed independently from individual mRNAs. The fact that two such extremely different viruses each are highly sensitive to codon pair deoptimization suggests that the strategy may operate quite generally. As with any new technology, there are issues to resolve before applying this approach for human use. These include our incomplete knowledge of the molecular mechanism(s) involved in attenuation, an in-depth assessment of the genetic stability of the attenuation phenotype and a need to find the right balance between attenuation (for provoking an immune response) and robust replication (for purposes of production). For seasonal epidemics, SAVE may allow a different strategy than the existing live vaccines, such as FluMist. For FluMist, the attenuating mutations (obtained by lengthy selection procedures) map to six viral ‘backbone’ genes, which therefore must be kept constant every year, with only HA and NA being updated annually (6:2 recombinants). Individuals immunized yearly could develop immunity against ‘backbone’ proteins, limiting the replication and thus the efficacy of the vaccine (possibly helping to explain the relatively poor efficacy of FluMist in previously immunized army personnel22). In contrast, use of SAVE could attenuate the entire genome of an impending seasonal or pandemic strain, providing a perfect antigenic match between the vaccine and target virus. However, this approach would require an appropriate regulatory environment that might, for instance, approve a method of attenuating each new strain to a certain standardized degree with a standardized change in codon-pair bias (just as the current inactivated virus vaccine uses a standardized method of inactivation). To attenuate in a standardized and predictable way will require many more experiments to exhaustively explore the variables of SAVE. The recoded influenza viruses described here present a useful paradigm for vaccines, as attenuation is the result of hundreds or thousands of nucleotide changes without the change of a single amino acid. The attenuated phenotype results from large-scale rearrangements of existing synonymous codons, producing under-represented pairs of codons2. Considering (i) the expected high genetic stability of the attenuating genetic changes (“attenuation by a thousand cuts”23), (ii) the possibility of systematically designing such viruses, (iii) the favorable growth kinetics in tissue culture (108 PFU/ml), (iv) the small protective dose of ‘deoptimized’ influenza viruses, (v) the efficacy of PR83F as revealed in challenge experiments and, particularly, (vi) the wide safety margin, the SAVE technology sets the stage for making efficient live attenuated influenza vaccines. Although it is difficult to extrapolate to human use, in our system 10 ml of culture supernatant apparently contains enough virus to vaccinate and protect ~1 million mice with a single vaccination of 100 PD50 doses of PR83F (Fig. 3a and Supplementary Fig. 3). Vaccines based on changes in codon-pair bias could be generated within weeks for any emerging influenza virus once its genome sequence is known, although of course a further period of testing would be required before the vaccine could be used. The margin of safety appears to be unusually high. This new strategy will be applicable to the rapid development of human vaccines against seasonal flu epidemics and pandemics. Methods Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturebiotechnology/. 726
Note: Supplementary information is available on the Nature Biotechnology website. Acknowledgments The authors are indebted to A. García-Sastre and P. Palese for sharing with us their 8-plasmid system for PR8, antibodies, and information. We thank A. Paul and J. Cello for comments on the manuscript. Supported by National Institutes of Health grants AI075219 and AI15122 (EW) and a TRO-Fusion Award by Stony Brook University (S.M. and S.S.). AUTHOR CONTRIBUTIONS S.M. designed and carried out the majority of the study and wrote the paper. J.R.C. and A.N. carried out experiments contributing to Figures 1c and 2. D.P., C.B.W. and S.S. developed computer design and analysis algorithms. B.F. and E.W. contributed to the design of the study and writing of the paper. COMPETING FINANCIAL INTERESTS The authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/naturebiotechnology/. Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/. 1. Salomon, R. & Webster, R.G. The influenza virus enigma. Cell 136, 402–410 (2009). 2. Coleman, J.R. et al. Virus attenuation by genome-scale changes in codon-pair bias. Science 320, 1784–1787 (2008). 3. Palese, P. & Shaw, M.L. in Field’s Virology, vol. 2 (eds. D.M. Knipe et al.) 1647–1689, (Lippincott Williams & Wilkins, Philadelphia, 2007). 4. Steinhauer, D.A. & Skehel, J.J. Genetics of influenza viruses. Annu. Rev. Genet. 36, 305–332 (2002). 5. Maassab, H.F. Adaptation and growth characteristics of influenza virus at 25 degrees c. Nature 213, 612–614 (1967). 6. Rimmelzwaan, G.F., Fouchier, R.A. & Osterhaus, A.D. Influenza virus-specific cytotoxic T lymphocytes: a correlate of protection and a basis for vaccine development. Curr. Opin. Biotechnol. 18, 529–536 (2007). 7. Sant, A.J. et al. Immunodominance in CD4 T-cell responses: implications for immune responses to influenza virus and for vaccine design. Expert Rev. Vaccines 6, 357–368 (2007). 8. Hikono, H. et al. T-cell memory and recall responses to respiratory virus infections. Immunol. Rev. 211, 119–132 (2006). 9. Swain, S.L. et al. CD4+ T-cell memory: generation and multi-faceted roles for CD4+ T cells in protective immunity to influenza. Immunol. Rev. 211, 8–22 (2006). 10. Simonsen, L. et al. Impact of influenza vaccination on seasonal mortality in the US elderly population. Arch. Intern. Med. 165, 265–272 (2005). 11. Belshe, R.B., Van Voris, L.P., Bartram, J. & Crookshanks, F.K. Live attenuated influenza A virus vaccines in children: results of a field trial. J. Infect. Dis. 150, 834–840 (1984). 12. Belshe, R.B. et al. The efficacy of live attenuated, cold-adapted, trivalent, intranasal influenzavirus vaccine in children. N. Engl. J. Med. 338, 1405–1412 (1998). 13. Cello, J., Paul, A.V. & Wimmer, E. Chemical synthesis of poliovirus cDNA: generation of infectious virus in the absence of natural template. Science 297, 1016–1018 (2002). 14. Gutman, G.A. & Hatfield, G.W. Nonrandom utilization of codon pairs in Escherichia coli. Proc. Natl. Acad. Sci. USA 86, 3699–3703 (1989). 15. Moura, G. et al. Large scale comparative codon-pair context analysis unveils general rules that fine-tune evolution of mRNA primary structure. PLoS ONE 2, e847 (2007). 16. Wimmer, E., Mueller, S., Tumpey, T.M. & Taubenberger, J.K. Synthetic viruses: a new opportunity to understand and prevent viral disease. Nat. Biotechnol. 27, 1163–1172 (2009). 17. Hoffmann, E., Neumann, G., Kawaoka, Y., Hobom, G. & Webster, R.G.A. DNA transfection system for generation of influenza A virus from eight plasmids. Proc. Natl. Acad. Sci. USA 97, 6108–6113 (2000). 18. Schickli, J.H. et al. Plasmid-only rescue of influenza A virus vaccine candidates. Phil. Trans. R. Soc. Lond. B 356, 1965–1973 (2001). 19. Mueller, S., Papamichail, D., Coleman, J.R., Skiena, S. & Wimmer, E. Reduction of the rate of poliovirus protein synthesis through large-scale codon deoptimization causes attenuation of viral virulence by lowering specific infectivity. J. Virol. 80, 9687–9696 (2006). 20. Shizhen, L. Compendium of Materia Medica (Bencao Gangmu, 1593) (Foreign Language Press, 2004). 21. Hutchinson, E.C., Curran, M.D., Read, E.K., Gog, J.R. & Digard, P. Mutational analysis of cis-acting RNA signals in segment 7 of influenza A virus. J. Virol. 82, 11869–11879 (2008). 22. Wang, Z., Tobler, S., Roayaei, J. & Eick, A. Live attenuated or inactivated influenza vaccines and medical encounters for respiratory illnesses among US military personnel. J. Am. Med. Assoc. 301, 945–953 (2009). 23. Coffin, J.M. Attenuation by a thousand cuts. N. Engl. J. Med. 359, 2283–2285 (2008). 24. Reed, L.J. & Muench, M. A simple method for estimating fifty percent endpoints. Am. J. Hyg. 27, 493–497 (1938).
VOLUME 28 NUMBER 7 JULY 2010 nature biotechnology
ONLINE METHODS
© 2010 Nature America, Inc. All rights reserved.
Cell lines. 293T and MDCK cells were obtained from the American Type Culture Collection (ATCC). Cells were grown in Dulbecco’s modified Eagle’s medium (Invitrogen), supplemented with 10% FBS (HyClone) and penicillinstreptomycin (Invitrogen). Reverse genetics and generation of synthetic influenza viruses. All synthetic influenza viruses used here are based on the strain A/PR/8/34 (Mt. Sinai variant, short PR8). The reference sequences of the eight gene segments for this strain are available under GenBank accession numbers AF389115, AF389116, AF389117, AF389118, AF389119, AF389120, AF389121, AF389122). An eightplasmid ambisense system for this strain cloned in the vector pDZ25 was kindly made available by Peter Palese and Adolfo García-Sastre (Mt. Sinai School of Medicine). Codon pair–deoptimized genome segments were designed using a computer algorithm previously described2. Coding regions of the segments PB1, HA and NP were targeted to be recoded. A minimum of 120 nucleotides at either segment terminus were left unaltered because these sequences contain packaging signals3. All of the codon changes were synonymous; that is, none of them introduced any amino acid changes. The precise extent of recoded sequence for each target segment and the resulting changes in codon-pair bias are summarized in Table 1. The recoded gene segments were synthesized de novo (Mr. Gene) and introduced into WT plasmids to replace the respective WT counterpart sequence. Codon pair–deoptimized viruses were derived by DNA transfection of either one, or combinations of two, or three deoptimized segments together with the remaining WT plasmids. For this purpose a total of 2 μg plasmid DNA (250 ng of each of 8 plasmids) was transfected into co-cultures of 293T and MDCK cells in 35-mm dishes using Lipofectamine 2000 (Invitrogen) according to manufacturers recommendations. After 6 h of incubation at 37 °C, the serum free Opti-MEM containing the transfection mix was replaced with DMEM containing 0.2% Bovine Serum Albumin (BSA). After a further 24 h of incubation, 1 μg/ml TPCK-Trypsin was added to the dishes. Two days thereafter virus containing cell supernatants were collected and amplified on MDCK cells. In vitro growth characteristics and titration of synthetic influenza viruses. The growth characteristics of codon pair–deoptimized synthetic viruses were analyzed by infecting confluent monolayers of MDCK cells in 100-mm dishes with 0.001 multiplicities of infection (MOI). Infected cells were incubated at 37 °C in DMEM, containing 0.2% BSA and 2 μg/ml TPCK-Trypsin (Pierce). At the given time points 200 μl of supernatant was removed and stored at −80 °C until titration. Viral titers and plaque phenotypes were determined by plaque assay on confluent monolayers of MDCK cells in 35-mm six-well plates using a semisolid overlay of 0.6% tragacanth gum (Sigma-Aldrich) in minimal Eagle medium (MEM) containing 0.2% BSA and 4 μg/ml TPCKtrypsin. After 72 h of incubation at 37 °C plaques were visualized by staining the wells with crystal violet. Mouse pathogenicity, in vivo virus replication and vaccination. A minimum of five BALB/c mice (5–6-weeks-old) per group were infected once by intranasal inoculation with doses ranging from 100 to 106 PFU of PR83F or of PR8 WT virus. Inoculum virus was diluted in 25 μl PBS and administered evenly into both nostrils. A control group of 5 mice was inoculated with PBS only (mock). Venous blood from the tail vein was collected from all animals before initial infection for subsequent determination of pre-vaccination antibody titers.
doi:10.1038/nbt.1636
Morbidity and mortality (weight loss, reduced activity, death) was monitored. LD50 of the WT virus and the vaccine candidates was calculated as described24. Mice experiencing severe disease symptoms (rapid, excessive weight loss over 25%) were euthanized and scored as a lethal outcome. For vaccination experiments mice were infected as above. Twenty-eight days after the initial infection (vaccination), venous blood from the tail vein was drawn for subsequent determination of post-vaccination antibody titers. The mice were then challenged with 105 PFU of the PR8 WT virus corresponding to >1,000 times the LD50. Mortality and morbidity (weight loss, reduced activity, death) was monitored. The PD50 of codon pair–deoptimized PR83F versus that of the PR8 was determined as the dose required to protect 50% of mice from a challenge with 1,000× LD50 of the WT virus, 28 d after a single inoculation with the vaccine virus. To assess virus replication in the lungs of infected animals, BALB/c mice were infected intranasally with 103 PFU of either PR8 or PR83F. At 1, 3, 5, 7 and 9 d after infection the lungs of three mice each were collected (WT infected mice did not survive beyond day 6). Lungs were homogenized in 1 ml of PBS and the virus load per organ was determined by plaque assay on MDCK cells, as described above. Similarly, replication of PR8 challenge virus in lungs of PR83F-vaccinated mice were determined. Twenty-eight days after a single intranasal inoculation with 104 PFU of PR83F or saline, five animals each were challenged with 1,000× LD50 of PR8. Three days after challenge (the usual peak of viral replication), lungs were processed and viral load per organ determined as described above. Western blot analysis of influenza virus proteins in infected cells. MDCK cells were infected with virus at an MOI of 5 and incubated for 4 h at 37 °C. Subsequently, the cells were harvested in Laemmli buffer. The proteins were resolved by SDS-PAGE, analyzed by western blots with α-HA, α-PB1 and α-actin (Santa Cruz Biotechnology), and α-NP (generous gift from Peter Palese) and HRP-conjugated secondary antibodies, and detected by autoradiography. Enzyme-linked immunosorbent assay (ELISA) determination of influenza-specific serum IgG-antibodies after vaccination. Nunc Maxisorp ELISA 96-well plates were coated over night with 100 ng purified influenza PR8 virus in 100 μl PBS followed by blocking with 100 μl 1% BSA in PBS. Serial fivefold dilutions in PBS/1% BSA of mouse sera obtained before and 28 d after a single intranasal vaccination were incubated for 2 h at 22 °C. Mice were previously vaccinated with ~0.01 or 0.001× LD50 of PR83F (103 PFU or 104 PFU, respectively), 0.01× LD50 of PR8 WT (100 PFU) or mock vaccinated. After four washes with PBS the wells were incubated with 1:500 of anti mouse-alkaline phosphatase conjugated secondary anti-mouse IgG antibody (Santa Cruz) for another 2 h at 22 °C. After four washes with PBS and brief rinsing with distilled water 100 μl of a chromatogenic substrate solution containing 9 mg/ml p-nitrophenylphosphate in 200 mM diethanolamine, 1 mM MgCl2, pH 9.8 was added. The color reaction was stopped by addition of an equal volume of 500 mM NaOH. Absorbance at 405 nm was read using a Molecular Devices ELISA reader. The endpoint antibody titer was defined as the highest dilution of serum that gave a signal >5 s.d. above background. Background level was determined from wells processed identically to experimental samples, in the absence of any mouse serum.
25. Quinlivan, M. et al. Attenuation of equine influenza viruses through truncations of the NS1 protein. J. Virol. 79, 8431–8439 (2005).
nature biotechnology
letters
Pairwise agonist scanning predicts cellular signaling responses to combinatorial stimuli
© 2010 Nature America, Inc. All rights reserved.
Manash S Chatterjee1, Jeremy E Purvis2, Lawrence F Brass3 & Scott L Diamond1,2 Prediction of cellular response to multiple stimuli is central to evaluating patient-specific clinical status and to basic understanding of cell biology. Cross-talk between signaling pathways cannot be predicted by studying them in isolation and the combinatorial complexity of multiple agonists acting together prohibits an exhaustive exploration of the complete experimental space. Here we describe pairwise agonist scanning (PAS), a strategy that trains a neural network model based on measurements of cellular responses to individual and all pairwise combinations of input signals. We apply PAS to predict calcium signaling responses of human platelets in EDTA-treated plasma to six different agonists (ADP, convulxin, U46619, SFLLRN, AYPGKF and PGE2) at three concentrations (0.1, 1 and 10 × EC50). The model predicted responses to sequentially added agonists, to ternary combinations of agonists and to 45 different combinations of four to six agonists (R = 0.88). Furthermore, we use PAS to distinguish between the phenotypic responses of platelets from ten donors. Training neural networks with pairs of stimuli across the dose-response regime represents an efficient approach for predicting complex signal integration in a patient-specific disease milieu. Because cells produce integrated responses to dose-dependent combinations of numerous external signals, efficient methods are needed to survey such high-dimensional systems. Primary human tissues such as blood, marrow or biopsies provided a limited number of cells, generally allowing only ~102 or fewer phenotypic tests. Evaluating the cellular response to pairs of stimuli offers a direct and rapid sampling of a response space that can be built-up into a higher level predictive tool through the use of neural networks. Such methods are needed to better phenotype platelets to predict cardiovascular risk. Platelets are cells that respond in a donor-specific manner to multiple signals in vivo, and their activation in response to thrombotic signals is central to the thrombotic risks and events surrounding 1.74 million heart attacks and strokes, 1.115 million angiograms and 0.652 million stent placements in the United States each year1. Moreover, platelets are ideal ‘reduced’ cellular systems for quantifying the effects of multiple signaling pathways because they are anucleate, easily obtained from donors and amenable to automated liquid handling.
During clotting, platelets experience diverse signaling cues simultaneously. Collagen activates glycoprotein VI (GPVI)-dependent tyrosine kinase signaling. ADP is released from dense granules to activate the G protein–coupled receptors P2Y1 and P2Y12. Thromboxane A2 (TxA2) is synthesized by platelet cyclooxygenase (COX)1 and binds thromboxane-prostanoid (TP) receptors. Tissue factor at the damaged vasculature leads to the production of thrombin, which cleaves the protease-activated receptors PAR1 and PAR4. These activating signals occur in the context of inhibitory signals from endothelial nitric oxide and prostacyclin (PGI 2). Platelets receive these signaling events simultaneously in vivo, and platelet signaling varies spatially and temporally in growing thrombi2, but few experimental or computational tools are available for building a global understanding of how the platelet integrates multiple stimuli present at varying levels. To predict cellular responses to multiple stimuli, we developed PAS (Fig. 1). This strategy involves selecting stimuli molecules based on prior knowledge (Fig. 1a), measuring cellular responses to all pairwise combinations of stimuli in a high-throughput manner (Fig. 1b), and then training a two-layer, nonlinear, autoregressive neural network with the cellular responses to exogenous inputs (Fig. 1c). Neural networks are remarkable in learning patterns of inputs and predicting outputs by optimizing intermediate connection weights, akin to a platelet’s ability to respond to multiple thrombotic signals through coupled biochemical reactions. Motivated by the notion that a living cell is essentially a neural network whose connection weights have been selectively adjusted during evolution3, we took a ‘top-down’ approach4 to model platelet signaling. The application of neural networks for predicting dynamic cellular signaling is beneficial because neural networks are ‘dense’ modeling structures—meaning that they do not require detailed knowledge of the kinetic structure of a system. By comparison, an ordinary differential equation model of ADP-stimulated calcium mobilization through P2Y1 required almost 80 reactions and over 100 kinetic parameters to describe just this one single pathway5. We estimate that an ordinary differential equation model that describes the signaling mechanisms of the six agonists (Fig. 1a) in this study on a similar level of detail would require >500 parameters, many of which are currently unavailable.
1Institute
for Medicine and Engineering, Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, Pennsylvania, USA. for Medicine and Engineering, Genomics and Computational Biology, University of Pennsylvania, Philadelphia, Pennsylvania, USA. 3Institute for Medicine and Engineering, Department of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA. Correspondence should be addressed to S.L.D. ([email protected]).
2Institute
Received 18 October 2009; accepted 6 May 2010; published online 20 June 2010; doi:10.1038/nbt.1642
nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
727
letters release7,8 as well as TxA2 formation9 and (iv) inhibiting integrin-mediated signaling CVX 2 downstream of Ca2+ release10. The operaCombinatorial dispense P Y P Y PAR4 PAR1 TP EP1-4 IP GPVI tional advantages of using EDTA, however, prevent prediction of important physiologic FcRγ PGE2 Agonist plate G G AYPGKF phenomena like granule release, integrin G Syk SFLLRN activation and outside-in signaling. U46619 PLC-β AC To test whether the intracellular Ca2+ signal ADP PLC-γ Platelet IP 384-well plate CVX detected was being influenced by endogenously cAMP activation released agonists, we studied the effects of 2+ Donor Ca dye Ca 2 units/ml apyrase (which hydrolyzes released Processing Platelet-rich ADP) or 15 μM indomethacin (which inhibits c layers plasma production of TxA2). Both of these inhibiInput signal tors had no effect on individual responses 2+ Output layer Time-series [Ca ] i profile PGE2 (Supplementary Fig. 2 and Supplementary 3 AYPGKF 50 Calculate next Tables 1 and 2), suggesting that endogenous time point SFLLRN 2 100 autocrine activators have no effect on the Ca2+ U46619 150 1 signal. This confirms that the resulting traces of 200 ADP 0 100 200 Assay conditions Ca2+ are directly dependent only on receptorCVX Time (s) Time mediated release from intracellular stores. S ck Signal feedback We applied the PAS method by first mea suring platelet responses to all 135 pairwise combinations of low (0.1 × EC50), moderFigure 1 Experimental and computational methods to study platelet signaling. (a) Signaling ate (1 × EC50) and high (10 × EC50) agonist pathways in platelets converge on intracellular calcium release. (b) High-throughput experimental concentrations (Fig. 2a). Then, we trained a procedure. An agonist plate containing combinatorial agonist combinations and a platelet plate containing dye-loaded platelets were separately assembled. Agonists were dispensed onto platelet neural network model on 154 time-course suspensions and fluorescence changes were measured to quantify platelet calcium concentrations traces (135 pairwise responses, 18 single[Ca2+]i. [Ca2+]i transients can be represented as overlapping plots (lower right) or parallel heat agonist responses, 1 null control response). maps (lower left). RFU, relative fluorescent units. (c) Dynamic neural network used to train We defined a pairwise agonist synergy score platelet response to combinatorial agonist activation. A constant sequence of input signals (agonist (Sij) to be the scaled difference between the concentrations) is introduced to the two-layer, 12-node network at each time point. Processing integrated transient (area under the curve) for layers integrate input values with feedback signals to predict [Ca 2+]i at the next time point. the combined response and the integrated area for the individual responses (Fig. 2b) (Sij > 0, We selected six major agonists of human platelets—convulxin synergism; Sij = 0, additivity; Sij < 0, antagonism). The trained net(CVX; GPVI activator), ADP, the thromboxane analog U46619, work accurately reproduced the time-course behavior (R = 0.968 PAR1 agonist peptide (SFLLRN), PAR4 agonist peptide (AYPGKF) for correlation between time points) and the pairwise agonist synand prostaglandin E2 (PGE2) (activator of the prostacyclin receptor ergy (R = 0.884) for correlation between Sij scores (Fig. 2a,b and IP and the E series prostanoid receptors EP 1-4). These agonists Supplementary Fig. 3). activate platelet signaling pathways that converge on the release of As an initial test of the trained network, we predicted the response of intracellular calcium (Ca2+) (Fig. 1a), which we measured using a platelets to all 64 ternary combinations of the agonists ADP, SFLLRN fluorescent calcium dye. Calcium mobilization is critical to physio and CVX at 0, 0.1, 1 and 10 × EC50 concentrations and compared the logically important platelet responses needed for aggregation and predictions to experimentally measured responses (Fig. 3a). A CVX clotting, including granule release, exposure of phosphatidylser- response requires GPVI multimerization11 and is characterized by a ine, actin polymerization, shape change and integrin activation 6. slow rise to a large peak signal followed by a slow decline. Gq-coupled To determine appropriate dynamic ranges and the effective con- responses (ADP or SFLLRN) produce rapid bursts that are quickly centration for half-maximum response (EC50) values for the six brought down to baseline. Increasing CVX for a fixed ADP level agonists, we first tested each compound individually to determine resulted in a steady increase in Ca2+ on longer timescales. In contrast, dose-response relationships (Supplementary Fig. 1). The inhibi- increasing ADP for a fixed CVX level bolstered early Ca2+ release. tory response of PGE 2 was studied by concomitantly stimulating A moderate dose of both ADP and CVX (for 0 and low SFLLRN) produced a response that almost instantaneously plateaued at a steady the platelet with 60 μM SFLLRN. To eliminate the sensitivity of cells to confounding autocrine level above baseline. Both the time-course behavior (R = 0.844) and effects of soluble mediators that are dependent on platelet concen- ternary agonist synergy scores (R = 0.881) (Supplementary Fig. 4) trations and transport processes, we conducted all experiments in were accurately reproduced for the 27 unique ternary conditions in 5 mM EDTA, which chelates extracellular calcium. The removal of this experiment that were not present in the training set. To fully test and utilize the predictive power of the neural netexternal calcium does not affect the ability of the studied receptors to signal, as no appreciable difference in EC50s were noted with or work, we made in silico time-course and synergy predictions for the without external calcium (Supplementary Fig. 1a,b). Although this complete six-dimensional agonist space consisting of 4,077 unique experimental design does not capture the contribution of store- agonist combinations of two to six agonists at 0.1, 1 or 10 × EC 50 operated calcium entry, it offers several operational advantages by concentrations (Supplementary Fig. 5). Based on these predictions, (i) lowering background fluorescence without extensive platelet we selected 45 combinations of four, five or six agonists that diswashing, (ii) preventing thrombin production, (iii) inhibiting granule played a range of predicted synergy scores from synergy to strong
a
U46619 AYPGKF SFLLRN
ADP
2 1
q
i
PGE
b
2 12
s
3
© 2010 Nature America, Inc. All rights reserved.
728
RFU
Time (s)
2+
VOLUME 28 NUMBER 7 JULY 2010 nature biotechnology
letters
nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
Platelet response Min Max 2+ Intracellular [Ca ]
Input
NN prediction
50 10 0 15 0 20 0 25 0
Experiment
50 10 0 15 0 20 0 25 0 0
0
PGE2 AYPGKF SFLLRN U46619 ADP CVX
0 0.1 1 10 × EC50
Time (s)
ADP
U46619
SFLLRN
PGE2
ADP
NN prediction
U46619
SFLLRN
Experiment
AYPGKF
b
Pairwise synergy ∫AB − (∫A + ∫B) ∫AB ≡ Score (Sij) ∫A max|∫AB − (∫A + ∫B)| ∫B
AYPGKF
CVX ADP U46619 SFLLRN AYPGKF
Dose × EC50 10 × 1× 0.1 × 0.1 × 1× 10 ×
antagonism and tested them experimentally in addition to no agonist and 18 single-agonist controls (Fig. 3b). To prevent any bias in the selection, we picked conditions that had maximal dissimilarity in the types and concentrations of agonists. We found strong agreement between both predicted and measured transient shapes (R = 0.845) (Fig. 3b and Supplementary Fig. 6a), as well as between predicted and measured Sij scores (R = 0.883, slope = 1.08) (Fig. 3c). For comparison, the full distribution of synergy predictions for all 4,077 agonist combinations is shown as a vertical heat map in Figure 3c. To investigate whether smaller subsets of inputs, such as dominant pairs, could account for the network’s predictive accuracy, we retrained the neural network on different subsets of inputs. This typically, and almost always, reduced predictive accuracy (Supplementary Fig. 6b), suggesting that the neural network does not exclusively rely on smaller subsets of input. Conditions containing high levels of all agonists showed especially low synergy due to saturation of Ca2+ release. The highest synergy was observed for agonist combinations that contained high levels of the thromboxane analog U46619 with no PGE2 present (Fig. 3c, orange bar). Given that only 8 of 45 conditions had maximal U46619/PGE2 ratio, this ordering of the top three conditions was highly significant (P < 0.004), considering there are 14,190 possible ways to order the first three conditions of which only 56 combinations would contain high U46619 and low PGE2. Thus, the neural network model trained on pairwise data facilitated discovery of a high-dimensional synergy that occurs at a high U46619/PGE2 ratio (at low levels of ADP, SFLLRN and submaximal levels of AYPGKF) consistent with the known cardiovascular risks of COX2 inhibitors that prevent endothelial production of PGI2 without affecting platelet production of thromboxane12. This points to a ‘high-dimensional’ COX2 inhibition risk of high concentrations of thromboxane, in the absence of PGI2, potentiating the effects of other agonists. We also explored the effect of adding the agonists ADP, SFLLRN and CVX in various sequential combinations (Fig. 3d). Several notable behaviors were accurately predicted by the neural network model despite the network being trained on purely synchronous interactions. For instance, the temporal sequence ADP-SFLLRN-CVX (Fig. 3d, panel 1) produced three distinct Ca 2+ bursts, whereas the ADP response was completely abolished in the sequence SFLLRN-ADP-CVX (Fig. 3d, panel 3). This behavior points to mechanisms of cross-downregulation of ADP signaling by component(s) of the PAR1 cascade. (See Supplementary Discussion and Supplementary Fig. 7 for tests with thrombin compared to SFLLRN+AYPGKF). To investigate the reproducibility of the PAS procedure and to investigate the potential for using it to stratify individuals’ platelet responses, we performed PAS twice in a 2-week period for ten healthy male donors (Fig. 4). The 135 conditions containing pairs of agonists in a single PAS experiment make up the synergy map for each donor experiment (Supplementary Fig. 8) and individual columns of the synergy matrix (Fig. 4). The standard errors in synergy scores across all 135 conditions were uncorrelated with the magnitude of
a
PGE2
© 2010 Nature America, Inc. All rights reserved.
Figure 2 PAS. (a) All 154 binary combinations of the agonists CVX, ADP, U46619, SFLLRN, AYPGKF and PGE2 at concentrations of 0, 0.1, 1 and 10 × EC50 were combined on the same plate (in replicates of 2) and the dynamic response of the platelet to each combination was recorded. The neural network model was trained on this dataset. (b) Pairwise agonist synergy scores, which reflect the gain or loss in calcium response due to agonist cross-talk, were calculated for both experimental and predicted time-course traces. EC50: PGE2, 24.6 μM; AYPGKF, 112 μM; SFLLRN, 15.2 μM; U46619, 1.19 μM; ADP, 1.17 μM; CVX, 0.00534 μM.
1 Normalized synergy 0 –1
synergy and are measures of the experimental uncertainty and dayto-day fluctuations in mean synergy values at these conditions. The mean uncertainty for a representative donor (donor A) was ±0.0523 for Sij ranging from –1 to 1 (uncertainties across all 135 conditions are shown in Supplementary Fig. 9). The mean standard error in synergy scores for all ten donors ranged from ±0.0347 to ±0.0627 (Supplementary Table 3). We generated a hierarchical cluster tree using the Euclidean distances between donor experiments. Seven of the ten donor pair vectors (donor pairs D, C, A, H, E, F and I) self-clustered, demonstrating that despite variation between samples from the same donor, pronounced inter-donor variations allow us to distinguish donors. This pattern of clustering was found to be highly significant (P < 8 × 10−7) by randomizing observed donor synergies (Supplementary Fig. 10). The observed pattern of self-clustering was platelet signaling dependent (and not related to donor plasma), as the PAS scans of an individual donor’s platelets with autologous or heterologous plasma self-clustered (Supplementary Fig. 11). In general, across all conditions and donors, the highest probability of pairwise synergy was observed when moderate doses of both agonists were used. Low doses of both agonists produced additive responses, whereas high doses of both agonists skewed synergy distributions toward antagonism (Supplementary Fig. 12). Donors separated into at least two major subgroups with the cluster of donor experiments D1, D2, J2, C1, C2, B1 and B2 characterized by relative lack of synergy in comparison to other experiments. The cluster of experiments A1, A2, H1, H2, J1, E1, E2, F1, F2, G1, I1, I2 and G2 had marked synergy between moderate doses of SFLLRN and all doses of U46619 or ADP, as 729
letters
∫Xall − Σ ∫Xi
Synergy ≡
+0.25
d
+0.25
Data points
Experiment SFLLRN 1.0 ADP CVX
(1)
ADP
0
0
25
0
20
15
10
0
NN prediction CVX
SFLLRN
(2)
SFLLRN
CVX
(3)
ADP
Best linear fit
0.5 0
0
0 −0.25
−0.50
−0.50
−0.75
−0.75
−1.00
−1.00 −1.00 −0.75 −0.50 −0.25
0
+0.25 Min
Measured synergy
Max Density
1.0 Fraction of maximum [Ca2+]i
−0.25
45 experimental conditions PGE2 AYPGKF SFLLRN U46619 ADP CVX +0.5 0 −0.5 −1
50
0
0
25
0
20
15
Time (s) Predicted synergy distribution (n = 4,077)
max|∫Xall − Σ ∫Xi|
0
NN prediction
Experiment
0
0
25 0
0
20
0
50
0
20 0
NN prediction
25
15 0
10 0
0
50
Experiment
PGE2 AYPGKF SFLLRN U46619 ADP CVX
Intracellular [Ca2+]
Platelet response
Time (s)
c
NN predicted synergy
Input
10 0
Max
15
ADP CVX SFLLRN
0 0.1 1 10 × EC50
© 2010 Nature America, Inc. All rights reserved.
b
Platelet response Min
50
Input
10 0
a
ADP
(4)
CVX
ADP SFLLRN
(5)
CVX SFLLRN
(6) ADP
0.5 0
1.0
ADP/SFLLRN (7) Buffer CVX
ADP/CVX (8) SFLLRN Buffer
SFLLRN/CVX (9) ADP Buffer
0.5 0
1.0 – Synergy + Synergy
SFLLRN CVX
ADP/SFLLRN/CVX (10) Buffer Buffer
(11) ADP (low)
0.5
(12) ADP (low) ADP (high) Buffer
Buffer Buffer
0 0 Normalized synergy measurements
5
10
15 0
5 10 Time (min)
15 0
5
10
15
Figure 3 Neural network model reveals the global platelet response to all agonist combinations. (a) Measurement and prediction of the platelet response to all 64 ternary combinations of ADP, SFLLRN and CVX at 0, 0.1, 1 and 10 × EC 50. The neural network model was trained only on pairwise interactions but successfully predicted ternary interactions. (b) Measurement and prediction of the platelet response to 45 predictions in the full combinatorial agonist space. (c) Predicted versus measured synergy scores for the 45 conditions in b (upper left). Distribution of synergy scores for all 4,077 possible experimental conditions (upper right). Experimental conditions for the 45 sampled combinations of agonists, arranged in order of increasing synergy (bottom). The orange bar denotes the three most highly synergistic conditions, which all contained high U46619, no PGE 2 and low levels of other agonists. (d) Measured and predicted platelet responses to sequential additions of ADP, SFLLRN and CVX.
well as marked synergy for moderate U46619 and high CVX. All donors showed some synergism between low and moderate doses of SFLLRN and U46619. We also typically observed synergy between AYPGKF and U46619. Moreover, synergistic or additive interactions were noted also between low and moderate doses of SFLLRN and AYPGKF. These results suggested a mechanism of 730
synergy between thrombin and thromboxane. To test this, binary synergy maps of the physiological agonist thrombin and U46619 were constructed for donors A and E (Supplementary Fig. 13) over seven doses spanning the active concentration ranges. To our knowledge, this is the first report of conserved synergy between thrombin and thromboxane mimetics. VOLUME 28 NUMBER 7 JULY 2010 nature biotechnology
VX
C
9
P
AD
2
N
R
61
LL
46
U
SF
E
PG
PG
nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
KF
model training (Supplementary Fig. 14 and Supplementary Discussion). In general, knowledge of pairwise interactions alone –1 0 +1 cannot be expected to predict response to D1 D2 J2 C1 C2 B1 B2 A1 A2 H1 H2 J1 E1 E2 F1 F2 G1 I1 I2 G2 several simultaneously present stimuli (>2). 0 0.1 1 10 × EC50 Synergy score However, certain characteristics of platelets and the conditions under which they were studied made such an approach feasible in this instance. These include (i) the relative abundance of binary interactions in signaling systems with minimized ternary interactions (Supplementary Fig. 14)19; (ii) the efficient utilization of system history (Supplementary Fig. 15); (iii) the dense sampling of inter actions across a full dose-response range; (iv) known intracellular wiring that rapidly converges on Ca2+, without the possibility of higher order effects from genetic regulation or other interactions on long time scales; and (v) choice of well-characterized extracellular ligands and careful design to avoid autocataD1 D2 J2 C1 C2 B1 B2 A1 A2 H1 H2 J1 E1 E2 F1 F2 G1 I1 I2 G2 lytic feedback. Further, application of PAS to stimuli Figure 4 Donor-specific synergy maps. Ten healthy donors were phenotyped for platelet calcium including epinephrine, soluble CD40L, seroresponse to all pairwise agonist combinations. Repeat experiments were conducted within 2 weeks. tonin and nitric oxide would map a major Donors (ages, 22–30 years) spanned several ethnic groups (three Western Europeans, two Asians, portion of the entire platelet response space. two Indians, one Caribbean, one African American and one African). The magnitudes of synergy in each of the 20 donor-specific synergy maps were arranged as columns of the synergy matrix. These The use of PAS with orthogonal pharmacovectors were clustered according to similarity using a distance-based clustering algorithm. logical agents (indomethacin, P2Y12 inhibitors, selective PAR antagonists, quanylate cyclase or adenylate cyclase inhibitors) Studying the combinatorial effects of pairs of agonists in low, would allow further assessment of individual clinical risk or sensimoderate and high concentrations allowed a rapid, donor-specific tivity to therapy. The PAS method demonstrates that sampling all phenotypic scan that was predictive of responses to multiple agonists. dual orthogonal ‘axes’ (every agonist pair) can successfully predict Importantly, a single 384-well plate of data was sufficient to train a the dynamic responses and cross-talk of a system receiving complex neural network model (Fig. 2) capable of making accurate predictions combinations of inputs. of the global six-dimensional agonist reaction space (Fig. 3), which is difficult to probe experimentally but fundamental to the processes Methods of thrombosis. Synergies between platelet agonists are dependent not Methods and any associated references are available in the online version just on agonist pairs and doses, but also vary from donor to donor of the paper at http://www.nature.com/naturebiotechnology/. (Fig. 4). In contrast to PAS, current measurements of platelet phenotype can only coarsely stratify healthy donors. For instance, platelet Note: Supplementary information is available on the Nature Biotechnology website. aggregometry has been used13 to classify 359 individuals as “hypo- or Acknowledgments hyper-” reactive to platelet agonists; and flow cytometry was used14 to The authors thank H. Li for suggesting the permutation test to evaluate the significance classify 26 individuals as high, medium or low responders. Previous of donor clustering. This work was supported by the US National Institutes of Health studies have reported synergistic aggregation responses of platelets R01-HL-56621 (S.L.D.), R33-HL-87317 (S.L.D. and L.F.B.) and T32-HG000046 (J.E.P.). to combinations of multiple agonists15–17. Such unique patterns of AUTHOR CONTRIBUTIONS synergisms could be used to distinguish donors and be correlated M.S.C. designed and performed all experiments. J.E.P. constructed neural network with certain risk factors. Clinically, we anticipate that PAS profiles models of platelet activation. M.S.C. wrote the paper with contributions from all will depend on variables such as ancestry, age, sex, pharmacology and authors. L.F.B. advised on experimental conditions, and S.L.D. conceived the study. cardiovascular state—all of which require further testing—although COMPETING FINANCIAL INTERESTS linking genotype (1,327 single nucleotide polymorphisms) to pheno The authors declare no competing financial interests. type (flow cytometric measurement of P-selectin exposure and fibrinogen binding) in 500 individuals18 demonstrated only weak Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/ association probabilities. reprintsandpermissions/. The PAS approach works because individual and binary interactions dominate, and they are sampled across the full dose range of inputs. Lloyd-Jones, D. et al. Heart disease and stroke statistics–2009 update: a report We expect the method to break down when ternary interactions in 1. from the American Heart Association Statistics Committee and Stroke Statistics excess of summing binary interactions become strong. We show that Subcommittee. Circulation 119, e21–e181 (2009). the residual ternary synergy (Δ(ABC) = SABC – SAB – SBC – SAC) was ~0 2. Furie, B. & Furie, B.C. In vivo thrombus formation. J. Thromb. Haemost. 5, 12–17 (2007). in each of 27 responses of platelets to different ternary combinations 3. Bray, D. Protein molecules as computational elements in living cells. Nature 376, 307–312 (1995). of CVX, ADP and SFLLRN and was minimized in the neural network AY
© 2010 Nature America, Inc. All rights reserved.
letters
731
letters 12. Mukherjee, D., Nissen, S.E. & Topol, E.J. Risk of cardiovascular events associated with selective COX-2 inhibitors. JAMA 286, 954–959 (2001). 13. Yee, D.L., Sun, C.W., Bergeron, A.L., Dong, J.-f. & Bray, P.F. Aggregometry detects platelet hyperreactivity in healthy individuals. Blood 106, 2723–2729 (2005). 14. Panzer, S., Höcker, L. & Koren, D. Agonists-induced platelet activation varies considerably in healthy male individuals: studies by flow cytometry. Ann. Hematol. 85, 121–125 (2006). 15. Packham, M.A., Guccione, M.A., Chang, P.L. & Mustard, J.F. Platelet-aggregation and release-effects of low concentrations of thrombin or collagen. Am. J. Physiol. 225, 38–47 (1973). 16. Grant, J.A. & Scrutton, M.C. Positive interaction between agonists in the aggregation response of human-blood platelets-interaction between adp, adrenaline and vasopressin. Br. J. Haematol. 44, 109–125 (1980). 17. Hallam, T.J., Scrutton, M.C. & Wallis, R.B. Synergistic responses and receptor occupancy in rabbit-blood platelets. Thromb. Res. 27, 435–445 (1982). 18. Jones, C.I. et al. A functional genomics approach reveals novel quantitative trait loci associated with platelet signaling pathways. Blood 114, 1405–1416 (2009). 19. Hsueh, R.C. et al. Deciphering signaling outcomes from a system of complex networks. Sci. Signal. 2, ra22 (2009).
© 2010 Nature America, Inc. All rights reserved.
4. Bray, D. Molecular networks: the top-down view. Science 301, 1864–1865 (2003). 5. Purvis, J.E., Chatterjee, M.S., Brass, L.F. & Diamond, S.L. A molecular signaling model of platelet phosphoinositide and calcium regulation during homeostasis and P2Y1 activation. Blood 112, 4069–4079 (2008). 6. Siess, W. Molecular mechanisms of platelet activation. Physiol. Rev. 69, 58–178 (1989). 7. Bohne, A., Fukami, M.H. & Holmsen, H. EDTA inhibits collagen-induced ATP plus ADP secretion and tyrosine phosphorylation in platelets independently of Mg2+ chelation and decrease in pH. Platelets 13, 437–442 (2002). 8. Lages, B. & Harvey, J.W. Heterogeneous defects of platelet secretion and responses to weak agonists in patients with bleeding disorders. Br. J. Haematol. 68, 53–62 (1988). 9. Cho, M.J. et al. The roles of alpha IIbbeta 3-mediated outside-in signal transduction, thromboxane A2, and adenosine diphosphate in collagen-induced platelet aggregation. Blood 101, 2646–2651 (2003). 10. Brass, L.F., Shattil, S.J., Kunicki, T.J. & Bennett, J.S. Effect of calcium on the stability of the platelet membrane glycoprotein IIb-IIIa complex. J. Biol. Chem. 260, 7875–7881 (1985). 11. Polgar, J. et al. Platelet activation and signal transduction by convulxin, a C-type lectin from Crotalus durissus terrificus (tropical rattlesnake) venom via the p62/GPVI collagen receptor. J. Biol. Chem. 272, 13576–13583 (1997).
732
VOLUME 28 NUMBER 7 JULY 2010 nature biotechnology
ONLINE METHODS
Materials. PAR1-agonist peptide SFLLRN (thrombin receptor agonist peptide, TRAP) and the PAR4-agonist peptide AYPGKF were obtained from Bachem. Convulxin (CVX) was obtained from Centerchem. Thrombin and GGACK were obtained from Haematologic Technologies. Clear, flat-bottom, black 384well plates were obtained from Corning. ADP, U46619, PGE2, EDTA, HEPES, the fibrin polymerization inhibitor Gly-Pro-Arg-Pro (GPRP), NaCl, NaOH, apyrase, indomethacin and sodium citrate were all from Sigma. Fluo-4 NW Calcium assay kits were obtained from Invitrogen. The buffer used for all dilutions was HEPES buffered saline (HBS, sterile filtered 20 mM HEPES and 140 mM NaCl in deionized water adjusted to pH 7.4 with NaOH).
© 2010 Nature America, Inc. All rights reserved.
Platelet preparation. Whole blood was drawn from healthy male volunteers according to the University of Pennsylvania Institutional Review Board guidelines, into citrate anticoagulant (1 part sodium citrate to 9 parts blood). All donors affirmed to not taking any medications for the past 10 d and not consuming alcohol for the past 3 d before phlebotomy. After centrifugation at 120g for 12 min to obtain platelet-rich plasma, 2 ml of platelet-rich plasma was incubated with each vial of Fluo4-NW dye mixture reconstituted into 8 ml of buffer for 30 min. High-throughput experimentation. An ‘agonist plate’ containing varying combinatorial concentrations of platelet agonists was prepared on a PerkinElmer Janus (PerkinElmer Life and Analytical Sciences) using 10× stock solutions of ADP, CVX, SFLLRN, AYPGKF and U46619. A separate ‘platelet plate’ containing dye-loaded platelets was prepared on a PerkinElmer Evolution. Final platelet rich plasma (PRP) concentrations were 12% by volume (6 μl/well) after agonist addition, and 5 mM EDTA was included in every well. Agonists (10 μl/well) were dispensed after a 20-s baseline read from columns of the ‘agonist plate’ onto the corresponding columns of the ‘platelet plate’ on a Molecular Devices FlexStation III. Fluo4 fluorescence was measured at excitation 485 nm and emission 535 nm for 4 min in every column of the plate. The fluorescence F(t) was scaled to the mean baseline value for each well F0(t) and relative calcium concentrations were quantified as F(t)/F0(t). An entire 384-well plate was read in ~90 min. Agonist selection. The number of agonists tested in a PAS experiment is limited to six by the need of testing all the 154 conditions in duplicate in a single 384-well plate. Agonists were chosen to be representative of physiological signaling cascades. Convulxin is a selective GPVI activator11 and under static conditions this receptor is the predominant determinant of collagen-induced signal strength20 . In contrast, the soluble monomeric form of collagen interacts only with α2β1, which regulates platelet adhesion but has little direct effect in mediating signaling21,22. ‘Horm’ collagen preparations are insoluble, making them poorly suited for automated liquid handling. Although ADP stimulates both P2Y1 and P2Y12, the latter receptor has a minor effect on calcium mobilization23, allowing us to use the physiological agonist ADP instead of specific P2Y1 ligands. Thrombin signals through two separate Gq-coupled receptors PAR1 and PAR4, both of which produce temporally separate calcium signals24,25. This prompted us to use selective PAR agonist peptides (SFLLRN and AYPGKF) to distinguish the separate signal contribution of both these receptor pathways. Moreover, thrombin stimulation of unwashed PRP requires inhibition of fibrin and coagulation factor Xa (FXa) formation (Supplementary Fig. 13). Washing or gel-filtering platelets are processing steps that decrease throughput in a large-scale experiment and often cause residual platelet activation in the absence of PGE2 or other PGI2 analogs. The use of a short-lived prostaglandin like PGI2 (ref. 26) is unsuitable for assembly of agonist plates (requiring ~120 min) and plate reading (requiring ~90 min). In contrast, prostaglandins of the E series are chemically stable, prompting us to use PGE2 as an agonist causing elevation in intracellular cAMP. Similarly, for reasons of stability during the course of the experiment, the thromboxane analog U46619 was used instead of its physiological equivalent TxA2 (ref. 27). Definition of synergy score. To quantify cross-talk between agonist combinations, we defined the ‘synergy score’ as the difference between the observed
doi:10.1038/nbt.1642
and the predicted additive response. For ease of visualization, this difference was scaled to the maximum synergy score observed in an experiment (or simulation), giving a metric that ranges from −1 (antagonism) to +1 (positive synergy). A similar synergy metric was previously defined as the ratio of the observed and the predicted additive response to demonstrate synergistic calcium signaling between C5a and UDP in RAW264.7 cells and bone marrow– derived macrophages28. The use of a ratio rather than a difference is prone to numerical errors for small values of the predicted additive response. Neural network model construction, training and simulation. Neural network modeling and analysis was performed using the Neural Network Toolbox for MATLAB (The MathWorks). Training data consisted of (i) the dynamic inputs, which represent the combination of agonist concentrations present at each time point for a particular experiment (because the concentration of agonists remains essentially constant throughout each experiment, these values were generally a constant vector of concentration values repeated at 1-s intervals) and (ii) the dynamic outputs, which represent the experimentally measured calcium concentrations, also interpolated at 1-s intervals. To normalize the input data, agonist concentrations of 0, 0.1, 1 and 10 × EC 50 were mapped to the values (−1, −0.333, +0.333, +1) before introducing them to the network, so as to fall within the working range of the hyperbolic tangent sigmoid transfer function, which was used for all processing nodes. Output values (fluorescence measurements) were normalized between −1 and +1, so that the basal concentration of calcium at t = 0 was defined to be 0. After training all 420 possible one- and two-layer neural networks with between 1 and 20 nodes in each processing, or ‘hidden’, layer and testing each network for accuracy, a final neural network topology with a six-node input layer (representing the six agonists), two processing layers (eight nodes/four nodes) and a single-node output layer (representing the intracellular calcium concentration)29 was most optimal (best predicted the ‘net’ output response [Ca2+]I for a given multivariate input using the fewest neurons) and thus selected to predict successive time points from all 154 Ca2+ release curves gathered experimentally (Fig. 2). For the sake of simplicity and because we already obtain reasonably accurate time series predictions of [Ca2+]i, more processing layers or >20 neurons in each layer were not tested. From a purely biological perspective, the model architecture is arbitrary and no particular meaning should be inferred from the narrowing of eight nodes in the first layer to four nodes in the second processing layer. Moreover, this neural network model (Fig. 1c) does not correspond to an actual signaling network (Fig. 1a) but does provide a highly efficient framework for use as an independent signaling module in multiscale models of thrombosis under flow. From a mathematical perspective, this architecture repre sents a multivariate regression to obtain optimal good fits of high-dimensional data and allow extrapolation onto experimentally unexplored spaces. NARX (nonlinear autoregressive network with exogenous inputs) models are recurrent dynamic networks with feedback connections enclosing multiple layers of the network and are well-suited for predicting time series data 30 because they process inputs sequentially, that is, at successive time points. Calcium outputs before the current instant were fed back to hidden layers using a delay line spanning 128 s. Initial states of the delay line were set to 0, corresponding to the steady state of the platelet before agonist stimulation. Such a structure allows the network output to progress over time, using the ‘memory’ of the previous 128 s in calculating the current output. Training was performed using Levenberg-Marquardt back-propagation until the performance of the model (mean squared error between the simulated and experimentally measured PAS responses) did not become better than >1 × 10−5. During training, the pairwise agonist data (154 time-course traces) was divided into training, validation and testing vectors. Validation and testing vectors were each generated by randomly selecting 23 (15%) of the 154 pairwise timecourse traces. The training vectors were used to directly optimize network edge weights and bias values to match the target output. The validation set was used to ensure that there is no overfitting in the final result. The test vectors provide an independent measure of how well the network can be expected to perform on data not used to train it Mathematically, the output y at an instant t, for an input vector Ī of the concentrations of the six inputs species can be compactly described by
nature biotechnology
y (t ) = f L3(1 × 4) × f H 2(4 × 8) × yh (8 × 1) + L2(4 × 8) × f H1(8 × 8) × yh (8 × 1) + IW (8 × 6) × I (6 × 1) + b1(8 × 1) + b2(4 × 1) + b3(1 × 1) (8 × 1) (4 × 1)
(
)
© 2010 Nature America, Inc. All rights reserved.
(1 × 1)
where IW is the matrix of input weights, L2 and L3 are the weight matrices that operate on the ‘inputs’ coming from the first and second processing layers respectively. H1 and H2 are matrices that contain history coefficients that weigh the history vector yh (containing the output of the system 1, 2, 4, 8, 16, 32, 64 and 128 s prior to the current instant). b1, b2 and b3 are bias vectors that add constant biases to each weighted input and weighted histories to produce the ‘net input’ to each transfer function. f is the hyperbolic tangent function that operates on a vector of ‘net inputs’ to yield the corresponding transformed output. Numbers in parentheses show the sizes of relevant matrices or vectors. The NARX model presented here represents a nonlinear regression model with input stimuli and system history. The use of simple 1st and 2nd order polynomial terms (with lower number of optimizable para meters) did not produce acceptable fits (not shown), necessitating the use of the NARX architecture. A 3rd order polynomial was not attempted since it requires 316 fitting parameters, far exceeding the number of parameters in the neural network model. It should be noted that each trained neural network model produces a deterministic prediction of platelet activation. Experimental variations are inherent in replicates of donor-specific training data (Supplementary Fig. 9), and the tightness of the measured mean will determine the predictive quality of such a donor-specific neural network model. The fold-expression kinetics of nine ‘top-ranked’ genes involved in the sustained migration of keratinocytes after hepatocyte growth factor (HGF) treatment has been described by means of a continuous-time recurrent neural network, and the neural network weights were used to define the modulation and control elements of the response 31. Also, pre vious studies have used partial least-squares regression analysis (PLSR) to understand the interplay of molecular mechanisms during signaling 32,33. PLSR measures multiple intermediate signaling molecules at various time points for a relatively small number of inputs, and identifies principal components that capture the phenotype of the system. In comparison, the PAS approach offers less mechanistic dissection but provides rapid
nature biotechnology
(a 2-h experiment) and efficient prediction of dynamic input-output relationships at numerous (~102) physiologically relevant conditions. 20. Nieswandt, B. & Watson, S.P. Platelet-collagen interaction: is GPVI the central receptor? Blood 102, 449–461 (2003). 21. Knight, C.G. et al. Collagen-platelet interaction: Gly-Pro-Hyp is uniquely specific for platelet Gp VI and mediates platelet activation by collagen. Cardiovasc. Res. 41, 450–457 (1999). 22. Hers, I. et al. Evidence against a direct role of the integrin alpha2beta1 in collageninduced tyrosine phosphorylation in human platelets. Eur. J. Biochem. 267, 2088–2097 (2000). 23. Foster, C.J. et al. Molecular identification and characterization of the platelet ADP receptor targeted by thienopyridine antithrombotic drugs. J. Clin. Invest. 107, 1591–1598 (2001). 24. Covic, L., Gresser, A.L. & Kuliopulos, A. Biphasic kinetics of activation and signaling for PAR1 and PAR4 thrombin receptors in platelets. Biochemistry 39, 5458–5467 (2000). 25. Shapiro, M.J., Weiss, E.J., Faruqi, T.R. & Coughlin, S.R. Protease-activated receptors 1 and 4 are shutoff with distinct kinetics after activation by thrombin. J. Biol. Chem. 275, 25216–25221 (2000). 26. Cho, M.J. & Allen, M.A. Chemical stability of prostacyclin (PGI2) in aqueoussolutions. Prostaglandins 15, 943–954 (1978). 27. Coleman, R.A., Humphrey, P.P.A., Kennedy, I., Levy, G.P. & Lumley, P. Comparison of the actions of U-46619, a prostaglandin H2-analogue, with those of prostaglandinH2 and thromboxane-A2 on some isolated smooth-muscle preparations. Br. J. Pharmacol. 73, 773–778 (1981). 28. Roach, T.I.A. et al. Signaling and crosstalk by C5a and UDP in macrophages selectively use PLCbeta 3 to regulate intracellular free calcium. J. Biol. Chem. 283, 17351–17361 (2008). 29. Jain, L.C. & Medsker, L.R. Recurrent Neural Networks: Design and Applications (CRC Press, 1999). 30. Eugen, D. The use of NARX neural networks to predict chaotic time series. WSEAS Trans. Comp. Res. 3, 182–191 (2008). 31. Busch, H. et al. Gene network dynamics controlling keratinocyte migration. Mol. Syst. Biol. 4, 199 (2008). 32. Janes, K.A. et al. Systems model of signaling identifies a molecular basis set for cytokine-induced apoptosis. Science 310, 1646–1653 (2005). 33. Kemp, M.L., Wille, L., Lewis, C.L., Nicholson, L.B. & Lauffenburger, D.A. Quantitative network signal combinations downstream of TCR activation can predict IL-2 production response. J. Immunol. 178, 4984–4992 (2007).
doi:10.1038/nbt.1642
letters
An allosteric inhibitor of substrate recognition by the SCFCdc4 ubiquitin ligase
© 2010 Nature America, Inc. All rights reserved.
Stephen Orlicky1, Xiaojing Tang1, Victor Neduva2, Nadine Elowe3, Eric D Brown3, Frank Sicheri1,4 & Mike Tyers1,2,4 The specificity of SCF ubiquitin ligase–mediated protein degradation is determined by F-box proteins1,2. We identified a biplanar dicarboxylic acid compound, called SCF-I2, as an inhibitor of substrate recognition by the yeast F-box protein Cdc4 using a fluorescence polarization screen to monitor the displacement of a fluorescein-labeled phosphodegron peptide. SCF-I2 inhibits the binding and ubiquitination of full-length phosphorylated substrates by SCFCdc4. A co-crystal structure reveals that SCF-I2 inserts itself between the -strands of blades 5 and 6 of the WD40 propeller domain of Cdc4 at a site that is 25 Å away from the substrate binding site. Longrange transmission of SCF-I2 interactions distorts the substrate binding pocket and impedes recognition of key determinants in the Cdc4 phosphodegron. Mutation of the SCF-I2 binding site abrogates its inhibitory effect and explains specificity in the allosteric inhibition mechanism. Mammalian WD40 domain proteins may exhibit similar allosteric responsiveness and hence represent an extensive class of druggable target. The ubiquitin-proteasome system mediates the intracellular degradation of many proteins through a cascade of enzyme activities, termed E1, E2 and E3, which serially activate and then transfer ubiquitin to substrate proteins3. E3 enzymes, also referred to as ubiquitin ligases, specifically recognize discrete sequence motifs in substrates termed degrons. The human genome encodes at least 600 E3 enzymes, each of which has the potential to recognize multiple substrates4. The largest class of E3 enzymes, the cullin-RING ligases, were discovered through identification of the multi-subunit Skp1–Cdc53/Cullin–F-box protein (SCF) complexes1,2. A large family of F-box proteins recruit substrates to the core SCF complex by means of protein interaction domains, typically leucinerich repeats or WD40 repeats, often in a phosphorylation-dependent manner1,2,5–7. The SCF enzymes likely target hundreds of different substrates4,8–10 and thus hold untapped potential for drug discovery4. The WD40 repeat is an ancient conserved motif that functions in many different cellular processes11,12. Tandem arrays of five to eight WD40 repeats form a circularly permuted β-propeller domain structure13. In yeast, recognition of the cyclin-dependent kinase (CDK) inhibitor Sic1 by the WD40 domain of the F-box protein Cdc4 depends on phosphorylation of multiple Cdc4 phosphodegron (CPD) motifs in Sic1 (refs. 6,14). SCFCdc4 also targets other substrates
including Far1, Cdc6 and Gcn4 (ref. 1). Human CDC4, also known as FBW7, recruits a number of important regulatory factors for ubiquitination including cyclin E, MYC, JUN, NOTCH, SREBP and presenilin9. FBW7 is a haploinsufficient tumor suppressor that is mutated in many cancer types9,15 and also likely influences stem cell renewal by virtue of its effects on MYC and other factors16. Given the central role of Cdc4/FBW7 in growth and division, we sought to identify small molecules that inhibit substrate recognition by Cdc4. We adapted a previously established fluorescence polarization assay to monitor the displacement of a fluorescein-labeled CPD peptide (Kd ≈ 0.2 μM) from yeast Cdc4 (Supplementary Fig. 1a)14. The fluores cence polarization assay achieved a Z-factor of 0.8, based on negative (DMSO solvent only) and positive (unlabeled CPD peptide) controls. A screen against a 50,000-compound library enriched for drug-like molecules17 yielded 44 hits that inhibited the CPD-Cdc4 interaction by at least 50% (Fig. 1a). Two of these compounds, denoted SCF-I2 and SCF-I6, strongly inhibited the interaction of full-length phospho-Sic1 with Cdc4 and prevented Sic1 ubiquitination by SCFCdc4 (Fig. 1b). We pursued only SCF-I2 because SCF-I6 appeared to cause nonspecific loss of Skp1-Cdc4 complex from the capture resin (Fig. 1b). SCF-I2 corresponds to 1-(2-carboxynaphth-1yl)-2-naphthoic acid, which is a derivative of 1,1′-binapthyl-2,2′diol, also known as BINOL, a biplanar axially chiral atropisomer that is widely used as a scaffold in chiral synthesis18. The two hydroxyl groups of BINOL are substituted by carboxylic acid groups in SCF-I2 (Fig. 1c). The form of 1-(2-carboxy naphth-1-yl)-2-naphthoic acid) used in all of our assays was an undefined racemic mixture of the R- and S-enantiomers, which are noninterconvertible at even high temperature18. SCF-I2 was tenfold less potent than unlabeled CPD peptide in the fluorescence polarization assay, with an IC50 of 6.2 μM versus 0.5 μM, respectively (Fig. 1c). SCF-I2 inhibited binding and/or ubiquitination of both full-length Sic1 and Far1 with an IC50 of ~60 μM (Supplementary Fig. 1b,c); the weaker apparent affinity of SCF-I2 in these assays may reflect differ ences in the interaction of peptides and full-length substrates with Cdc4. SCF-I2 failed to inhibit Cdc4 activity in vivo (data not shown), presumably because the two carboxylate groups prevented efficient partitioning of the inhibitor into yeast cells. SCF-I2 did not affect the in vitro activity of the closely related E3 enzyme SCFMet30, which recruits its substrate Met4 by means of the WD40 domain of the F-box protein Met30 (Supplementary Fig. 1d)19.
1Center for Systems Biology, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada. 2Wellcome Trust Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Mayfield Road, Edinburgh, Scotland. 3Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada. 4Department of Molecular Genetics, University of Toronto, Toronto, Canada. Correspondence should be addressed to F.S. ([email protected]) or M.T. ([email protected]).
Received 9 March; accepted 10 May; published online 27 June 2010; doi:10.1038/nbt.1646
nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
733
letters a
b
0 5, 00 10 0 ,0 0 15 0 ,0 0 20 0 ,0 0 25 0 ,0 0 30 0 ,0 0 35 0 ,0 0 40 0 ,0 0 45 0 ,0 0 50 0 ,0 00
pS
ic D 15 M 0% S G O in cn pu t SC 4 p F- ept SC I1 id e F SC -I2 FSC I3 F SC -I4 F SC -I5 F SC -I6 F SC -I7 F SC -I8 FI9
Percent binding activity
Figure 1 Small-molecule inhibitors of the 300 Cdc4-substrate interaction. (a) Distribution of 250 hits from the 50,000-compound Maybridge 200 library screen. Interaction between a 150 fluorescein-labeled, high–affinity, cyclin 100 3s E-derived phosphopeptide (GLLpTPPQSG) 50 0 - pSic1 and recombinant Cdc4 was monitored by 50% activity –50 fluorescence polarization. Forty-four compounds –100 fell below the 50% inhibition cutoff (red line). - Cdc4 Yellow dashed lines indicate three s.d. above - GST-Skp1 and below the mean. Z and Z ′ factor scores Compound no. were 0.8 and 0.66, respectively. At one s.d. (R) 300 (σ), high controls were 4.6%, low controls pSic1-(Ub)n 6.8% and sample data 7.0%. (b) Inhibition of 200 interaction between full-length phospho-Sic1 (S) and Cdc4 (top panel). Phosphorylated Sic1 pSic1 100 (0.1 μM) was incubated in the presence of SCF-I2 IC50 = 6.2 ± 1 µM CPD peptide IC50 = 0.5 ± 1 µM recombinant Skp1-Cdc4 resin (500 ng) and the 0 indicated compounds (50 μM). Bound protein –10 –9 –8 –7 –6 –5 –4 –3 was visualized by anti-Sic1 immunoblot. Total log [inhibitor] (M) protein on resin after capture and wash was determined by Ponceau S stain (middle panel). DMSO solvent alone and 10 μM Gcn4 phosphopeptide (FLPpTPVLED) served as negative and positive controls. Inhibition of Sic1 ubiquitination in vitro (bottom panel). Phosphorylated Sic1 (0.2 μM) was incubated with recombinant SCFCdc4 (0.2 μM), E1 (0.4 μM), E2 (2 μM), ubiquitin (24 μM) and ATP (1 mM) in the presence of 80 μM of indicated compound or control. Reaction products were visualized by anti-Sic1 immunoblot. Full-length blots are shown in Supplementary Figure 5. (c) Inhibition curves for SCF-I2 (red) and unlabeled CPD peptide (black) in the fluorescence polarization assay. R- and S-enantiomers of 1-(2-carboxynaphth-1yl)-2 naphthoic acid (SCF-I2) are shown.
c
COOH
mP
COOH
COOH
© 2010 Nature America, Inc. All rights reserved.
COOH
We determined the crystal structure of SCF-I2 bound to a Skp1Cdc4 complex20 to 2.6 Å resolution (see Supplementary Table 1 for data collection and refinement statistics). Unbiased difference electron-density maps revealed that SCF-I2 binds to the WD40 repeat domain of Cdc4 at a site that is 25 Å distant from the CPD binding pocket (Fig. 2a). The eight WD40 repeat motifs of Cdc4 form a canonical propeller structure in which each propeller blade consists of four anti-parallel β-strands and intervening loop regions (Supplementary Fig. 2)20. SCF-I2 embeds itself in a deep pocket on the lateral surface of the β-propeller between blades 5 and 6 (Fig. 2a,b and Supplementary Fig. 2). Cdc4 engages only one of two enantio mers of SCF-I2, the (R)-(+) equivalent of BINOL. The top naphthalene ring system of SCF-I2 inserts itself deeply between blades 5 and 6, forming extensive hydrophobic contacts with Leu628, Ile594, Leu634, Trp657 and Ala649 (Fig. 2b). In addition, the carboxyl group of the top ring system hydrogen bonds to the NH group of the Trp657 side chain and forms a salt bridge with the side chain of Arg664. The bottom naphthalene ring system is more exposed to solvent and forms a stabilizing co-planar stacking interaction with the side chain of Arg664 and van der Waals interactions with the side chains of Ser667 and Arg655. The carboxyl group of the bottom ring system also forms ionic interactions with the side chains of Arg655 and His631. In the apo–Skp1-Cdc4 structure, there is no obvious pre existing pocket that might anticipate the binding mode of SCF-I2 (Fig. 2c). Rather, the SCF-I2 binding pocket is induced by separation of blades 5 and 6 and a drastic shift of the β21-β22 linker that connects the two blades (Fig. 2d). The reorientation of the β21-β22 linker entails a 5 Å shift of the main chain and a massive 13 Å shift of the side chain of His631 from a buried to a solvent-exposed position (Fig. 2d,e). These large conformational alterations create an interblade void that is filled by the rearrangement of residues proximal to the CPD binding pocket (Fig. 2d,e). The void is filled in part by a swap of side chain positions between Val635, which is normally buried and adjacent to His631 and the normally solventexposed Leu634 side chain; as a consequence of this rearrangement, the side chains of Val635 and Leu634 traverse 6 and 8 Å, respectively. The position vacated by Leu634 in turn is filled by rotation 734
of the side chain of Tyr574. Critically, both Tyr574 and Leu634 constitute part of the highly conserved CPD binding infrastructure. In the CPD peptide–Skp1-Cdc4 complex20, Tyr574 and Leu634 line the hydrophobic P-2 binding pocket within the central pore (Fig. 2e) and thereby dictate the preference for hydrophobic residues at the P-2 position of the CPD consensus motif 14,20. The P-2 pocket is thus severely distorted by the reoriented side chains of Tyr574 and Leu634 in the SCF-I2 bound structure. In addition, the hydroxyl group of Tyr574 participates in stabilizing H-bond interactions with the side chain of Arg572, one of the four invariant essential arginine residues found in all Cdc4 orthologs20. Arg572 stabilizes the orientation of Tyr548, which in turn directly hydrogen-bonds to the CPD phosphate group in the P0 position. Thus, SCF-I2 critically compromises the main binding pockets for the P-2 and P0 positions of the CPD consensus sequence14,20. As predicted by this structural model, the effects of SCF-I2 are mimicked by a Tyr574Ala mutation, which results in a 20-fold reduction in the affinity of Cdc4 for the CPD peptide (Fig. 2f). We explored the determinants of the SCF-I2–Cdc4 interface. The two carboxylic acid groups of SCF-I2 exhibit marked charge complementarity with the guanidinium side chains of Arg655 and Arg664. Mutation of each arginine residue individually to alanine attenuated the inhibition of Cdc4 by SCF-I2 by at least 50-fold (Fig. 2g). Alleles bearing either mutation fully complemented Cdc4 function in vivo, indicating that this region of Cdc4 is not normally critical in substrate recognition or SCF catalytic activity (Supplementary Fig. 3a). To investigate the structural features of SCF-I2 required for Cdc4 inhibition, we tested a panel of available BINOL analogs for activity in the fluorescence polarization assay (Supplementary Fig. 3b,c). This series demonstrated the importance of the naphthalene ring systems that participate in numerous hydrophobic inter actions and the carboxylate groups that form electrostatic interactions with the two arginine residues on Cdc4. These mutational and structure-activity results validate the binding mode for SCF-I2 observed in the crystal structure. We next assessed the activity SCF-I2 toward human FBW7. The key Cdc4 residues Arg655 and Arg664 are replaced in FBW7 by lysine VOLUME 28 NUMBER 7 JULY 2010 nature biotechnology
letters
b
a
His 631
PB4
PB5
c Asp 651
Cdc4
Leu 634
PB3 PB6
Leu 634
Skp1
PB1
Leu 628
PB8
Trp 657
PB5
Ser 667 Arg 664
Ala 649
Ile 594
Arg 655
PB2
PB7
His 631
Ala 649
Ile 594
25Å
Asp 651
Leu 628
PB6
Arg 655
Ser 667
Trp 657
Arg 664
PB5
PB6
d
Val 635 ∆6Å
Val 635 ∆6Å
Propeller blade 5
S
Arg534 NH2
NH2
+
S
LLeu634 Leu634
His631
NH2 +
P0
OH
P-2 pock
et
Val635
N
HO
po
ck
OH
oc
t ke oc 2p O
OH
O
hr
Thr441
pT
Val384
Thr465 O
f
Arg 655
O
P
Trp717
et
5
P+
Pr
o
P-1 pock
Arg48
2
26 p4
Arg 664
N
2 +
NH
Tr
PB6
His 631 ∆13Å
NH
O
NH
Arg 655
NH2
O
pThr
Leu
Propeller blade 6
His Arg 631 ∆13Å 664
et
+
O
Leu
COOH COOH
P
Arg467
N NH2
O O
NH2
t
Leu-2 Leu 634 ∆8Å
Met569
N
ke
Leu-2 Leu 634 ∆8Å
Leu-1
Met 590
Tyr548
Arg572
1p
Leu-1
pThr-0 Tyr 574
P+
Met 590
Arg 572
x
pThr-0 Tyr 574
HN
Arg 572
4 5r5747 Tyyr
PB6
350
g
and cysteine, respectively, suggesting that FBW7 might be resistant to inhibition by SCF-I2. This proved to be the case as SCF-I2 inhibited the CPD-FBW7 interaction only at high concentrations (Fig. 3a). The residual inhibitory activity of SCF-I2 toward FBW7 might be due to the conservative Arg-to-Lys substitution and the conservation of most other residues that form the induced SCF-I2 binding pocket (Fig. 3b and Supplementary Fig. 2). Alignment of all human WD40 domains revealed that, aside from the two surface arginine residues, the pattern of SCF-I2 contact residues is often conserved (Supplementary Fig. 4). We are currently exploring whether the BINOL scaffold can be modified to more potently interact with FBW7 and other human WD40 domain proteins. The most thoroughly studied WD40 domain proteins are the β subunits of heterotrimeric G proteins, which transduce signals from nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
mP
mP
100 Figure 2 Structure analysis of the SCF-I2–Skp1-Cdc4 300 WT IC50 = 1.9 ± 1 µM Cdc4 complex. (a) SCF-I2 intercalates between β-propeller WT 250 75 100 R655A Cdc4 Kd = 0.19 ± 0.07 µM IC50 > 200 µM Cdc4 blades 5 and 6 of the Cdc4 WD40 repeat domain, 200 R664A 75 Cdc4 IC50 > 200 µM ~25 Å from the CPD phosphopeptide binding site. 50 50 150 SCF-I2 is shown in yellow. Red dot indicates modeled 25 100 25 0 position of the P0 phosphate position. PB indicates Cdc4Y574A Kd = 3.5 ± 0.2 µM –8 –7 –6 –5 –4 50 log [CPD peptide] (M) propeller blade. (b) Stereo diagram of SCF-I2 bound 0 0 between PB5 and PB6 of the WD40 domain of Cdc4. 0 1 2 3 4 5 –8 –7 –6 –5 –4 –3 SCF-I2 is shown in yellow; critical contact residues in [Skp1-Cdc4] (µM) log [SCF-I2] (M) Cdc4 are shown in blue stick representations. (c) Surface representation of SCF-I2 binding region on Cdc4 in the absence (top) and presence (bottom) of bound SCF-I2. (d) Stereo diagram of main chain conformational shifts induced by SCF-I2. The structure of Cdc4 in the absence of SCF-I2 but in the presence of a CPD phosphopeptide substrate (yellow) is shown in purple; the structure of Cdc4 in the presence of SCF-I2 (yellow) is shown in blue. (e) Schematic of allosteric alterations caused by binding of SCF-I2. Positions of SCF-I2 bound conformations are shown in red; X indicates abrogation of a hydrogen bond caused by rotation of Tyr574. (f) Binding curves for WT Skp1-Cdc4 (black) and Skp1-Cdc4 Y574A (red) interactions with cyclin E-derived phosphopeptide by fluorescence polarization. (g) SCF-I2 inhibition curves for WT Skp1-Cdc4 (black), Skp1-Cdc4 R655A (green) and Skp1-Cdc4R664A (blue) binding to cyclin E phosphopeptide by fluorescence polarization. Inset shows binding inhibition by unlabeled cyclin E phosphopeptide for the same three proteins. mP
© 2010 Nature America, Inc. All rights reserved.
e
a host of G protein–coupled receptors21. Notably, the interaction of the regulatory protein phosducin with the Gtβ subunit of the hetero trimeric G-protein transducin also causes substantial structural rearrangements between adjacent WD propeller blades22. These rearrangements induce a binding pocket for the C-terminal farnesyl moiety of the partner Gtγ subunit, which may serve to regulate membrane association of the Gtβγ complex22. Comparison of our SCF-I2–Cdc4 structure and the phosducin-Gtβγ structure reveals three highly similar features. First, the ligand-bound forms of both structures exhibit an analogous buried-to-exposed transition of the conserved histidine residue at the apex of the connector between the affected blades (Fig. 3c). Second, the Cdc4 and Gtβ structures show a close juxtaposition of induced binding pockets for the SCF-I2 and farnesyl ligands, respectively (Fig. 3c). Third, these rearrangements 735
© 2010 Nature America, Inc. All rights reserved.
letters a
Cdc4 IC50 = 1.9 ± 1 µM FBW7 IC50 = 274 ± 3 µM 100 125
75
100 75
50
mP
mP
Figure 3 Inhibition and allosteric modulation of human WD40 domains. (a) Fluorescence polarization competition binding curves for S. cerevisiae Cdc4 (black) and human FBW7 (red) with SCF-I2. Inset shows inhibition by unlabeled cyclin E phosphopeptide for yeast Cdc4 (black) and human FBW7 (red). (b) Stereo view overlay of the inhibitor binding site region of S. cerevisiae Cdc4 (PDB: 1NEX) in the absence of SCF-I2 (blue) with the corresponding region of human FBW7 (green) (PDB: 2OVR)20,29. Only residues which differ between the human and S. cerevisiae proteins are labeled. (c) Stereo view comparison of induced pockets in the WD40 repeat domain of Cdc4 and the bovine transducin Gβ subunit. Top displays a superposition of Cdc4 (blue) bound to SCF-I2 (yellow) with the Gtβ subunit (dark green) bound to bovine retinal phosducin (pink) and a farnesyl ligand (magenta) from an associated Gtγ subunit (PDB: 1A0R). Bottom displays a superposition of unliganded forms of Cdc4 (gray) and the Gtβ subunit (light green) (PDB: 1TBG). For illustrative purposes, SCF-I2 and farnesyl ligands from the top image have been modeled into the lower image.
50 25
25
0
–8
–7
–6
–5
–4
–3
–2
log [CPD peptide] (M)
0 –9
b
Ala649 Asn598
–8
Arg655 Lys604
–7 –6 –5 log [SCF-I2] (M)
–4
–3
Ala649 Asn598
Arg655 Lys604
Ser667 Thr616
Ser667 Thr616
Cdc4 FBW7
Arg664 Cys613
PB6
Cdc4 FBW7
Arg664 Cys613
PB6
occur between blades 5 and 6 for both WD40 structures. That two functionally unrelated c and evolutionarily distant proteins undergo similar induced conformational changes His 311 His 311 hints that allosteric responsiveness may be an intrinsic and conserved feature of the WD40 domain. His His In contrast to conventional protein inter631 631 action inhibitors that directly block the substrate binding site, such as the p53-MDM2 inhibitor nutlin23, SCF-I2 elicits its effect by an allosteric mechanism. A structural feature SCF-I2 - Cdc4 SCF-I2 - Cdc4 of WD40 domains and other β-propeller phosducin - Gtβ - farnysl phosducin - Gtβ - farnysl structures such as the Kelch domain is the variability in blade number, which in known WD40 structures ranges from five to eight His 311 His 311 blades per domain13. The circular β-propeller His 631 His 631 structure can exhibit interblade separation24 and structural tolerance to artificial insertion of an additional repeat 25. WD40 domains may thus be inherently susceptible to disruption by insertion of appropriately apo Cdc4 - SCF-I2 model apo Cdc4 - SCF-I2 model configured small molecules between adjaapo Gtβ - farnysl model apo Gtβ - farnysl model cent blades. Although it remains to be determined whether all WD40 domains exhibit allosteric responsiveness, in other protein families ultraconserved system, autophagy, vesicle trafficking, the cytoskeleton and organelle residues can transmit long-range allosteric effects26. biogenesis (Supplementary Table 2). In humans, WD40 domains occur To our knowledge, SCF-I2 represents the first example of a WD40 in at least 256 different proteins and perform similarly diverse funcdomain inhibitor. As our data with Cdc4 and FBW7 show, allosteric tions (Supplementary Table 3). Biomedically important WD40 domain inhibition can discriminate between even highly related domains that proteins include the F-box proteins FBW7 and β-TrCP8, target of rapa recognize identical substrate motifs; thus, it may be feasible to design mycin kinase complex subunits28 and Gβ-subunits of heterotrimeric other inhibitors that are selective for particular WD40 domain proteins. G proteins21,27. Our findings suggest that the WD40 domain may be Moreover, allosteric inhibitors may be combined with conventional generally accessible to allosteric modulation by small molecules. binding pocket inhibitors to increase potency27. The yeast genome encodes at least 113 proteins with WD40 or WD40-like domains that Methods function in signaling, transcription, chromatin remodeling, mRNA Methods and any associated references are available in the online version splicing, DNA replication and repair, protein synthesis, the ubiquitin of the paper at http://www.nature.com/naturebiotechnology/. 736
VOLUME 28 NUMBER 7 JULY 2010 nature biotechnology
letters Accession codes. Coordinates have been deposited in the Protein Data Bank (accession code 3MKS). Note: Supplementary information is available on the Nature Biotechnology website. Acknowledgments We thank M. Auer, J. Walton and M. Bradley for stimulating discussions. This work was supported by grants to F.S. and M.T. from the Canadian Institutes of Health Research (MOP-57795), to E.D.B. from the Ontario Research and Development Challenge Fund and to M.T. from the National Cancer Institute of Canada and the European Research Council. F.S. is supported by a Canada Research Chair in Structural Biology of Signal Transduction and M.T. is supported by a Research Chair of the Scottish Universities Life Sciences Alliance and a Royal Society Wolfson Research Merit Award.
© 2010 Nature America, Inc. All rights reserved.
AUTHOR CONTRIBUTIONS S.O., small-molecule library screen, affinity determinations and structural analysis; X.T., in vitro substrate binding and ubiquitination assays; V.N., bioinformatic analysis and sequence alignments; N.E. and E.D.B., small-molecule library screen; F.S. and M.T. conceived and directed the project, interpreted results and wrote the manuscript. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/.
1. Willems, A.R., Schwab, M. & Tyers, M. A hitchhiker’s guide to the cullin ubiquitin ligases: SCF and its kin. Biochim. Biophys. Acta 1695, 133–170 (2004). 2. Petroski, M.D. & Deshaies, R.J. Function and regulation of cullin-RING ubiquitin ligases. Nat. Rev. Mol. Cell Biol. 6, 9–20 (2005). 3. Hershko, A. & Ciechanover, A. The ubiquitin system. Annu. Rev. Biochem. 67, 425–479 (1998). 4. Nalepa, G., Rolfe, M. & Harper, J.W. Drug discovery in the ubiquitin-proteasome system. Nat. Rev. Drug Discov. 5, 596–613 (2006). 5. Bai, C. et al. SKP1 connects cell cycle regulators to the ubiquitin proteolysis machinery through a novel motif, the F-box. Cell 86, 263–274 (1996). 6. Verma, R. et al. Phosphorylation of Sic1p by G1 Cdk required for its degradation and entry into S phase. Science 278, 455–460 (1997). 7. Patton, E.E. et al. Cdc53 is a scaffold protein for multiple Cdc34/Skp1/F-box protein complexes that regulate cell division and methionine biosynthesis in yeast. Genes Dev. 12, 692–705 (1998).
nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
8. Frescas, D. & Pagano, M. Deregulated proteolysis by the F-box proteins SKP2 and beta-TrCP: tipping the scales of cancer. Nat. Rev. Cancer 8, 438–449 (2008). 9. Welcker, M. & Clurman, B.E. FBW7 ubiquitin ligase: a tumour suppressor at the crossroads of cell division, growth and differentiation. Nat. Rev. Cancer 8, 83–93 (2008). 10. Yen, H.C. & Elledge, S.J. Identification of SCF ubiquitin ligase substrates by global protein stability profiling. Science 322, 923–929 (2008). 11. Smith, T.F., Gaitatzes, C., Saxena, K. & Neer, E.J. The WD repeat: a common architecture for diverse functions. Trends Biochem. Sci. 24, 181–185 (1999). 12. Makarova, K.S., Wolf, Y.I., Mekhedov, S.L., Mirkin, B.G. & Koonin, E.V. Ancestral paralogs and pseudoparalogs and their role in the emergence of the eukaryotic cell. Nucleic Acids Res. 33, 4626–4638 (2005). 13. Fulop, V. & Jones, D.T. Beta propellers: structural rigidity and functional diversity. Curr. Opin. Struct. Biol. 9, 715–721 (1999). 14. Nash, P. et al. Multisite phosphorylation of a CDK inhibitor sets a threshold for the onset of DNA replication. Nature 414, 514–521 (2001). 15. Rajagopalan, H. et al. Inactivation of hCDC4 can cause chromosomal instability. Nature 428, 77–81 (2004). 16. Takahashi, K. et al. Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131, 861–872 (2007). 17. Blanchard, J.E. et al. High-throughput screening identifies inhibitors of the SARS coronavirus main proteinase. Chem. Biol. 11, 1445–1453 (2004). 18. Brunel, J.M. BINOL: a versatile chiral reagent. Chem. Rev. 105, 857–897 (2005). 19. Barbey, R. et al. Inducible dissociation of SCF(Met30) ubiquitin ligase mediates a rapid transcriptional response to cadmium. EMBO J. 24, 521–532 (2005). 20. Orlicky, S., Tang, X., Willems, A., Tyers, M. & Sicheri, F. Structural basis for phosphodependent substrate selection and orientation by the SCFCdc4 ubiquitin ligase. Cell 112, 243–256 (2003). 21. Lagerstrom, M.C. & Schioth, H.B. Structural diversity of G protein-coupled receptors and significance for drug discovery. Nat. Rev. Drug Discov. 7, 339–357 (2008). 22. Loew, A., Ho, Y.K., Blundell, T. & Bax, B. Phosducin induces a structural change in transducin beta gamma. Structure 6, 1007–1019 (1998). 23. Vassilev, L.T. et al. In vivo activation of the p53 pathway by small-molecule antagonists of MDM2. Science 303, 844–848 (2004). 24. Fulop, V., Bocskei, Z. & Polgar, L. Prolyl oligopeptidase: an unusual beta-propeller domain regulates proteolysis. Cell 94, 161–170 (1998). 25. Juhasz, T., Szeltner, Z., Fulop, V. & Polgar, L. Unclosed beta-propellers display stable structures: implications for substrate access to the active site of prolyl oligopeptidase. J. Mol. Biol. 346, 907–917 (2005). 26. Suel, G.M., Lockless, S.W., Wall, M.A. & Ranganathan, R. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat. Struct. Biol. 10, 59–69 (2003). 27. May, L.T., Leach, K., Sexton, P.M. & Christopoulos, A. Allosteric modulation of G protein-coupled receptors. Annu. Rev. Pharmacol. Toxicol. 47, 1–51 (2007). 28. Wullschleger, S., Loewith, R. & Hall, M.N. TOR signaling in growth and metabolism. Cell 124, 471–484 (2006). 29. Hao, B., Oehlmann, S., Sowa, M.E., Harper, J.W. & Pavletich, N.P. Structure of a Fbw7-Skp1-cyclin E complex: multisite-phosphorylated substrate recognition by SCF ubiquitin ligases. Mol. Cell 26, 131–143 (2007).
737
ONLINE METHODS
© 2010 Nature America, Inc. All rights reserved.
Chemicals and reagents. An N-terminally labeled fluorescein phosphopeptide derived from cyclin E (GLLpTPPQSG, called CPD) was synthesized by the W.M. Keck Biotechnology Resource Center. Nonfluorescently labeled peptide Ac-GLLpTPPQSG was synthesized by Dalton Chemical. Small molecules were purchased from Maybridge plc. Skp1-Cdc4263-744, Skp1Cdc4263-744 R655A, Skp1-Cdc4263-744 R664A and Skp1-FBW7 were purified as previously described20. Purified complexes were passed over a Superdex S75 gel filtration column (GE Healthcare) equilibrated in 10 mM HEPES (pH 7.5), 250 mM NaCl and 1 mM DTT and concentrated to 20 mg/ml. Fluorescence polarization assays. A 50,000-compound Maybridge Screening Collection library (http://www.maybridge.com/) was screened in a 384-well format on a Beckman-Coulter Integrated Robotic System at the McMaster University HTS Laboratory. Assays contained 0.21 μM Skp1-Cdc4 complex and 10 nM fluorescently labeled cyclin E-derived phosphopeptide in 10 mM HEPES (pH 7.5), 40 mM NaCl, 0.1 mg/ml BSA and 1 mM DTT in a final volume of 23.5 μl per well. 1.5 μl of each library compound from a 1 mM stock in DMSO was added to a final concentration of 60 μM, mixed and incubated at 23 °C for 30 min. Samples were excited at 485 nm and emission was read at 535 nm on an Analyst HT plate reader (Molecular Devices). High controls contained 1.5 μl of DMSO only and low controls contained 1.5 μl of unlabeled cyclin E peptide in DMSO at a final concentration of 10 μM. Binding activity was calculated as the average sample value minus the mean of low controls divided by the mean of high controls minus the mean of low controls. Compounds were classified as initial hits if the binding value was <50% of control. Dose response curves were carried out using the same conditions as above and EC50 values calculated as previously described17. Determination of IC50 was performed by a nonlinear regression analysis using a sigmoidal dose response equation (variable slope) with no weighting or restraints (Graphpad Software). mP units were normalized to no compound controls, that is, 100% binding, for graphical representation. Interaction and ubiquitination assays. For full-length phospho-Sic1 substrate interaction assays, compounds (50 μM final) were added to 500 ng of GstSkp1-HisCdc4263–744 immobilized to glutathione sepharose resin in PBS, incubated for 30 min at 4 °C, followed by incubation with 0.1 μM of phosphorylated Sic1 for 1 h at 4 °C. Resin was washed three times with PBS and captured Sic1 protein visualized by anti-Sic1 immunoblot30. For ubiquitination assays, compounds were pre-incubated with recombinant E1 (0.4 μM), Cdc34 (2 μM), SCFCdc4 (0.2 μM) for 15 min at 23 °C in 12.5 μl reaction buffer containing 50 mM Tris (pH 7.5) 10 mM MgCl2, 2 mM ATP, 50 μM DTT and 24 μM ubiquitin. Reactions were initiated by addition of recombinant HisSic1 (0.2 μM) that had been previously phosphorylated with Cln2-Cdc28 kinase; reactions were incubated at 30 °C for indicated times and products detected by anti-Sic1 immunoblot30. Ubiquitination of Met4 by SCFMet30 was carried out as described19. X-ray structure determination. Crystals of Skp1-Cdc4263–744 were derived as described previously20. Crystals were transferred to buffer containing 100 mM Tris (pH 8.5), 1.5 M ammonium sulfate, 15% glycerol and 1 mM 1-(2-carboxynaphth1-yl)-2-naphthoic acid (SCF-I2) and incubated at 23 °C for 30 min to incorporate SCF-I2 into the crystal lattice and to cryoprotect the crystal. Soaked crystals
nature biotechnology
retained the same space group of P32 (a = 108.281, b = 108.281, c = 165.594, α = β = 90°, γ = 120°). All structural methods were carried out essentially as described previously20. High-resolution data were collected on a frozen crystal at the Advanced Photon Source on beamline BM-14C (λ = 0.9 Å) using a Quantum ADSC CCD detector. Data were processed using the HKL program suite31. The structure was solved by molecular replacement using PDB 1NEX as the search model in CNS32 and model building in O33. Geometric parameters and restraints for SCF-I2 were determined in HIC-UP34 and used in the refinement. Final rounds of refinement were carried out in Refmac35. The structure was refined to 2.6 Å to a working Rvalue of 21.1% and Rfree of 26.3% (Supplementary Table 1). Bioinformatic analysis of WD40 repeat domain proteins. Yeast WD40 repeat domains listed in Supplementary Table 2 were extracted manually from the Saccharomyces Genome Database (http://www.yeastgenome.org/), based on matches to the Interpro database using the Interpro scan (iprscan) program36. Interpro collates motif and domain information from the PROSITE, PRINTS, Pfam, ProDom, SMART, TIGERFAMs, PIR SUPERFAMILY, Gene3D and PANTHER databases37. Human WD40 repeat domain proteins were identified by scanning human annotated proteins from UniProt38 using a hidden Markov model approach39 for the presence of the WD40 repeat domain. 255 significant matches to WD40 repeats were obtained (Supplementary Table 3). Gene and protein annotations were obtained from the above databases and condensed into unified designations. We then used amino acid residues contacting SCF-I2 to define a perfect match search pattern, which was then modified to allow gaps within loop regions between WD40 β-strands and conserved amino acid substitutions. Regions containing the best matches to the query pattern were aligned using Probcons40. Human WD40 repeat domain proteins that contain a putative SCF-I2 binding pocket are listed in Supplementary Figure 4.
30. Tang, X. et al. Suprafacial orientation of the SCFCdc4 dimer accommodates multiple geometries for substrate ubiquitination. Cell 129, 1165–1176 (2007). 31. Otwinowski, Z. & Minor, W. Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol. 276, 307–326 (1997). 32. Brunger, A.T. et al. Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr. D Biol. Crystallogr. 54, 905–921 (1998). 33. Jones, T.A., Zou, J.Y., Cowan, S.W. & Kjeldgaard, M. Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr. A 47, 110–119 (1991). 34. Kleywegt, G.J. Crystallographic refinement of ligand complexes. Acta Crystallogr. D Biol. Crystallogr. 63, 94–100 (2007). 35. Murshudov, G.N., Vagin, A.A. & Dodson, E.J. Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr. D Biol. Crystallogr. 53, 240–255 (1997). 36. Christie, K.R. et al. Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res. 32, D311–D314 (2004). 37. Zdobnov, E.M. & Apweiler, R. InterProScan–an integration platform for the signaturerecognition methods in InterPro. Bioinformatics 17, 847–848 (2001). 38. UniProt Consortium The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. 38, D142–D148 (2010). 39. Eddy, S.R., Mitchison, G. & Durbin, R. Maximum discrimination hidden Markov models of sequence consensus. J. Comput. Biol. 2, 9–23 (1995). 40. Do, C.B., Mahabhashyam, M.S., Brudno, M. & Batzoglou, S. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15, 330–340 (2005).
doi:10.1038/nbt.1646
letters
© 2010 Nature America, Inc. All rights reserved.
Chemical genetics screen for enhancers of rapamycin identifies a specific inhibitor of an SCF family E3 ubiquitin ligase Mariam Aghajan1,11, Nao Jonai1,11, Karin Flick2,11, Fei Fu1, Manlin Luo3, Xiaolu Cai4, Ikram Ouni2, Nathan Pierce5, Xiaobo Tang6, Brett Lomenick1, Robert Damoiseaux7, Rui Hao1, Pierre M del Moral8, Rati Verma5, Ying Li4, Cheng Li9, Kendall N Houk4, Michael E Jung4, Ning Zheng6, Lan Huang10, Raymond J Deshaies5, Peter Kaiser2 & Jing Huang1 The target of rapamycin (TOR) plays a central role in eukaryotic cell growth control1. With prevalent hyperactivation of the mammalian TOR (mTOR) pathway in human cancers2, strategies to enhance TOR pathway inhibition are needed. We used a yeast-based screen to identify small-molecule enhancers of rapamycin (SMERs) and discovered an inhibitor (SMER3) of the Skp1-Cullin-F-box (SCF)Met30 ubiquitin ligase, a member of the SCF E3-ligase family, which regulates diverse cellular processes including transcription, cell-cycle control and immune response3. We show here that SMER3 inhibits SCFMet30 in vivo and in vitro, but not the closely related SCFCdc4. Furthermore, we demonstrate that SMER3 diminishes binding of the F-box subunit Met30 to the SCF core complex in vivo and show evidence for SMER3 directly binding to Met30. Our results show that there is no fundamental barrier to obtaining specific inhibitors to modulate function of individual SCF complexes. Conserved from yeast to humans, the target of rapamycin (TOR) protein is a serine/threonine protein kinase that controls various aspects of cellular growth by regulating translation, transcription, autophagy, cytoskeletal organization and metabolism1. Rapamycin, a secondary metabolite produced by Streptomyces hygroscopicus, specifically inhibits the activity of TOR, resulting in starvationlike phenotypes. Over the past few years, deregulation of pathways upstream and downstream of mTOR has been implicated in a variety of cancers, making the TOR signaling pathway a potential target for cancer therapy and rapamycin (and its analogs) an attractive anticancer agent2. Results from first-round clinical trials suggest that different types of tumors have different sensitivities to rapamycin and in many cases rapamycin does not completely halt the progress
of the disease4,5, thus making it desirable to identify small mole cules that can act in concert with rapamycin. Although combination strategies taking advantage of known interacting pathways (e.g., mTOR and IGF1R, PI3K or AKT) are being pursued6–8, an unbiased search for novel exploitable pathways has not been reported. The unbiased cell-based approach described here has the potential to elucidate new interactions of TOR signaling with other pathways and to provide valuable chemical tools to study signaling networks in various settings. We and others have previously shown that yeast is a promising platform for high-throughput discovery of small-molecule modifiers of rapamycin-sensitive TOR functions, including both suppressors (small-molecule inhibitors of rapamycin or SMIRs) and enhancers (SMERs), which show potential for modulating TORrelated processes in higher organisms9,10. Here we used the yeastbased screen to identify new SMERs targeting cell growth control (Online Methods). Using a ChemBridge DiverSet library containing 30,000 small molecules, we identified >400 compounds that, in the presence of a suboptimal rapamycin concentration, gave a ‘no growth’ phenotype (Supplementary Data Set 1). After removing toxic compounds using unrelated screening data sets (Online Methods), a total of 86 potential SMERs were identified, which were synthetic sick/lethal with rapamycin but showed little toxicity by themselves at the concentrations used (Supplementary Data Set 2 and Supplementary Fig. 1). The SMERs encompass a variety of modes of action and biological activities, including direct inhibition of mTOR kinase activity, new post-translational regulation of mTOR function, and inhibition of patient-derived brain tumor initiating cells (unpublished data). Five structurally distinct molecules that exhibited differing effects on growth (Online Methods) were selected for further analysis (Fig. 1a).
1Department
of Molecular and Medical Pharmacology, David Geffen School of Medicine, and the Molecular Biology Institute, University of California, Los Angeles, California, USA. 2Department of Biological Chemistry, School of Medicine, University of California, Irvine, California, USA. 3Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. 4Department of Chemistry and Biochemistry, University of California, Los Angeles, California, USA. 5Department of Biology, Howard Hughes Medical Institute, California Institute of Technology, Pasadena, California, USA. 6Department of Pharmacology, Howard Hughes Medical Institute, University of Washington, Seattle, Washington, USA. 7Molecular Screening Shared Resource, University of California, Los Angeles, California, USA. 8Roche Diagnostics Corporation, Roche Applied Science, Indianapolis, Indiana, USA. 9Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA. 10Departments of Physiology & Biophysics and Developmental & Cell Biology, University of California, Irvine, California, USA. 11These authors contributed equally to this work. Correspondence should be addressed to J.H. ([email protected]) or P.K. ([email protected]). Received 9 March; accepted 9 May; published online 27 June 2010; doi:10.1038/nbt.1645
738
VOLUME 28 NUMBER 7 JULY 2010 nature biotechnology
letters b
c SMER3
SMER1
SMER4 SMER2 SMER5 Rapamycin
.0
Low
The primary challenge for phenotype-based chemical genetic screens is the subsequent target identification, for which a variety of technologies—from affinity to genomics based—has been developed (see ref. 11 and reviews therein). We first sought to take advantage of the tremendous information on gene expression related to various cellular pathways in yeast and performed genome-wide expression profiling using DNA microarrays. We expected that expression profile changes induced by SMERs can be linked to gene expression changes caused by genetic perturbations 12. To capture early and/or direct transcriptome changes and avoid secondary effects, cells were treated with SMERs for a short period (30 min) and the extracted RNA was processed to probe Affymetrix GeneChips (Online Methods). The hierarchical clustering pattern of our microarray data classi fied the five SMERs identified from the screen into three distinct groups (Fig. 1b). Treatment of yeast cells with SMER2, 4 or 5 had no obvious effect on global gene transcription, whereas SMER1’s effect on transcription shared extensive similarity with rapamycin (M.A., unpublished data). SMER3′s expression profile, on the other hand, is different from all the others. Consistent with hierarchical clustering, principal components analysis (Fig. 1c) also readily distinguishes these effects on gene expression. nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
–2
Figure 1 Two unsupervised data analyses classify five SMERs into three different groups based on their gene expression profiles. (a) Chemical structures of SMER1 to SMER5. (b) Two-dimensional hierarchical clustering reveals that the expression profile of SMER1 is similar to that of rapamycin (Rapa), whereas the profiles of SMERs 2, 4 and 5 are indistinguishable from that of DMSO (solvent) control. The profile of SMER3 is distinct. Each row corresponds to a gene, and each column corresponds to an experimental sample. (c) Principal component analysis is consistent with hierarchical clustering. Light blue, DMSO; blue, SMER1; cyan, SMER2; red, SMER3; sage, SMER4; chartreuse, SMER5; green, rapamycin. Replicates were obtained from independent small-molecule treatments in separate experiments.
–3
© 2010 Nature America, Inc. All rights reserved.
DMSO
.3 –0 .4 0. 4 1. 3 2. 1 3. 0
SMER5
.1
SMER4
–1
SMER2 SMER1
SMER3
We focused primarily on SMER3, given its distinct profile. Notably, a set of methionine biosynthesis genes (referred to as MET-genes hereafter) was upregulated in SMER3-treated cells (Supplementary Tables 1 and 2). GO analysis revealed that, in addition to the enrichment of sulfur metabolism genes among the induced group, genes involved in cell-cycle regulation were overrepresented in the downregulated group of SMER3-specific genes (Supplementary Tables 1 and 2). Induction of MET-gene expression in response to SMER3 exposure suggested that the cellular pathway controlling homeo stasis of sulfur-containing compounds was a possible target for SMER3. The key regulator of this pathway is the ubiquitin ligase SCFMet30, which restrains the transcriptional activator Met4 in an inactive state in a methionine-replete media by attachment of a regulatory ubiquitin chain 13. Inactivation of SCFMet30 prevents Met4 ubiquitination, permitting the formation of an active Met4containing transcription complex that induces expression of the MET-genes and blocks cell proliferation. One hypothesis to explain the METgene activation and growth inhibition in SMER3-treated cells is that SMER3 inhibits SCFMet30. In agreement with this notion, Met4 ubiquitination was blocked in cells exposed to SMER3 (but not to rapamycin) (Fig. 2a). Furthermore, genetic analyses have previously demonstrated that deubi quitinated Met4 mediates cell cycle arrest High upon inactivation of SCFMet30 (ref. 13), and deletion of MET4 rescues lethality of Relative expression met30Δ (ref. 14). Notably, met4Δ cells were also less susceptible to growth inhibition by SMER3 (but not rapamycin, exemplifying specificity) (Fig. 2b and Supplementary Fig. 2). These findings are consistent with SMER3 being an inhibitor of SCFMet30. However, the incomplete resistance of met4Δ to SMER3 (Fig. 2b) suggests that SMER3 likely has additional targets other than SCFMet30 and that cell growth inhibition by SMER3 is not solely due to SCFMet30 inhibition. This is not uncommon as even imatinib (Gleevec), which was originally believed to be a highly specific inhibitor of BCR-Abl, is now appreciated to exert its biological effects through protein kinases in addition to its intended target15. SMER3 enhances rapamycin’s effect and also inhibits SCFMet30, suggesting a connection between the TOR and SCFMet30 pathways. To test whether SMER3 functions as an enhancer of rapamycin through inhibition of SCFMet30, we asked if genetic inhibition of SCFMet30 could mimic SMER3 in the synergistic effect with rapamycin. Indeed, hypomorphic alleles of the individual components of SCF Met30 and its E2 ubiquitin-conjugating enzyme, Cdc34, were hypersensitive to rapamycin (Fig. 2c). The synthetic lethality with rapamycin appears to arise largely from reduced SCFMet30 activity because inhibition of Cdc4, which forms a related, essential SCFCdc4 ubiquitin ligase, only resulted in minor rapamycin hypersensitivity (Fig. 2c). Together these results suggest that SMER3 enhances rapamycin’s growth inhibitory effect by inhibition of SCFMet30.
Rapa Rapa SMER1 SMER1 SMER1+Rapa SMER1+Rapa SMER3 DMSO SMER2 DMSO SMER5 DMSO SMER4
a
739
letters
d
5′ reaction
25′ reaction
a
No E2
Figure 2 SMER3 targets SCFMet30. SMER3 (µM) (a) Biochemical evidence for SCFMet30 SMER3 (µM) RAPA (nM) 0 1 5 10 20 50 100 inhibition by SMER3 but not rapamycin. Yeast met4� WT WT cells were cultured in YPDA medium to mid-log 0 50 100 200 0 90 0 5 10 15 30 60 Met4-Ub(n) phase, treated with indicated concentrations Met4Ub of SMER3 or rapamycin for 45 min, and total n.s.* * Met4 Met4 protein was extracted for western blot analyses (Online Methods). Met4 ubiquitination in vivo * can be directly assessed by immunoblotting because ubiquitinated forms of Met4 are not degraded by proteasomes and can thus be n.s.* * detected due to a characteristic mobility shift Sic1-Ub(n) on denaturing gels29. The asterisks (*) denote WT DMSO nonspecific bands immunoreactive to the antiWT SMER3 Met4 antibody (generous gift from M. Tyers). met4∆ DMSO 1.4 (b) SMER3 resistance in met4Δ cells. Yeast met4∆ SMER3 Sic1 1.2 cells were treated with either vehicle (DMSO) 1.0 or 4 μM SMER3, and growth curve analysis was performed with an automated absorbance 0.8 reader measuring O.D. at 595 nm every 0.6 Sic1-Ub(n) 30 min (Online Methods). Cell growth was 0.4 measured in liquid because SMER3 activity is 0.2 undetectable on agar. (c) Genetic interaction between SCFMet30 and TOR. Temperature0 0 10 20 30 sensitive mutants as indicated were grown Sic1 Hours at 25 °C to mid-log phase in YPDA medium 0 2.5 nM rapa and serial dilutions were spotted onto plates Sic1 (25′) with or without 2.5 nM rapamycin. The plates 1.4 wt were incubated at the permissive temperatures cdc34-3 1.2 for the mutants: 28 °C for cdc34-3, cdc53-1, cdc53-1 1.0 Sic1 (5′) cdc4-3 and met30-6 because these mutants cdc4-3 exhibited fitness defects at 30 °C even 0.8 met30-6 without rapamycin, or 30 °C (standard growth 0.6 temperature) for met30-9 and skp1-25 met30-9 0.4 because these alleles are not temperature Met4 (25′) skp1-25 0.2 sensitive until 37 °C. (d) SMER3 specifically 0 –1 –2 –3 0 –1 –2 –3 inhibits SCFMet30 E3 ligase in vitro. µM µM µM µM µM SO µM 5 0 M 1 20 Log5 cell dilution 10 50 D Components of SCFMet30 and SCFCdc4 E3 10 ligases were expressed and purified from insect cells and used in in vitro ubiquitination assays. Reaction products were analyzed by immunoblotting. The asterisk indicates a protein cross-reacting with the anti-Met4 antibody. (e) The amount of unubiquitinated substrate (Met4 and Sic1) was quantified on a Fuji LAS-4000 imaging system, and inhibition was expressed as the ratio of unubiquitinated substrate in DMSO/SMER3.
c
25′ reaction
e
Ratio unmodified substrate DMSO/SMER3
© 2010 Nature America, Inc. All rights reserved.
OD595
b
To test whether SMER3 can directly inhibit SCF ubiquitin ligases, we assayed ubiquitination of well-established SCF substrates by purified SCF complexes in vitro. Indeed, addition of SMER3 to the ligase reactions inhibited ubiquitination of Met4 by SCFMet30 in a dosedependent manner, whereas SMER1 had no effect (Supplementary Fig. 3). To assess specificity of SMER3, we also examined in vitro ubiquitination of Sic1 by the related WD40 repeat containing SCFCdc4. For direct comparison of SMER3’s effects, activities of SCF Met30 and SCFCdc4 were analyzed in a single reaction mix containing both ligase complexes and their substrates Met4 and Sic1 (Fig. 2d). Owing to the faster kinetics of the SCFCdc4-catalyzed ubiquitination, the Sic1 reaction was probed at two incubation times: first at 5 min, corresponding to the linear range for the SCFCdc4 reaction (at which time there was no Met4 ubiquitination by SCFMet30), then at 25 min, corresponding to the linear range of the SCFMet30 reaction. Consistent with the selective in vivo effect of SMER3 on SCFMet30, in vitro ubiquitination of Sic1 was unaffected by SMER3 (Fig. 2d,e). In some experiments with SCFCdc4, a modest effect is seen on high molecular weight conjugates (data not shown), but it is clear from the direct head-to-head comparison where both enzymes are in the same tube that there is a very large difference in sensitivity of the two ligase complexes toward SMER3. To investigate the mechanisms of specificity in the inhibition of SCFMet30 by SMER3, we examined the association of Met30 and the 740
SCF core component Skp1. We found that Met30 was no longer bound to Skp1 immunopurified from cells treated with SMER3 (Fig. 3a), suggesting that SMER3 prevents the assembly of SCFMet30 or induces SCF complex dissociation (Supplementary Note). We next asked whether SMER3 affects the binding of other Skp1 interactors or acts specifically on SCFMet30. Skp1-bound proteins were purified from cells treated with SMER3 or DMSO solvent control and their relative abundance was determined using stable isotope labeling with amino acids in cell culture (SILAC)-based quantitative mass spectrometry. Among the 11 identified F-box proteins, only binding of Met30 to Skp1 was substantially inhibited by SMER3 (Fig. 3b). Skp1 and Met30 protein levels were not affected by SMER3, nor were the interactions of the SCF core components Cdc53 (cullin) and Hrt1 (RING component) with Skp1 (Fig. 3b and Supplementary Fig. 4 ). To further address the specificity of SMER3 for Met30 in vivo, we compared the cell cycle arrest phenotype induced by SMER3 to that of temperature-sensitive mutants of Met30, Cdc4 and the SCF components induced by nonpermissive temperatures. SMER3 induces a phenotype resembling that of genetic inhibition of Met30, whereas genetic inhibition of general SCF components or the specific F-box subunit Cdc4 gives a completely different elongated cell cycle arrest phenotype (Fig. 3c). Inhibition of any of the SCF core components simultaneously blocks SCFMet30 and SCFCdc4, yet the arrest phenotypes VOLUME 28 NUMBER 7 JULY 2010 nature biotechnology
letters Figure 3 Molecular mechanism for the Lysate αmyc-IP specificity of SCFMet30 inhibition by SMER3. 1.0 0.8 (a) Protein-protein interaction between Met30 0.6 and Skp1 is diminished by SMER3 in vivo. 0.4 Yeast cells expressing endogenous 13Myc-tagged 0.2 Myc Met30 Met30 were either untreated, or treated with 0 solvent control (DMSO) or 30 μM SMER3 for HA Skp1 30 min at 30 °C. 13MycMet30 was immuno purified and immunocomplexes were analyzed SCF-core F-box proteins for Skp1 binding by western blot analysis. 10 min Thermolysin 0 min (b) SMER3 specifically targets SCFMet30 SMER3 (µM) 0 20 200 0 20 200 Wild type met30-6 in vivo as determined by quantitative mass RGS6H spectrometry. A yeast strain expressing endogenous HBTH-tagged Skp1 was grown cdc34-3 met30-9 ySkp1 in medium containing either heavy (13C/15N) β-actin or light (12C/14N) arginine and lysine to Wild type metabolically label proteins. Skp1-bound cdc53-1 + 10 µM SMER3 0 min 15 min 30 min 45 min Thermolysin proteins were purified and analyzed by mass 0 20100 0 20 100 0 20 100 0 20100 SMER3 (µM) spectrometry. Abundance ratios for SCF Wild type Met30-RGS6H skp1-25 + 30 µM SMER3 components identified by multiple quantifiable * 60 peptides are shown as SILAC ratios of ‘light’ 40 GAPDH cdc4-3 Skp1::GAL1-SKP1 (SMER3-treated) over ‘heavy’ (DMSO-treated) + 30 µM SMER3 Grown in dextrose 60 min 75 min 90 min Overnight 37 °C peptide intensities. (c) SMER3 specificity for 0 20100 0 20100 0 20100 0 20100 cdc4-3 SCFMet30 versus SCFCdc4 as verified by cell Met30-RGS6H cdc4-3 + 30 µM SMER3 cycle arrest morphology. Temperature-sensitive 25 °C * 60 mutants were shifted to 37 °C for 4 h. 40 GAPDH The Skp1 depletion phenotype was observed after repression of Skp1 expression in dextrose medium for 12 h. SMER3 treatment of cells was for 6 h. Scale bar, 10 µm. (d) SMER3 protects endogenous Met30 from protease digestion. Yeast cells expressing Met30-RGS6H were lysed and digested with thermolysin in the presence of SMER3 versus DMSO control, and the extent of proteolysis was analyzed by immunoblotting. (e) SMER3 protects recombinant Met30 from protease digestion. The asterisks (*) indicate the Met30 fragment that is protected by SMER3 from protease digestion. Full-length blots for a,d and e are in Supplementary Figure 8.
a
Sk C p1 dc 5 H 3 M rt 1 et 30 Yl Mfb r3 1 52 C w dc U 4 fo 1 H Y j rt l1 3 Ym 49 r2 w 58 G c M rr dm 1 30
SILAC ratio (L/H)
N o ta g SM ER 3
o ta SM g ER 3 D M S – O
– N
D
M SO
b
d
c
© 2010 Nature America, Inc. All rights reserved.
e
of SCF core mutants strongly resemble Cdc4 inhibition (Fig. 3c). This indicates that the cdc4 cell cycle arrest morphology is ‘dominant’ over that of met30 and that inhibition by SMER3 is indeed specific for Met30 without affecting Cdc4 or SCF in general. Additionally, whereas SMER3-treated cdc4 temperature-sensitive mutant cells have a phenotype at permissive temperatures resembling genetic inhibition of Met30, their phenotype changes to that resembling Cdc4 inhibition when shifted to nonpermissive temperatures (Fig. 3c), further demonstrating that SMER3 has little effect on Cdc4 in vivo. To test direct binding of SMER3, we employed differential scanning fluorimetry16 using purified Met30-Skp1 versus Skp1 proteins (Met30 cannot be obtained in isolation without Skp1). The addition of SMER3 altered the melting temperature of Met30-Skp1, but not that of Skp1 alone, indicating that SMER3 does indeed directly target the Met30-Skp1 complex (Table 1). The simplest model to explain the biochemical specificities of SMER3 is that it binds directly to Met30 but not Skp1. Because drug binding often stabilizes a folded state or conformation of its protein target, leading to increased resistance to protease digestion (as assayed by drug affinity responsive target stability or DARTS11), we tested whether protease susceptibility of Met30 is altered by the presence of SMER3. Indeed, when yeast cell lysates were treated with the protease thermolysin, we observed SMER3dependent protection of Met30 (Fig. 3d,e and Supplementary Fig. 5), but not Skp1. These experiments suggest that Met30 is the direct molecular target of SMER3, although we cannot exclude that SMER3 binding to Met30 may require Skp1. Met30 contains at its N terminus the F-box motif, which binds Skp1, and at the C terminus the WD40 repeats, which serve as proteinprotein interaction motifs for substrate binding17. We found that the Met30 F-box, but not the Cdc4 F-box, was protected to a similar extent as full-length Met30 by the presence of SMER3 in DARTS experiments (Supplementary Figs. 6 and 7). In contrast, SMER3 nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
failed to protect the WD40 repeat domain of Met30 (Supplementary Fig. 6 and Supplementary Note). These results suggest that SMER3 may recognize the F-box motif of Met30, yet further investigation is required to understand how SMER3 binds to Met30. In this study, we demonstrated that SMER3 (i) specifically inhibits in vitro ubiquitination by recombinant reconstituted SCFMet30 (Fig. 2d,e and Supplementary Fig. 3), (ii) selectively disassembles or prevents assembly of SCFMet30 but not other SCF complexes in vivo (Fig. 3a–c) and (iii) directly binds to Met30 (or Met30-Skp1 complex), but not Skp1 alone (Fig. 3d and Table 1). Together, these experiments suggest that SMER3 specifically inactivates SCFMet30 by binding to Met30. Designing specific inhibitors for SCFs has historically been considered highly challenging owing to their common scaffolding sub units and similar enzymatic steps18–21, reminiscent of the obstacles faced with kinase inhibitors22. The biological specificities demonstrated by this first-generation hit provide encouraging examples for
Table 1 SMER3 binding to Met30-Skp1 in DSF Tm (°C) DMSO 1 μM SMER3 10 μM SMER3 100 μM SMER3 1 μM Rapamycin 10 μM Rapamycin
Met30 (2 μM)
Met30 (4 μM)
Skp1 (5 μM)
45.17 46.00 27.85 27.13 46.30
48.03 48.02 26.90 26.45
45.65 46.17 46.42 45.02 46.08 45.98
47.70
SMER3 directly binds to Met30-Skp1, but not Skp1 alone, as determined by differential scanning fluorimetry (DSF). Met30 and Skp1 were either co-expressed or Skp1 was expressed alone in insect cells and the complex was purified based on a GST-tag fused to Met30, whereas Skp1 was purified based on a His-tag fused to Skp1. Protein, drug and Sypro Orange dye were added to 384-well plates and melting curve fluorescent signal was detected using the LightCycler 480 System II. Melting temperatures (Tm) were determined by the LightCycler 480 Protein Melt Analysis Tool.
741
© 2010 Nature America, Inc. All rights reserved.
letters such potential and highlight the importance of unbiased cell-based approaches in drug discovery and in biological studies. In conclusion, we identified several small-molecule enhancers of rapamycin from a phenotype-based chemical genetic screen. Genomic, genetic and biochemical analyses indicate that one of the SMERs (SMER3) inhibits an E3 ubiquitin ligase in yeast, SCFMet30, which coordinates nutritional responses with cell proliferation. Because increasing evidence suggests that ubiquitin E3 ligases are involved in tumorigenesis23, we believe that SMER3 and SMER3-like molecules represent a class of E3 ubiquitin ligase inhibitors that can potentially be used as anti-cancer drugs in the future. In addition, our study provides a link between the TOR pathway and a separate network that monitors the sulfur-containing amino acids methionine, cysteine and the primary methyl group donor S-adenosylmethionine (SAM). This genetic interaction may be simply explained by the convergence of these two pathways on regulation of the G1 cyclins (refs. 14,24 and see Supplementary Table 2 for SMER3). Alternatively, it is possible that more complicated co-regulations occur in which TOR inhibition, although insufficient for activation of the ‘sulfur starvation’ response, may in fact enhance this response during times of sulfur depletion (Supplementary Note). We have preliminary data suggesting that SMER3 also acts in a synthetic lethal fashion with rapamycin in human A549 lung cancer cells (data not shown), but the target pathway for SMER3 in mammalian cells has yet to be determined. It is noteworthy that cancer cells and tumors are particularly dependent on metabolic networks linked to methionine25,26, indicating that mammalian processes similar to that controlled by SCFMet30 in yeast might provide potential anti-cancer targets. Synthetic lethal interactions between rapamycin and the ubiquitinlike modification systems (Fig. 2c) suggest potential therapeutic benefit for combination therapy with rapamycin and any small molecule that inhibits a component of SCF or an activator of SCF, such as in sensitizing a tumor’s response to rapamycin and/or preempting the development of drug resistance. Beyond cancer and tumor-prone syndromes, a variety of other diseases including hypertrophy, neurodegeneration and aging are linked to the TOR pathway27,28. For example, several SMERs have been identified that effectively enhance autophagy and reduce toxicity in Huntington’s disease models through unknown mechanisms10. Similar chemical genetic approaches are applicable to the study of other pathways, drugs and diseases. Methods Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturebiotechnology/. Accession codes. NCBI Gene Expression Omnibus (GEO), GSE22269. The library database and complete genomic data sets are also available on the web (http://labs.pharmacology.ucla.edu/huanglab). Note: Supplementary information is available on the Nature Biotechnology website. Acknowledgments We are grateful for grant support from the American Cancer Society and the U.S. National Institutes of Health and for traineeship support of M.A. and B.L. by the NIH UCLA Chemistry−Biology Interface Predoctoral Training Program. N.Z. and R.J.D. are investigators of the Howard Hughes Medical Institute. We thank D. Skowyra (Saint Louis University) and M. Tyers (University of Edinburgh, UK) for their generous gifts of bacculo virus constructs and anti-Met4 antibody, respectively. We also thank J. Salcedo (Roche Diagnostics Corporation) for support toward differential scanning fluorimetry experiments. AUTHOR CONTRIBUTIONS Figure 1a, M.A. and R.D.; 1b, F.F. and M.L.; 1c, C.L. and J.H.; 2a, N.J.; 2b, N.J. and R.H.; 2c, K.F.; 2d,e, I.O. and N.P.; 3a, K.F.; 3b, K.F. and L.H.; 3c, K.F.; 3d, N.J.; 3e,
742
M.A.; Table 1, X.T., M.A. and P.M.d.M.; X.C., B.L., R.V., Y.L., K.N.H., M.E.J. and N.Z. contributed new reagents and analysis; all authors discussed data; M.A., F.F., M.E.J., R.J.D., P.K. and J.H. wrote the paper with input from all authors. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/.
1. Wullschleger, S., Loewith, R. & Hall, M.N. TOR signaling in growth and metabolism. Cell 124, 471–484 (2006). 2. Bjornsti, M.A. & Houghton, P.J. The TOR pathway: a target for cancer therapy. Nat. Rev. Cancer 4, 335–348 (2004). 3. Petroski, M.D. & Deshaies, R.J. Function and regulation of cullin-RING ubiquitin ligases. Natl. Rev. 6, 9–20 (2005). 4. Easton, J.B. & Houghton, P.J. mTOR and cancer therapy. Oncogene 25, 6436–6446 (2006). 5. Cloughesy, T.F. et al. Antitumor activity of rapamycin in a Phase I trial for patients with recurrent PTEN-deficient glioblastoma. PLoS Med. 5, e8 (2008). 6. Chiang, G.G. & Abraham, R.T. Targeting the mTOR signaling network in cancer. Trends Mol. Med. 13, 433–442 (2007). 7. Shaw, R.J. & Cantley, L.C. Ras, PI(3)K and mTOR signalling controls tumour cell growth. Nature 441, 424–430 (2006). 8. Guertin, D.A. & Sabatini, D.M. Defining the role of mTOR in cancer. Cancer Cell 12, 9–22 (2007). 9. Huang, J. et al. Finding new components of the target of rapamycin (TOR) signaling network through chemical genetics and proteome chips. Proc. Natl. Acad. Sci. USA 101, 16594–16599 (2004). 10. Sarkar, S. et al. Small molecules enhance autophagy and reduce toxicity in Huntington’s disease models. Nat. Chem. Biol. 3, 331–338 (2007). 11. Lomenick, B. et al. Target identification using drug affinity responsive target stability (DARTS). Proc. Natl. Acad. Sci. USA (in the press) (2009). 12. Hughes, T.R. et al. Functional discovery via a compendium of expression profiles. Cell 102, 109–126 (2000). 13. Kaiser, P., Su, N.Y., Yen, J.L., Ouni, I. & Flick, K. The yeast ubiquitin ligase SCFMet30: connecting environmental and intracellular conditions to cell division. Cell Div. 1, 16 (2006). 14. Patton, E.E. et al. SCF(Met30)-mediated control of the transcriptional activator Met4 is required for the G(1)-S transition. EMBO J. 19, 1613–1624 (2000). 15. Sawyers, C.L. Imatinib GIST keeps finding new indications: successful treatment of dermatofibrosarcoma protuberans by targeted inhibition of the platelet-derived growth factor receptor. J. Clin. Oncol. 20, 3568–3569 (2002). 16. Niesen, F.H., Berglund, H. & Vedadi, M. The use of differential scanning fluorimetry to detect ligand interactions that promote protein stability. Nat. Protoc. 2, 2212–2221 (2007). 17. Bai, C. et al. SKP1 connects cell cycle regulators to the ubiquitin proteolysis machinery through a novel motif, the F-box. Cell 86, 263–274 (1996). 18. Zheng, N. et al. Structure of the Cul1-Rbx1-Skp1-F boxSkp2 SCF ubiquitin ligase complex. Nature 416, 703–709 (2002). 19. Chen, Q. et al. Targeting the p27 E3 ligase SCF(Skp2) results in p27- and Skp2mediated cell-cycle arrest and activation of autophagy. Blood 111, 4690–4699 (2008). 20. Nakajima, H., Fujiwara, H., Furuichi, Y., Tanaka, K. & Shimbara, N. A novel smallmolecule inhibitor of NF-kappaB signaling. Biochem. Biophys. Res. Commun. 368, 1007–1013 (2008). 21. Soucy, T.A. et al. An inhibitor of NEDD8-activating enzyme as a new approach to treat cancer. Nature 458, 732–736 (2009). 22. Knight, Z.A. & Shokat, K.M. Features of selective kinase inhibitors. Chem. Biol. 12, 621–637 (2005). 23. Nalepa, G., Rolfe, M. & Harper, J.W. Drug discovery in the ubiquitin-proteasome system. Nat. Rev. Drug Discov. 5, 596–613 (2006). 24. Zinzalla, V., Graziola, M., Mastriani, A., Vanoni, M. & Alberghina, L. Rapamycinmediated G1 arrest involves regulation of the Cdk inhibitor Sic1 in Saccharomyces cerevisiae. Mol. Microbiol. 63, 1482–1494 (2007). 25. Halpern, B.C., Clark, B.R., Hardy, D.N., Halpern, R.M. & Smith, R.A. The effect of replacement of methionine by homocystine on survival of malignant and normal adult mammalian cells in culture. Proc. Natl. Acad. Sci. USA 71, 1133–1136 (1974). 26. Guo, H. et al. Therapeutic tumor-specific cell cycle block induced by methionine starvation in vivo. Cancer Res. 53, 5676–5679 (1993). 27. Lee, C.H., Inoki, K. & Guan, K.L. mTOR pathway as a target in tissue hypertrophy. Annu. Rev. Pharmacol. Toxicol. 47, 443–467 (2007). 28. Harrison, D.E. et al. Rapamycin fed late in life extends lifespan in genetically heterogeneous mice. Nature 460, 392–395 (2009). 29. Flick, K. et al. Proteolysis-independent regulation of the transcription factor Met4 by a single Lys 48-linked ubiquitin chain. Nat. Cell Biol. 6, 634–641 (2004).
VOLUME 28 NUMBER 7 JULY 2010 nature biotechnology
ONLINE METHODS
© 2010 Nature America, Inc. All rights reserved.
Chemical genetic screen. The screen for SMERs was carried out as described9 with several modifications. The earlier screen, using a high rapamycin concentration, was designed to identify potent SMIR activities. Here, the following modifications were made to optimize the identification of SMERs: (i) lowering the concentration of rapamycin such that it inhibits wild-type yeast only partially, thereby facilitating the detection of synthetic lethal hits, and (ii) raising the final concentrations of library compounds in the medium (2.5×; ~25 μM) to better recognize (and eliminate) hits that are cytotoxic even without rapamycin. Other changes include the use of a larger collection of the ChemBridge DiverSet library and implementation of robotic pin transfer to deliver library compounds. The library database is available on our web site (http://labs.pharmacology.ucla.edu/huanglab/). Selection of SMERs. Yeast growth was scored using a scale of “1–” (least) to “6–” (most severe growth inhibition) by visual inspection once a day for 5 d, resulting in 446 compounds that caused varying degrees of ‘no growth’ pheno type. Growth results were compared to OD data obtained from an unrelated screen30 performed with the same compound library to eliminate potential nonspecific toxic hits. This was done by categorizing compounds as corresponding to growth that is at least 1 s.d. below the average OD reading, at least 1 s.d. above the average reading, and no significant change. If a compound was seen to significantly inhibit growth in this unrelated screen and our screen, it was eliminated as a nonspecific toxic hit, narrowing the list of 446 hits to 86 compounds (SMERs). Further, the SMERs were sorted based on the OD readings from the unrelated screen, followed by sorting based on our own growth rankings, allowing for growth comparison between both screens owing to compound treatment. SMERs 1, 3 and 4 were selected based on their ability to severely inhibit growth in our screen (6– score), while exhibiting no effects on growth in unrelated screens. SMERs 2 and 5, on the other hand, only weakly affected growth in our screen (2– score) and displayed no effects on growth in unrelated screens. By selecting structurally distinct compounds that exhibit differing degrees of growth inhibition, we hoped to isolate SMERs that have different cellular targets and/or mechanisms of action. Expression analysis (experimental part). Yeast cells were grown to midlog phase (0.5–2 × 107 cells/ml) at 30 °C, in YPD medium, unless otherwise specified, before treatment with small molecules for 30 min. Treated cells were harvested and flash frozen in liquid nitrogen. Total RNA was isolated using the RiboPure Yeast kit (Ambion) and RNA quality was checked using an Agilent 2100 Bioanalyzer (Agilent Technologies). Biotin-labeled cRNA probes were generated from total RNA using the One-Cycle Target Labeling Assay and used for hybridization to Affymetrix GeneChip Yeast Genome 2.0 arrays (Affymetrix), according to manufacturer’s specifications. The Yeast 2.0 array includes ~5,744 probe sets for 5,841 of the 5,845 genes present in S. cerevisiae and 5,021 probe sets for all 5,031 genes present in S. pombe. The arrays were scanned using Affymetrix GeneChip Scanner 3000 (Affymetrix) and raw data generated using the GeneChip Operating System GCOSv1.4. Raw data were further processed and analyzed using GCOS or dChip (see below) as indicated. Gene expression analysis (computational part). Gene expression data were normalized in dChip (http://www.dchip.org/)31. Model-based expression indices were calculated and log2 transformed. Genes were filtered by two criteria: the s.d. across the samples had to be between 0.5 and 1,000, and the genes had to be present in at least 20% of the samples. Hierarchical clustering and principal component analysis on filtered genes were performed in dChip. Differentially expressed genes were selected by the following criteria: there should at least be a twofold difference in expression between treatments and controls; the P-value of the two-tailed, two-sample unpaired equal variance t-test must be <0.05. GO (gene ontology)32 analysis was performed using GO term finder on the Saccharomyces Genome Database (http://www.yeastgenome. org/). The complete data sets are available on the web (http://labs.pharmacology. ucla.edu/huanglab/). Protein analyses. Yeast cells were cultured to mid-log phase (~0.8 × 107 cells/ml) at 30 °C, in YPD medium, or YPD plus adenine (YPDA) medium
doi:10.1038/nbt.1645
where indicated, for small-molecule treatment. Equal concentration of DMSO carrier (0.45% here) was used across all samples. For western blot analysis, protein extracts were prepared in 8 M urea buffer (8 M urea, 0.2% SDS, 200 mM NaCl, 100 mM Tris-HCl pH 7.5, phosphatase inhibitors (10 mM sodium pyrophosphate, 5 mM EDTA, 5 mM EGTA, 50 mM NaF, and 0.1 mM orthovanadate), and complete protease inhibitor cocktail (Roche)). Cell pellets were broken with glass beads for 2 × 40 s at 4 °C in a FastPrep-24 (MP Biomedicals). Whole cell lysates were collected after centrifugation (18,000g, 10 min) and diluted to a final concentration of 4 M urea for SDS-PAGE. Proteins were transferred to a PVDF membrane and probed with antibodies as indicated. For immunoprecipitation cells were lysed in (50 mM HEPES pH 7.5, 0.2% Triton X-100, 200 mM NaCl, 10% glycerol, 1 mM DTT, 10 mM Na-pyrophosphate, 5 mM EDTA, 5 mM EGTA, 50 mM NaF, 0.1 mM orthovanadate, 1 mM PMSF, and 1 μg/ml each aprotinin, leupeptin and pepstatin). Protein complexes were immunopurified with anti-myc antibodies (Santa Cruz Biotechnology), washed three times with 1 ml lysis buffer, and proteins were eluted by boiling in SDS-PAGE loading buffer before analysis by immunoblotting. Generation of the met4 null strain. The met4Δ strain was generated in the BY4741 strain background via one-step replacement of the MET4 open reading frame with a kanMX6 cassette33 and selected for YPD/G418 plates supplemented with 20 μM S-adenosylmethionine (SAM, which is required for viability of the met4Δ mutant in this background). All deletions were verified with PCR. Primer sequences used are shown below. MET4-F1: 5′-aagcgcacttctgataagcacttttattcctttttttccactgtgaacgcggatccccgggttaattaa-3′ MET4-R1: 5′-tgcacgtatatatatatatatatataattaaactgtatagtctgttattgaattcgagctcgtttaaac-3′ MET4-C1: 5′-ctcgtcgcacatgctattgt-3′ MET4-C2: 5′-ccacgtaggccaactgttct-3′ Kan_755R: 5′- atacctggaatgctgttttgccgg-3′ Growth curve analysis. Wild-type or met4Δ yeast cells in the BY4741 background were seeded in a 96-well plate format at an initial cell density of 2 × 105 cells/ml in YPD + SAM (50 μM), in the presence of SMER3, rapamycin, or DMSO carrier control. Plates were incubated, without shaking, at 30 °C and automated absorbance (optical density or O.D.) measurements of each culture well were taken at 595 nm every 30 min for 30 h using SpectraMax 340PC microplate reader (Molecular Devices). Synthesis of 9H-Indeno[2,1-b][1,2,5]oxadiazolo[3,4-e]pyrazin-9-one, SMER3. To a stirred solution of 3,4-diaminofurazan (100 mg, 1.0 mmol) in acetic acid (2.5 ml) and ethanol (2.5 ml) was added ninhydrin (178 mg, 1.0 mol). The mixture was stirred at reflux overnight and cooled to 22 °C. The precipitate was collected by filtration, washed with ethanol (20 ml) and dried to give the product (SMER3, 198 mg, 88%) as a yellow solid. 1H NMR (400 MHz, DMSO-d6) δ 8.29 (m, 1H), 8.05 (m, 2H), 7.95 (t, J = 6.8 Hz, 1H). SCF E3 ligase assay. Met4 in vitro ubiquitination assays were carried out as previously described34 with the exception that recombinant SCFMet30 components and FLAG-6xHisMet4 were expressed in Hi5 cells. Briefly, GstSkp1, Cdc53, 6xHisMet30 and Rbx1 were co-expressed in Hi5 cells and SCFMet30 was purified on glutathione sepharose. Recombinant FLAG-6xHisMet4 was bound to SCFMet30 and the ligase/substrate complex was eluted with glutathione. Sic1 in vitro ubi quitination by recombinant SCFCdc4 was performed as described35, with the exception that ubiquitinated Sic1 was detected by immunoblotting. To directly compare the effect of SMER3 on SCFMet30 and SCFCdc4, Met4 and Sic1 ubiquitination were assayed in a single reaction. Approximately 150 nM SCFMet30 and SCFCdc4 were incubated with DMSO (solvent control) or SMER3 for 20 min at 25 °C. The reaction was started by addition of a final concentration of 250 nM yeast E1 (Boston Biochem), 0.8 μM recombinant
nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
Cdc34 purified from E. coli36, 5 mM ATP, and 80 μM ubiquitin (Sigma). Reactions were incubated at 30 °C and samples were taken after 5 min and 25 min reaction time to accommodate different reaction kinetics by the two SCFs. Products were separated on SDS-PAGE and analyzed by immuno blotting using anti-Met4 and anti-Sic1 antibodies. Quantification was done on a Fuji LAS-4000 imaging system. Quantitative comparison of Skp1 complexes after SMER3 treatment by mass spectrometry. A yeast strain expressing N-terminal HBTH-tagged Skp1 from the endogenous locus was generated by a PCR-based approach as described37. Briefly, a PCR-based integration of a TRP1-GAL1-HBTH was used to generate a strain expressing HBTH-tagged Skp1. This strain is viable only on galactose plates. A PCR fragment encoding the SKP1 promoter with flanking homology regions for the HBTH tag and 5′ regions of the SKP1 gene was then used to replace the TRP1-GAL1 fragment at the SKP1 locus. Transformants were selected for growth on dextrose plates. The resulting strain carried the HBTH-tag inserted into the SKP1 locus before the coding region without any other changes at the locus. To quantitatively analyze changes in Skp1-associated proteins in response to SMER3, we used the QTAX strategy38. Briefly, for SILAC labeling, 200 ml cultures of cells expressing HBTHSkp1 were grown in medium containing either 30 mg/l 12C 14N arginine and 100 mg/l lysine (‘light’) or the same amount of 13C 15N arginine (isotopic purity > 98 atom %) and 13C 15N lysine (iso6 4 6 2 topic purity > 98 atom %) (Cambridge Isotope Labeling) (‘heavy’). When cells reached an A600 of 0.5, the light culture was treated with 20 μM SMER3 for 30 min at 30 °C. The same amount of DMSO was added to the heavy culture as solvent control. Formaldehyde was then added to both cultures to a final concentration of 1% to cross-link and stabilize protein complexes in vivo, and cells were incubated at 30 °C for 10 min. Cross-linking was quenched by the addition of 125 mM glycine for 5 min at 30 °C. Cells were harvested by filtration and stored at –80 °C. Cell lysis and purification of proteins was performed as described38,39 with the following modifications. Cells were lysed with glass beads in 500 μl of buffer-1 (8 M urea, 300 mM NaCl, 0.5% Nonidet P-40, 50 mM sodium phosphate, 50 mM Tris, pH 8, 20mM imidazole) per tube in a FastPrep FP120 system. Cleared lysates were pooled and 10 mg of total protein extract of each light and heavy lysate were mixed and then incubated with Ni2+-sepharose (pre-equilibrated in buffer-1) (Amersham Biosciences) overnight at 25 °C. Ni2+-sepharose was then washed once in buffer-1 and twice in buffer-1, pH 6.3. Proteins were eluted in buffer-2 (8 M urea, 200 mM NaCl, 50 mM sodium phosphate, 2% SDS, 10 mM EDTA, 100 mM Tris, pH 4.3). The pH of the eluate was adjusted to pH 8.0, and then loaded onto immobilized streptavidin (preequilibrated in buffer-3 (8 M urea, 200 mM NaCl, 0.2% SDS, 100 mM Tris, pH 8.0)) (Pierce). After incubation for 5 h at 25 °C the streptavidin beads were washed three times in buffer-3, and three times in buffer-3 without SDS. Streptavidin beads were then washed extensively with 25 mM NH4HCO3, pH 8, and the proteins were released by on-bead digestion with trypsin at 37 °C for 12−16 h. Tryptic peptides were extracted three times using 25% (vol/vol) acetonitrile, 0.1% (vol/vol) formic acid. The peptides were further purified on Vivapure C18 micro spin columns according to the manufacturer’s instructions (Sartorius Biotech). Peptides were analyzed by 1D LC-MS/MS using a nanoLC system (Eksigent) coupled online to a Linear Ion Trap (LTQ)-Orbitrap XL mass spectrometer (Thermo-Electron) as described40. Data were analyzed using Protein Prospector developmental version 5.1.7. Relative abundance of proteins was determined by measuring the peptide peak intensities. DARTS experiment using recombinant Met30. Met30 was PCR-subcloned into pcDNA3.1(-) (Invitrogen) and expressed using Promega TnT T7 Quick
nature biotechnology
Coupled Transcription/Translation System. Thermolysin digestion was performed using translated lysate incubated with SMER3 or vehicle control, and stopped by adding EDTA pH 8.0. Samples were subjected to 4 -12% NuPAGE gradient gel (Invitrogen) and western blot analysis carried out with anti-RGSH (Qiagen) and anti-GAPDH (Ambion) antibodies. Protein expression and purification. Full-length Met30 and Skp1 (yeast) proteins were overexpressed in insect cells as a glutathione S-transferase (GST)-fusion protein and N-terminal 6X His-tagged protein, respectively. After co-infection with both viruses expressing GST-Met30 and His-Skp1, GST-Met30 and His-Skp1 complex was isolated from the soluble cell lysate by glutathione affinity chromatography. The Met30/Skp1 protein complex was released from the column after cleavage by TEV protease. The protein sample was in a final solution of 20 mM Tris-HCl (pH 8.0), 200 mM NaCl and 5 mM DTT. Full-length yeast Skp1 (N-terminal 6X His-tagged protein, as above) was overexpressed in insect cells and isolated from the soluble cell lysate using Ni-NTA affinity chromatography. The protein sample was in a final solution of 20 mM Tris-HCl, 300 mM NaCl and 15 mM imidazole. Differential scanning fluorimetry. Protein melting experiments were carried out using the LightCycler 480 System II (Roche). Protein melting was monitored measuring the fluorescence of the hydrophobic dye Sypro Orange (Invitrogen) binding to amino acids of a denatured protein. The instrument was set up with a detection format of 465 nm as the excitation wavelength and 580 nm as the emission wavelength to detect Sypro Orange–specific signal. Melting curve fluorescent signal was acquired between 20 °C and 85 °C using a ramping rate of 0.06 °C s−1, and an acquisition of ten data points per degree Celsius. Melting temperatures (Tm) were determined by the LightCycler 480 Protein Melt Analysis Tool.
30. Duncan, M.C., Ho, D.G., Huang, J., Jung, M.E. & Payne, G.S. Composite synthetic lethal identification of membrane traffic inhibitors. Proc. Natl. Acad. Sci. USA 104, 6235–6240 (2007). 31. Li, C. & Wong, W.H. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl. Acad. Sci. USA 98, 31–36 (2001). 32. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000). 33. Longtine, M.S. et al. Additional modules for versatile and economical PCR-based gene deletion and modification in Saccharomyces cerevisiae. Yeast 14, 953–961 (1998). 34. Chandrasekaran, S. et al. Destabilization of binding to cofactors and SCFMet30 is the rate-limiting regulatory step in degradation of polyubiquitinated Met4. Mol. Cell 24, 689–699 (2006). 35. Feldman, R.M., Correll, C.C., Kaplan, K.B. & Deshaies, R.J. A complex of Cdc4p, Skp1p, and Cdc53p/cullin catalyzes ubiquitination of the phosphorylated CDK inhibitor Sic1p. Cell 91, 221–230 (1997). 36. Petroski, M.D. & Deshaies, R.J. In vitro reconstitution of SCF substrate ubiquitination with purified proteins. Methods Enzymol. 398, 143–158 (2005). 37. Booher, K.R. & Kaiser, P. A PCR-based strategy to generate yeast strains expressing endogenous levels of amino-terminal epitope-tagged proteins. Biotechnol. J. 3, 524–529 (2008). 38. Guerrero, C., Tagwerker, C., Kaiser, P. & Huang, L. An integrated mass spectrometrybased proteomic approach: quantitative analysis of tandem affinity-purified in vivo cross-linked protein complexes (QTAX) to decipher the 26 S proteasome-interacting network. Mol. Cell. Proteomics 5, 366–378 (2006). 39. Tagwerker, C. et al. A tandem affinity tag for two-step purification under fully denaturing conditions: application in ubiquitin profiling and protein complex identification combined with in vivo cross-linking. Mol. Cell. Proteomics 5, 737–748 (2006). 40. Meierhofer, D., Wang, X., Huang, L. & Kaiser, P. Quantitative analysis of global ubiquitination in HeLa cells by mass spectrometry. J. Proteome Res. 7, 4566–4576 (2008).
doi:10.1038/nbt.1645
letters
Engineered allosteric activation of kinases in living cells
© 2010 Nature America, Inc. All rights reserved.
Andrei V Karginov1, Feng Ding2, Pradeep Kota2, Nikolay V Dokholyan2 & Klaus M Hahn1 Studies of cellular and tissue dynamics benefit greatly from tools that can control protein activity with specificity and precise timing in living systems. Here we describe an approach to confer allosteric regulation specifically on the catalytic activity of protein kinases. A highly conserved portion of the kinase catalytic domain is modified with a small protein insert that inactivates catalytic activity but does not affect other protein functions (Fig. 1a). Catalytic activity is restored by addition of rapamycin or non-immunosuppresive rapamycin analogs. Molecular modeling and mutagenesis indicate that the protein insert reduces activity by increasing the flexibility of the catalytic domain. Drug binding restores activity by increasing rigidity. We demonstrate the approach by specifically activating focal adhesion kinase (FAK) within minutes in living cells and show that FAK is involved in the regulation of membrane dynamics. Successful regulation of Src and p38 by insertion of the rapamycin-responsive element at the same conserved site used in FAK suggests that our strategy will be applicable to other kinases. Recently described methods for regulation of kinases with precise timing in living cells include induced dimerization, subcellular localization, proteolytic degradation and chemical rescue from an inactivating mutation1–4. Engineered allosteric regulation has also been used for precise control of protein activity5–7. Nonetheless, existing methods are limited to specific targets, inactivate rather than activate kinases and/or do not enable regulation of a particular domain within the target. Here we describe a method to activate specifically the catalytic domain within a multidomain protein kinase, using FAK as a model. FAK has been implicated in a wide variety of cell behaviors, including proliferation, apoptosis, migration and tumorigenesis8–11. It is a multidomain protein that functions as both a scaffold and a kinase11, and relatively little is known about the role of its catalytic activity. It therefore served as a good test case for our method, which enabled us to specifically dissect the function of FAK kinase activity, controlling it with a temporal resolution of 1–2 min, without affecting the scaffolding function. To allosterically regulate FAK’s catalytic activity, we used a portion of the small protein FKBP12 (Fig. 1a). A previous study has shown that ligand binding to FKBP12 greatly increases its conformational rigidity12, suggesting that insertion of FKBP12 near the catalytic site of kinases could be used to control the conformational mobility of the kinase active site. It was, however, unclear that FKBP12 could be
inserted into the kinase sequence without disrupting kinase structure or FKBP12 binding interactions. We therefore tested truncated forms of FKBP12, leading to an FKBP12 derivative named insertable FKBP (iFKBP, Fig. 1b). In iFKBP, the N and C termini are positioned near one another for minimal perturbation of kinase secondary structure (Fig. 1b). Co-immunoprecipitation experiments showed that iFKBP binds rapamycin and FKBP12-rapamycin binding domain (FRB) as efficiently as wild-type FKBP12, even when inserted in the middle of the FAK molecule (Fig. 1c and Supplementary Fig. 1). Molecular dynamics studies of iFKBP indicated that its conformational fluctuation is reduced by interaction with rapamycin or by rapamycininduced heterodimerization with FRB (Fig. 1d and Supplementary Fig. 2). Changes in conformational fluctuations were especially pronounced at the N and C termini where iFKBP would be linked to FAK, suggesting that the effects of rapamycin and FRB binding could be communicated to FAK. Optimization of the insertion site and the linkers connecting iFKBP to FAK led to a version of FAK that was susceptible to regu lation by rapamycin-induced FRB binding. The insertion of iFKBP at Glu445 (FAK-iFKBP445 construct) substantially reduced the catalytic activity of FAK. Rapamycin-induced binding to FRB restored activity (Fig. 2a). Treatment with rapamycin did not affect the activity of wild-type FAK or a construct with iFKBP attached to the FAK N terminus, demonstrating that regulation of catalytic activity is dependent on specific placement of the insert in the catalytic subunit. To optimize regulation of FAK by rapamycin, we introduced several modifications into the regions that connect iFKBP to FAK. iFKBP was positioned within the FAK loop Met442–Ala448, between two β-strands in the N-terminal lobe of the FAK catalytic domain (Fig. 2b). Replacing FAK residues Met442–Ala448 with iFKBP, using no linkers, negated the effect of iFKBP on FAK activity and drama tically reduced interaction with rapamycin and FRB (Fig. 2a,b, construct FAK-iFKBP442–448). Computational analysis revealed that the construct without linkers is locked in a distorted conformation that prevents ligand binding (Supplementary Fig. 3). In contrast, introduction of short linkers to connect iFKBP with the β-strands of the FAK catalytic domain led to the optimized structure used henceforth, rapamycin-regulated FAK (RapR-FAK). In RapR-FAK, activity in the absence of rapamycin was considerably lower than that of FAK-iFKBP445 (Fig. 2a). Rapamycin-induced FRB binding restored activity to near wild-type level. Activation of RapR-FAK catalytic activity was achieved in living cells within 2 min and with 50 nM rapamycin (Fig. 2c,d). Activation was also achieved by treatment
1Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. 2Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. Correspondence should be addressed to K.M.H. ([email protected]) or N.V.D. ([email protected]).
Received 21 January; accepted 27 April; published online 27 June 2010; doi:10.1038/nbt.1639
nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
743
© 2010 Nature America, Inc. All rights reserved.
letters a
iFKBP
iFKBP
Catalytic
Catalytic
b
FRB
–
–
-F
m yc
Rapamycin
AK -i F
-F
FKBP12 N-term
m yc -F
AK -F
KB P1 2
c
(N -te KB rm P ) AK (N -i F m -te yc KB rm -F ) P AK (4 -F 11 m KB ) yc P1 -F 2 AK (N -i F -te m K yc rm B -F P ) AK (N -te -i F rm KB ) P (4 11 )
Rapamycin
–
+
+
+
myc-FAK GFP-FRB
FKBP12 C-term
d
iFKBP N-term
iFKBP
with rapamycin alone, without co-expression of FRB (Supplementary Fig. 4). However, this required significantly higher concentrations of rapamycin (up to 4 μM), so the remaining studies described here were carried out using rapamycin-induced FRB binding. Computational analysis indicated that rapamycin alone does not stabilize iFKBP to the same extent as rapamycin together with FRB (Fig. 1d and Supplementary Fig. 2). iFKBP-mediated FAK regulation was designed to specifically control catalytic activity without perturbing other FAK functions. Thus, it was important to test the effects of iFKBP insertion on normal FAK binding interactions and FAK regulation. FAK catalytic activity is regulated by an autoinhibitory interaction between the N-terminal FERM domain and the catalytic domain13. Two amino acids known to be involved in this interaction were mutated to alanines (Y180A and M183A, previously described13) to test if RapRFAK remains regulated by autoinhibition. When activated by rapa mycin, the Y180A/M183A construct (RapR-FAK-YM) demonstrates considerably higher activity than RapR-FAK (Fig. 2e), consistent with published results for constitutively active FAK13 and demonstrating that RapR-FAK is still regulated by autoinhibition. RapR-FAK-YM is therefore solely regulated by rapamycin and not by endogenous mechanisms. To confirm that RapR-FAK phosphorylates substrates in a rapamycin-dependent manner in cells, phosphorylation of two known FAK substrates was tested before and after addition of rapa mycin. Upon activation of RapR-FAK-YM, phosphorylation of paxillin on residue Tyr31 and autophosphorylation of FAK on residue Tyr397 are substantially increased (Fig. 2f). A control construct lacking catalytic activity (RapR-FAK-YM-KD, with additional mutation D546R) failed to demonstrate any change in phosphorylation. RapRFAK and wild-type FAK showed similar binding to paxillin and Src in co-immunoprecipitation assays (Supplementary Fig. 5), indicating that introduction of iFKBP into the catalytic domain of FAK does not affect interaction with binding partners. Also, iFKBP insertion did not perturb the intracellular distribution of RapR-FAK; its localization was identical to that of wild-type FAK (Supplementary Fig. 6). Activation of RapR-FAK was accompanied by translocation of fluores cently labeled FRB into focal adhesions and co-localization with 744
FRB
FRB
m yc
Figure 1 Design and generation of RapR-FAK. (a) Schematic representation of the approach used to regulate the catalytic activity of FAK. A fragment of FKBP is inserted at a position in the catalytic domain where it abrogates catalytic activity. Binding to rapamycin and FRB restores activity. (b) The truncated fragment of human FKBP12 (amino acids Thr22 through Glu108) inserted into the kinase domain. Blue and red, full-length FKBP12; red, proposed structure of the inserted fragment. The FKBP12 is shown in complex with rapamycin and FRB (cyan). (c) Immunoblot analysis of iFKBP interaction with rapamycin and FRB. Myc-tagged FKBP12 and iFKBP constructs were immunoprecipitated from cells treated for 1 h with either 200 nM rapamycin or ethanol (solvent control). Co-immunoprecipitation of co-expressed GFP-FRB was detected using anti-GFP antibody. (d) Changes in the molecular dynamics of iFKBP upon binding to rapamycin and FRB. Warmer colors and thicker backbone indicate increasing root mean square fluctuation.
myc-FAK-FKBP12 (N term) myc FKBP12
FAK
myc-FAK-iFKBP (N term) myc iFKBP
FAK
myc-FAK-iFKBP (411) myc iFKBP + rapamycin
iFKBP iFKBP + rapamycin + FRB
fluorescent RapR-FAK (Supplementary Fig. 7). The translocation of fluorescent FRB into adhesions served as a useful marker of FAK activation in live cells. Overall we conclude that RapR-FAK enables robust and specific activation of FAK catalytic activity in living cells without perturbation of other FAK properties. FAK is known to be overexpressed and activated in human tumors14–16, but the specific role of its catalytic activity remains unclear. To identify processes affected specifically by FAK catalytic activity, we examined the activation of RapR-FAK-YM in HeLa cells. The Y180A/M183A mutant was used to ensure the regulation of RapR-FAK by rapamycin only and to exclude modulation by endogenous upstream factors. Consistent with previous reports showing that catalytic activity is not required for FAK’s role in growth factor–stimulated motility17, activation of RapR-FAK-YM did not significantly affect cell movement (Supplementary Fig. 8). However, we did observe a distinct effect on membrane dynamics. HeLa cells normally show small peripheral ruffles that remain near the cell border. Upon addition of rapamycin, the extent of ruffling greatly increased, and very large and dynamic ruffles appeared across the dorsal surface (Fig. 3a,b and Supplementary Movie 1, 36/64 analyzed cells). In control studies, cells expressing similar levels of catalytically inactive RapR-FAK-YM-KD showed no change in normal ruffling activity (Fig. 3b, 34/35 analyzed cells). RapR-FAK was localized to these ruffles (Supplementary Fig. 9). Notably, wildtype FAK was also detected in ruffles stimulated by RapR-FAK and in those produced by platelet-derived growth factor (PDGF) treatment (Supplementary Figs. 10–12), indicating that dorsal ruffles are not an artifact of RapR-FAK mislocalization. Furthermore, FAK-null fibroblasts failed to produce PDGF-stimulated dorsal ruffles (158 cells analyzed), whereas 50% of control fibroblasts expressing FAK (59/118 analyzed) exhibited distinct dorsal ruffling under the same stimulation conditions. These data implicate FAK catalytic activity in the regulation of dorsal membrane protrusions. VOLUME 28 NUMBER 7 JULY 2010 nature biotechnology
Kw
P KB
t
iF K-
FA Rapamycin
–
FA – +
+
t
FA –
iF K-
+
44
P KB
FA – +
Rapamycin
M442
K
FA
iF K-
pR
–
+
K
–
+
Myc-FAK
a
R –
R -F A
e
E445
2–
5
44
P KB
A448
8 44
R -F A
b
)
R ap
(N
m er
R ap
a
KYM
letters
+
pY31 paxillin
f
Rapamycin (µM)
0
0.01 0.05 0.2 0.5
1
–
+
–
+
Anti-pY31 paxillin
Rapamycin (min)
0
2
5
10
20
40
60
Anti-myc
Anti-myc
Anti-GFP-paxillin
Anti-pY31 paxillin
Anti-pY31 paxillin
Anti-pY397 FAK
Anti-GFP
Anti-GFP
Anti-myc-FAK
Figure 2 Development and biochemical characterization of RapR-FAK. (a) Rapamycin regulation of FAK variants with iFKBP inserted at different positions. HEK293T cells co-expressing myc-tagged FAK constructs and GFP-FRB were treated for 1 h with either 200 nM rapamycin or ethanol (solvent control). The activity of immunoprecipitated FAK variants was tested using the N-terminal fragment of paxillin as a substrate. (b) Sites of iFKBP insertion (green) and connecting linkers (red). (c,d) HEK293T cells co-expressing RapR-FAK and FRB were treated with the indicated amount of rapamycin for 1 h or with 200 nM rapamycin for the indicated period of time. The kinase was immunoprecipitated and its activity tested as described above. (e) FAK Y180A and M183A mutations were introduced to eliminate autoinhibitory interactions, thereby generating RapR-FAK-YM, which was tested as in a. (f) HEK293T cells co-expressing Cherry-FRB, GFP-paxillin and either myc-tagged RapR-FAK-YM or its kinase-inactive mutant (RapRFAK-YM-KD) were treated with rapamycin or ethanol (solvent control) for 1 h. GFP-paxillin was immunoprecipitated and its phosphorylation was assessed using anti-phospho-Tyr31 antibody. Autophosphorylation of FAK on Tyr397 was analyzed using total cell lysate. (Full-length blots are provided in Supplementary Fig. 19.)
nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
c
b
d
FP G
Rapamycin – + Anti-pY418Src
D
7F
Anti-myc Anti-GFP
– + IP lysates
YM
-Y
39
-K
60 50 40 30 20 10 0
%
-R G ap -Y FP R-F 39 -R AK 7F ap R -F AK
a
YM
Figure 3 Activation of FAK catalytic activity initiates large dorsal ruffles through the activation of Src. (a) Rapamycin treatment of HeLa cells co-expressing RapR-FAK-YM and FRB caused formation of large dorsal ruffles. Scale bars in a and c, 20 µm. (b) HeLa cells expressing either GFP-RapR-FAK-YM (YM, 64 cells), GFP-RapR-FAK kinase-dead mutant (YM-KD, 35 cells) or GFP-tagged Y397F mutant (YM-Y397F, 47 cells) were scored for ruffle induction by rapamycin. No dorsal ruffles were seen before rapamycin addition. (c) Inhibition of Src family kinases eliminated the FAK-induced ruffles. Cells co-expressing GFP-RapR-FAKYM and Cherry-FRB were treated with rapamycin for 1 h and imaged before and after addition of Src family kinase inhibitor PP2. PP2 addition stopped dorsal protrusion in all cells analyzed (16 cells). (d) Activation of FAK leads to activation of Src. HeLa cells co-expressing myc-tagged Src, Cherry-FRB and either GFP-RapR-FAK-YM or its Y397F mutant were treated with rapamycin for 1 h. Src was immunoprecipitated using anti-myc antibody, and its phosphorylation on Tyr418 was assessed by immunoblotting. (Full-length blot is provided in Supplementary Fig. 19.)
FAK-induced dorsal ruffling (Fig. 3c and Supplementary Movie 2). In contrast, control compound PP3, an inactive PP2 stereoisomer, or imatinib (Gleevec), an inhibitor of Abl kinase, had no effect (data not shown). Phosphorylation of Src Tyr418 (Tyr416 in avian Src) is known to occur upon Src activation21,22. Rapamycin addition to cells transfected with RapR-FAK-YM led to increased Src Tyr418 phosphorylation, whereas cells expressing RapR-FAK-YM with an
YM
Published work has demonstrated that FAK autophosphorylation of Tyr397 plays an important role in FAK-mediated signaling, and that Tyr397 phosphorylation level correlates with FAK activation18. Because autophosphorylation of FAK on Tyr397 creates a binding site for Src family kinases18,19, it has been proposed that interaction of FAK with Src leads to Src activation18. Furthermore, Src is involved in the formation of dorsal protrusions stimulated by PDGF20. Together these observations led us to hypothesize that the FAK-stimulated formation of dorsal protrusions occurs through activation of Src. In our studies, mutation of Tyr397 to phenylalanine in RapR-FAK completely abolished the formation of dorsal protrusions (Fig. 3b). To test the potential role of Src, cells were treated with PP2, an inhibitor of Src family kinases, after stimulation of RapR-FAK-YM. This abrogated the
Percent cells demonstrating rapamycin-induced dorsal ruffles
© 2010 Nature America, Inc. All rights reserved.
c
Rapamycin (200 nM)
d
R ap
R ap
R -F A
– HGGVYMSPENPALAVA – HGGVYMSP-iFKBP-G-NPALAVA – HGGVY-iFKBP-LAVA – HGGVY-GPG-iFKBP-GPG-LAVA
KYM
FAKwt 445 Insertion 442-448 Insertion RapR-FAK
GFP-FRB
R -F A
pY31 paxillin
KYM -K D
FAK
745
letters a
b
Insertion loop
Amino acid position
Insertion loop
G-loop
746
0
42 9 43 9 44 9 45 9 46 9 47 489 9 49 9 50 9 51 9
c
G-loop Amino acid position FRB
Insertion loop
iFKBP
iFKBP
R R Sr (k ap c in R as -S e r S d c (C rc- ea -te iFK d) rm B ) P
c Sr Rapamycin
5
ap
e
9 G-loop 7
– + – +– + –
+
Src
3 pY31 paxillin
1
0
0
0
57
0
55
0
53
0
51
49
0
0
45
43
0
0
iFKBP insertion
–1
47
RMSF, Å
11
RapR-FAK + rap + FRB
RapR-FAK
RapR-FAK RapR-FAK/Rap/FRB FAK wild-type
R
d 13
41
© 2010 Nature America, Inc. All rights reserved.
1.00
–0.45
FAK wt
a dditional mutation that abolishes Src binding (Y397F mutation) showed no effect (Fig. 3d). Together, these data directly demonstrate that FAK catalytic activity stimulates Src, and that this in turn leads to dorsal protrusions. Dorsal protrusions are important in the invasive migration of cells into extracellular matrix23 and enhanced FAK expression in tumor cells is associated with cell invasiveness. Our data suggest a mechanism whereby FAK overexpression contributes to the invasive nature of tumors. To understand the molecular mechanism of RapR-FAK allosteric regulation and explore the generalizability of the approach, we carried out molecular dynamics simulations24,25. Combined with the biochemical data, the computational analysis indicated a mechanism for iFKBP-mediated regulation. The iFKBP insertion site connects via a β-strand to FAK’s glycine loop (G-loop), a structural feature critical for positioning the ATP phosphate groups in the catalytic site (Fig. 4a)26. Molecular dynamics analysis indicated that the confor mational mobility of the G-loop is correlated with that of the FAK region where iFKBP is inserted (the ‘insertion loop’, Fig. 4b), suggesting that the dynamics of the insertion loop could affect the dynamics of the G-loop and thereby change the catalytic activity. Comparison of wild-type FAK and RapR-FAK dynamics indicated that the amplitude of G-loop conformational dynamics is dramatically increased when iFKBP is inserted in the catalytic domain. These dynamics decreased back to wild-type levels upon binding to rapamycin and FRB (Fig. 4c,d and Supplementary Movie 3). Based on this analysis, we postulate that the effectiveness of the G-loop in the phosphate transfer reaction is reduced owing to greater conformational flexibility produced by insertion of iFKBP. Interaction with rapamycin and FRB stabilizes the G-loop to rescue FAK catalytic activity. Molecular dynamics analysis was consistent with empirical measurements; dynamics analysis of the FAK-iFKBP445 variant suggested that its longer linkers decreased coupling between the iFKBP insert and G-loop dynamics (Supplementary Fig. 13), resulting in the less effective FAK inhibition observed in biochemical studies (Fig. 2a, FAK-iFKBP445 construct). In contrast, insertion of iFKBP without any linkers restricted the structural dynamics of iFKBP, consistent with the observed minimal effects on catalytic activity (Supplementary Fig. 13, FAK-iFKBP442– 448 construct). In summary, computational analysis indicates that the allosteric modulation of RapR-FAK activity results from dynamic coupling of the optimized iFKBP insertion and the kinase G-loop, highly conserved structural features in all known kinases26. Because the mechanism of allosteric regulation is based on coupling of highly conserved structural elements, the rapamycin-mediated
519 509 499 489 479 469 459 449 439 429
GFP-FRB
59
Figure 4 Mechanism of regulation by iFKBP; Src regulation. (a) The portion of the FAK catalytic domain targeted for insertion of iFKBP (blue) and the G-loop (red). (b) Dynamic correlation analysis of the wild-type FAK catalytic domain (red, positive correlation; blue, negative correlation). The circled region indicates strong negative correlation between the movement of the insertion loop and the G-loop. (c) Tube representation depicting changes in the dynamics of the FAK catalytic domain’s N-terminal lobe, based on molecular dynamics simulations. Warmer colors and thicker backbone correspond to higher root mean squared fluctuation (RMSF) values, reflecting the degree of free movement within the structure. The red arrows points to the G-loop. (d) RMSF of amino acids in FAK and RapR-FAK (arrow indicates G-loop). The break in the wild-type FAK graph corresponds to the iFKBP insert in RapR-FAK. (e) Regulation of Src kinase by insertion of iFKBP. HEK293T cells co-expressing the indicated myc-tagged Src construct and GFP-FRB were treated with either 200 nM rapamycin or ethanol solvent control. The kinase activity of immunoprecipitated Src was tested as in Figure 2a. (Full-length blot is provided in Supplementary Fig. 19.)
Amino acid position in RapR-FAK
regulation approach may well be applicable to other kinases. We explored this by inserting iFKBP into a tyrosine kinase, Src, and into a serine/threonine kinase, p38, at a site analogous to that used in FAK (Gly288 in Src, Lys45 in p38α; Supplementary Fig. 14). In both Src and p38, insertion of iFKBP strongly inhibited activity, and activity was rescued by interaction with rapamycin and FRB (Fig. 4e and Supplementary Fig. 15). Treatment with rapamycin did not affect wild-type Src or control Src constructs in which iFKBP was added to the C terminus, nor did it have any significant effect on wild-type p38. In molecular dynamics simulations, Src showed the same coupling between iFKBP and the G-loop that was observed for FAK (Supplementary Figs. 16 and 17). These data suggest that the iFKBP cassette can be used for allosteric regulation of a wide variety of both tyrosine and serine/threonine kinases. Although we saw no effects of rapamycin in the absence of rapamycin-regulated kinases, we were concerned that some potential studies could be complicated by the known immunosuppressive effects of rapamycin27. We therefore tested the ability to regulate rapamycinregulated kinases using known non-immunosuppresive analogs of rapamycin, iRap and AP21967. Both compounds regulated RapR-FAK activity at concentrations comparable to those reported previously for dimerization of proteins in living cells28 (Supplementary Fig. 18). AP21967 and a similar analog of rapamycin (C20-MaRap) have been successfully used for experiments in animals29,30, indicating that the RapR method can be applied in live animal studies. The F36V mutant of FKBP, which interacts tightly with the Shield 1 compound4, could potentially eliminate the requirement for FRB and minimize effects on endogenous FKBP12 function. In summary, we describe a protein modification to confer rapa mycin sensitivity specifically on the catalytic activity of kinases. The approach is based upon addition of a small protein insert into highly conserved regions of either serine/threonine or tyrosine kinases, promising broad applicability. It can be used with non-immunosuppresive rapamycin analogs suitable for in vivo studies. The approach VOLUME 28 NUMBER 7 JULY 2010 nature biotechnology
letters combines the temporal resolution of small molecule inhibitors with the absolute specificity of genetic approaches and enables allosteric regulation of a single domain in a multidomain protein. A mechanistic model based on molecular dynamics and application to analogous sites in FAK, Src and p38α indicate that rapamycin exerts its effect by modulating the conformational flexibility of the conserved catalytic subunit. By selectively activating FAK catalytic activity in living cells, we directly demonstrated that FAK catalysis activates Src to trigger large dorsal protrusions, a potential mechanism explaining how overexpression and activation of FAK contributes to tumor progression. Methods Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturebiotechnology/.
© 2010 Nature America, Inc. All rights reserved.
Note: Supplementary information is available on the Nature Biotechnology website. Acknowledgments We thank J. Edwards, D. Dominguez and V. Rao for help with construction and testing of RapR-Src and RapR-p38α constructs, B. Clarke for her design of figures and are grateful to the UNC Cancer Research Fund and the National Institutes of Health for funding (GM64346 and GM057464 to K.M.H.; GM080742 and GM080742- 03S1 to N.V.D.). We acknowledge the following gifts: anti-paxillin antibodies and the construct expressing the GST-tagged N-terminal fragment of paxillin from M. Schaller, Department of Biochemistry, West Virginia University; iRap from T. Wandless, Molecular Pharmacology Department, Stanford University; the construct for myc-tagged mouse FAK from S.K. Hanks, Vanderbilt University Medical Center; the flag-tagged mouse p38α, human FKBP12 and FRB domain of human FRAP1 DNA constructs from G. Johnson, Department of Pharmacology, University of North Carolina at Chapel Hill; AP21967 compound was provided by ARIAD Pharmaceuticals. Author Contributions A.V.K. initiated the project, developed and validated regulation of RapR-kinases and performed the studies of FAK biological function. F.D. performed molecular modeling of FKBP12 variants, RapR-FAK and RapR-Src. P.K. performed biochemical characterization of RapR-p38. N.V.D. coordinated molecular dynamics studies. K.M.H. coordinated the study and wrote the final version of the manuscript, based on contributions from all authors. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/. 1. Spencer, D.M., Wandless, T.J., Schreiber, S.L. & Crabtree, G.R. Controlling signal transduction with synthetic ligands. Science 262, 1019–1024 (1993). 2. Bishop, A.C. et al. A chemical switch for inhibitor-sensitive alleles of any protein kinase. Nature 407, 395–401 (2000). 3. Qiao, Y., Molina, H., Pandey, A., Zhang, J. & Cole, P.A. Chemical rescue of a mutant enzyme in living cells. Science 311, 1293–1297 (2006). 4. Banaszynski, L.A., Chen, L.C., Maynard-Smith, L.A., Ooi, A.G. & Wandless, T.J. A rapid, reversible, and tunable method to regulate protein function in living cells using synthetic small molecules. Cell 126, 995–1004 (2006).
nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
5. Tucker, C.L. & Fields, S. A yeast sensor of ligand binding. Nat. Biotechnol. 19, 1042–1046 (2001). 6. Guntas, G., Mansell, T.J., Kim, J.R. & Ostermeier, M. Directed evolution of protein switches and their application to the creation of ligand-binding proteins. Proc. Natl. Acad. Sci. USA 102, 11224–11229 (2005). 7. Radley, T.L., Markowska, A.I., Bettinger, B.T., Ha, J.H. & Loh, S.N. Allosteric switching by mutually exclusive folding of protein domains. J. Mol. Biol. 332, 529–536 (2003). 8. Zhao, J. & Guan, J.L. Signal transduction by focal adhesion kinase in cancer. Cancer Metastasis Rev. 28, 35–49 (2009). 9. Gabarra-Niecko, V., Schaller, M.D. & Dunty, J.M. FAK regulates biological processes important for the pathogenesis of cancer. Cancer Metastasis Rev. 22, 359–374 (2003). 10. Tilghman, R.W. & Parsons, J.T. Focal adhesion kinase as a regulator of cell tension in the progression of cancer. Semin. Cancer Biol. 18, 45–52 (2008). 11. Schlaepfer, D.D., Mitra, S.K. & Ilic, D. Control of motile and invasive cell phenotypes by focal adhesion kinase. Biochim. Biophys. Acta 1692, 77–102 (2004). 12. Marquis-Omer, D. et al. Stabilization of the FK506 binding protein by ligand binding. Biochem. Biophys. Res. Commun. 179, 741–748 (1991). 13. Lietha, D. et al. Structural basis for the autoinhibition of focal adhesion kinase. Cell 129, 1177–1187 (2007). 14. Golubovskaya, V.M., Kweh, F.A. & Cance, W.G. Focal adhesion kinase and cancer. Histol. Histopathol. 24, 503–510 (2009). 15. Chatzizacharias, N.A., Kouraklis, G.P. & Theocharis, S.E. Clinical significance of FAK expression in human neoplasia. Histol. Histopathol. 23, 629–650 (2008). 16. Sood, A.K. et al. Biological significance of focal adhesion kinase in ovarian cancer: role in migration and invasion. Am. J. Pathol. 165, 1087–1095 (2004). 17. Sieg, D.J. et al. FAK integrates growth-factor and integrin signals to promote cell migration. Nat. Cell Biol. 2, 249–256 (2000). 18. Schaller, M.D. et al. Autophosphorylation of the focal adhesion kinase, pp125FAK, directs SH2-dependent binding of pp60src. Mol. Cell. Biol. 14, 1680–1688 (1994). 19. Xing, Z. et al. Direct interaction of v-Src with the focal adhesion kinase mediated by the Src SH2 domain. Mol. Biol. Cell 5, 413–421 (1994). 20. Veracini, L. et al. Two distinct pools of Src family tyrosine kinases regulate PDGFinduced DNA synthesis and actin dorsal ruffles. J. Cell Sci. 119, 2921–2934 (2006). 21. Smart, J.E. et al. Characterization of sites for tyrosine phosphorylation in the transforming protein of Rous sarcoma virus (pp60v-src) and its normal cellular homologue (pp60c-src). Proc. Natl. Acad. Sci. USA 78, 6013–6017 (1981). 22. Playford, M.P. & Schaller, M.D. The interplay between Src and integrins in normal and tumor biology. Oncogene 23, 7928–7946 (2004). 23. Suetsugu, S., Yamazaki, D., Kurisu, S. & Takenawa, T. Differential roles of WAVE1 and WAVE2 in dorsal and peripheral ruffle formation for fibroblast cell migration. Dev. Cell 5, 595–609 (2003). 24. Ding, F. & Dokholyan, N.V. Dynamical roles of metal ions and the disulfide bond in Cu, Zn superoxide dismutase folding and aggregation. Proc. Natl. Acad. Sci. USA 105, 19696–19701 (2008). 25. Ding, F., Tsao, D., Nie, H. & Dokholyan, N.V. Ab initio folding of proteins with all-atom discrete molecular dynamics. Structure 16, 1010–1018 (2008). 26. Krupa, A., Preethi, G. & Srinivasan, N. Structural modes of stabilization of permissive phosphorylation sites in protein kinases: distinct strategies in Ser/Thr and Tyr kinases. J. Mol. Biol. 339, 1025–1039 (2004). 27. Foster, D.A. & Toschi, A. Targeting mTOR with rapamycin: one dose does not fit all. Cell Cycle 8, 1026–1029 (2009). 28. Inoue, T., Heo, W.D., Grimley, J.S., Wandless, T.J. & Meyer, T. An inducible translocation strategy to rapidly activate and inhibit small GTPase signaling pathways. Nat. Methods 2, 415–418 (2005). 29. Stankunas, K. et al. Conditional protein alleles using knockin mice and a chemical inducer of dimerization. Mol. Cell 12, 1615–1624 (2003). 30. Vogel, R., Mammeri, H. & Mallet, J. Lentiviral vectors mediate nonimmunosuppressive rapamycin analog-induced production of secreted therapeutic factors in the brain: regulation at the level of transcription and exocytosis. Hum. Gene Ther. 19, 167–178 (2008).
747
ONLINE METHODS
© 2010 Nature America, Inc. All rights reserved.
Antibodies and reagents. Anti-phospho-paxillin (Tyr31), anti-phosphoFAK (Tyr397), anti-phospho-Src (Tyr418) and anti-GFP antibodies were purchased from Invitrogen. Anti-myc antibodies and IgG-coupled agarose beads were purchased from Millipore. Anti-paxillin antibodies were a gift from M. Schaller. Rapamycin was purchased from Sigma. All restriction enzymes were purchased from New England Biolabs. iRap was a gift from T. Wandless. AP21697 was provided by Ariad Pharmaceuticals. Molecular biology. The construct for myc-tagged mouse FAK was a gift from S. K. Hanks. The construct expressing the GST-tagged N-terminal fragment of paxillin was a gift from M. Schaller. The mouse Src and was purchased from Upstate. The flag-tagged mouse p38α, human FKBP12 and FRB domain of human FRAP1 DNA constructs were a gift from G. Johnson. The iFKBP domain consisted of amino acids Thr22 through Glu108 of human FKBP12. Insertion of wild-type FKBP12 or iFKBP at the ends or in the middle of FAK, p38 and Src genes was performed using a modification of the QuikChange site-directed mutagenesis kit (Stratagene). The FKBP12 and iFKBP inserts were created by PCR such that their 5′- and 3′-end sequences annealed at the desired insertion site within the p38, Src and FAK genes. The PCR products were used as mega-primers for QuikChange mutagenesis reactions. In the RapR Src construct, the iFKBP insert contained GPG linkers on both sides. In RapR-p38, iFKBP was flanked by PE and NP linkers at the N and C temini, respectively. The FRB domain of human FRAP1 protein was cloned into pmCherry-CI vector using EcoRI/BamHI cloning sites. GFP-tagged FAK variants were created by subcloning the FAK gene into pEGFP-CI vector (Clontech) using BglII/BamHI cloning sites. The myc-tagged Src gene was constructed by insertion of a myc-tag sequence at the 3′-end of the Src gene using the QuikChange mutagenesis kit. Immunoprecipitation and kinase assay. Cells expressing FAK or Src were treated with either rapamycin or equivalent volumes of ethanol (solvent control). After treatment, cells were washed with ice-cold PBS and lysed with lysis buffer (20 mM HEPES-KOH, pH 7.8, 50 mM KCl, 100 mM NaCl, 1 mM EGTA, 1% NP40, 1 mM NaF, 0.1 mM Na3VO4, 0.033% ethanol). Cells treated with rapamycin were lysed with lysis buffer containing 200 nM rapamycin. Cleared lysates were incubated for 2 h with IgG-linked agarose beads pre bound with antibody used for immunoprecipitation. The beads were washed two times with wash buffer (20 mM HEPES-KOH, pH 7.8, 50 mM KCl, 100 mM NaCl, 1 mM EGTA, 1% NP40) and two times with kinase reaction buffer (25 mM HEPES, pH7.5, 5 mM MgCl2, 5 mM MnCl2, 0.5 mM EGTA, 0.005% BRIJ-35). No MnCl2 was used in the kinase reaction buffer for Src kinase immunoprecipitation and assay. Bead suspension (20 μl) was used in kinase assays using the N-terminal fragment of paxillin as previously described31. Kinase assay for p38α was performed as previously described32. Cell imaging. Cells were plated on fibronectin-coated coverslips (10 mg/l fibronectin) 2 h before imaging, then transferred into L-15 imaging medium (Invitrogen) supplemented with 5% FBS. Live cell imaging was performed in an open heated chamber (Warner Instruments) using an Olympus IX-81 microscope equipped with an objective-based total internal reflection fluorescence (TIRF) system and a PlanApo N 60× TIRFM objective (NA 1.45). All images were collected using a PhotometrixCoolSnap ES2 CCD camera controlled by Metamorph software. The 468 nm and 568 nm lines from an omnichrome series 43 Ar/Kr laser were used for TIRF imaging. Epifluorescence images were taken using a high-pressure mercury arc light source. Cells expressing GFPRapRFAK constructs and mCherry-FRB were selected using epifluorescence
nature biotechnology
imaging. Time-lapse movies were taken at 1 min time intervals. GFP-RapRFAK expression level quantification and other image analysis were performed using Metamorph software. Thermodynamics study of FKBP, FKBP deletion mutant with and without binding partners. We performed replica exchange discrete molecular dynamics (DMD) simulations of various molecular systems to estimate the thermostabilities and to study the conformational dynamics of FKBP and its deletion mutant, dFKBP. Details of the DMD method and simulation protocols can be found in previous studies24,25. Briefly, DMD is an efficient conformational sampling algorithm and an all-atom DMD model has been shown to fold several small proteins to their native states ab initio25. Using replica exchange DMD simulations, the folding thermodynamics of superoxide dismutase (SOD1) and its variants were computationally characterized in agreement with experiments24. We applied a similar method to study the folding thermodynamics and conformational dynamics of FKBP and dFKBP bound to either rapamycin or both rapamycin and FRB. The X-ray crystal structure of FKBP, FRB and rapamycin was used to set up the simulations (PDB code: 3FAP). Model construction of chimeric kinase. To model FAK with dFKBP insertion, we first manually positioned the dFKBP with various linkers in the proximity of insertion loci of FAK (PDB code: 2J0J) using PyMol (http://www.pymol. org/). To model the relative orientation of iFKBP with respect to FAK, we performed all-atom DMD simulations at 27 °C (ref. 25) with the FAK molecule kept static, whereas dFKBP and linkers were allowed to move. As the simulation temperature is below the folding transition temperature of dFKBP, the inserted domain stays folded while the DMD simulation optimizes its relative orientation with respect to FAK. By clustering the snapshot conformations from equilibrium DMD simulations, the centroid structure was identified. We modeled the chimera in complex with rapamycin and FRB by aligning the corresponding FKBP domains in the chimera and in the complex structure of FKBP, rapamycin and FRB. Similarly, we also constructed the model of FKBP insertion into Src kinase (PDB code: 1Y57). DMD simulations of chimeric kinases. We performed equilibrium DMD simulations of FKBP-dFKBP chimera with different linkers at 27 °C. We also studied wild-type FAK, FAK-dFKBP chimera and FAK-dFKBP chimera in complex with rapamycin and FRB. To reduce the computational overhead, we kept the distal FERM domain of FAK and alpha-helical subdomain of the catalytic domain fixed. We allowed the inserted FKBP and the directly modified catalytic subdomain to sample their conformational space. Similarly, we also studied the Src-dFKBP chimera. The dynamic coupling of the wild-type FAK was obtained by computing the normalized correlation matrix33,34 from DMD simulation trajectories. In the calculation of the dynamics coupling and RMSF, the translational and rotational freedom was reduced by translating the center of mass to the origin and then aligning each snapshot with respect to the average structure.
31. Cai, X. et al. Spatial and temporal regulation of focal adhesion kinase activity in living cells. Mol. Cell. Biol. 28, 201–214 (2008). 32. Gerwins, P., Blank, J.L. & Johnson, G.L. Cloning of a novel mitogen-activated protein kinase kinase kinase, MEKK4, that selectively regulates the c-Jun amino terminal kinase pathway. J. Biol. Chem. 272, 8288–8295 (1997). 33. Sharma, S., Ding, F. & Dokholyan, N.V. Multiscale modeling of nucleosome dynamics. Biophys. J. 92, 1457–1470 (2007). 34. Teotico, D.G. et al. Active nuclear receptors exhibit highly correlated AF-2 domain motions. PLOS Comput. Biol. 4, e1000111 (2008).
doi:10.1038/nbt.1639
resource
A mouse knockout library for secreted and transmembrane proteins
© 2010 Nature America, Inc. All rights reserved.
Tracy Tang1, Li Li2, Jerry Tang2, Yun Li2, Wei Yu Lin3, Flavius Martin3, Deanna Grant1, Mark Solloway1, Leon Parker4, Weilan Ye4, William Forrest5, Nico Ghilardi1, Tamas Oravecz6, Kenneth A Platt6, Dennis S Rice6, Gwenn M Hansen6, Alejandro Abuin6, Derek E Eberhart6, Paul Godowski3, Kathleen H Holt6, Andrew Peterson1, Brian P Zambrowicz6 & Frederic J de Sauvage1 Large collections of knockout organisms facilitate the elucidation of gene functions. Here we used retroviral insertion or homologous recombination to disrupt 472 genes encoding secreted and membrane proteins in mice, providing a resource for studying a large fraction of this important class of drug target. The knockout mice were subjected to a systematic phenotypic screen designed to uncover alterations in embryonic development, metabolism, the immune system, the nervous system and the cardiovascular system. The majority of knockout lines exhibited altered phenotypes in at least one of these therapeutic areas. To our knowledge, a comprehensive phenotypic assessment of a large number of mouse mutants generated by a gene-specific approach has not been described previously. The sequence of the human genome predicts the existence of ~20,500 protein-coding genes1. Computational analyses, mainly by means of sequence alignment, protein structure prediction and protein family classification, enable only speculative prediction of potential molecular functions for newly discovered genes that cannot replace experimental validation of the role of gene products. One of the most effective ways to determine the physiological function and potential therapeutic utility of a gene is to study the phenotypic consequences of its disruption in the mouse germline. The mouse has been established as the premier genetic model organism for studying gene function in development and disease, and efforts are underway to generate a comprehensive embryonic stem (ES)-cell based resource of loss-offunction alleles2,3. The near-complete sequence of the mouse genome reveals that 99% of mouse genes have a homolog in the human genome and that most mouse and human ortholog pairs have a high degree of protein sequence identity with a mean amino acid identity of 78.5%4. Knockout mice have also been shown to be predictive models of drug activity in a retrospective analysis of knockout phenotypes for genes encoding existing drug targets5. Secreted and transmembrane proteins are attractive therapeutic targets because they are accessible to drugs delivered by various modalities. We have previously identified ~1,000 novel secreted and transmembrane proteins in a large-scale effort, the secreted protein discovery initiative (SPDI), using a combination of several different approaches, including signal sequence trap screens in yeast, algorithms that recognized features of signal sequences and a BLAST algorithm that searched for protein sequence similarity to known receptors and ligands6. Based on homology, many of the SPDI genes can be placed in families
consisting of known regulators in key physiological processes such as angiogenesis, apoptosis and immune response. However, despite large efforts involving various in vitro assays, the functions of many of these SPDI proteins remain largely unknown. Here we report a large-scale functional screen using mouse knockouts to identify the biological functions and therapeutic utilities of 472 selected SPDI genes. This work provides a framework for how similar screens can be carried out in the future using the ES cell resources that are nearing completion. RESULTS Gene selection We selected 475 of the ~1,000 SPDI genes for a large-scale functional screen of mouse knockouts based on sequence homology to members of important protein families and expression profiles that suggest importance in key disease or developmental processes. The gene families of interest included, among others, those encoding cytokines, chemokines, leucine-rich repeat proteins and immunoglobulin domain–containing proteins (Supplementary Table 1). Analysis of gene expression in human adult tissues and during mouse embryonic development provided additional selection criteria. To this end, we examined a microarray database of 11,914 normal and diseased human tissue samples across 34 different tissue types (Supplementary Table 2). For 917 SPDI genes, we also used whole mount in situ hybridization to assess expression profiles at various stages of mouse embryonic development (Supplementary Table 3). We focused on genes with tissue-specific and/or disease-specific expression in adult humans as well as those with specific expression patterns during mouse embryonic development (e.g., vascular) (Fig. 1).
1Department
of Molecular Biology, Genentech, Inc., South San Francisco, California, USA. 2Bioinformatics, Genentech, Inc., South San Francisco, California, USA. Genentech, Inc., South San Francisco, California, USA. 4Tumor Biology & Angiogenesis, Genentech, Inc., South San Francisco, California, USA. 5Nonclinical Biostatistics, Genentech, Inc., South San Francisco, California, USA. 6Lexicon Pharmaceuticals, Inc., The Woodlands, Texas, USA. Correspondence should be addressed to F.J.d.S. ([email protected]). 3Immunology,
Received 6 July 2009; accepted 11 May 2010; published online 20 June 2010; doi:10.1038/nbt.1644
nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
749
Figure 1 Gene selection based on expression. (a) In situ hybridization analysis was carried out for 917 SPDI genes, and those with a restricted expression pattern in mouse embryos were selected for the screen. Examples of three genes are shown (mouse orthologs of: left, TMEM108; middle, PLVAP; right, DKK2). (b) Examples of three genes with kidney-, pancreas- and liver-specific expression. Each dot in the graph represents the signal intensity of the indicated microarray probe in a single tissue sample. FXYD4 (231058_at) is expressed highly and specifically in the kidney, REG3G (231661_at) in the pancreas and PGLYRP2 (242817_at) predominantly in the liver. Gray line marks the grand mean intensity across all tissue samples. (c) An example of a gene with elevated expression in tumor samples. The level of SLC39A6 (1555460_a_at) gene expression as measured by Affymetrix signal intensities is higher in the HER2-negative infiltrating ductal carcinoma samples (n = 74) than the normal breast samples (n = 31; one-sided t-test P = 7.2e-4). Gray line marks the grand mean intensity across all tissue samples.
a
Heart
b
30,000
E10.5
Vasculature
E9.75
Eye
Kidney
E9.5
231058_at, FXYD4
20,000 10,000 Affymetrix signal intensity
0 100,000 80,000
231661_at, REG3G
Pancreas
60,000 40,000 20,000 0 20,000 16,000
Liver
242817_at, PGLYRP2
12,000 8,000 4,000 0
Ad ip oo Ad ose d re ve na ss l Bo ne B els m on ar e ro Br w Br ain e C a En Co er st do lore vix Es me cta t l G oph rium H al a ea l b g d la us an dd d er ne H ck Ki ear dn t e Li y v Ly Lu er m ng p M M ho yo u id m sc et le riu N m er v Pa Ov e nc ary Pl re ac as Pr ent os a ta t Sm S e So a kin ft ll i t n St issu t om e a T ch Th est ym is Th u y s U roi rin d ar Wy bc Tissue type
c
Affymetrix signal intensity
In total, 475 genes were selected for knockout analysis (listed in Supplementary Table 2 along with their expression characteristics). Among these 475 genes, 169 genes showed specific expression during development (Fig. 1a and Supplementary Table 2), 98 genes showed adult tissue-specific expression (Fig. 1b and Supplementary Table 4) and 12 genes had elevated expression in tumor samples (Fig. 1c). Based on Protcomp 6.0, 199 genes were predicted to encode secreted proteins, 217 plasma membrane–bound proteins, whereas 59 appeared to encode intracellular membranebound proteins (Supplementary Table 1).
Bl
© 2010 Nature America, Inc. All rights reserved.
resource
35,000 30,000 25,000 20,000 15,000 10,000 5,000
Knockout generation and functional screen Because all the SPDI genes are of human origin, we first identified each mouse ortholog using four criteria, namely reciprocal BLAST analysis, confirmation of synteny, confirmation of similar or identical domain structure and comparable expression pattern. Based on these criteria, a mouse ortholog could be identified for every gene on the list. The mouse genes were then disrupted either by homologous recombination or retroviral insertion. Four hundred twenty-one genes were targeted by homologous recombination, and correct targeting was confirmed by Southern blot analysis. The remaining 54 genes were knocked out by gene trapping using preexisting ES cell clones from the OmniBank library7. Germline transmission was successfully obtained for 472 out of 475 genes. As part of the data available for each mutant line, the targeting strategy and PCR/Southern blot data can be found for each allele in the database (http://mmrrc.mousebiology. org/phenotype/) under the “Expression” program. All knockout lines were subjected to a broad, unbiased phenotypic screen aimed at identifying potential defects in general metabolism, in bone metabolism, or in the function of the cardiovascular, the immune or the neural systems. We also investigated an involvement in oncogene sis. Supplementary Table 5 lists the assays that are directly relevant to these phenotypic categories or therapeutic areas. The phenotyping of 750
0
Normal breast, Normal breast, Infiltrating from patient from patient with ductal without breast breast cancer carcinoma, cancer HER2-positive
Infiltrating ductal carcinoma, HER2-negative
Lobular carcinoma
each knockout line required the offspring of 16 heterozygote matings to provide a target number of eight homozygous animals (four males and four females) per line for the screen, and all animals went through the same battery of assays in the same order temporally from the least to the most invasive assay as outlined and described previously8. All phenotypic assays were validated before being implemented in the full screen (as illustrated for dual-energy X-ray absorptiometry applications in obesity research9), and the testing order of the assays in the screen was carefully designed and tested to ensure that each assay was selfcontained and had no effect on subsequent assays8. In spite of variability in the assays due to the mixed 129/B6 genetic background and the relatively small cohort sizes (2–4 wild type, 0–4 heterozygous, 4–8 homozygous animals) analyzed, the majority of the knockouts exhibited significant changes in one or more phenotypic categories (Fig. 2a). The number of genes exhibiting changes in each phenotypic category when deleted is shown in Figure 2b. Positive phenotype calls (Supplementary Tables 6 and 7) were made using the criteria and statistical rules described in Online Methods. The phenotypic data of individual knockout strains are accessible through http://mmrrc.mousebiology.org/phenotype/. It is important to note that although all phenotypes counted reached statistical significance, their deviation from the norm varied widely. VOLUME 28 NUMBER 7 JULY 2010 nature biotechnology
resource
53 40 8
6
0
1
5
6
7
8
9
0 0
1
2
3
4
50
Number of phenotypic categories
40
Lymphocyte count
Rasgrp1 Rpl22 (0.539) (0.397)
C10orf54 (1.391) Cldn18 (1.658)
18
22
12
Blood chemistry triglyceride Epha6 (1.722)
Apoa5 (4.125)
Gpihbp1 (80.725)
0.001–0.100 0.101–0.400 0.401–0.700 0.701–1.000 1.001–1.300 1.301–1.600 1.601–1.900 1.901–2.200 2.201–2.500 2.501–2.800 2.801–3.100 3.101–3.400 3.401–3.700 3.701–4.000 4.001–4.300 4.301–4.600 4.601–4.900 4.901–5.200 5.201–5.500 5.501–5.800 5.801–6.100 6.101–6.400 6.401–6.700 6.701–7.000 7.001–7.300 7.301–7.600 7.601–7.900 7.901–80.725
f Number of KO lines
Slamf8 (0.778)
50
17
Median hom/pooled wt ratio (weighted)
Median hom/pooled wt ratio (weighted) 60
200 Slc29a3 180 (0.573) 160 140 Angptl4 120 (0.340) 100 80 Angptl3 (0.208) 60 40 20 0
3.701–4.000
3.401–3.700
3.101–3.400
2.801–3.100
2.501–2.800
2.201–2.500
1.901–2.200
1.601–1.900
1.001–1.300
1.301–1.600
0.701–1.000
Slc29a3 (0.397)
0.401–0.700
Gpihbp1 (3.473)
Epha6 (1.463)
d Number of KO lines
Apoa5 (1.460)
Angptl3 (0.624)
0.001–0.100 Number of KO lines
e
Blood chemistry cholesterol
Angptl4 (0.798)
0.101–0.400
220 200 180 160 140 120 100 80 60 40 20 0
240 220 200 180 160 140 120 100 80 60 40 20 0
Ctsl (0.179) Sppl3 (0.367) Cd81 Ppp3cb
Ova lgG1 1:100
Figure 2 Summary of phenotypes. (a) The majority of the knockouts showed changes in one or more phenotypic categories. (b) Breakdowns of phenotypic categories, which are defined by the assays listed in Supplementary Table 5. (c–h) Histograms of gene distribution as a function of changes in cholesterol level (c), triglyceride level (d), lymphocyte count (e), amount of OVA-specific IgG1 (f), red blood cell count (g) and total body volumetric bone mineral density (vBMD; mg/cm3) (h). Median homozygote (hom) values were evaluated and compared to median wildtype (wt) values by calculating a ratio of the two cohorts (hom/wt). The resulting ratios were arranged in a histogram format for evaluation and plotting (orange dots). The histogram data were then normalized and plotted as an overlay (blue line). Several genes of interest (highlighted in purple) and a few known benchmark genes (highlighted in pink) were labeled. Gpihbp1-knockout mice showed greatly elevated cholesterol and triglyceride levels whereas Sost-knockout mice displayed the highest vBMD observed among all knockout and wild-type mice analyzed.
0.001–0.100 0.101–0.400 0.401–0.700 0.701–1.000 1.001–1.300 1.301–1.600 1.601–1.900 1.901–2.200 2.201–2.500 2.501–2.800 2.801–3.100 3.101–3.400 3.401–3.700 3.701–4.000 4.001–4.300 4.301–4.600 4.601–4.900 4.901–5.200
Pik3cd Cd28 Cd79b
1.301–1.350
1.251–1.300
1.201–1.250
1.151–1.200
1.001–1.050
0.951–1.000
0.901–0.950
0.851–0.900
0.801–0.850
0.751–0.800
0.701–0.750
0.801–0.825 0.826–0.850 0.851–0.875 0.876–0.900 0.901–0.925 0.926–0.950 0.951–0.975 0.976–1.000 1.001–1.025 1.026–1.050 1.051–1.075 1.076–1.100 1.101–1.125 1.126–1.150 1.151–1.175 1.176–1.200 1.201–1.225 1.226–1.250 1.251–1.275
Number of KO lines
1.701–1.750
1.101–1.150
1.601–1.650
1.201–1.250
1.501–1.550
1.101–1.150
1.051–1.100
1.001–1.050
0.901–0.950
0.801–0.850
0.701–0.750
0.601–0.650
0.501–0.550
0.401–0.450
0.301–0.350
0.201–0.250
1.401–1.450
Sgpl1
20 (0.212)
1.301–1.350
confidence in the neurology category is the absence of a startle response to 120 dB in the 10 knockout mice (pre-pulse inhibition assay), 0 reflecting hearing impairment. Degeneration of sensory cochlear hair cells in the inner ear in the case of Cldn14 knockout mice (data not shown)12 and degeneration of the organ Median hom/pooled wt ratio Median hom/pooled wt ratio of Corti in Tmprss3 were also observed in the g 250 h 150 knockout mice (data not shown). Thus, these Thpo Red blood cell count Volumetric bone mineral density (0.958) Chsy1 125 mice will help elucidate the basis of here 200 (0.902) Susd1 Sost Kcnn4 (0.885) 100 (1.026) ditary deafness in these families and the genes (1.253) Osm (0.878) 150 Gpr54 (0.898) 75 Crtap (0.891) Klotho Tmprss6 Slc29a3 identified may point toward additional loci 100 Lrp5 (0.886) (1.237) (1.203) (0.843) 50 that may harbor mutations in cases of here 50 25 ditary deafness in man13. 0 0 Embryonic lethal knockouts are of parti cular interest as they reflect the potential role of a gene in one of many cellular pro cesses that not only occur during embryonic Median hom/pooled wt ratio Median hom/pooled wt ratio (weighted) development but may also be misregulated To illustrate the range of phenotypic changes, we also plotted the in diseases, such as cancer14,15. The observed 8% pre-weaning results of a few assays with easily quantifiable data on histograms lethality in our screen is within the range of estimates of essential (Fig. 2c–h). This type of graph clearly highlights the phenotypes that genes reported for mice (8–20% based on region-specific N-ethyl-Nnitrosourea (ENU) mutagenesis screens) and for other organisms most strongly deviate from the norm. (yeast Saccharomyces cerevisiae, 19%; zebrafish, 6–10%; Caenorhabditis elegans, 7%)16. Further studies of these lethal lines are needed to deterUtility of the KO resource A search of the OMIM (Online Mendelian Inheritance in Man) data- mine the age and/or developmental stage and cause of lethality. The base, which contains information on all known human Mendelian list of lethal genes can be found in Supplementary Table 2. Several disorders and >12,000 genes, revealed that 98 of the 475 SPDI genes of these genes, including ALG2, CDAN1, CLDN1, COG7, CRELD1, overlap with genes for which human mutations or genetic linkages DOLK, EFEMP2 and GPC6, have been linked to congenital human have been described (Supplementary Table 2). Thus, these mice pro- disorders (Table 1). vide a number of animal models for human diseases that can be used The primary phenotypic screen described here provides clues as to to study pathophysiology or test potential therapeutics (Table 1). the functions of a large set of secreted and transmembrane proteins. For example, mutations in TMPRSS3 have been identified in humans In several cases these results led directly to further characterization to be responsible for autosomal recessive nonsyndromic hearing of the knockout mice and the identification of important bioloss10,11. We identified seven mouse genes (Cldn14, 5930434B04Rik, logical processes in which these molecules are involved, such as Flrt2, Myadm, Lrrc4, Sumf1 and Tmprss3) that appear to be essential Tspan12 in angiogenesis17, Angptl4 and Gpihbp1 in metabolism18,19, for normal hearing. One phenotype identified with a high degree of Epha6 in neurology 20, and Sulf1 and Sulf2 in development 21. 30
Number of KO lines
© 2010 Nature America, Inc. All rights reserved.
Number of KO lines
c
37
0 Reduced viability Reproductive biology
8
101
Neurology
20
110
Metabolism
40
112 100
Immunology
60
178
Oncology
84
80
150
150
Embryonic lethality Growth change
100
188
200
Bone metabolism
Number of genes
122
120
250
Ophthalmology
b
150
140
Cardiology
160
Number of genes
a
nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
751
resource Table 1 Mouse models of human Mendelian disordersa Gene symbol
Human disorderb
Mouse knockout phenotypes
ADAM33 AGR3 ALB
Asthma susceptibility Breast cancer Bisalbuminemia; analbuminemia; familial dysalbuminemic hyperthyroxinemia Congenital disorders of glycosylation Acute promyelocytic leukemia Susceptibility to reduced triglycerides Hypertriglyceridemia; type III hyperlipidemia Osteoarthritis; lumbar disc degeneration Late-onset retinal degeneration Congenital dyserythropoietic anemias Neonatal ichthyosis-sclerosing cholangitis Autosomal recessive deafness Congenital disorder of glycosylation, type Iie Atrioventricular septal defect Dolichol kinase deficiency Doyne honeycomb retinal dystrophy; cutis laxa Autosomal dominant hypophosphatemic rickets; familial tumoral calcinosis with hyperphosphatemia Simpson dysmorphia syndrome; Simpson-Golabi-Behmel syndrome Omodysplasia Association with height Inflammatory bowel disease Rett syndrome Recessive osteopetrosis
Immunological abnormality Ovarian and uterine hypoplasia; few ducts in mammary glands Metabolic abnormalities
© 2010 Nature America, Inc. All rights reserved.
ALG2 AMICA1 ANGPTL4 APOA5 ASPN C1QTNF5 CDAN1 CLDN1 CLDN14 COG7 CRELD1 DOLK EFEMP2 FGF23 GPC4 GPC6 HHIP MUC20 NTNG1 OSTM1 OTOR POMGNT1 REG4
SEL1L SEMA4A
Inner ear dysfunction Muscle-eye-brain disease Crohn disease and ulcerative colitis; mucinous tumors or neuroendocrine tumors Type II diabetes and insulin resistance-related hypertension; obesity-related phenotypes Pancreatic carcinomas Retinitis pigmentosa-35
SLC29A3 SOST STRA6 SUMF1
H syndrome Sclerosteosis; van Buchem disease Microphthalmia, syndromic 9; Matthew-Wood syndrome Multiple sulfatase deficiency
TBL2 TMPRSS3 TMPRSS4 TMPRSS6
Williams-Beuren syndrome Autosomal recessive deafness Pancreatic cancer Iron-refractory iron deficiency anemia
VAPB
Amyotrophic lateral sclerosis (ALS8)
RETN
Lethality of (−/−) mutants Immunological abnormality Metabolic abnormalities Metabolic abnormalities Immunological abnormality Retinal degeneration and blood vessel attenuation Lethality of (−/−) mutants Lethality of (−/−) mutants Hearing impairment; degeneration sensory cochlear hair cells in the inner ear Lethality of (−/−) mutants Lethality of (−/−) mutants Lethality of (−/−) mutants Lethality of (−/−) mutants Growth retardation; reduced viability; diffuse osteodystrophy and metastatic calcification Bone and metabolic abnormalities Lethality of (−/−) mutants Lethality of (−/−) mutants; increased weight in HETs Immunological abnormalities Neurological abnormalities Decreased mean body weight and length; diffuse marked osteopetrosis, diffuse moderate retinal degeneration and multifocal mild neuronal necrosis Decreased time in center may indicate increased anxiety-related response Developmental malformation of the brain; retinal degeneration; decreased body weight Bone, metabolic and immunological abnormalities Metabolic abnormalities Lethality of (−/−) mutants Retinal degeneration, attenuated retinal vessels, microaneurysms and a decreased mean retinal artery-to-vein ratio Bone, cardiology, metabolic, neurological and immunological abnormalities Bone osteopetrosis Growth retardation; bone, cardiology and metabolic abnormalities Reduced viability; growth retardation; histologic changes consistent with lysosomal storage disease; hypoactivity; deficits in motor coordination; no startle response Increased mean body weight, body length and bone-related measurements Impaired hearing; degeneration of the organ of Corti Increased mean skin fibroblast proliferation rate Hypochromasia and anisocytosis; decreased mean hemoglobin and hematocrit; increased mean RBC and platelet counts; variable size of the RBCs Enhanced motor coordination
HET, heterozygotes; RBC, red blood cell. aSubset
of the 98 SPDI genes that overlap with genes for which human mutations or genetic linkages have been described. bOMIM. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, Maryland) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, Maryland), (Oct. 22, 2009), http://www.ncbi.nlm.nih.gov/omim/.
We highlight here a specific unpublished example to illustrate how the resource can be called upon to interrogate gene function. CLEC1B (also known as CLEC2) was originally identified by sequence similarity to C-type lectin-like receptors and found to be expressed in the liver and in some blood cells, mostly of myeloid origin22. In the initial broad phenotypic characterization, the Clec1bdeficient mice exhibited reduced viability, which was subsequently confirmed in larger cohorts (Supplementary Table 8). The two surviving Clec1b mutant mice showed numerous abnormalities in the phenotypic screening panel, including reduced platelet counts, anemia, decreased blood pressure and increased percentage of CD4+ in peripheral blood mononuclear cells (Fig. 3a and data not shown). Because the P values in these assays were not calculated for the Clec1b homozygous animals (n = 2) compared to the wild-type littermates or the wild-type reference controls, these hematological changes were not called based on the statistical thresholds and rules described in the 752
Online Methods. Histological analysis of mutant embryos revealed multiple loci of hemorrhage (Fig. 3b). Follow-up characterization confirmed the hemorrhage phenotypes in Clec1b-deficient embryos; hemorrhagic lesions were also observed outside the nervous system (Fig. 3c and data not shown). This set of abnormalities suggested defects in blood vessel integrity and/or a coagulation defect caused by the reduction in platelet counts. To determine if the expression of Clec1b is consistent with either a role in integrity of the vasculature or coagulation, we carried out flow cytometry and immunohistochemistry analysis. Clec1b protein is highly expressed on the cell surface of platelets and megakaryocytes, at lower levels on liver Kupffer cells, and not on T or B cells (Fig. 3d,e, Supplementary Fig. 1 and data not shown). In addition, co-staining with CD41 confirmed expression of Clec1b in megakaryocytes (Supplementary Fig. 2). Clec1b is not detected by immunostaining on PECAM-positive endothelial cells (data not shown). VOLUME 28 NUMBER 7 JULY 2010 nature biotechnology
14
HGB × g/dL
10 8
HET
wt
HOM
50
18
45
16
40
14
WT
HET
HOM
10
c
Clec1b -/-
35 30
12
6 WT
b
12
20
HCT %
2,200 2,000 1,800 1,600 1,400 1,200 1,000 800 600
RBC × 10^6 / uL
a PLT × 10^3 / uL
WT
HET
HOM
wt
25
WT
HET
HOM
Clec1b -/-
12 dpc
12 dpc Clec1b -/-
Transverse sections, 12.5 dpc Platelets
d
13 dpc Kupffer cells
Monocytes
e Hu
Mu
Clec1b
f
Clec1b antibody
600
600
500
500
400 300 200
Bleed time (sec)
Figure 3 Clec1b deficient mice. (a) Surviving Clec1b mutant mice show decreased platelet (PLT) count, mean red blood cell (RBC) count, hemoglobin (HGB) concentration, and hematocrit (HCT) level (WT = wild type, n = 4 (green dots); HET = heterozygotes, n = 8 (blue dots); HOM = homozygotes, n = 2 (red dots); historical wild type means ± 1 s.d. (light gray); historical wild type means ± 2 s.d. (darker gray)). (b) Clec1b mutant embryos show multiple hemorrhagic lesions. At 12.5 days post coitus (dpc) multiple foci of congestion and hemorrhage in the brain (diencephalon) and vestibulocochlear ganglion were observed (arrow). In addition, an increased number of dilated capillaries were observed in the affected areas of the developing brains. Bilateral hemorrhages were also observed adjacent to the neuroepithelium (arrowheads). Scale bar, 500 μm. (c) Hemorrhages in 12 dpc and 13 dpc mutant embryos. Top and bottom right, whole embryos. Bottom left, close-up of dilated vasculature adjacent to the mutant neural epithelium. Note broken vessel wall and many enucleated red blood cells mixed throughout the population (arrow). Scale bar, 50 μm. (d) Clec1b protein is highly expressed on platelets and low levels on Kupffer cells. (e) In adult mouse tissues by immunohistochemistry Clec1b is detected in spleen (top, 60× magnification) and liver (bottom, 40× magnification). Most intensely stained cells are megakaryocytes (arrowheads). Scale bars, upper, 30 µm; lower, 50 µm. (f) Clec1b-Fc injected into adult mice has modest (P = 0.13) but reproducible effects on bleeding time (left, one of two experiments shown; CLEC1b-Fc or isotype control: 150 μg/100 μl intraperitoneal, 3×/week). Anti-CD41 mAb used as positive control (right, anti-CD41 or rat IgG1: 30 μg/100 μl intravenous).
Bleed time (sec)
© 2010 Nature America, Inc. All rights reserved.
resource
Isotype antibody
400 300 200
100 100 The systemic hemorrhage and reduced 0 0 viability of the Clec1b mutants resemble that Rat IgG1 Anti-CD41 Isotype control CLEC1B-Fc of Syk-deficient mice. Syk, a tyrosine kinase also expressed in platelets, is activated by stimulation with thrombin and platelet-derived growth factor β or alleles and occasionally in hypermorphic, neomorphic or antimorby binding to integrins23,24. Thus, we examined whether blocking phic alleles. The expected frequency of generating a null allele for a endogenous Clec1b activity acutely in adult mice would affect platelet given gene using ENU mutagenesis is estimated to be only roughly function. We found that a soluble form of CLEC1B leads to a modest, 1 out of 10 mutations27. Although the completion of the mouse nonsignificant (P = 0.13) increase in tail bleeding time when injected genome sequence and advances in genetic techniques accelerate the in adult mice (Fig. 3f). In vitro studies with isolated platelets demon genetic mapping of mutations, the complexity associated with the strated induced tyrosine phosphorylation of Clec1b upon platelet identification of mutated genes remains a major challenge for activation25. Furthermore, anti-CLEC1B antibody treatment in vivo phenotype-driven screens in mouse. The availability in the near future showed that Clec1b is required in adult mice for normal hemostasis of a comprehensive set of loss-of-function alleles in mouse ES cells and thrombosis26. Taken together, these results indicate that Clec1b means that genotype-driven approaches are likely to be an important is important in platelet function. The Clec1b knockout mice provide approach for mouse genetics. Although considerable improvements in technology render the a resource for interrogating the physiological significance of CLEC1B on platelets and an illustration of how the phenotypic information can generation of a comprehensive collection of knockout mice for each be mined for clues that, in combination with other information, can gene achievable in the near future, the phenotypic analysis of the mutant strains will constitute a major challenge. To carry out an initial lead directly to biological insights. evaluation of the genotype-driven approach to a large-scale screen, DISCUSSION we have generated mouse knockouts for 472 genes encoding secreted Several large-scale, ‘phenotype-driven’ screens have been performed and transmembrane proteins and subjected the resulting mice to a in the mouse using ENU to induce mutations at random and identi- large set of assays across multiple functional areas. Cohorts of all fying mutants based on phenotype. ENU introduces primarily point 472 mutant lines were examined in an unbiased manner in a large mutations, and these mutations commonly result in hypomorphic panel of different assays, revealing phenotypes in areas of neurology,
nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
753
© 2010 Nature America, Inc. All rights reserved.
resource immunology, metabolism, cardiology and/or bone metabolism. This is the largest genotype-driven phenotypic screen reported in mice and illustrates both the power and limitations of such an approach. The ability to select a specific set of genes for phenotypic screening allows us to not only determine whether gene families grouped by structural similarities, enzymatic activities or other attributes share common roles in specific biological processes but also identify functions that are unique to the individual genes by comparing their knockout phenotypes under the same assay conditions. In addition, by examining the phenotypes of mice deficient in genes associated with a specific disease on a large scale, such as reported here, we may be able to identify the driver genes and pathways that are misregulated in the disease of interest. However, the incorporation of challenge assays in such a screen may be necessary (e.g., dietary modifications for metabolic disorders or sensitizing the mice by crossing them with mice deficient in tumor suppressor p53 for cancers). The primary screen phenotypes allowed us to define the physiological role of the genes disrupted and to identify their molecular functions by further detailed characterization of the individual knockouts. It also provided us with a large set of phenotypic data, together with the gene expression data, to advance our understanding at the system level. Gene expression is often used as an initial guide for further functional analysis and was indeed one of the main criteria we used to select the genes for this screen. We evaluated the predictive value of gene expression profiling by examining whether there are associations between phenotype and tissue distribution for genes with expression restricted to specific tissues, using both human (Supplementary Table 2) and mouse (downloaded from the Genomics Institute of the Novartis Research Foundation SymAtlas website http://symatlas.gnf. org/)28 gene expression data sets. Although we detected some interesting trends, no statistically significant correlation between phenotype and expression could be detected, even when the 199 predicted, secreted protein-coding genes were excluded from the analysis as they may act on distant tissue (data not shown). Significant correlations may develop after additional phenotypic characterization, but these results highlight the utility of an unbiased phenotypic screen to discover unexpected gene functions that may be overlooked by more targeted analysis of mutant mice. The majority of the knockout phenotypes shed light on the physio logical functions of the molecules that were targeted. Several strains also provide models for human disorders. However, as with any primary screen, follow-up validation, using larger cohorts of mice and repeated tests, is essential. For example, a large number of the neurology mutants exhibited altered anxiety-like and/or depressionlike responses as indicated by alterations in the open field test and tail suspension assays. However, mice with abnormal activity levels caused by non-neurological defects will also score in these assays. Thus, most of the neurological phenotypes seen in this primary screen await confirmation by repeated testing with larger numbers of mice as well as further in-depth phenotypic characterization. In addition, for mouse screens, the genetic background, whether mixed or pure bred, are especially important as phenotypes are often influenced by the genetic background of the knockout analyzed. Future screens based on a pure C57Bl/6 background are likely to have increased effectiveness, and testing on different genetic backgrounds will be important to elucidate the true function of specific genes. The resource we present here provides a starting point for much more detailed analysis of mutants of interest to investigators specialized in specific biological or therapeutic areas. The coming few years will see the creation of loss-of-function alleles for every gene in the mouse genome, an endeavor currently 754
undertaken by the Knockout Mouse Project, the European Conditional Mouse Mutagenesis Program and the North American Conditional Mouse Mutagenesis Project2,3. Comprehensive phenotyping of these mutant lines is being undertaken using large-scale platforms for systemic phenotypic analysis that are being developed by a consortium of research institutes, such as the National Research Center for Environment and Health (Neuherberg, Germany), Institute Clinique de la Souris (Illkirch , France) and Medical Research Council Harwell (UK) (http://www.eumorphia.org/)29,30. The challenges of systematic phenotyping of large numbers of genetically modified mouse knockout lines cannot be underestimated. To generate data that could be meaningfully analyzed by statistical methods, a sufficient number of mice must be included in each assay, which would undoubtedly require multiple matings. If one is studying a small number of knockout lines per year, very large cohort sizes can be used. To perform the broad phenotypic screen on the 472 lines in a high-throughput manner as we have done here, we limited the cohort size while maintaining the ability to detect significant phenotypic changes. We chose to use eight homozygous mice (equally divided between males and females) per assay generated by 16 heterozygous matings, as we were most interested in seeing effects shared by the sexes. Furthermore, to establish the systematic screening approach, we first identified the assays that are directly relevant to the phenotypic categories or therapeutic areas. Then all assays were validated to ensure reproducibility and robustness before being implemented in the full screen. As all animals went through the same phenotypic screen in the same order, the testing order of the assays and the age of the mice at which a particular assay was performed were all important parameters that were carefully evaluated in the design of the screen8. Our work provides a glimpse of the complexity associated with doing large-scale phenotyping of knockout mouse lines. The collection of mutant mice reported here should help to accelerate further investigation of the genes we mutated and represents a meaningful addition to the worldwide effort aimed at generating a comprehensive collection of gene-deficient mice, ultimately targeting every proteinencoding gene in the mouse genome. Methods Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturebiotechnology/. Note: Supplementary information is available on the Nature Biotechnology website. Acknowledgments We thank J. Brennan, S. Bunting, L. Corson, P. Fielder, E. Filvaroff, D. French, J. Junutula, F. Peale, H. Phillips, M. Rohrer, H. Stern, J. Zha, R. Watts, B. Wolf and scientists in the Genentech Immunology Department for critical review of the knockout phenotypes and E. Bierwagen and D. Wan for the bioinformatics infrastructures used to track phenotypic calls. We thank J. Mitchell for analysis and plotting of histograms depicting phenotypic ranges. We also thank M. TessierLavigne, F. Bazan, M. Kong-Beltran, J. Theunissen, S. Warming and Z. Zhang for critical reading of the manuscript. Author Contributions F.J.d.S., T.T., A.P. and B.P.Z. designed the project, analyzed data and wrote the manuscript. F.M. and P.G. designed experiments and analyzed data. J.T. and N.G. contributed to the identification of murine orthologs. K.H.H. and N.G. contributed to the design and verification of targeting strategies. T.O., K.A.P., D.S.R., G.M.H., A.A., D.E.E., K.H.H. and B.P.Z. designed, performed and supervised the knockout generation and phenotype screen. T.T., L.L. and W.F. contributed to the statistical analysis and making the phenotype calls. J.T. and Y.L. implemented the database for public access. L.L. compiled the adult tissue calls. L.P. and W.Y. performed the embryonic in situ hybridization screen. W.Y.L., F.M., D.G. and M.S. performed the follow-up characterization of the Clec1b mutant line.
VOLUME 28 NUMBER 7 JULY 2010 nature biotechnology
resource COMPETING FINANCIAL INTERESTS The authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/naturebiotechnology/.
© 2010 Nature America, Inc. All rights reserved.
Published online at http://www.nature.com/naturebiotechnology/. Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions/.
1. Clamp, M. et al. Distinguishing protein-coding and noncoding genes in the human genome. Proc. Natl. Acad. Sci. USA 104, 19428–19433 (2007). 2. Austin, C.P. et al. The knockout mouse project. Nat. Genet. 36, 921–924 (2004). 3. Friedel, R.H., Seisenberger, C., Kaloff, C. & Wurst, W. EUCOMM–the European conditional mouse mutagenesis program. Brief. Funct. Genomics Proteomics 6, 180–185 (2007). 4. Mouse Genome Sequencing Consortium & Waterston, R.H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002). 5. Zambrowicz, B.P. & Sands, A.T. Knockouts model the 100 best-selling drugs–will they model the next 100? Nat. Rev. Drug Discov. 2, 38–51 (2003). 6. Clark, H.F. et al. The secreted protein discovery initiative (SPDI), a large-scale effort to identify novel human secreted and transmembrane proteins: a bioinformatics assessment. Genome Res. 13, 2265–2270 (2003). 7. Zambrowicz, B.P. et al. Disruption and sequence identification of 2,000 genes in mouse embryonic stem cells. Nature 392, 608–611 (1998). 8. Beltrandelrio, H. et al. Saturation Screening of the Druggable Mammalian Genome. in Model Organisms in Drug Discovery (eds. Carroll, P.M. & Fitzgerald, K.) 251–278, (John Wiley & Sons, Chichester, West Sussex, England, 2003). 9. Brommage, R. Validation and calibration of DEXA body composition in mice. Am. J. Physiol. Endocrinol. Metab. 285, E454–E459 (2003). 10. Scott, H.S. et al. Insertion of beta-satellite repeats identifies a transmembrane protease causing both congenital and childhood onset autosomal recessive deafness. Nat. Genet. 27, 59–63 (2001). 11. Guipponi, M., Antonarakis, S.E. & Scott, H.S. TMPRSS3, a type II transmembrane serine protease mutated in non-syndromic autosomal recessive deafness. Front. Biosci. 13, 1557–1567 (2008). 12. Ben-Yosef, T. et al. Claudin 14 knockout mice, a model for autosomal recessive deafness DFNB29, are deaf due to cochlear hair cell degeneration. Hum. Mol. Genet. 12, 2049–2061 (2003).
nature biotechnology VOLUME 28 NUMBER 7 JULY 2010
13. Friedman, L.M., Dror, A.A. & Avraham, K.B. Mouse models to study inner ear development and hereditary hearing loss. Int. J. Dev. Biol. 51, 609–631 (2007). 14. Fan, B. et al. Hepatocyte growth factor activator inhibitor-1 (HAI-1) is essential for the integrity of basement membranes in the developing placental labyrinth. Dev. Biol. 303, 222–230 (2007). 15. Yan, M. & Plowman, G.D. Delta-like 4/Notch signaling and its therapeutic implications. Clin. Cancer Res. 13, 7243–7246 (2007). 16. Wilson, L. et al. Random mutagenesis of proximal mouse chromosome 5 uncovers predominantly embryonic lethal mutations. Genome Res. 15, 1095–1105 (2005). 17. Junge, H.J. et al. TSPAN12 regulates retinal vascular development by promoting Norrin- but not Wnt-induced FZD4/β-catenin signaling. Cell 139, 299–311 (2009). 18. Beigneux, A.P. et al. Glycosylphosphatidylinositol-anchored high-density lipoproteinbinding protein 1 plays a critical role in the lipolytic processing of chylomicrons. Cell Metab. 5, 279–291 (2007). 19. Desai, U. et al. Lipid-lowering effects of anti-angiopoietin-like 4 antibody recapitulate the lipid phenotype found in angiopoietin-like 4 knockout mice. Proc. Natl. Acad. Sci. USA 104, 11766–11771 (2007). 20. Savelieva, K.V. et al. Learning and memory impairment in Eph receptor A6 knockout mice. Neurosci. Lett. 438, 205–209 (2008). 21. Holst, C.R. et al. Secreted sulfatases Sulf1 and Sulf2 have overlapping yet essential roles in mouse neonatal survival. PLoS ONE 2, e575 (2007). 22. Colonna, M., Samaridis, J. & Angman, L. Molecular characterization of two novel C-type lectin-like receptors, one of which is selectively expressed in human dendritic cells. Eur. J. Immunol. 30, 697–704 (2000). 23. Turner, M. et al. Perinatal lethality and blocked B-cell development in mice lacking the tyrosine kinase Syk. Nature 378, 298–302 (1995). 24. Cheng, A.M. et al. Syk tyrosine kinase required for mouse viability and B-cell development. Nature 378, 303–306 (1995). 25. Suzuki-Inoue, K. et al. A novel Syk-dependent mechanism of platelet activation by the C-type lectin receptor CLEC-2. Blood 107, 542–549 (2006). 26. May, F. et al. CLEC-2 is an essential platelet activating receptor in hemostasis and thrombosis. Blood 114, 3464–3472 (2009). 27. Cordes, S.P. N-ethyl-N-nitrosourea mutagenesis: boarding the mouse mutant express. Microbiol. Mol. Biol. Rev. 69, 426–439 (2005). 28. Su, A.I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA 101, 6062–6067 (2004). 29. The Eumorphia Consortium EMPReSS: standardized phenotype screens for functional annotation of the mouse genome. Nat. Genet. 37, 1155 (2005). 30. Morgan, H. et al. EuroPhenome: a repository for high-throughput mouse phenotyping data. Nucleic Acids Res. 38, D577–D585 (2010).
755
© 2010 Nature America, Inc. All rights reserved.
ONLINE METHODS
Knockout mice generation and phenotypic screen. All animal procedures were conducted in conformity with Institutional Animal Care and Use Committee guidelines as previously described31–34. To generate the knockouts by homo logous recombination, correctly targeted 129S5/SvEvBrd ES cell clones were microinjected into C57BL/6J-Tyrc-Brd blastocysts. Resulting chimeras were mated with C57BL/6J-Tyrc-Brd females to produce F1 heterozygotes. For the broad, unbiased phenotypic screen, F1 heterozygotes were intercrossed to produce F2 wild-type, heterozygote and homozygote cohorts. All phenotypic analyses were performed on a cohort of 2–4 wild-type, 0–4 heterozygous and 4–8 homozygous mutant mice between 12 and 16 weeks of age unless reduced viability necessitated earlier testing. In addition to the wild-type litter mate controls, the mutant phenotypes were also compared to those of the cumulative wild-type historical controls and the wild-type reference controls from the same 3- to 6-week time windows during the screen. Methods for knockout generation and screen assays have been published previously8,31–34. Gene trap knockout lines were generated using OminiBank ES cell clones as previously described35,36. All gene targeting was done using vectors made by simple PCR amplification of the homology arms from genomic DNA or the methods described37. As part of the data available for each mutant line, the targeting strategy and PCR/Southern blot data can be found for each allele in the database MMRRC (http://mmrrc.mousebiology.org/phenotype/) under the “Expression” program. Namely, one can find in the database the schematic showing the insertion site and the PCR data showing the transcript loss for each gene trap allele, and for each homologous recombination allele, a map of the gene, targeting vector, Southern blot probes to confirm correct targeting and the Southern blot data. The genomic sequence information (the exact genomic insertion site for each gene trap allele; the deleted genomic sequence for each homo logous recombination allele) and the primer sequence for genotyping will be provided upon request. Knockout phenotype calls. For assays with numerical measurements that are represented in the phenotype database (http://mmrrc.mousebiology. org/phenotype/) as dot plot graphs, we use the following rules to identify those with a significant change between the homozygous animals and the wild-type animals: (i) both “pval_hom_wt” and “pval_hom_wtref ” must be ≤ 0.05, (ii) when “pval_hom_wt” is not available due to a small sample size (n < 3), “pval_hom_wtref ” must be ≤ 0.05 and (iii) the absolute value of the difference between the mean of the homozygous animals (hom_mean) and the mean of the wild-type littermate controls (wt_mean) must be greater than or equal to the s.d. of the wild-type reference controls (wt_ref_std_dev) (that is, |hom_mean - wt_mean| ≥ wt_ref_std_dev). “pval_hom_wt” is defined as the P value for the comparison between the homozygous animals and the agematched wild-type littermate controls; “pval_hom_wtref ” for the comparison between the homozygous animals and the age-matched wild-type reference controls, which were wild-type animals from other lines that were analyzed within the same 3- to 6-week time window during the screen. These nonlittermate wild-type animals were included to increase the wild-type sample size and power the statistical analysis. Both “pval_hom_wt” and “pval_hom_ wtref ” were calculated using a two-sided Wilcoxon rank sum test; the calculations were done only when the sample size of each group was ≥ 3. The P values were calculated for the male and female combined (mf), the male alone (m) and the female alone (f) data sets, and filtered with the above statistical thresholds independently. An assay was called as long as one data set met the above criteria (Supplementary 6). Supplementary Table 6 shows the assays in which significant changes were observed between the homozygous animals and the wild-type animals, detailing the number of mice, the mean values and the P values for the male and female combined (mf), the male alone (m), and the female alone (f) data sets in these assays. These statistical thresholds and rules were not applied to nonquantitative assays such as pathology, angiogram and fundus picture data, or to assays for which we could not calculate the P values by the method mentioned above such as inverted screen, body weight over time and startle response at 120 db when mutants have hearing impairment. Findings from pathology (gross and microscopic) and ophthalmology (fundus picture data and angiogram) were categorized based on the observed histological phenotypes. Whereas fundus
nature biotechnology
and angiogram allowed for detection of defect in vasculatures in the eye, alter ations found in these assays were categorized in both cardiology and ophthal mology. A knockout line was embryonic lethal when either no homozygous animals were observed at the time of genotyping (2 weeks of age), or at the time of genotyping homozygous animals were observed in fewer numbers than expected, and all died before testing (starting ~12 weeks of age). A knockout line had reduced viability when either homozygous animals were observed at the expected Mendelian ratio at the time of genotyping (2 weeks of age), but all or a portion of them died before testing (starting at ~12 weeks of age), or at the time of genotyping fewer homozygous animals than expected were observed, and the surviving homozygous animals completed the phenotypic analysis or a portion thereof. Supplementary Table 7 lists the calls from nonquantitative assays. The assay calls listed in both Supplementary Tables 6 and 7 were then translated into calls in the phenotypic categories based on the breakdown of the assays into the 11 phenotypic categories shown in Supplementary Table 5. A knockout line was positive in a phenotypic category as long as one of the assays in the phenotypic category was called (Supplementary Tables 2, 6 and 7). The gene lists for knockouts in the 11 phenotypic categories are shown in Supplementary Table 2. Histograms. Median homozygote (hom) values were evaluated and compared to median wild-type (wt) values by calculating a ratio of the two cohorts (hom/wt). The resulting ratios were arranged in a histogram format for evaluation and plotting (orange dots in Fig. 2c–h). The histogram data was then normalized and plotted as an overlay (blue line in Fig. 2c–h). Several genes of interest and a few known benchmark genes were labeled. If the assay was affected by gender differences, this was accounted for by modifying the ratio as follows: Ratio = {[DMH × (NMW + NMH)/ DMW] + [DFH × (NFW + NFH)/ DFW]}/(NMW + NMH + NFW + NFH) Where: NMW = number of male wts; NFW = number of female wts; NMH = number of male homs; NFH = number of female homs; DMW = mean of male wts; DFW = mean of female wts; DMH = mean of male homs; DFH = mean of female homs. Microarray expression analysis. A compendium of microarray datasets including11,914 normal and diseased human tissue samples across 34 different tissue types (GeneLogic, Gaithersburg, Maryland) was used to examine the expression profiles of SPDI genes. The data were obtained from assays for mRNA abundance using Affymetrix U133 Plus 2.0 Array and were preprocessed using Affymetrix MAS5.0. Expression profiles across normal human tissues were examined to assess the adult tissue distribution (adipose (97 samples); adrenal (14 samples); blood vessels (67 samples); bone (8 samples); bone marrow (4 samples); breast (32 samples); cervix (67 samples); brain (1,791 samples); colo rectal (226 samples); endometrium (16 samples); esophagus (18 samples); gall bladder (10 samples); head and neck (6 samples); heart (126 samples); kidney (77 samples); liver (37 samples); lung (115 samples); lymphoid (39 samples); muscle (90 samples); myometrium (159 samples); nerve (15 samples); ovary (127 samples); pancreas (22 samples); placenta (15 samples); prostate (40 samples); skin (52 samples); small int (small intestine) (171 samples); soft tissue (3 samples); stomach (47 samples); testis (20 samples); thymus (71 samples); thyroid (14 samples); urinary (8 samples); wbc (white blood cell) (222 samples)). The tissue calls shown in the Human Tissue Expression column in Supplementary Table 2 were made by using the following rules: (i) a tissue is called when the mean Affymetrix signal intensity for the particular tissue is >500 and z-score >1.5 calculated using the distribution of mean intensities from all tissues, (ii) the expression is considered tissue specific if tissue call is not null and coefficient of variation (CV) > 1 calculated using the distribution of the mean intensities from all tissues, (iii) the gene is considered to have broad expression if it has mean Affymetrix signal intensities >500 in >75% of the tissue types and is not tissue-specific based on rule (ii). CV is defined as the ratio of the s.d. to the mean. Embryonic in situ hybridization. Relevant murine expressed sequence tags and cDNA clones were obtained from Open Biosystems, and primers specific
doi:10.1038/nbt.1644
© 2010 Nature America, Inc. All rights reserved.
to the plasmid backbone of each clone were used to amplify their respective inserts by PCR. The PCR product was verified by dideoxynucleic acid sequencing and subsequently used as a template for the synthesis of digoxigeninlabeled riboprobes using a kit from Roche Applied Science and following the manufacturer’s instructions. Embryos dissected from CD-1 mice at approximately the following stages: E7.5, E8.5, E9.5, E10.5, E11.5 and E12.5, in L15 media with 4% heat inactivated horse serum were fixed in 4% paraformaldehyde (PFA) in PBS for 4 h at room temperature (20–25 °C) or 4 °C overnight with rocking. They were then washed, dehydrated on ice through a methanol/PBS-0.1% (vol/vol) Tween-20 (PBST) series, bleached for 1 h in methanol/H2O2 (4:1), washed again in methanol and stored at −20 °C in methanol until needed for analysis. Before subjecting the embryos to whole-mount in situ hybridization analysis, they were rehydrated through a methanol/PBST series and permeabilized for 10–30 min by treatment with proteinase K at 10 or 20 μg/ml, depending on the stage. Proteinase K digestion was stopped with two rapid washes in cold PBST and refixation in 4% PFA/0.2% gluteraldehyde in 1× PBS for 20 min, followed by extensive washing in PBST. At this point, 2–3 embryos at each of the aforementioned stages were equilibrated into hybridization buffer (50% de-ionized formamide, 5× SSC, 40 μg/ml heparin, 100 μg/ml denatured salmon sperm DNA, 50 μg/ml yeast tRNA, 0.1% Tween-20, pH 4.5–5) and as a group, loaded into the columns of the Intavis InSituPro whole-mount in situ hybridization robot. In the robot, the embryos were prehybridized for 4 h at 68 °C in hybridization buffer with 1% (wt/vol) SDS added. Hybridization was performed overnight with 1 ng/ml riboprobe in hybridization buffer without SDS. After hybridization, the embryos were washed over several hours with 50% formamide/5× SSC and then 50% formamide/2× SSC, followed by equilibration into TRIS-buffered saline-0.1% Tween-20 (TBST). Embryos were blocked for 4 h in 10% heat-inactivated lamb serum in 1× TBST, followed by an overnight incubation in a 1:2,000 dilution of alkaline phosphatase (AP)-conjugated anti-digoxigenin antibody (Roche Applied Science) in 1% lamb serum/1× TBST. The embryos were then washed for nearly 24 h with 1× TBST, with frequent changes, and then probes were detected using a premixed NBT/BCIP solution in an AP buffer following the manufacturer’s instructions (Roche Applied Science). Equipments and settings. For embryonic in situ hybridization screen, each embryo was photographed using an MZ FLIII stereo-dissecting microscope (Leica Microsystems) equipped with a SPOT RT digital camera and SPOT RT image acquisition software (Diagnostic Instruments).
doi:10.1038/nbt.1644
For the follow-up characterization of the Clec1b mutant line, all transmitted light images were digitally captured as tiffs using a Zeiss Discovery.V12 microscope and AxioCam HRc color camera. Tiffs of fluorescent images were captured using a Zeiss Axioskop 2 plus microscope, X-Cite series lamp, and AxioCam HRc camera. All images were captured using Axiovision (release 4.5) software interface with standard settings. White balance was modified using the levels function in Adobe Photoshop CS3, and fluorescent overlays were generated by placing the appropriate images into the red, green and blue channels. Material availability. The 472 knockout mouse lines described in our study along with all the corresponding allele information are available to investigators at nonprofit institutions. All strains are archived at the University of California Davis (UCD) Mutant Mouse Regional Resource Center (MMRRC) at http://mmrrc.mousebiology.org/ for distribution as frozen germplasm (embryos and/or sperm) and ES cells, or as live mice recovered from frozen formats. Orders can be placed from the online catalog at http://www.mmrrc. org/catalog/StrainCatalogSearchForm.jsp. In addition, all phenotyping data and information on available alleles are freely viewable, searchable, and downloadable from UCD-MMRRC at http://mmrrc.mousebiology.org/phenotype/. The UCD-MMRRC is part of the NIH/NCRR sponsored MMRRC National Consortium (http://www.mmrrc.org/). 31. Zambrowicz, B.P., Holt, K.H., Walke, D.W., Kirkpatrick, L.L. & Eberhart, D.E. Generation of transgenic animals. in Target Validation in Drug Discovery (eds. Metcalf, B.W. & Dillon, S.) 3–26, (Academic Press, Burlington, Massachusetts, USA, 2007). 32. Friddle, C.J. et al. High-throughput mouse knockouts provide a functional analysis of the genome. Cold Spring Harb. Symp. Quant. Biol. 68, 311–315 (2003). 33. Pogorelov, V.M., Baker, K.B., Malbari, M.M., Lanthorn, T.H. & Savelieva, K.V. A standardized behavioral test battery to identify and validate targets for neuropsychiatric diseases and pain. in Experimental Animal Models in Neurobehavioral Research (eds. Kalueff, A.V. & LaPorte, J.L.) 17–45 (Laboratory of Clinical Science, Nat. Inst. of Mental Health, Bethesda, Maryland, USA, 2008). 34. Brommage, R. et al. High-throughput screening of mouse knockout lines identifies true lean and obese phenotypes. Obesity (Silver Spring) 16, 2362–2367 (2008). 35. Zambrowicz, B.P. et al. Wnk1 kinase deficiency lowers blood pressure in mice: a gene-trap screen to identify potential targets for therapeutic intervention. Proc. Natl. Acad. Sci. USA 100, 14109–14114 (2003). 36. Abuin, A., Hansen, G.M. & Zambrowicz, B. Gene trap mutagenesis. in Conditional Mutagenesis: An Approach to Disease Models (eds. Feil, R. & Metzger, D.) 129–147, (Springer, 2007). 37. Wattler, S., Kelly, M. & Nehls, M. Construction of gene targeting vectors from lambda KOS genomic libraries. Biotechniques 26, 1150–1156, 1158, 1160 (1999).
nature biotechnology
c o r r i g e n d a & e r r ata
Corrigendum: Safety signal dampens reception for mipomersen antisense Jim Kling Nat. Biotechnol. 28, 295–297 (2010); published online 8 April 2010; corrected after print 9 July 2010 In the version of this article initially published, some of the oligos in Table 1 are described as phosphorothioate modified. In fact, all antisense oligonucleotides are phosphorothioate-modified oligos. In addition, Lucanix, which is not an antisense oligo, has been removed from the table. The error has been corrected in the HTML and PDF versions of the article.
Corrigendum: Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs
© 2010 Nature America, Inc. All rights reserved.
Mitchell Guttman, Manuel Garber, Joshua Z Levin, Julie Donaghey, James Robinson, Xian Adiconis, Lin Fan, Magdalena J Koziol, Andreas Gnirke, Chad Nusbaum, John L Rinn, Eric S Lander & Aviv Regev Nat. Biotechnol. 28, 503–510 (2010); published online 02 May 2010; corrected after print 9 July 2010 In the version of this article initially published, the fourth sentence in the Online Methods section “RNA extraction and library preparation,” that read in part “procedure that combines a random priming step with a shearing step8,9,28 and results in fragments of ~700 bp in size,” should have read, “procedure that combines fragmentation of mRNA to a peak size of ~750 nucleotides by heating6 followed by random-primed reverse transcription8.” The error has been corrected in the HTML and PDF versions of the article.
Erratum: US biodefense contracts continue to lure biotechs Catherine Shaffer Nat. Biotechnol. 28, 187–188 (2010); published online 8 March 2010; corrected after print DD Month 9 July 2010 In the version of this article initially published, in Table 1, the Emergent BioSolutions’ anthrax countermeasures in development listed AV-7909 as being in phase 2 under a $447.6 million BARDA contract; AV-7909 is in phase 1 and the BARDA contract is for $29.7 million. AIGIV is in phase 1/3, not phase 1/2. Finally, a third product was omitted; anthrax monoclonal is in preclinical testing under a $24 million BARDA contract. The $447.6 million BARDA contract was for procurement and product enhancements on BioThrax. Also, on p.188, column 2, line 7, the vaccine requires five injections, not six as originally stated. The error has been corrected in the HTML and PDF versions of the article.
Erratum: Single base–resolution methylome of the silkworm reveals a sparse epigenomic map Hui Xiang, Jingde Zhu, Quan Chen, Fangyin Dai, Xin Li, Muwang Li, Hongyu Zhang, Guojie Zhang, Dong Li, Yang Dong, Li Zhao, Ying Lin, Daojun Cheng, Jian Yu, Jinfeng Sun, Xiaoyu Zhou, Kelong Ma, Yinghua He, Yangxing Zhao, Shicheng Guo, Mingzhi Ye, Guangwu Guo, Yingrui Li, Ruiqiang Li, Xiuqing Zhang, Lijia Ma, Karsten Kristiansen, Qiuhong Guo, Jianhao Jiang, Stephan Beck, Qingyou Xia, Wen Wang & Jun Wang Nat. Biotechnol. 28, 516–520 (2010); published online 02 May 2010; corrected after print 9 July 2010 In the version of this article initially published, references 4 and 7 were inadvertently interchanged. The error has been corrected in the HTML and PDF versions of the article.
Erratum: Up for grabs Michael Eisenstein Nat. Biotechnol. 28, 544–546 (2010); published online 7 June 2010; corrected after print 9 July 2010 In the version of the article originally published, it was stated that the Cohen-Boyer patents generated hundreds of billions of dollars in licensing revenue. It should have read hundreds of millions of dollars. The error has been corrected in the HMTL and PDF versions of the article.
756
volume 28 number 7 JULY 2010 nature biotechnology
careers and recruitment
Advancing the careers of life science professionals of Indian origin Jagath R Junutula, Praveena Raman, Darshana Patel, Holly Butler & Anula Jayasuriya
© 2010 Nature America, Inc. All rights reserved.
Indian-American life scientists can advance their careers by networking, receiving help from mentors and pursuing collaborations in academia, industry and the nonprofit sector.
R
ecent studies have shown that diversity of thought and the use of a range of approaches are crucial to innovation. Consequently, top universities and businesses are altering how they select employees and learning to embrace diverse thinking. This has led to an important trend in the last few decades: the US workforce deployed in the science and technology sectors has become more ethnically diversified, and today about 14% are Asian-Americans1. Americans of Indian origin constitute the third-largest subset of Asian-Americans2, and their educational qualification levels are among the highest of all ethnic groups in the United States. Over 65% of Indian-Americans have a bachelor’s or higher degree, compared to 28% of all Americans, and nearly 40%—five times the national figure—have a master’s, doctoral or other professional degree2. This has allowed for a growing presence of people of Indian origin in various roles throughout the life sciences in the United States, which in turn has resulted in new alliances between the US life science industry and its emerging Indian counterpart. But despite this significant progress, many hurdles remain for scientists of Indian origin trying to advance their careers in the US life science sector, mostly stemming from marked differences in culture and family values. Commonly recognized obstacles include a tendency to understate one’s contributions risk aversion and fear of failure. First-generation life science professionals might grapple with additional issues, such
Jagath R. Junutula, Praveena Raman, Darshana Patel, Holly Butler and Anula Jayasuriya are volunteers at EPPIC Global, Los Altos, California, USA. http://www.eppicglobal.org e-mail: [email protected]
At regular networking events and conferences, EPPIC helps life science professionals of Indian origin advance their careers.
as immigration and visa requirements; the lack of US academic mentors for those who obtained their PhDs abroad; communication gaps with US colleagues owing to barriers of language, culture and communication style; lack of awareness in the United States about the expertise and academic standards of Indian universities and life science research institutions; an Indian educational system that focuses more on rote learning than on problem solving; and the need for those who hold non-US PhDs to establish their scientific credibility through extensive postdoctoral training. There are three key elements to breaking down barriers and advancing careers: mentorship, networking and collaboration. Professional relationships of these three forms are valuable whether you come from India, China or small-town America.
nature biotechnology volume 28 number 7 july 2010
Mentors can open doors, give priceless insights into cultural nuances and provide feedback on the effectiveness of presentations and other communication. Networking builds connections that will facilitate career moves and scientific progress. Collaboration allows scientists to celebrate their ‘sameness’, share scientific passions and transcend cultural differences; it is the common language, the bond that provides support as researchers navigate their careers. Filling a need EPPIC (formerly known as “Enterprising Pharmaceutical Professionals from the Indian sub-Continent) is a nonprofit organization founded 12 years ago in the San Francisco Bay area to address many of the barriers that face Indian life science professionals. EPPIC’s mission is to advance the 757
careers and recruitment ceilings’, ‘bamboo ceilings’ and ‘silicon ceilings’ and try to identify the barriers and challenges faced by these immigrant ethnic communities3–5. In 2006, Roli Varma conducted a study on India’s ‘techno-immigrants’ working in the United States6. A study with specific emphasis on the IndianAmerican life science community does not exist but would be extremely valuable. There is no organization better suited than EPPIC to conduct such a study. Therefore, EPPIC strongly urges all Indian-American life science professionals to participate in our survey (http://www.surveymonkey.com/s/ EPPIC_Survey). The survey results will be made available in late 2010 and will form the basis for further targeted research and strategies aimed at improving career trajectories
and promoting success in the life science community of Indian origin. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. 1. National Science Board. Science and Engineering Indicators 2008. National Science Foundation (2008). 2. Reeves, T.J. & Bennett, C.E. We the People: Asians in the United States (Census 2000 Special Reports; December 2004) <www.census.gov/prod/2004pubs/ censr-17.pdf> (accessed 22 June 2010). 3. Ruttiman, J. Breaking through the “bamboo ceiling” for Asian American scientists. Science Careers (29 May 2009). 4. Mervis, J. A glass ceiling for Asian scientists? Science Careers (28 October 2005). 5. Gimm, G. Shattering the glass ceiling: an interview with journalist Peter Hong. Yisei (Spring 1992). 6. Varma, R. Harbingers of Global Change: India’s TechnoImmigrants in the United States (Lexington, Lanham, Maryland, USA, 2006).
© 2010 Nature America, Inc. All rights reserved.
careers of life science professionals in the Indian-American community by promoting networking, collaboration and mentoring. The group also fosters US-India life science synergies and provides a resource for industry and academia. The vibrant and growing community at EPPIC includes scientists, inventors, entrepreneurs, managers, executives, specialized service providers, consultants and investors. In the past 12 years, EPPIC has organized over 40 quarterly networking events and four annual conferences to promote the success of the IndianAmerican life science community. Over the last 10 years there have been several studies and articles focused on the careers of immigrant Asian scientists. These articles discuss the effects of so-called ‘glass
758
volume 28 number 7 july 2010 nature biotechnology
© 2010 Nature America, Inc. All rights reserved.
people
PolyTherics (London) has announced the appointment of Ken Cunningham (left) as chairman of the company’s board of directors. Mike Hayes, the current nonexecutive chairman, will continue as a nonexecutive director. Cunningham has over 20 years’ experience in the pharmaceutical industry. He is currently CEO of SkyePharma and previously served as CEO of Arakis, vice president, European Affairs at Alza and vice president, clinical development at Sequus. He is also a nonexecutive director of Xention. PolyTherics CEO Keith Powell says, “We are delighted that Ken has agreed to join PolyTherics as its chairman and we are sure that his broad experience of development and business will help successfully steer the company to the next stage of its evolution. I would like to thank Mike for his dedication to the company and am pleased that we will continue to benefit from his invaluable insights.”
Genmab (Copenhagen) has announced that its cofounder, Lisa N. Drakeman, has retired from her position as CEO and member of the board of directors of the company. She is succeeded by Jan G.J. van de Winkel, Genmab’s former president of R&D and CSO. Under Drakeman’s leadership, Genmab raised over $1 billion in capital, completed the largest initial public offering of any biotech company in Europe, received the annual James D. Watson Helix Award as the best international biotech company in 2005 and received regulatory approval in the United States and Europe for its chronic lymphocytic leukemia treatment, Arzerra. Myriad Genetics (Salt Lake City) has elected Heiner Dreismann to its board of directors. Dreismann has more than 24 years of experience in the healthcare industry, including stints as president and CEO of Roche Molecular Systems and head of global business development of Roche Diagnostics. Benitec (Melbourne) has announced the appointment of Peter French as CEO, replacing Sue MacLeman. French has been Benitec’s CSO since August 2009. StemCells (Palo Alto, CA, USA) has named R. Scott Greer to its board of directors. Greer has more than 25 years of life sciences and financial services industry experience and is currently a principal and managing director of Numenor Ventures, which he founded in 2002, and chairman of Acologix. He also serves on the boards of Nektar Therapeutics and BAROnova. He was previously a founder, 760
CEO and chairman of Abgenix, and senior vice president, corporate development and CFO of Cell Genesys. Genzyme (Cambridge, MA, USA) agreed to settle its proxy contest with activist investor Carl C. Icahn and his affiliated private investment funds. Under the agreement, the Icahn funds withdrew its slate of four nominees for Genzyme’s board and voted in favor of the company’s board slate. In return, Genzyme appointed two of Icahn’s nominees, Steven Burakoff and Eric Ende, to serve as directors. Burakoff is professor of medicine, hematology and medical oncology at the Mount Sinai School of Medicine and director of the Tisch Cancer Institute at the Mount Sinai Medical Center. Ende is a former biotech analyst with Merrill Lynch. In addition, Dennis M. Fenton, a 25-year veteran of Amgen, was also appointed as a director. Fenton was executive vice president of operations when he retired from Amgen in 2008. The new additions bring Genzyme’s board from 10 to 13 members. Erwan Martin has been named CFO and Emmanuel Conseiller has been appointed vice president, R&D of Genomic Vision (Paris). Before joining the company, Martin served as CFO and an executive board member at Cytomics Pharmaceuticals. Conseiller joined Genomic Vision in October 2009 after serving as a senior manager at Sanofi-Aventis from 1992 to 2009. Lukas Utiger, previously COO of Lonza Life Science Ingredients (Basel), has been
named as COO of Lonza Bioscience, located in Walkersville, Maryland, USA. He replaces Anja Fiedler who resigned due to personal health reasons. Stefan Borgas, CEO of Lonza Life Science Ingredients, will take over Utiger’s responsibilities until a successor is named. Following his recent appointment as CEO of Ark Therapeutics Group (London), Martyn Williams has stepped down from the roles of CFO and company secretary. Succeeding him are David Bowyer as CFO and Edward Bliss as secretary. Bowyer joined Ark in 2004 and was promoted to group financial controller in 2008. Bliss joined Ark from law firm Covington & Burling in 2005 and also holds the post of general counsel. Gentris (Morrisville, NC, USA) has named Rick Williams as CEO and a member of the board of directors. Working for the past three years at The Hamner Institutes, Williams set up international academic partnerships, a business accelerator and the Hamner-China Biosciences Center. Additionally, he helped to establish the Hamner-University of North Carolina Institute for Drug Safety Sciences as well as the Drug Discovery Center of Innovation, a virtual drug discovery network funded by the North Carolina Biotechnology Center. Jack L. Wyszomierski has been elected to the board of directors of Athersys (Cleveland). From 2004 to 2009, Wyszomierski served as executive vice president and CFO of VWR International. From 1982 to 2004, he held positions of increasing responsibility at Schering-Plough, culminating with his appointment as executive vice president and CFO in 1996. Two Athersys directors, Jordan S. Davis and William C. Mulligan, did not stand for re-election. Privately held Cyntellect (San Diego) has appointed Saiid Zarrabian as president and CEO, replacing Fred Koller, who has assumed the position of chief technology officer. Zarrabian has experience spanning the biotech, pharma and chemical sectors, previously serving as president and COO of Synomyx, COO of Pharmacopeia and president and COO of Molecular Simulations. He currently serves on the boards of Ambit Biosciences and eMolecules.
volume 28 number 7 JULY 2010 nature biotechnology