BIOMARKERS IN DRUG DEVELOPMENT A Handbook of Practice, Application, and Strategy Edited by MICHAEL R. BLEAVINS, Ph.D., ...
195 downloads
1544 Views
6MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
BIOMARKERS IN DRUG DEVELOPMENT A Handbook of Practice, Application, and Strategy Edited by MICHAEL R. BLEAVINS, Ph.D., DABT Michigan Technology and Research Institute Ann Arbor, Michigan
CLAUDIO CARINI, M.D., Ph.D., FRCPath Fresenius Biotech of North America Waltham, Massachusetts
MALLÉ JURIMA-ROMET, Ph.D. MDS Pharma Services Montreal, Quebec, Canada
RAMIN RAHBARI, M.S. Innovative Scientific Management New York, New York
A JOHN WILEY & SONS, INC., PUBLICATION
BIOMARKERS IN DRUG DEVELOPMENT
BIOMARKERS IN DRUG DEVELOPMENT A Handbook of Practice, Application, and Strategy Edited by MICHAEL R. BLEAVINS, Ph.D., DABT Michigan Technology and Research Institute Ann Arbor, Michigan
CLAUDIO CARINI, M.D., Ph.D., FRCPath Fresenius Biotech of North America Waltham, Massachusetts
MALLÉ JURIMA-ROMET, Ph.D. MDS Pharma Services Montreal, Quebec, Canada
RAMIN RAHBARI, M.S. Innovative Scientific Management New York, New York
A JOHN WILEY & SONS, INC., PUBLICATION
Copyright 2010 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/ permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: Biomarkers in drug development : a handbook of practice, application, and strategy / [edited by] Michael R. Bleavins . . . [et al.]. p. ; cm. Includes index. ISBN 978-0-470-16927-8 (cloth) 1. Biochemical markers 2. Drug development. I. Bleavins, Michael R. [DNLM: 1. Biomarkers, Pharmacological. 2. Drug Design. 3. Drug Discovery. QV 744 B6154 2009] R853.B54B5645 2009 615'.10724—dc22 2009021627 Printed in the United States of America 10
9
8
7
6
5
4
3
2
1
CONTENTS
CONTRIBUTORS
ix
PREFACE
xv
PART I
1
BIOMARKERS AND THEIR ROLE IN DRUG DEVELOPMENT
Biomarkers Are Not New
1 3
Ian Dews
2
Biomarkers: Facing the Challenges at the Crossroads of Research and Health Care
15
Gregory J. Downing
3
Enabling Go/No Go Decisions
31
J. Fred Pritchard and Mallé Jurima-Romet
PART II
4
IDENTIFYING NEW BIOMARKERS: TECHNOLOGY APPROACHES
Imaging as a Localized Biomarker: Opportunities and Challenges
41 43
Jonathan B. Moody, Philip S. Murphy, and Edward P. Ficaro
5
Protein Biomarker Discovery Using Mass Spectrometry– Based Proteomics
101
Joanna M. Hunter and Daniel Chelsky
6
Quantitative Multiplexed Patterning of ImmuneRelated Biomarkers
121
Dominic Eisinger, Ralph McDade, and Thomas Joos v
vi CONTENTS
7
Gene Expression Profiles as Preclinical and Clinical Cancer Biomarkers of Prognosis, Drug Response, and Drug Toxicity
135
Jason A. Sprowl and Amadeo M. Parissenti
8
Use of High-Throughput Proteomic Arrays for the Discovery of Disease-Associated Molecules
155
Douglas M. Molina, W. John W. Morrow, and Xiaowu Liang
PART III 9
CHARACTERIZATION AND VALIDATION
Characterization and Validation Biomarkers in Drug Development: Regulatory Perspective
177
179
Federico Goodsaid
10
Fit-for-Purpose Method Validation and Assays for Biomarker Characterization to Support Drug Development
187
Jean W. Lee, Yuling Wu, and Jin Wang
11
Molecular Biomarkers from a Diagnostic Perspective
215
Klaus Lindpaintner
12
Strategies for the Co-Development of Drugs and Diagnostics: FDA Perspective on Diagnostics Regulation
231
Francis Kalush and Steven Gutman
13
Importance of Statistics in the Qualification and Application of Biomarkers
247
Mary Zacour
PART IV
14
BIOMARKERS IN DISCOVERY AND PRECLINICAL SAFETY
Qualification of Safety Biomarkers for Application to Early Drug Development
287
289
William B. Mattes and Frank D. Sistare
15
Development of Serum Calcium and Phosphorus as Clinical Biomarkers for Drug-Induced Systemic Mineralization: Case Study with a MEK Inhibitor
301
Alan P. Brown
16 Biomarkers for the Immunogenicity of Therapeutic Proteins and Its Clinical Consequences
323
Claire Cornips and Huub Schellekens
17
New Markers of Kidney Injury Sven A. Beushausen
335
CONTENTS
vii
PART V TRANSLATING FROM PRECLINICAL RESULTS TO CLINICAL AND BACK
359
18 Translational Medicine—A Paradigm Shift in Modern Drug Discovery and Development: The Role of Biomarkers
361
Giora Z. Feuerstein, Salvatore Alesci, Frank L. Walsh, J. Lynn Rutkowski, and Robert R. Ruffolo, Jr.
19
Clinical Validation and Biomarker Translation
375
David Lin, Andreas Scherer, Raymond Ng, Robert Balshaw, Shawna Flynn, Paul Keown, Robert McMaster, and Bruce McManus
20
Predicting and Assessing an Inflammatory Disease and Its Complications: Example from Rheumatoid Arthritis
399
Christina Trollmo and Lars Klareskog
21
Pharmacokinetic and Pharmacodynamic Biomarker Correlations
413
J.F. Marier and Keith Gallicano
22 Validating In Vitro Toxicity Biomarkers Against Clinical Endpoints
433
Calvert Louden and Ruth A. Roberts
PART VI 23
BIOMARKERS IN CLINICAL TRIALS
Opportunities and Pitfalls Associated with Early Utilization of Biomarkers: Case Study in Anticoagulant Development
443
445
Kay A. Criswell
24
Integrating Molecular Testing Into Clinical Applications
463
Anthony A. Killeen
25 Biomarkers for Lysosomal Storage Disorders
475
Ari Zimran, Candida Fratazzi, and Deborah Elstein
26 Value Chain in the Development of Biomarkers for Disease Targets
485
Charles W. Richard, III, Arthur O. Tzianabos, and Whaijen Soo
PART VII
27
LESSONS LEARNED: PRACTICAL ASPECTS OF BIOMARKER IMPLEMENTATION
493
Biomarkers in Pharmaceutical Development: The Essential Role of Project Management and Teamwork
495
Lena King, Mallé Jurima-Romet, and Nita Ichhpurani
viii
CONTENTS
28
Integrating Academic Laboratories Into Pharmaceutical Development
515
Peter A. Ward and Kent J. Johnson
29
Funding Biomarker Research and Development Through the Small Business Innovative Research Program
527
James Varani
30
Novel and Traditional Nonclinical Biomarker Utilization in the Estimation of Pharmaceutical Therapeutic Indices
541
Bruce D. Car, Brian Gemzik, and William R. Foster
31 Anti-Unicorn Principle: Appropriate Biomarkers Don’t Need to Be Rare or Hard to Find
551
Michael R. Bleavins and Ramin Rahbari
32
Biomarker Patent Strategies: Opportunities and Risks
565
Cynthia M. Bott and Eric J. Baude
PART VIII WHERE ARE WE HEADING AND WHAT DO WE REALLY NEED?
575
33
577
IT Supporting Biomarker-Enabled Drug Development Michael Hehenberger
34
Redefining Disease and Pharmaceutical Targets Through Molecular Definitions and Personalized Medicine
593
Craig P. Webb, John F. Thompson, and Bruce H. Littman
35 Ethics of Biomarkers: The Borders of Investigative Research, Informed Consent, and Patient Protection
625
Heather Walmsley, Michael Burgess, Jacquelyn Brinkman, Richard Hegele, Janet Wilson-McManus, and Bruce McManus
36
Pathodynamics: Improving Biomarker Selection by Getting More Information from Changes Over Time
643
Donald C. Trost
37 Optimizing the Use of Biomarkers for Drug Development: A Clinician’s Perspective
693
Alberto Gimona
38
Nanotechnology-Based Biomarker Detection
709
Joshua Reineke
INDEX
731
CONTRIBUTORS
Salvatore Alesci, M.D., Ph.D., Wyeth Research, Collegeville, Pennsylvania Robert Balshaw, Ph.D., Syreon Corporation, Vancouver, British Columbia, Canada Eric J. Baude, Ph.D., Brinks Hofer Gilson & Lione, P.C., Ann Arbor, Michigan Sven A. Beushausen, Ph.D., Pfizer Global Research and Development, Chesterfield, Missouri Michael R. Bleavins, Ph.D., DABT, Michigan Technology and Research Institute, Ann Arbor, Michigan Cynthia M. Bott, Ph.D., Honigman Miller Schwartz and Cohn LLP, Ann Arbor, Michigan Jacquelyn Brinkman, M.Sc., University of British Columbia, Vancouver, British Columbia, Canada Alan P. Brown, Ph.D., DABT, Pfizer Global Research and Development, Ann Arbor, Michigan Michael Burgess, Ph.D., University of British Columbia, Vancouver, British Columbia, Canada Bruce D. Car, B.V.Sc., Ph.D., Bristol-Myers Squibb Co., Princeton, New Jersey Daniel Chelsky, Ph.D., Caprion Proteomics, Inc., Montreal, Quebec, Canada Claire Cornips, B.Sc., Utrecht University, Utrecht, The Netherlands Kay A. Criswell, Ph.D., Pfizer Global Research and Development, Groton, Connecticut Ian Dews, MRCP, FFPM, Envestia Ltd., Thame, Oxfordshire, UK ix
x
CONTRIBUTORS
Gregory J. Downing, D.O., Ph.D., U.S. Department of Health and Human Services, Washington, DC Dominic Eisinger, Ph.D., Rules Based Medicine, Inc., Austin, Texas Deborah Elstein, Ph.D., Gaucher Clinic, Shaare Zedek Medical Center, Jerusalem, Israel Giora Z. Feuerstein, M.D., Wyeth Research, Collegeville, Pennsylvania Edward P. Ficaro, Ph.D., INVIA Medical Imaging Solutions, Ann Arbor, Michigan Shawna Flynn, B.Sc., Syreon Corporation, Vancouver, British Columbia, Canada William R. Foster, Ph.D., Bristol-Myers Squibb Co., Princeton, New Jersey Candida Fratazzi, Massachusetts
M.D., Altus
Pharmaceuticals,
Inc.,
Waltham,
Keith Gallicano, Ph.D., Watson Laboratories, Corona, California Brian Gemzik, Ph.D., Bristol-Myers Squibb Co., Princeton, New Jersey Alberto Gimona, M.D., Merck Serono International, Geneva, Switzerland Federico Goodsaid, Ph.D., Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland Steven Gutman, M.D., M.B.A., University of Central Florida, Orlando, Florida; formerly with the U.S. Food and Drug Administration, Rockville, Maryland Richard Hegele, M.D., Ph.D., University of British Columbia, Vancouver, British Columbia, Canada Michael Hehenberger, Ph.D., IBM Healthcare & Life Sciences, Somers, New York Joanna M. Hunter, Ph.D., Caprion Proteomics, Inc., Montreal, Quebec, Canada Nita Ichhpurani, B.A., PMP, MDS Pharma Services, Mississauga, Ontario, Canada Kent J. Johnson, M.D., The University of Michigan Medical School, Ann Arbor, Michigan Thomas Joos, Ph.D., Rules Based Medicine, Inc., Austin, Texas Mallé Jurima-Romet, Ph.D., MDS Pharma Services, Montreal, Quebec, Canada
CONTRIBUTORS
xi
Francis Kalush, Ph.D., U.S. Food and Drug Administration, Silver Spring, Maryland; formerly with USFDA, Rockville, Maryland Paul Keown, M.D., D.Sc., MBA, University of British Columbia, Vancouver, British Columbia, Canada Anthony A. Killeen, M.D., Ph.D., University of Minnesota, Minneapolis, Minnesota Lena King, Ph.D., DABT, CanBioPharma Consulting, Inc., Guelph, Ontario, Canada Lars Klareskog, M.D., Ph.D., Karolinska Institute, Stockholm, Sweden Jean W. Lee, Ph.D., Amgen, Inc., Thousand Oaks, California Xiaowu Liang, Ph.D., Antigen Discovery, Inc., Irvine, California David Lin, B.MLSc., University of British Columbia, Vancouver, British Columbia, Canada Klaus Lindpaintner, M.D., M.P.H., F. Hoffmann–La Roche AG, Basel, Switzerland Bruce H. Littman, M.D., Translational Medicine Associates, Stonington, Connecticut Calvert Louden, Ph.D., Johnson & Johnson Pharmaceuticals, Raritan, New Jersey J.F. Marier, Ph.D., FCP, Pharsight, A Certara Company, Montreal, Quebec, Canada William B. Mattes, Ph.D., DABT, The Critical Path Institute, Rockville, Maryland Ralph McDade, Ph.D., Rules Based Medicine, Inc., Austin, Texas Bruce McManus, M.D., Ph.D., University of British Columbia, Vancouver, British Columbia, Canada Robert McMaster, D.Phil., University of British Columbia, Vancouver, British Columbia, Canada Douglas M. Molina, Ph.D., Antigen Discovery, Inc., Irvine, California Jonathan B. Moody, Ph.D., INVIA Medical Imaging Solutions, Ann Arbor, Michigan W. John W. Morrow, Ph.D., Antigen Discovery, Inc., Irvine, California Philip S. Murphy, Ph.D., GlaxoSmithKline Research and Development, Uxbridge, Middlesex, UK
xii
CONTRIBUTORS
Raymond Ng, Ph.D., University of British Columbia, Vancouver, British Columbia, Canada Amadeo M. Parissenti, Ph.D., Laurentian University, Sudbury, Ontario, Canada J. Fred Pritchard, Ph.D., MDS Pharma Services, Raleigh, North Carolina Ramin Rahbari, M.S., Innovative Scientific Management, New York, New York Joshua Reineke, Ph.D., Wayne State University, Detroit, Michigan Charles W. Richard, III, M.D., Ph.D., Shire Human Genetic Therapies, Cambridge, Massachusetts Ruth A. Roberts, Macclesfield, UK
Ph.D., AstraZeneca
Research
and
Development,
Robert R. Ruffolo, Jr., Ph.D., Wyeth Research, Collegeville, Pennsylvania J. Lynn Rutkowski, Ph.D., Wyeth Research, Collegeville, Pennsylvania Huub Schellekens, M.D., Utrecht University, Utrecht, The Netherlands Andreas Scherer, Ph.D., Spheromics, Kontiolahti, Finland Frank D. Sistare, Ph.D., Merck Research Laboratories, West Point, Pennsylvania Whaijen Soo, M.D., Ph.D., Shire Human Genetic Therapies, Cambridge, Massachusetts Jason A. Sprowl, Ph.D., Laurentian University, Sudbury, Ontario, Canada John F. Thompson, M.D., Helicos BioSciences, Cambridge, Massachusetts Christina Trollmo, Ph.D., Karolinska Institute and Roche AB Sweden, Stockholm, Sweden Donald C. Trost, M.D., Ph.D., Analytic Dynamics, Niantic, Connecticut Arthur O. Tzianabos, Ph.D., Shire Human Genetic Therapies, Cambridge, Massachusetts James Varani, Ph.D., The University of Michigan Medical School, Ann Arbor, Michigan Heather Walmsley, M.A., Lancaster University, Bailrigg, Lancaster, UK Frank L. Walsh, Ph.D., Wyeth Research, Collegeville, Pennsylvania Jin Wang, M.S., Amgen, Inc., Thousand Oaks, California Peter A. Ward, M.D., The University of Michigan Medical School, Ann Arbor, Michigan
CONTRIBUTORS
xiii
Craig P. Webb, Ph.D., Van Andel Research Institute, Grand Rapids, Michigan Janet Wilson-McManus, M.T., B.Sc., University of British Columbia, Vancouver, British Columbia, Canada Yuling Wu, Ph.D., Amgen, Inc., Thousand Oaks, California Mary Zacour, Ph.D., BioZac Consulting, Montreal, Quebec, Canada Ari Zimran, M.D., Gaucher Clinic, Shaare Zedek Medical Center, Jerusalem, Israel
PREFACE
The impact of biomarker technologies and strategies in pharmaceutical development is still emerging but is already proving to be significant. Biomarker strategy forms the basis for translational medicine and for the current industry and regulatory focus to improve success rates in drug development. The pharmaceutical industry faces greater challenges today than at any time in its history: an ever-increasing expectation of safer, more efficacious, and better understood drugs in the face of escalating costs of drug development and increasing duration of clinical development times; high rates of compound failure in phase II and III clinical trials; remaining blockbuster drugs coming off patent; and many novel but unproven targets emerging from discovery. These factors have pressured pharmaceutical research divisions to look for ways to reduce development costs, make better decisions earlier, reassess traditional testing strategies, and implement new technologies to improve the drug discovery and development processes. There is consensus that biomarkers are valuable drug development tools that enhance target validation, thereby helping us better understand the mechanisms of action and enabling earlier identification of compounds with the highest potential for efficacy in humans. These important methods are also essential for eliminating compounds with unacceptable safety risks, enabling the concept of “fail fast, fail early,” and providing more accurate or complete information regarding drug performance and disease progression. At the same time that pharmaceutical scientists are focusing on biomarkers in drug discovery and development, clinical investigators and health care practitioners are using biomarkers increasingly in medical decision making and diagnosis. Similarly, regulatory agencies have recognized the value of biomarkers to guide regulatory decision making about drug safety and efficacy. The magnitude and seriousness of the U.S. Food and Drug Administration (FDA) commitment to biomarkers is reflected in its Critical Path initiative. In recent years, several pharmacogenomic tests have been incorporated into product labels and implemented in clinical practice to improve the risk–benefit ratio xv
xvi PREFACE
for patients receiving certain drug therapies (e.g., 6-mercaptopurine, irinotecan, warfarin). Agencies such as the FDA and European Medicines Association have taken a leadership role in encouraging biomarker innovation in the industry and collaboration to identify, evaluate, and qualify novel biomarkers. Moreover, a biomarker strategy facilitates the choice of a critical path to differentiate products in a competitive marketplace. In recent years, the topic of biomarkers has been featured at many specialized scientific meetings and has received extensive media coverage. We, the coeditors, felt that a book that approached the topic with an emphasis on the practical aspects of biomarker identification and use, as well as their strategic implementation, was missing and essential to improve the application of these approaches. We each have experience working with biomarkers in drug development, but we recognized that the specialized knowledge of a diverse group of experts was necessary to create the type of comprehensive book that is needed. Therefore, contributions were invited from authors who are renowned experts in their respective fields. The contributors include scientists from academia, research hospitals, biotechnology and pharmaceutical companies, contract research organizations and consulting firms, and the FDA. The result is a book that we believe will appeal broadly to pharmaceutical research scientists, clinical and academic investigators, regulatory scientists, managers, students, and all other professionals engaged in drug development who are interested in furthering their knowledge of biomarkers. As discussed in Part I, biomarkers are not new: They have been used for hundreds of years to help physicians diagnose and treat disease. What is new is a shift from outcome biomarkers to target and mechanistic biomarkers, the availability of “omics,” imaging, and other technologies that allow collection of large amounts of data at the molecular, tissue, and whole-organism levels, and the use of data-rich biomarker information for “translational research,” from the laboratory bench to the clinic and back. Part II chapter is dedicated to highlighting several important technologies that affect drug discovery and development, the conduct of clinical trials, and the treatment of patients. In Part III we’ve invited leaders from industry and regulatory agencies to discuss the qualification of biomarker assays in the fit-for-purpose process, including perspectives on the development of diagnostics. The importance of statistics cannot be overlooked, and this topic is also profiled in this part with a practical overview of concepts, common mistakes, and helpful tips to ensure credible biomarkers that can address their intended uses. Parts IV to VI present information on concepts and examples of utilizing biomarkers in discovery, preclinical safety assessment, clinical trials, and translational medicine. Examples are drawn from a wide range of target-organ toxicities, therapeutic areas, and product types. We hope that by presenting a wide range of biomarker applications, discussed by knowledgeable and experienced scientists, readers will develop an appreciation of the scope and breadth of biomarker knowledge and find examples that will help them in their own work.
PREFACE
xvii
Part VII focuses on “lessons learned” and the practical aspects of implementing biomarkers in drug development programs. Many pharmaceutical companies have created translational research divisions, and increasingly, external partners, including academic and government institutions, contract research organizations, and specialty laboratories, are providing technologies and services to support biomarker programs. This is changing the traditional organizational models within industry and paving the way toward greater collaboration across sectors and even among companies within a competitive industry. Perspectives from contributing authors representing several of these different sectors are presented in this part, as well as a legal perspective on potential intellectual property issues in biomarker development. The book concludes with Part VIII on future trends and developments, including developments in data integration, the reality of personalized medicine, and the addressing of ethical concerns. The field of biomarkers in drug development is evolving rapidly and this book presents a snapshot of some exciting new approaches. By utilizing the book as a source of new knowledge, or to reinforce or integrate existing knowledge, we hope that readers will benefit from a greater understanding and appreciation of the strategy and application of biomarkers in drug development and become more effective decision makers and contributors in their own organizations. Michael R. Bleavins Claudio Carini Mallé Jurima-Romet Ramin Rahbari
PART I BIOMARKERS AND THEIR ROLE IN DRUG DEVELOPMENT
1
1 BIOMARKERS ARE NOT NEW Ian Dews, MRCP, FFPM Envestia Ltd., Thame, Oxfordshire, UK
INTRODUCTION The word biomarker in its medical context is a little over 30 years old, having first been used by Karpetsky, Humphrey, and Levy in the April 1977 edition of the Journal of the National Cancer Institute, where they reported that the “serum RNase level … was not a biomarker either for the presence or extent of the plasma cell tumor.” Few new words can have proved so popular—a recent PubMed search lists more than 370,000 publications that use it! Part of this success can no doubt be attributed to the fact that the word gave a longoverdue name to a phenomenon that has been around at least since the seventh century b.c., when Sushustra, the “father of Ayurvedic surgery,” recorded that the urine of patients with diabetes attracted ants because of its sweetness. However, although the origins of biomarkers are indeed ancient, it is fair to point out that the pace of progress over the first 2500 years was somewhat less than frenetic.
UROSCOPY Because of its easy availability for inspection, urine was for many centuries the focus of attention. The foundation of the “science” of uroscopy is generally attributed to Hippocrates (460–355 b.c.), who hypothesized that urine was a
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
3
4
BIOMARKERS ARE NOT NEW
filtrate of the “humors,” taken from the blood and filtered through the kidneys, a reasonably accurate description. One of his more astute observations was that bubbles on the surface of the urine (now known to be due to proteinuria) were a sign of long-term kidney disease. Galen (a.d. 129–200), the most influential of the ancient Greco-Roman physicians, sought to make uroscopy more specific but in reality added little to the subject beyond the weight of his reputation, which served to hinder further progress in this as in many other areas of medicine. Five hundred years later, Theophilus Protospatharius, another Greek writer, moved things one step nearer to the modern world when he investigated the effects of heating urine and hence established the world’s first medical laboratory test. He discovered that heating urine from patients with symptoms of kidney disease caused cloudiness (in fact, the precipitation of proteins). In the sixteenth century, Paracelsus (1493–1541) in Switzerland used vinegar to bring out the same cloudiness (acid, like heat, will precipitate proteins). Events continued to move both farther north and closer to modernity when in 1695 Frederick Deckers of Leiden in the Netherlands identified this cloudiness as resulting from the presence of albumin. The loop was finally closed when Richard Bright (1789–1858), a physician at Guy’s Hospital in London, made the connection between proteinuria and autopsy findings of abnormal kidneys. The progress from Hippocrates’ bubbles to Bright disease represents the successful side of uroscopy, but other aspects of the subject now strike us as a mixture of common sense and bizarre superstition. The technique of collecting urine was thought to be of paramount importance for accurate interpretation. In the eleventh century, Ismail of Jurjani insisted on a full 24-hour collection in a vessel that was large and clean (very sensible) and shaped like a bladder, so that the urine would not lose its “form” (not at all sensible). His advice to keep the sample out of the sun and away from heat continues, however, to be wise counsel. Gilles de Corbeil (1165–1213), physician to King Philip Augustus of France, recorded differences in sediment and color of urine which he related to 20 different bodily conditions. He also invented the matula, or jorden, a glass vessel through which the color, consistency, and clarity of the sample could be assessed. Shaped like a bladder rounded at the bottom and made of thin clear glass, the matula was to be held up in the right (not the left) hand for careful inspection against the light. De Corbeil taught that different areas of the body were represented by the urine in different parts of the matula. These connections, which became ever more complex, were recorded on uroscopy charts that were published only in Latin, thus ensuring that the knowledge, and its well-rewarded use in treating wealthy patients, was confined to appropriately educated men. To further this education, de Corbeil, in his role as a professor at the Medical School of Salerno, set out his own ideas and those of the ancient Greek and Persian writers in a work called Poem on the Judgment
BLOOD PRESSURE
5
of Urines, which was set to music in order that medical students could memorize it more easily. It remained popular for several centuries.
BLOOD PRESSURE One of the first excursions away from urine in the search for markers of function and disease came in 1555 with the publication of a book called Sphygmicae artis iam mille ducentos annos perditae & desideratae Libri V by a physician from Poznán in Poland named Józef Struś (better known by his Latinized name, Iosephus Struthius). In this 366-page work, Struthius described placing increasing weights on the skin over an artery until the pulse was no longer able to lift the load. The weight needed to achieve this gave a crude measure of what he called “the strength of the pulse” or, as we would call it today, blood pressure. Early attempts at quantitative measurement of blood pressure had to be conducted in animals rather than human subjects because of the invasiveness of the technique. The first recorded success with these techniques dates from 1733, when the Reverend Stephen Hales, a British veterinary surgeon, inserted a brass pipe into a horse’s artery and connected the pipe to a glass tube. Hales observed the blood rising in the tube and concluded not only that the rise was due to the pressure of the blood in the artery but also that the height of the rise was a measure of that pressure. By 1847, experimental technique had progressed to the point where it was feasible to measure blood pressure in humans, albeit still invasively. Carl Ludwig inserted brass cannulas directly into an artery and connected them via further brass pipework to a U-shaped manometer. An ivory float on the water in the manometer was arranged to move a quill against a rotating drum, and the instrument was known as a kymograph (“wave-writer” in Greek). Meanwhile, in 1834, Jules Hérisson had described his sphygmomètre, which consisted of a steel cup containing mercury, covered by a thin membrane, with a calibrated glass tube projecting from it. The membrane was placed over the skin covering an artery and the pressure in the artery could be gauged from the movements of the mercury into the glass tube. Although minor improvements were suggested by a number of authors over the next few years, credit for the invention of the true sphygmomanometer goes to Samuel Siegfried Karl Ritter von Basch, whose original 1881 model used water in both the cuff and the manometer tube. Five years later, Scipione Riva-Rocci introduced an improved version in which an inflatable bag in the cuff was connected to a mercury manometer, but neither of these early machines attracted widespread interest. Only in 1901, when the famous American surgeon Harvey Cushing brought back one of Riva-Rocci’s machines on his return from a trip to Italy, did noninvasive blood pressure measurement really take off.
6
BIOMARKERS ARE NOT NEW
Sphygmomanometers of the late nineteenth century relied on palpation of the pulse and so could only be used to determine systolic blood pressure. Measurement of diastolic pressure only became possible when Nikolai Korotkoff observed in 1905 that characteristic sounds were made by the constriction of the artery at certain points in the inflation and deflation of the cuff. The greater accuracy allowed by auscultation of these Korotkoff sounds opened the way for the massive expansion in blood pressure research that characterized the twentieth century.
IMAGING To physicians keen to understand the hidden secrets of the human body, few ideas can have been more appealing than the dream of looking through the skin to examine the tissues beneath. The means for achieving this did not appear until a little over a century ago, and then very much by accident. On the evening of November 8, 1895, Wilhem Roentgen, a German physicist working at the University of Würzburg, noticed that light was coming from fluorescent material in his laboratory and worked out that this was the result of radiation escaping from a shielded gas discharge tube with which he was working. He was fascinated by the ability of this radiation to pass through apparently opaque materials and promptly set about investigating its properties in more detail. While conducting experiments with different thicknesses of tinfoil, he noticed that that if the rays passed though his hand, they cast a shadow of the bones. Quick to see the potential medical uses for his new discovery, Roentgen immediately wrote a paper entitled “On a new kind of ray: a preliminary communication” for the Würzburg Physical Medical Society, reprints of which he sent to a number of eminent scientists with whom he was friendly. One of these, Franz Exner of Vienna, was the son of the editor of the Vienna Presse, and hence the news was published quickly, first in that paper and then across Europe. Whereas we are inclined to believe that rapid publication is a feature of the Internet age, the Victorians were no slouches in this matter, and by January 24, 1896 a reprint of the Würzburg paper had appeared in the London Electrician, a major journal able to bring details of the new invention to a much wider technical audience. The speed of the response was remarkable. Many physics laboratories already had gas discharge tubes, and within a month physicists in a dozen countries were reproducing Roentgen’s findings. Edwin Frost produced an x-ray image of a patient’s fractured wrist for his physician brother, Gilmon Frost, at Dartmouth College in the United States, while at McGill University in Montreal, John Cox used the new rays to locate a bullet in a gunshot victim’s leg. Similar results were obtained in cities as far apart as Copenhagen, Prague, and Rijeka in Croatia. Inevitably, not everyone was initially quite so impressed; The Lancet of February 1, 1896 expressed considerable surprise that the
IMAGING
7
Belgians had decided to bring x-rays into practical use in hospitals throughout the country! Nevertheless, it was soon clear that a major new diagnostic tool had been presented to the medical world, and there was little surprise when Roentgen received a Nobel Prize in Physics in 1901. Meanwhile, in March 1896, Henri Becquerel, professor of physics at the Muséum National d’Histoire Naturelle in Paris, while investigating Roentgen’s work, wrapped a fluorescent mineral, potassium uranyl sulfate, in photographic plates and black material in preparation for an experiment requiring bright sunlight. However, a period of dull weather intervened, and prior to actually performing the experiment, Becquerel found that the photographic plates were fully exposed. This led him to write: “One must conclude from these experiments that the phosphorescent substance in question emits rays which pass through the opaque paper and reduce silver salts.” Becquerel received a Nobel prize, which he shared with Marie and Pierre Curie, in 1903, but it was to be many years before the use of spontaneous radioactivity reached maturity in medical investigation in such applications as isotope scanning and radioimmunoassay. The use of a fluoroscopic screen on which to view x-ray pictures was implicit in Roentgen’s original discovery and soon became part of the routine equipment not only of hospitals but even of shoe shops, where large numbers of children’s shoe fittings were carried out in the days before the true dangers of radiation were appreciated. However, the greatest value of the real-time viewing approach only emerged following the introduction of electronic image intensifiers by the Philips company in 1955. Within months of the introduction of planar x-rays, physicians were asking for a technique that would demonstrate the body in three dimensions. This challenge was taken up by a number of scientists in different countries, but because of the deeply ingrained habit of reviewing only the national, not the international, literature, these workers remained ignorant of each other’s progress for many years. Carl Mayer, a Polish physician, first suggested the idea of tomography in 1914. André-Edmund-Marie Bocage in France, Gustav Grossmann in Germany, and Allesandro Vallebona in Italy all developed the idea further and built their own equipment. George Ziedses des Plantes in the Netherlands pulled all these strands together in the 1930s and is generally considered the founder of conventional tomography. Further progress had to wait for the development of powerful computers, and it was not until 1972 that Godfrey Hounsfield, an engineer at EMI, designed the first computer-assisted tomographic device, the EMI scanner, installed at Atkinson Morley Hospital, London, an achievement for which he received both a Nobel prize and a knighthood. Parallel with these advances in x-ray imaging were ongoing attempts to make similar use of the spontaneous radioactivity discovered by Becquerel. In 1925, Herrman Blumgart and Otto Yens made the first use of radioactivity as a biomarker when they used bismuth-214 to determine the arm-to-arm
8
BIOMARKERS ARE NOT NEW
circulation time in patients. Sodium-24, the first artificially created biomarker radioisotope, was used by Joseph Hamilton to investigate electrolyte metabolism in 1937. Unlike x-rays, however, radiation from isotopes weak enough to be safe was not powerful enough to create an image merely by letting it fall on a photographic plate. This problem was solved when Hal Anger of the University of California, building on the efficient gamma-ray capture system using large flat crystals of sodium iodide doped with thallium developed by Robert Hofstadter in 1948, constructed the first gamma camera in 1957. The desire for three-dimensional images that led to tomography with x-rays also influenced radioisotope imaging and drove the development of singlephoton-emission computed tomography (SPECT) by David Kuhl and Roy Edwards in 1968. Positron-emission tomography (PET) also builds images by detecting energy given off by decaying radioactive isotopes in the form of positrons that collide with electrons and produce gamma rays that shoot off in nearly opposite directions. The collisions can be located in space by interpreting the paths of the gamma rays, and this information is then converted into a three-dimensional image slice. The first PET camera for human studies was built by Edward Hoffman, Michael Ter-Pogossian, and Michael Phelps in 1973 at Washington University. The first whole-body PET scanner appeared in 1977. Radiation, whether from x-ray tubes or from radioisotopes, came to be recognized as having dangers both for the patient and for personnel operating the equipment, and efforts were made to discover media that would produce images without these dangers. In the late 1940s, George Ludwig, a junior lieutenant at the Naval Medical Research Institute in Bethseda, Maryland, undertook experiments using industrial ultrasonic flaw-detection equipment in an attempt to determine the acoustic impedance of various tissues, including human gallstones surgically implanted into the gallbladders of dogs. His observations were detailed in a 30-page project report to the Naval Medical Research Institute dated June 16, 1949, now considered the first report of its kind on the diagnostic use of ultrasound. However, a substantial portion of Ludwig’s work was considered classified information by the Navy and was not published in medical journals. Civilian research into what became the two biggest areas of early ultrasonic diagnosis—cardiology and obstetrics—began in Sweden and Scotland, respectively, both making use of gadgetry initially designed for shipbuilding. In 1953, Inge Edler, a cardiologist at Lund University collaborated with Carl Hellmuth Hertz, a graduate student in the department of nuclear physics who was familiar with using ultrasonic reflectoscopes for nondestructive materials testing, and together they developed the idea of using this method in medicine. They made the first successful measurement of heart activity on October 29, 1953 using a device borrowed from Kockums, a Malmö shipyard. On December 16 of the same year, the method was used to generate an echo encephalogram. Edler and Hertz published their findings in 1954.
ELECTROCARDIOGRAPHY
9
At around the same time, Ian Donald of the Glasgow Royal Maternity Hospital struck up a relationship with boilermakers Babcock & Wilcox in Renfrew, where he used their industrial ultrasound equipment to conduct experiments assessing the ultrasonic characteristics of various in vitro preparations. With fellow obstetrician John MacVicar and medical physicist Tom Brown, Donald refined the equipment to the point where it could be used successfully on live volunteer patients. These findings were reported in The Lancet on June 7, 1958 as “Investigation of abdominal masses by pulsed ultrasound.” Nuclear magnetic resonance (NMR) in molecules was first described by Isidor Rabi in 1938. His work was followed up eight years later by Felix Bloch and Edward Mills Purcell, who, working independently, noticed that magnetic nuclei such as hydrogen and phosphorus, when placed in a magnetic field of a specific strength, absorb radio-frequency energy, a situation described as being “in resonance.” For the next 20 years NMR found purely physical applications in chemistry and physics, and it was not until 1971 that Raymond Damadian showed that the nuclear magnetic relaxation times of different tissues, especially tumors, differed, thus raising the possibility of using the technique to detect disease. Magnetic resonance imaging (MRI) was first demonstrated on small test tube samples in 1973 by Paul Lauterbur, and in 1975 Richard Ernst proposed using phase and frequency encoding and the Fourier transform, the technique that still forms the basis of MRI. The first commercial nuclear magnetic imaging scanner allowing imaging of the body appeared in 1980 using Ernst’s technique, which allowed a single image to be acquired in approximately 5 minutes. By 1986, the imaging time was reduced to about 5 seconds without sacrificing too much image quality. In the same year, the NMR microscope was developed, which allowed approximately 10-mm resolution on approximately 1-cm samples. In 1993, functional MRI (fMRI) was developed, thus permitting the mapping of function in various regions of the brain.
ELECTROCARDIOGRAPHY Roentgen’s discovery of x-rays grew out of the detailed investigation of electricity that was a core scientific concern of the nineteenth century, and it is little surprise that investigators also took a keen interests in the electricity generated by the human body itself. Foremost among these was Willem Einthoven. Before his day, although it was known that the body produced electrical currents, the technology was inadequate to measure or record them with any sort of accuracy. Starting in 1901, Einthoven, a professor at the University of Leiden, conducted a series of experiments using a string galvanometer. In his device, electric currents picked up from electrodes on the patient’s skin passed through a thin filament running between very
10
BIOMARKERS ARE NOT NEW
strong electromagnets. The interaction of the electric and magnetic fields caused the filament or “string” to move, and this was detected by using a light to cast a shadow of the moving string onto a moving roll of photographic paper. It was not, at first, an easy technique. The apparatus weighed 600 lb, including the water circulation system essential for cooling the electromagnets, and was operated by a team of five technicians. Over the next two decades Einthoven gradually refined his machine and used it to establish the electrocardiographic (ECG) features of many different heart conditions, work that was eventually recognized with a Nobel prize in 1924. As the ECG became a routine part of medical investigations it was realized that a system that gave only a “snapshot” of a few seconds of the heart’s activity could be unhelpful or even misleading in the investigation of intermittent conditions such as arrhythmias. This problem was addressed by Norman Holter, an American biophysicist, who created his first suitcase-sized “ambulatory” monitor as early as 1949, but whose technique is dated in many sources to the major paper that he published on the subject in 1957, and other authors cite an even later, 1961 publication.
HEMATOLOGY The scientific examination of blood in order to learn more about the health of the patient from whom it was taken can be dated to 1642, when Anthony van Leeuwenhoek first observed blood cells through his newly invented microscope. Progress was at first slow, and it was not until 1770 that leucocytes were discovered by William Hewson, an English surgeon, who also observed that red cells were flat rather than spherical, as had earlier been supposed. Association of blood cell counts with clinical illness depended on the development of a technical method by which blood cells could be counted. In 1852, Karl Vierordt at the University of Tübingen developed such a technique, which, although too tedious for routine use, was used by one of his students, H. Welcher, to count red blood cells in a patient with “chlorosis” (an old word for what is probably our modern iron-deficiency anemia). He found, in 1854, that an anemic patient had significantly fewer red blood cells than did a normal person. Platelets, the third major cellular constituent of blood, were identified in 1862 by a German anatomist, Max Schultze. Remarkably, all these discoveries were made without the benefit of cell staining, an aid to microscopic visualization that was not introduced until 1877 in Paul Ehrlich’s doctoral dissertation at the University of Leipzig. The movement of blood cell studies from the research laboratory to routine support of patient care needed a fast automatic technique for separating and counting cells, which was eventually provided by the Coulter brothers, Wallace and Joseph. In 1953 they patented a machine that detected the change in electrical conductance of a small aperture as fluid containing cells was drawn through.
BLOOD AND URINE CHEMISTRY
11
Cells, being nonconducting particles, alter the effective cross section of the conductive channel and so signal both their presence and their size. An alternative technique, flow cytometry, was also developed in stages between the late 1940s and the early 1970s. Frank Gucker at Northwestern University developed a machine for counting bacteria in a laminar stream of air during World War II and used it to test gas masks, the work subsequently being declassified and published in 1947. Louis Kamentsky at IBM Laboratories and Mack Fulwyler at the Los Alamos National Laboratory experimented with fluidic switching and electrostatic cell detectors, respectively, and both described cell sorters in 1965. The modern approach of detecting cells stained with fluorescent antibodies was developed in 1972 by Leonard Herzenberg and his team at Stanford University, who coined the term fluorescence-activated cell sorter (FACS).
BLOOD AND URINE CHEMISTRY As with hematology, real progress in measuring the chemical constituents of plasma depended largely on the development of the necessary technology. Until such techniques became available, however, ingenious use was made of bioassays, developed in living organisms or preparations made from them, to detect and in some cases quantify complex molecules. A good example of this is the detection of human chorionic gonadotrophin (hCG) in urine as a test for pregnancy. Selmar Aschheim and Bernhard Zondek in Berlin, who first isolated this hormone in 1928, went on to devise the Aschheim–Zondek pregnancy test, which involved five days of injecting urine from the patient repeatedly into an infantile female mouse which was subsequently killed and dissected. The finding of ovulation in the mouse indicated that the injected urine contained hCG and meant that the patient was pregnant. In the early 1940s, the mouse test gave way to the frog test, introduced by Lancelot Hogben in England. This was a considerable improvement, in that injection of urine or serum from a pregnant woman into the dorsal lymph sac of the female African clawed frog (Xenopus laevis) resulted in ovulation within 4 to 12 hours. Although this test was known to give a relatively high proportion of false negatives, it was regarded as an outstanding step forward in diagnosis. One story from the 1950s recounts that with regard to the possible pregnancy of a particular patient, “opinions were sought from an experienced general practitioner, an eminent gynecologist, and a frog; only the frog proved to be correct.” Pregnancy testing, and many other “biomarker” activities, subsequently moved from out-and-out bioassays to the “halfway house” of immunological tests based on antibodies to the test compound generated in a convenient species but then used in an ex vivo laboratory setting, and in 1960 a hemagglutination inhibition test for pregnancy was developed by Leif Wide and Carl Gemzell in Uppsala.
12
BIOMARKERS ARE NOT NEW
Not all immune reactions can be made to modulate hemagglutination, and a problem with the development of immunoassays was finding a simple way to detect whether the relevant antibody or antigen was present. One answer lay in the use of radiolabeled reagents. Radioimmunoassay was first described in a paper by Rosalyn Sussman Yalow and Solomon Berson published in 1960. Radioactivity is difficult to work with because of its safety concerns, so an alternative was sought. This came with the recognition that certain enzymes (such as ABTS or 3,3′,5,5′-tetramethylbenzidine) which react with appropriate substrates to give a color change could be linked to an appropriate antibody. This linking process was developed independently by Stratis Avrameas and G. B. Pierce. Since it is necessary to remove any unbound antibody or antigen by washing, the antibody or antigen must be fixed to the surface of the container, a technique first published by Wide and Porath in 1966. In 1971, Peter Perlmann and Eva Engvall at Stockholm University, as well as Anton Schuurs and Bauke van Weemen in the Netherlands, independently published papers that synthesized this knowledge into methods to perform enzyme-linked immunosorbent assay (ELISA). A further step toward physical methods was the development of chromatography. The word was coined in 1903 by the Russian botanist Mikhail Tswett to describe his use of a liquid–solid form of a technique to isolate various plant pigments. His work was not widely accepted at first, partly because it was published in Russian and partly because Arthur Stoll and Richard Willstätter, a much better known Swiss–German research team, were unable to repeat the findings. However, in the late 1930s and early 1940s, Archer Martin and Richard Synge at the Wool Industries Research Association in Leeds devised a form of liquid–liquid chromatography by supporting the stationary phase, in this case water, on silica gel in the form of a packed bed and used it to separate some acetyl amino acids derived from wool. Their 1941 paper included a recommendation that the liquid mobile phase be replaced with a suitable gas that would accelerate the transfer between the two phases and provide more efficient separation: the first mention of the concept of gas chromatography. In fact, their insight went even further, in that they also suggested the use of small particles and high pressures to improve the separation, the starting point for high-performance liquid chromatography (HPLC). Gas chromatography was the first of these concepts to be taken forward. Erika Cremer working with Fritz Prior in Germany developed gas–solid chromatography, while in the UK, Martin himself cooperated with Anthony James in the early work on gas–liquid chromatography published in 1952. Real progress in HPLC began in 1966 with the work of Csaba Horváth at Yale. The popularity of the technique grew rapidly through the 1970s, so that by 1980, this had become the standard laboratory approach to a wide range of analytes. The continuing problem with liquid or gas chromatography was the identification of the molecule eluting from the system, a facet of the techniques that was to be revolutionized by mass spectrometry.
FASHIONABLE “OMICS”
13
The foundations of mass spectrometry were laid in the Cavendish Laboratories of Cambridge University in the early years of the twentieth century. Francis Aston built the first fully functional mass spectrometer in 1919 using electrostatic and magnetic fields to separate isotope ions by their masses and focus them onto a photographic plate. By the end of the 1930s, mass spectrometry had become an established technique for the separation of atomic ions by mass. The early 1950s saw attempts to apply the technique to small organic molecules, but the mass spectrometers of that era were extremely limited by mass and resolution. Positive theoretical steps were taken, however, with the description of time-of-flight (TOF) analysis by W. C. Wiley and I. H. Maclaren. and quadruple analysis by Wolfgang Pauli. The next major development was the coupling of gas chromatography to mass spectrometry in 1959 by Roland Gohlke and Fred McLafferty at the Dow Chemical Research Laboratory in Midland, Michigan. This allowed, for the first time, an analysis of mixtures of analytes without laborious separation by hand. This, in turn, was the trigger for the development of modern mass spectrometry of biological molecules. The introduction of liquid chromatography–mass spectrometry (LC-MS) in the early 1970s, together with new ionization techniques developed over the last 25 years (i.e., fast particle desorption, electrospray ionization, and matrix-assisted laser desorption/ionzation), have made it possible to analyze almost every class of biological compound class right up into the megadalton range.
FASHIONABLE “OMICS” In Benet Street, Cambridge, stands a rather ordinary pub which on Saturday, February 28, 1953, enjoyed 15 minutes of fame far beyond Andy Warhol’s wildest dreams. Two young men arrived for lunch and, as James Watson watched, Francis Crick announced to the regulars in the bar that “we have found the secret of life.” The more formal announcement of the structure of DNA appeared in Nature on April 2 in a commendably brief paper of two pages with six references. Watson and Crick shared a Nobel prize with Maurice Wilkins, whose work with Rosalind Franklin at King’s College, London had laid the groundwork. Sadly, Franklin’s early death robbed her of a share of the prize, which is never awarded posthumously. Over the next two decades a large number of researchers teased out the details of the genetic control of cells, and by 1972 a team at the Laboratory of Molecular Biology of the University of Ghent, led by Walter Fiers, were the first to determine the sequence of a gene (a coat protein from a bacteriophage). The same team followed up in 1976 by publishing the complete RNA nucleotide sequence of the bacteriophage. The first DNA-based genome to be sequenced in its entirety was the 5368-base-pair sequence of bacteriophage
14
BIOMARKERS ARE NOT NEW
Φ-X174 elucidated by Frederick Sanger in 1977. The science of genomics had been born. Although the rush to sequence the genomes of ever more complex species (including humans in 2001) initially held out considerable hope of yielding new biomarkers, focus gradually shifted to the protein products of the genes. This process is dated by many to the introduction in 1977 by Patrick O’Farrell at the University of Colorado in Boulder of two-dimensional polyacrylamide gel electrophoresis (2-D PAGE). The subject really took off in the 1990s, however, with technical improvements in mass spectrometers combined with computing hardware and software to support the extremely complex analyses involved. The next “omics” to become fashionable was metabolomics, based on the realization that the quantitative and qualitative pattern of metabolites in body fluids reflects the functional status of an organism. The concept is by no means new, the first paper addressing the idea (but not using the word) having been “Quantitative Analysis of Urine Vapor and Breath by Gas–Liquid Partition Chromatography” by Robinson and Pauling in 1971. The word metabolomics, however, was not coined until the 1990s.
THE FUTURE Two generalizations may perhaps be drawn from the accelerating history of biomarkers over the last 2700 years. The first is that each new step depends on an interaction between increasing understanding of the biology and technical improvement of the tools leading to a continuous spiral of innovation. The second is the need for an open but cautious mind. Sushustra’s recognition of the implications of sweet urine has stood the test of time; de Corbeil’s Poem on the Judgment of Urines has not. The ultimate fate of more recent biomarkers will only be revealed by time.
2 BIOMARKERS: FACING THE CHALLENGES AT THE CROSSROADS OF RESEARCH AND HEALTH CARE Gregory J. Downing, D.O., Ph.D. U.S. Department of Health and Human Services, Washington, DC
INTRODUCTION Across many segments of the biomedical research enterprise and the health care delivery sectors, the impact of biomarkers has been transforming in many ways: from business and economics to policy and planning of disease management. The pace of basic discovery research progress has been profound worldwide, with the intertwining of innovative technologies and knowledge providing extensive and comprehensive lists of biological factors now known to play integral roles in disease pathways. These discoveries have had a vast impact on pharmaceutical and biotechnology industries, with tremendous growth in investment in biomarker research reaching into the laboratory technology and services sector. These investments have spawned new biomedical industry sectors, boosted the roles of contract research organizations, supported vast new biomarker discovery programs in large corporate organizations, and prompted the emergence of information management in research. Similarly, growth in the academic research programs supporting biomarker research has greatly expanded training capacity, bench and clinical research capacity, and infrastructure, while fueling the growth of intellectual property. Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
15
16
THE CROSSROADS OF RESEARCH AND HEALTH CARE
By many reports, private-sector applications of biomarkers in toxicity and early efficacy trials have been fruitful in developing decision-making priorities that are introducing greater efficiency in early to midstage medical product development. Despite the heavy emphasis in private and publicly funded research, the reach of the impact of biomarkers into clinical practice interventions at present is challenging to quantify. The costs of development remain high for many drugs, and the numbers of new chemical entities reaching the marketplace have continued to remain relatively low compared to prior years and expectations following robust research expansion of the 1980s and 1990s. Industry concerns about the sustainability of research and development programs have grown in the backdrop of the clinical challenges that are attendant on biomarker applications in clinical trials. Understanding of the clinical implications of disease markers as a consequence of relatively slow evidence development has taken much longer to discern than many had predicted. There have been many challenges in establishing a translational research infrastructure that serves to verify and validate the clinical value of biomarkers as disease endpoints and their value as independent measures of health conditions. The lack of the equivalent of the clinical trial infrastructure for biomarker validation and diagnostics has slowed progress compared to therapeutic and device development. Evidence development processes and evaluations have only now begun to emerge for biomarkers, and wide adoption of them in clinical practice measures has not yet matured. For some, the enthusiasm and economic balance sheets have not been squared, as the clinical measure indices that had been hoped for have been viewed as moderately successful by some and by others as bottlenecks in the pipelines of therapeutic and diagnostic development.
BRIEF HISTORY OF BIOMARKER RESEARCH, 1998–2008: THE FIRST DECADE During the last decade of the twentieth century, biomedical research underwent one of the most dramatic periods of change in history. Influenced by a multitude of factors—some scientific, others economic, and still others of policy—new frontiers of science emerged as technology and knowledge converged, and diverged—bringing new discoveries and hope to the forefront of medicine and health. These capabilities came about as a generation’s worth of science that brought to the mainstream of biomedical research the foundation for a molecular basis of disease: recombinant DNA technology. Innovative applications of lasers, novel medical imaging platforms, and other advanced technologies began to yield a remarkable body of knowledge that provided unheralded opportunities for discovery of new approaches to the management of human health and disease. Here we briefly revisit a part of the medical research history that led to the shaping of new directions that is, for now, captured simply by the term
BRIEF HISTORY OF BIOMARKER RESEARCH, 1998–2008: THE FIRST DECADE
17
biomarker, a biological indicator of health or disease. In looking backward to the 1980s and 1990s and the larger scheme of health care, many new challenges were being faced. The international challenges and global economic threats posed by human immunodeficiency virus (HIV) and AIDS provided the impetus for one of the first steps in target-designed therapies and the use of viral and immune indicators of disease. For the first time, strategically directed efforts in discovery and clinical research paradigms were coordinated at the international level using clinical measures of disease at the molecular level. The first impact of biomarkers on discovery and translational research, both privately and publicly funded, as related to biological measures of viral load, CD4+ T-lymphocyte counts, and other parameters of immune function and viral resistance came to be a mainstay in research and development. Regulatory authority was put in place to allow “accelerated approval” of medical products using surrogate endpoints for health conditions with grave mortality and morbidity. Simultaneously, clinical cancer therapeutics programs had some initial advances with the use of clinical laboratory tests that aided in the distinction between responders and nonresponders to targeted therapies. The relation of Her2/neu tyrosine kinase receptor in aggressive breast cancer and response to (Herceptin) [1], and similarly, the association of imatinib (Gleevac) responsiveness with the association of the presence of Philadelphia chromosome translocation involving BCR/Abl genes in chronic myelogenous leukemia [2], represented some of the cases where targeted molecular therapies were based on a biomarker test as a surrogate endpoint for patient clinical response. These represented the entry point of pharmaceutical science moving toward co-development, using diagnostic tests to guide selection of therapy around a biomarker. Diverse changes were occurring throughout the health care innovation pipeline in the 1990s. The rise of the biotechnology industry became an economic success story underpinned by successful products in recombinant DNA technology, monoclonal antibody production, and vaccines. The device manufacturing and commercial laboratory industries became major forces. In the United States, the health care delivery system underwent changes with the widespread adoption of managed care programs, and an effort at health care reform failed. For U.S.-based academic research institutions, it was a time of particular tumult for clinical research programs, often supported through clinical care finances, downsized in response to financial shortfalls. At a time when scientific opportunity in biomedicine was, arguably, reaching its zenith, there were cracks in the enterprise that was responsible for advancing basic biomedical discovery research to the clinic and marketplace. In late 1997, the director of the National Institutes of Health, Harold Varmus, met with biomedical research leaders from academic, industrial, governmental, and clinical research organizations, technology developers, and public advocacy groups to discuss mutual challenges, opportunities, and responsibilities in clinical research. In this setting, some of the first strategic considerations regarding “clinical markers” began to emerge among stake-
18
THE CROSSROADS OF RESEARCH AND HEALTH CARE
holders in clinical research. From a science policy perspective, steps were taken to explore and organize information that brought to light the need for new paradigms in clinical development. Some of these efforts led to the framing of definitions of terms to be used in clinical development, such as biomarkers (a characteristic that is measured and evaluated objectively as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention) and surrogate endpoints (a biomarker that is intended to substitute for a clinical endpoint and is expected to predict clinical benefit or harm, or lack of benefit or harm, based on epidemiologic, therapeutic, pathophysiologic, or other scientific evidence) and descriptions of the information needs and strategic and tactical approaches needed to apply them in clinical development [3]. A workshop was held to address statistical analysis, methodology, and research design issues in bridging empirical and mechanism-based knowledge in evaluating potential surrogate endpoints [4]. In-depth analyses were held to examine information needs, clinical training skills, database issues, regulatory policies, technology applications, and candidate disease conditions and clinical trials that were suitable for exploring biomarker research programs. As a confluence of these organizational activities, in April 1999 an international conference was hosted by the National Institutes of Health (NIH) and U.S. Food and Drug Administration (FDA) [5]. The leadership focused on innovations in technology applications, such as multiplexed gene analysis using polymerase chain reaction technologies, large-scale gel analysis of proteins, and positron-emission tomography (PET) and magnetic resonance imaging (MRI). A summary analysis was crafted for all candidate markers for a wide variety of disease states, and a framework was formed for multiple disease-based public–private partnerships in biomarker development. A series of research initiatives supported by industry, NIH, and FDA were planned and executed in ensuring months. New infrastructure for discovery and validation of cancer biomarkers was put in place. Public–private partnerships for biomarker discovery and characterization were initiated in osteoarthritis, Alzheimer disease, and multiple sclerosis. Research activities in toxicology markers for cardiovascular disease and metabolism by renal and hepatic transformation systems were initiated by FDA. These events did not yield a cross-sector strategic action plan, but did serve as a framework for further engagement across governmental, academic, industrial, and nongovernmental organizations. Among the breakthroughs was the recognition that new statistical analysis methods and clinical research designs would be needed to address multiple variables measured simultaneously and to conduct metaanalyses from various clinical studies to comprehend the effects of a biomarker over time and its role as a reliable surrogate endpoint. Further, it was recognized that there would be needs for data management, informatics, clinical registries, and repositories of biological specimens, imaging files, and common reagents. Over the next several years, swift movement across the research and development enterprise was under way. It is obvious that future biomarker research
SCIENCE AND TECHNOLOGY ADVANCES IN BIOMARKER RESEARCH
19
TABLE 1 Major Scientific Contributions and Research Infrastructure Supporting Biomarker Discovery Human Genome Project Mouse models of disease (recombinant DNA technology) Information management (informatics tools, open-source databases, open-source publishing, biomarker reference services) Population-based studies and gene–environment interaction studies Computational biology and biophysics Medical imaging: structural and functional High-throughput technologies: in vitro cell-based screening, nanotechnology platforms, molecular separation techniques, robotics, automated microassays, high-resolution optics Proteomics, metabolomics, epigenomics Pharmacogenomics Molecular toxicology Genome-wide association studies Molecular pathways, systems biology, and systems engineering
was driven in the 1990s and early years of the twenty-first century by the rapid pace of genome mapping and the fall in cost of large-scale genomic sequencing technology, driven by the Human Genome Project. A decade later, it is now apparent that biomarker research in the realm of clinical application has acquired a momentum of its own and is self-sustaining. The major schemes for applications of biomarkers can be described in a generalized fashion in four areas: (1) molecular target discovery, (2) earlyphase drug development, (3) clinical trials and late-stage therapeutic development, and (4) clinical applications for health status and disease monitoring. The building blocks for biomarker discovery and early-stage validation over the last decade are reflected in Table 1. Notable to completion of the international Human Genome Project was the vast investment in technology, database development, training, and infrastructure that have been applied throughout industry toward clinical research applications.
SCIENCE AND TECHNOLOGY ADVANCES IN BIOMARKER RESEARCH In the past decade of biomarker research, far and away the most influential driving force was completion of the Human Genome Project in 2003. The impact of this project on biomarker research has many facets beyond establishment of the reference data for human DNA sequences. This mammoth undertaking initiated in 1990 led to the sequence for the nearly 25,000 human genes and to making them accessible for further biological study. Beyond this and the other species genomes that have been characterized,
20
THE CROSSROADS OF RESEARCH AND HEALTH CARE
human initiatives to define individual differences in the genome provided some of the earliest large-scale biomarker discovery efforts. The human haplotype map (HapMap) project defined differences in single-nucleotide polymorphisms (SNPs) in various populations around the world to provide insights into the genetic basis of disease and into genes that have relevance for individual differences in health outcomes. A collaboration among 10 pharmaceutical industry companies and the Wellcome Trust Foundation, known as the SNP consortium, was formed in 1999 to produce a public resource of SNPs in the human genome [6]. The SNP consortium used DNA resources from a pool of samples obtained from 24 people representing several racial groups. The initial goal was to discover 300,000 SNPs in two years, but the final results exceeded this, as 1.8 million SNPs had been released into the public domain at the end of 2002 when the discovery phase was completed. The SNP consortium was notable, as it would serve as a foundation for further cross-industry public–private partnerships that would be spawned as a wide variety of community-based efforts to hasten the discovery of genomic biomarkers (see below). The next phase of establishing the basic infrastructure to support biomarker discovery, particularly for common chronic diseases, came in 2002 through the International HapMap Project, a collaboration among scientists and funding agencies from Japan, the United Kingdom, Canada, China, Nigeria, and the United States [7]. A haplotype is a set of SNPs on a single chromatid that are associated statistically. This rich resource not only mapped over 3.1 million SNPs, but established additional capacity for identifying specific gene markers in chronic diseases and represented a critical reference set for enabling population-based genomic studies to be done that could establish a gene– environmental basis for many diseases [8]. Within a short time of completing the description of the human genome, a substantial information base was in place to enable disease–gene discoveries on a larger scale. This approach to referencing populations to the welldescribed SNP maps is now the major undertaking for defining gene-based biomarkers. In recent years, research groups around the world have rapidly been establishing genome-wide association studies to identify specific gene sets associated with diseases for a wide range of chronic diseases. This new era in population-based genetics began with a small-scale study that led to the finding that age-related macular degeneration is associated with a variation in the gene for complement factor H, which produces a protein that regulates inflammation [9]. The first major implication in a common disease was revealed in 2007 through a study of type II diabetes variants [10]. To demonstrate the rapid pace of discovery of disease gene variants: At the time of this writing, within 18 months following the study, there are now 18 disease gene variants associated with defects in insulin secretion [11]. The rapid growth in genome-wide association studies (GWASs) is identifying a large number of multigene variants that are leading to subclassification
SCIENCE AND TECHNOLOGY ADVANCES IN BIOMARKER RESEARCH
21
of diseases with common phenotype presentations. Among the databases being established for enabling researchers public access to these association studies is dbGaP, the database of genotype and phenotype. The database, which was developed and is operated by the National Library of Medicine’s National Center for Biotechnology Information, archives and distributes data from studies that have investigated the relationship between phenotype and genotype, such as GWASs. At present, dbGAP contains 36 population-based studies that include genotype and phenotype information. Worldwide, dozens if not hundreds of GWASs are under way for a plethora of health and disease conditions associated with genetic features. Many of these projects are collaborative, involve many countries, and are supported through public–private partnerships. An example is the Genomics Association Information Network (GAIN), which is making genotype–phenotype information publicly available for a variety of studies in mental health disorders, psoriasis, and diabetic nephropathy [12]. For the foreseeable future, substantial large-scale efforts will continue to characterize disease states and catalog genes associated with clinically manifested diseases. As technology and information structures advance, other parameters of genetic modifications represent new biomarker discovery opportunities. The use of metabolomics, proteomics, and epigenomics in clinical and translational research is now being actively engaged. A new large-scale project to sequence human cancers, the Cancer Genome Atlas, is focused on applying large-scale biology in the hunt for new tumor genes, drug targets, and regulatory pathways. This project is focused not only on polymorphisms but also on DNA methylation patterns and copy numbers as biomarker parameters [13]. Again, technological advances are providing scientists with novel approaches to inferring sites of DNA methylation at nucleotide-level resolution using a technique known as high-throughput bisulfite sequencing (HTBS). Large-scale initiatives are also under way to bring a structured approach to relating protein biomarkers into focus for disease conditions. Advances in mass spectrometry, protein structure resolution, bioinformatics for archiving protein-based information, and worldwide teams devoted to disease proteomes have solidified in recent years. Although at a more nascent stage of progress in disease characterization, each of these emerging new fields is playing a key complementary role in biomarker discovery in genetics and genomics. Supporting this growth in biomarker discovery is massive investment over the last 10 years worldwide by public and private financers that has spawned hundreds of new commercial entities worldwide. Private-sector financing for biomarker discovery and financing has become a major component of biomedical research and development (R&D) costs in pharmaceutical development. Although detailed budget summaries have not been established for U.S. federal funding of biomarker research, in a recent survey by McKinsey and Co., biomarker R&D expenditures in 2009 were estimated at $5.3 billion, up from $2.2 billion in 2003 [14].
22
THE CROSSROADS OF RESEARCH AND HEALTH CARE
POLICIES AND PARTNERSHIPS Although progress in biomarker R&D has accelerated, the clinical translation of disease biomarkers as endpoints in disease management and as the foundation for diagnostic products has had more extensive challenges. A broad array of international policy matters over the past decade have moved to facilitate biomarker discovery and validation (Table 2). In the United States, the FDA has taken a series of actions to facilitate applications of biomarkers in drug development and use in clinical practices as diagnostic and therapeutic monitoring. A voluntary submission process of genomic data from therapeutic development was initiated by the pharmaceutical industry and the FDA in 2002 [15]. This program has yielded many insights into the role of drug-metabolizing enzymes in the clinical pharmacodynamic parameters of biomarkers in drug development. In July 2007, guidelines for the use of multiplexed genetic tests in clinical practice to monitor drug therapy were issued by the FDA [16] More recently, the FDA has begun providing label requirements indicating those therapeutic agents for which biomarker assessment can be recommended to avoid toxicity and enhance the achievement of therapeutic responses [17]. In 2007, Congress authorized the establishment of a private–public resource to support collaborative research with the FDA. One of the major obstacles to clinical genomic research expressed over the years has been a concern that research participants may be discriminated against in employment and provision of health insurance benefits as a result of the association of genetic disease markers. After many years of deliberation, the U.S. Congress passed legislation known as the Genetic Information Non-discrimination Act of 2008, preventing the use of genetic information to deny employment and health insurance. The past decade has seen many new cross-organizational collaborations and organizations developed to support biomarker development. For example,
TABLE 2
Major International Policy Issues Related to Biomarker Research
Partnerships and collaborations: industry, team science Expanded clinical research capacity through increases in public and private financing Open-source publishing, data-release policies Standards development and harmonization FDA Critical Path Initiative Regulatory guidances for medical product development Biomarkers Consortium Evidence-based medicine and quality measures of disease International regulatory harmonization efforts Public advocacy in medical research Genetic Information Non-discrimination Act of 2008 (U.S.)
POLICIES AND PARTNERSHIPS
23
the American Society for Clinical Oncology, the American Association for Cancer Research, and the FDA established collaborations in workshops and research discussions regarding the use of biomarkers for ovarian cancer as surrogate endpoints in clinical trials [18]. The FDA Critical Path Initiative was developed in 2006, with many opportunities described for advancing biomarkers and surrogate endpoints in a broad range of areas for therapeutic development [19,20]. Progress in these areas have augmented industry knowledge of application of biomarkers in clinical development programs and fostered harmony with international regulatory organizations in the ever-expanding global research environment. This program has been making progress on expanding the toolbox for clinical development—many of the components foster development and application of biomarkers. As an example of international coordination among regulatory bodies, recently the FDA and the European Medicines Agency (EMEA) for the first time worked together to develop a framework allowing submission, in a single application to the two agencies, of the results of seven new biomarker tests that evaluate kidney damage during animal testing of new drugs. The new biomarkers are KIM-1, albumin, total protein, β2-microglobulin, cystatin C, clusterin, and trefoil factor-3, replacing blood urea nitrogen (BUN) and creatinine in assessing acute toxicity [21]. The development of this framework is discussed in more detail by Goodsaid in Chapter 9. In 2007, Congress established legislation that formed the Reagan–Udall Foundation, a not-for-profit corporation to advance the FDA’s mission to modernize medical, veterinary, food, food ingredient, and cosmetic product development, accelerate innovation, and enhance product safety. Another important policy in biomarker development occurred with the abovementioned establishment of the Critical Path Institute in 2006 to facilitate precompetitive collaborative research among pharmaceutical developers. In working closely with the FDA, these collaborations have focused on toxicology and therapeutic biomarker validation [22]. In 2007, a new collaboration building on public–private partnerships with industry was formed to develop clinical biomarkers. In 2006, the Biomarkers Consortium was established as a public–private initiative with industry and government to spur biomarker development and validation projects in cancer, central nervous system, and metabolic disorders in its initial phase [23]. These programs all support information exchange and optimize the potential to apply well-characterized biomarkers to facilitate pharmaceutical and diagnostic development programs. Other policies that are broadening the dissemination of research findings relate to the growing directions toward open-source publishing. In 2003, the Public Library of Science began an open-source publication process that provides instant access to publications [24]. Many scientific journals have moved to make their archives available six to 12 months after publication. In 2008, the National Institutes of Health implemented policy that requires publications of scientific research with U.S. federal funding to be placed in the public domain within 12 months of publication [25]. All of these policy actions are
24
THE CROSSROADS OF RESEARCH AND HEALTH CARE
favoring biomarker research by accelerating the transfer of knowledge from discovery to development. New commercial management tools have been developed to provide extensive descriptions of biomarkers and their state of development. Such resources can help enhance industry application of wellcharacterized descriptive information and increase efficiency of research by avoiding duplication and establishing centralized credentialing of biomarker information [26]. New business models are emerging among industry and patient advocacy organizations to increase the diversity of financing options for early-stage clinical development [27]. Private philanthropy conducted with key roles of patient groups are supporting research in proof-of-concept research and target validation with the expectation that these targeted approaches will lead to commercial interests in therapeutic development. Patient advocacy foundations are supporting translational science in muscular dystrophy, amyotrophic lateral sclerosis, juvenile diabetes, multiple myeloma, and Pompe disease, often with partnerships from private companies [28,29].
CHALLENGES AND SETBACKS While progress in biomarker R&D has accelerated, the clinical translation of disease biomarkers as endpoints in disease management and as the foundation for diagnostic products has had more extensive challenges [30]. For example, we have not observed a large number of surrogate endpoints emerging as clinical trial decision points. Notable exceptions to this include imaging endpoints that have grown in substantial numbers. In most cases for drug development, biomarkers are being applied in therapeutic development to stratify patients into subgroups of responders, to aid in pharmacodynamic assessment, and to identify early toxicity indicators to avoid late-stage failures in therapeutic development. There are difficulties in aligning the biomarker science to clinical outcome parameters to establish the clinical value in medical practice decision making. As applied in clinical practice, biomarkers have their most anticipated applications in pharmacotherapeutic decisions in treatment selection and dosing, risk assessment, and stratification of populations for disease preemption and prevention. In the United States, challenges to the marketplace are presented by the lack of extensive experience with pathways for medical product review and reimbursement systems that establish financial incentives for biomarker development as diagnostic assays. Clinical practice guidelines for biomarker application in many diseases are lacking, leaving clinicians uncertain about what roles biomarker assays play in disease management. In addition, few studies have been done to evaluate the cost-effectiveness of inclusion of biomarkers and molecular diagnostics in disease management [31]. The lack of these key pieces in a system of modern health care can cripple plans for integration of valuable technologies into clinical practice.
LOOKING FORWARD
25
Scientific setbacks have also occurred across the frontier of discovery and development. Among notable instances was the use of pattern recognition of tandem mass spectrometric measurements of blood specimens in ovarian cancer patients. After enthusiastic support of the application in clinical settings, early successes were erased when technical errors and study design issues led to faulty assumptions about the findings. Across clinical development areas, deficiencies in clinical study design have left initial study findings unconfirmed, often due to overfitting of sample size to populations, and improper control for selection and design bias [32]. Commercial development of large-scale biology companies has also struggled in some ways to identify workable commercial models. Initial enthusiasm early in the decade about private marketing of genomic studies in disease models faltered as public data resources emerged. Corporate models for developing large proteomic databases faltered based on a lack of distinguished market value, little documented clinical benefit, and wide variability in quality of clinical biospecimens. Evidence development to support clinical utility of many biomarkers to be used as clinical diagnostics is difficult to establish, as clinical trial infrastructure has not yet been established to validate candidate biomarkers for clinical practice. An obstacle to this has been access to well-characterized biospecimens coupled with clinical phenotype information. This has led to calls for centralized approaches to biospecimen collection and archiving to support molecular analysis and biomarker research [33]. Furthermore, a wide variety of tissue collection methods and DNA and protein preparation for molecular analysis has been at the root of many problems of irreproducibility. Standards development and best practices have been represented as cornerstones in facilitating biomarker validation [34,35]. Similarly, reacting to the lack of reproducibility of findings in some studies, proposals have been made for standards in study design for biomarker validation for risk classification and prediction [36].
LOOKING FORWARD The next decade of biomarker research is promising, with a push toward more clinical applications to be anticipated. Key factors on the horizon that will be integral to clinical adoption are summarized in Table 3. The confluence of basic and translational research has set the stage for personalized medicine, a term of art used widely now, which indicates that health care practices can be customized to meet specific biological and patient differences. The term was not part of the lexicon in 1997 but speaks to the consumer-directed aspects of biomedical research. Genomic services offered to consumers have emerged with the use of GWASs, although clinical value and impact are not known. It is clear that the emergence of biomarkers in an ever-changing health care delivery system will in some fashion incorporate the consumer marketplace.
26
THE CROSSROADS OF RESEARCH AND HEALTH CARE
TABLE 3
Looking Ahead: Implementing Biomarkers in Clinical Care
Intellectual property policy Phenotypic disease characterization Clinical translation: biomarker validation and verification Clinical trials stratification based on biological diversity Surrogate endpoints: managing uncertainty and defining boundaries in medical practice Data-sharing models Dynamic forces in industry financing Co-development of diagnostics and therapeutics Clinical infrastructure for evidence development and clinical utility of diagnostics Health information exchange and network services Consumer genomic information services
Prospects for biomarkers to continue to play a major role in the transformation of pharmaceutical industry research remain high as new technology platforms, bioinformatics infrastructure, and credentialed biomarkers evolve. The emergence of a clearer role for federal regulators and increased attention to appraisal of value from genomic-based diagnostics will help provide guideposts for investment and a landscape for clinical application. One can anticipate that the impact of genomics in particular will probably provide a clinical benefit in chronic diseases and disorders, where multiple biomarker analyses reflect models and pathways of diseases. An emerging clinical marketplace is evolving for the development and application of biomarker assays as clinical diagnostics. The pathway for laboratory-developed tests will probably evolve to include FDA oversight of certain tests with added complexity of multiple variables integrated into index scoring approaches to assist in therapeutic selection. Clinical practice guidelines are beginning to emerge for the inclusion of biomarkers to guide stratification of patients and therapeutic decision making. The early impact of these approaches is now evident in oncology, cardiovascular, and infectious disease and immune disorders. Advancing clinical biomarker to improve safety and quality of health care as a mainstay remains many years away. The clinical evaluation processes for diagnostics and targeted molecular therapy as a systematic approach have not yet been firmly established. The use of electronic health information, and integration of information from health plans and longitudinal data collection and randomized clinical trials will need integration and coordination for effective implementation in medical decision making. In 2008, the first impact was felt for over-the-counter or electronically available consumer services based on genetic tests. Utilizing public and private genome-wide association databases, private-sector resources using powerful search engines coupled with family history information and SNP analysis developed a consumer service that identifies possible health risks. Although
REFERENCES
27
the medical benefit of such services remains undocumented, the successful entry of several services and the growth of online commercial genomic services indicates interest among health-conscious citizens for understanding inherited disease risk. Another noteworthy factor that will probably play an important role in the next decade of clinical biomarker adoption is the development of standards and interoperability specifications for the health care delivery system and consumers. Interoperable health record environments will probably provide more flexibility and mobility of laboratory information and provide advantages to consumer empowerment in prevention and disease management. One of the most important and sweeping challenges with biomarkers is the development of intellectual property policies that will bring opportunity and entrepreneurship in balance with meeting unmet market needs and clinical value. Because of the likelihood that single gene mutations or simple protein assays are not by themselves a discovery equivalent to diagnostic tests or new clinical markers, the arraying of the convergence of circles of technologies and knowledge will need new approaches to management for the combined aspects of technology to be brokered as real value in health care. Indeed, the challenges ahead in noting the importance overall for this notion is underscored more globally by Alan Greenspan in noting that “arguably, the single most important economic decision our lawmakers and courts will face in the next twenty-five years is to clarify the rules of intellectual property” [37]. Overall, a decade’s worth of work has charted a robust and vibrant course for biomarkers across the biomedical research and development landscape. Clinical applications of biomarkers in medical practice are coming more into focus through diagnostics and molecularly targeted therapies, but a long period of time may pass before biomarker-based medicine becomes a standard in all areas of health care practice.
REFERENCES 1. Ross JS, Fletcher JA, Linette GP, et al. (2003). The Her-2/neu gene and protein in breast cancer 2003: biomarker and target of therapy. Oncologist, 8:307–325. 2. Deininger M, Druker BJ (2003). Specific targeted therapy of chronic myelogenous leukemia with Imatinib. Pharmacol Rev, 55:401–423. 3. Biomarkers Definitions Working Group (2001). Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther, 69:89–95. 4. De Grottola VG, et al. (2001). Considerations in the evaluation of surrogate endpoints in clinical trials: summary of a National Institutes of Health Workshop. Control Clin Trials, 22:485–502. 5. Downing GJ (ed.) (2000). Biomarkers and Surrogate Endpoints: Clinical Research and Applications. Elsevier Science, Amsterdam.
28
THE CROSSROADS OF RESEARCH AND HEALTH CARE
6. Sachidanandam R, Weissman D, Schmidt S, et al. (The International SNP Map Working Group) (2001). A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature, 409:928–933. 7. The International HapMap Consortium (2003). The International HapMap Project. Nature, 426:789–796. 8. The International HapMap Consortium (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature, 449:851–861. 9. Klein RJ, Zeiss C, Chew EY, et al. (2005). Complement factor H polymorphism in age-related macular degeneration. Science, 308:385–389. 10. Sladek R, Rocheleau G, Rung J, et al. (2007). A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature, 445:881–885. 11. Perry JR, Frayling TN (2008). New gene variants alter type 2 diabetes risk predominantly through reduced beta-cell function. Curr Opin Clin Nutr Metab Care, 11:371–378. 12. GAIN Collaborative Research Group (2007). New models of collaboration in genome-wide association studies: the Genetic Association Information Network. Nat Genet, 39(9):1045–1051. 13. Collis FS, Barker AD (2007). Mapping the cancer genome: pinpointing the genes involved in cancer will help chart a new course across the complex landscape of human malignancies. Sci Am, 296;50–57. 14. Conway M, McKinsey and Co. (2007). Personalized medicine: deep impact on the health care landscape. http://sequencing.hpcgg.org/PM/presentations/Tue_08_ Conway_061120%20Michael%20Conway%20Harvard%20Pers%20Med%20 Presentation.pdf (accessed Sept. 5, 2008). 15. Orr MS, Goodsaid F, Amur S, Rudman A, Frueh FW (2007). The experience with voluntary genomic data submissions at the FDA and a vision for the future of the voluntary data submission program. Clin Pharmacol, 81:294–297. 16. FDA (2007). Guidance for Industry and FDA Staff: Pharmacogenetic tests and genetic tests for heritable markers. http://www.fda.gov/cdrh/oivd/guidance/1549. html. 17. Frueh FW, et al. (2008).Pharmacogenomic biomarker information in drug labels approved by the United States Food and Drug Administration: prevalence of related drug use. Pharmacotherapy, 28:992–998. 18. Bast RC, Thigpen JT, Arbuck SG, et al. (2007). Clinical trial endpoints in ovarian cancer: report of an FDA/ASCO/AACR Public Workshop. Gynecol Oncol, 107(2):173–176. 19. The critical path to new medical products. http://www.fda.gov/oc/initiatives/ criticalpath/report2007.html. 20. FDA (2004). Challenge and opportunity on the critical path to new medical products. http://www.fda.gov/oc/initiatives/criticalpath/whitepaper.html (accessed Aug. 23, 2008). 21. FDA (2008). European Medicines Agency to consider additional test results when assessing new drug safety. http://www.fda.gov/bbs/topics/NEWS/2008/NEW01850. html. 22. Woolsey RL, Cossman J (2007). Drug development and the FDA’s Critical Path Initiative. Clin Pharmacol Ther, 81:129–133.
REFERENCES
29
23. The Biomarkers Consortium (2008). On the critical path of drug discovery. Clin Pharmacol Ther, 83:361–364. 24. Public Library of Science. http://www.plos.org. 25. National Institutes of Health (2008). Revised policy on enhancing public access to archived publications resulting from NIH-funded research. NOT 08–033. http:// grants.nih.gov/grants/guide/notice-files/not-od-08-033.html. 26. Thomson Reuters. BIOMARKERcenter. http://scientific.thomsonreuters.com/ products/biomarkercenter/ (accessed Sept. 23, 2008). 27. Kessel M, Frank F (2007). A better prescription for drug-development financing. Nat Biotechnol, 25:859–866. 28. PriceWaterhouseCoopers (2007). Personalized medicine: the emerging pharmacogenomics revolution. Blogal Technology Centre, Health Research Institute, San Jose, CA. 29. Trusheim MR, Berndt ER, Douglas FL (2007). Stratified medicine: strategic and economic implications of combiing drugs and clinical biomarkers. Nat Rev Drug Discov, 6(4):287–293. 30. Phillips KA, Van Bebber S, Issa A (2006). Priming the pipeline: a review of the clinical research and policy agenda for diagnostics and biomarker development. Nat Revi Drug Discov, 5(6):463–469. 31. Phillips KA, Van Bebber SL (2004). A systematic review of cost-effectiveness analyses of pharmacogenomic interventions. Pharmacogenomics, 5(8):1139–1149. 32. Ransohoff DW (2005). Lessons from controversy: ovarian cancer screening and serum proteomics. J Natl Cancer Inst, 97:315–319. 33. Ginsburg GS, Burke TW, Febbo T (2008). Centralized biospecimen repositories for genetic and genomic research. JAMA, 299:1359–1361. 34. National Cancer Institute (2007). National Cancer Institute best practices for biospecimen resources. http://biospecimens.cancer.gov/practices/. 35. Thomson Reuters (2008). Establishing the standards for biomarkers research. http://scientific.thomsonreuters.com/pm/biomarkers_white_paper_0308.pdf (accessed Sept. 4, 2008). 36. Pepe MS, Feng Z, Janes H, Bossoyt PM, Potter JD (2008). Pivotal evaluation of the accuracy of a biomarker used for classification or prediction: standards for study design. J Natl Cancer Inst, eprint, Oct. 8. 37. Greenspan A (ed.) (2007). The Age of Turbulence: Adventures in a New World. Penguin Press, New York.
3 ENABLING GO/NO GO DECISIONS J. Fred Pritchard, Ph.D. MDS Pharma Services, Raleigh, North Carolina
Mallé Jurima-Romet, Ph.D. MDS Pharma Services, Montreal, Quebec, Canada
UNDERSTANDING RISK There is no question that most people who know something about our industry consider developing a drug product as a risky business. Usually, they mean that the investment of time and money is high while the chance of a successful outcome is low compared to other industries that create new products. Yet the rewards can be great, not only in terms of monetary return on investment (ROI) but also in the social value of contributing an important product to the treatment of human disease. Risk is defined as “the possibility of loss or injury” [1]. Therefore, inherent in the concept is the mathematical sense of probability of occurrence of something unwanted. Everyday decisions and actions that people take are guided by conscious and unconscious assessments of risk, and we are comfortable with compartmentalized schemes where we sense that a situation is very high, high, medium, low, or very low risk. We often deal with the concept of a relative risk (e.g., in comparison to other options). Some risks can be defined in more absolute terms, such as some type of population measure based on trend analysis of prior incidence statistics (e.g., current risk of postmenopausal Caucasian women in the United States being diagnosed with breast cancer).
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
31
32
ENABLING GO/NO GO DECISIONS
These types of population-based risk data, while often much debated in the scientific and popular press, do affect decision making at the individual level. As opposed to an individual’s skill at assessing risks, decisions and actions taken during drug development require objective and systematic risk assessment by groups of people. There are many stakeholders involved in the development of a drug product. These include the specialists that do the science required in drug development. Also included are the investors and managers who make decisions about how limited resources will be used. Clinical investigators who administer the drug and the patients who agree to participate in clinical trials are stakeholders, as are the regulators and institutional review boards (IRBs) or ethics committees that approve use of an experimental drug in humans. Each stakeholder has his or her unique perspective of risk. The prime focus of some is the business risk involved, including how much work and money is invested to progress the drug at each phase of development. On the other hand, IRBs, regulators, investigators, and patients are concerned primarily with the safety risk to the patient. Figure 1 depicts the major factors that contribute to three different perspectives of risk, broadly classified as risk to patient, business risk, and risk of therapeutic failure. Investigators and patients are concerned primarily with safety and efficacy factors associated with a particular therapy. Although regulators are concerned primarily with patient safety, regulations do affect how much development will be required for a particular drug candidate. The business risk can be greatly affected by the investment and development partners involved and the expectations set with owners and managers. Potential competitors in the marketplace affect risk, as does the available pool of money for development. The common factor for all three risk perspectives is the novelty
Figure 1 Major factors affecting three main perspectives of risk in drug development: business risk, risk to patient, and risk of therapeutic failure.
DECISION GATES
33
of the product. Drug candidates that represent new targets for therapy while representing a hope for improved efficacy do require more interactions with regulators and investigators, adding to development expense. In addition, because the target is unproven, there is a greater relative risk for therapeutic failure compared to proven pharmacological targets of disease intervention. Therefore, when attempting to express the risks involved in developing a drug, it is important to understand the varying perspectives of each major group of stakeholders. Before risk assessment can be fully appreciated in the context of decision making in drug development, it must be balanced with the perceived benefits of the product. For example, the patient and physician would tolerate a high level of risk to the patient if the potential benefits offered by a novel therapeutic are for a life-threatening disease for which there is no effective treatment. In a similar way, a high degree of business risk may be tolerated by investors in novel drug candidates if the return on investment, if successful, were very high. Like risk, benefits are also in the eye of the beholder. Ideally, on many occasions during drug development, each stakeholder is asked to assess his or her view of risk versus benefit based on current data. This assessment will become part of the decision-making process that drives drug development in a logical and, hopefully, collaborative way. Effective decision making requires integrating these varying risk–benefit assessments in a balanced way. The decision gate approach is a useful way to integrate these needs into the drug development process.
DECISION GATES Drug development is a process that proceeds through several high-level decision gates from identification of a potential therapeutic agent through to marketing a new drug product [2]. A decision gate is a good analogy. For a drug candidate to progress further in drug development it must meet a set of criteria that have been agreed to by the decision makers before they will open the gate. It is “go/no go” because the future product life of the drug hangs in the balance. Once a new drug candidate has progressed through a decision gate, the organization should be committed to expend even greater resources (money and scientists’ time) to do the studies to address criteria for the next decision gate along the development path. Disciplined planning and decision making is required to leverage the value of the decision gate approach. An example of a decision grid applicable to the early phase of drug development is depicted in Table 1. Initially, a clear set of questions at each decision gate needs to be agreed upon and understood by the decision makers. In later stages of drug development these questions and criteria are often presented in a target product profile (TPP), which is, in essence, a summary of a drug development plan described in terms of labeling concepts. The FDA has formalized the agency’s expectations for a TPP in a
34
Rodent: NOAEL >10 mg/kg; dog: NOAEL >10 mg/kg; no genotoxicity no CV; effects <100 mg/kg
Rodent: NOAEL >5 mg/kg; dog: NOAEL >5 mg/kg; equivocal genotoxicity at highest exposure; no CV effects <30 mg/kg
Rodent model: ED90<1 mg/kg; human receptor: IC50<10 μM
Rodent model: ED90<2 mg/kg; human receptor: IC50<50 μM
Base case (invest in next phase)
Minimum required (proceed with caution)
Rodent: NOAEL >50 mg/kg; dog: NOAEL >50 mg/kg; no genotoxicity; no CV effects
Animal Safety
Rodent model: ED90<0.3 mg/kg; human receptor: IC50<1 μM
Efficacy API stable for at least 6 mo Cost of goods for API <$10 K/kg GMP tablet API stable for at least 3 mo Cost of goods for API <$20 K/kg GMP tablet
Dog half-life >8 h; dog BA >90%; no active metabolites; no CYP2D6 inhibition Dog half-life >4 h; dog BA >50%; <10% active metabolites; CYP2D6 inhibition: Ki>10 μM Dog half-life >2 h; dog BA >30%; <30% active metabolites; CYP2D6 inhibition: Ki>1 μM
API stable for at least 3 mo Cost of goods for API <$30 K/kg GMP tablet
CMC
PK/ADME
Patients: multisite dose escalation; intensive CV monitoring; FDA review after each dose escalation
Cohorts of healthy normal subjects: staggered (SD to MD) dose escalation; normal safety monitoring Cohorts of healthy normal subjects: sequential dose escalation; intensive CV monitoring
Phase I Study Design
Decision Gate Grid Defining Criteria Required to Move Through the Decision Gate “Safe to Give to Humans?”
Best achievable (enhance investment)
TABLE 1
ROLE OF BIOMARKERS IN DISCOVERY AND PRECLINICAL DEVELOPMENT
35
guidance document [3]. Many companies start evolving a TPP even at the earliest go/no go decision gates of drug development. In this way, a common way of thinking is preserved throughout the life of the product. The development program plan is assembled by determining what studies need to be done and their design that would provide information critical to answering the key questions at each go/no go decision gate. The value of a solid drug development plan based on a decision gate approach can be leveraged only if there is discipline in the decision-making process. What method will be used to make a decision: a majority vote of a committee or a single decider who is advised by others? Each stakeholder needs to understand clearly his or her role in making the decision at each gate. Is he or she a decider, a consultant, or just someone who needs to know what the decision is in order to do the job effectively? Go/no go decisions need to be made when all information required to answer the key questions is available. Go/no go decisions should not be revisited. If this occurs, there is either a lack of decision discipline or key information was missing when the original decision was made, representing a failure in planning. Some go/no go decision gates require agreement to proceed by different groups of stakeholders. This is particularly true when the question “Is the drug candidate safe to give to humans?” is addressed. The sponsor must decide whether to file an investigational new drug (IND) application based on the data and information collected so far from animal tests and in vitro assays. The same data are also evaluated by the regulators (e.g., the FDA), who must have time (30 days) to object if they feel the safety of the subjects in the first few clinical trials will be compromised unduly. Finally, the data are reviewed again by the IRB, who look specifically at how safety will be evaluated and managed in the clinical trials and who represent the interests of the volunteers. It is as if the gate has three locks, each with a key owned by a separate entity which will decide independently whether they open their lock. One cannot pass through the gate until all three locks are open.
ROLE OF BIOMARKERS IN DISCOVERY AND PRECLINICAL DEVELOPMENT Less than 10% of drug candidates that enter clinical testing result in a marketed product [4]. This means that at some point during clinical development over 90% of drug candidates fail. If failure occurs late in clinical development, up to hundreds of millions of dollars and a great deal of time and effort will have been invested, with little or no return. Proper incorporation of biomarkers in drug development strategy enables the concept “fail fast, fail early.” Early failure actually lowers overall risk because it enables one to move resources and utilize available patients on other promising therapies. Biomarker data can be critical to a go/no go decision. By definition, biomarkers may reflect the biology or the progression of disease and/or the effect
36
ENABLING GO/NO GO DECISIONS
of drug treatment. Therefore, information provided by properly selected biomarkers can greatly influence the decision as to whether or not to progress through a go/no go decision gate. The challenge is to identify relevant biomarkers early enough to implement them for go/no go decisions at the critical early stages of the discovery–development process. Biomarkers in discovery are valuable tools for understanding the pathobiology of a disease and the pharmacology of a target and/or compounds (hits or leads) under investigation. Moreover, even at this very early stage, a biomarker may already be identified as a potential clinical safety or efficacy biomarker. Decision gates in discovery are frequently directed toward selection of a lead or a small number of leads to take forward to preclinical development, and prioritization of the leads. For this purpose, a number of commonly employed screening assays can generate useful biomarker data. For example, membrane permeability coefficients and metabolic stability data from Caco-2 and hepatocyte screening assays, respectively, are predictive biomarkers of in vivo intestinal absorption and liver metabolism, and can be used to weed out compounds unlikely to have acceptable pharmacokinetic (PK) properties for the clinical use intended. During preclinical development, the focus of studies is on revealing potential toxicity of the drug candidate. Accordingly, there is interest in identifying and monitoring safety biomarkers that can be used for go/no go decisions. Particularly for classes of compounds whose development has been plagued by unusual findings in animals that have not translated into human risk (e.g., fibrates and peroxisome proliferation in rodents), safety biomarkers that translate well across species are highly desirable but not always easy to find. In both discovery and preclinical development, the abundance of new technologies, high content screening, imaging, genomics, proteomics, metabonomics, and systems biology, to name a few, has opened the door to the identification of theoretically innumerable novel biomarkers along with an expectation that this will result in enhanced efficiency and better decision making. However, the technologies, and the biomarkers that they produce, have to be evaluated as to how they provide insight into understanding the biology and pharmacology being investigated, and how they can be used effectively by scientists and managers in decision making. As discussed in Chapter 4, increased volume and complexity of data require sophisticated data analysis and informatics infrastructure to be in place before the new knowledge generated can be applied meaningfully to decision making.
ROLE OF BIOMARKERS IN EARLY CLINICAL RESEARCH Biomarkers may be the only way to address a decision gate that asks the question: Is there evidence that the drug is working in humans using the same
ROLE OF BIOMARKERS IN PHASE III AND POSTMARKETING DECISIONS
37
mechanism of action defined by animal studies? This question is answered by a clinical proof-of-concept study: often, a phase IIa study in a small number of patients. However, occasionally, clinical proof of concept can be addressed during phase I development if the mechanism of action can be demonstrated in healthy volunteers (e.g., effects on blood pressure, body weight, lipids). The changes in the data collected during this study do not achieve statistical significance; rather, there needs to be enough indication of potential efficacy to convince decision makers to open the gate and spend the resources to progress the drug further. Often, the minimal result for progression can be defined ahead of time, making the actual decision-making process much easier, more objective, and transparent. Early in clinical development, multiple biomarker assays or technologies may be utilized in order to understand more fully the safety and actions of the drug in humans. One may use a collection of biomarker methods, most of which are not likely to be fully validated surrogate markers of effect. For early decision making, it is not necessary to have results that meet the same standards of accuracy and quality as those used for drug approval. Indeed, it is during these early trials that promising tests for use in later clinical development or marketing are first identified from among the pool of available experimental biomarkers. Resources can then be put to further developing the selected biomarker methodology to meet the quality standards expected for continued future use. Knowledge of the biology of the disease may be evolving at the same time as the drug is being developed. Therefore, new biomarkers may need to be added during the development process as the science evolves. It should not be assumed that the “tried and true” biomarkers are the only ones worthy of consideration or that a biomarker that was used to obtain marketing approval in the past is the one that should be used today. Good science supported by convincing data can convince stakeholders that a novel biomarker is better than one already established.
ROLE OF BIOMARKERS IN PHASE III AND POSTMARKETING DECISIONS Traditional clinical trial endpoints, such as morbidity and mortality, often require extended time frames and may be difficult to evaluate. Fully validated surrogate markers can be used as primary endpoints for pivotal phase IIb and III safety and efficacy trials. Lowering blood pressure and lowering LDL cholesterol are both examples of surrogate biomarker endpoints upon which several drugs have been approved, because it has been demonstrated that effects on these markers directly affect morbidity and mortality several years later.
38
ENABLING GO/NO GO DECISIONS
New technologies are now rapidly emerging that show similar promise in providing efficacy signals earlier in the treatment of certain slowly progressing diseases. Imaging-based biomarkers are providing objective endpoints that may be confidently evaluated in a reasonable time frame. Imaging techniques tend to be expensive, but can be cost-effective when used in well-defined situations where subjective assessment has been the only approach available. As discussed in greater detail elsewhere in this book, examples of therapeutic areas where imaging is reshaping clinical trial design include Alzheimer disease, pain, and osteoarthritis. In affected joints, as well as in tissues expressing specific receptors, magnetic resonance imaging, x-ray computed tomography, position-emission tomography, and single-photon-emission computed tomography imaging are delivering new information to clinicians and researchers. Patient selection is another area where decisions are made using biomarker information. For example, in developing Herceptin, patients who tested negative for the Her2Neu receptor responded poorly to drug treatment, whereas good effects were observed in patients who expressed the receptor. This is consistent with the mechanism of action of the drug and resulted in regulatory approval for patients positive for Her2Neu. The outcome was that a diagnostic test was required before this drug could be marketed effectively. So this is now a biomarker that drives the treatment decisions of breast cancer oncologists around the world. Identifying potential responders or nonresponders prior to phase III clinical trials can have a profound effect on the size of a clinical trial. Eliminating nonresponders can dramatically reduce the intersubject variability in response, thereby reducing the number of subjects in each group of a pivotal clinical trial required to demonstrate effect. The result is a saving in time and large savings in costs. However, if the biomarker is novel, a diagnostic will need to be developed along with the drug before one can market the product effectively.
SUMMARY Biomarkers have the potential to provide valuable information that can be translated into knowledge to aid decision making at all stages of the drug discovery–development process. Different stakeholders have different perspectives of risk associated with development and clinical use of a drug. Although biomarkers cannot eliminate the inherent challenges to integrating risk assessment in this complex environment, the associated data can be evaluated objectively. As informatic tools evolve to analyze more efficiently and effectively the large data sets associated with the new biomarker technologies, decision making, in turn, will become better. Regardless of the technologies employed and the complexity of data, a disciplined planning and decisionmaking process is essential to successful drug development.
REFERENCES
39
REFERENCES 1. Merriam-Webster (1988). Webster’s Ninth New Collegiate Dictionary. MerriamWebster, Inc., Springfield, MA. 2. Pritchard JF, Jurima-Romet M, Reimer MLJ, Mortimer E, Rolfe B, Cayen MN (2003). Making better drugs: decision gates in non-clinical drug development. Nat Rev Drug Discov, 2:542–553. 3. FDA (2007). Guidance for Industry and Review Staff: Target product profile—a strategic development process tool. http://www.fda.gov/cder/guidance/6910dft.htm. 4. Kola I, Landis J (2004). Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov, 3:711–715.
PART II IDENTIFYING NEW BIOMARKERS: TECHNOLOGY APPROACHES
41
4 IMAGING AS A LOCALIZED BIOMARKER: OPPORTUNITIES AND CHALLENGES Jonathan B. Moody, Ph.D. INVIA Medical Imaging Solutions, Ann Arbor, Michigan
Philip S. Murphy, Ph.D. GlaxoSmithKline Research and Development, Uxbridge, UK
Edward P. Ficaro, Ph.D. INVIA Medical Imaging Solutions, Ann Arbor, Michigan
INTRODUCTION Medical imaging is well known for its ability to produce pictures of patient anatomy for the diagnosis of disease, but perhaps more significant for drug development is its ability to produce images of physiological and physicochemical processes in a living subject. It is the latter that gives rise to the concept of imaging biomarkers, which we may define as the subset of all biomarkers that utilize medical imaging technologies to acquire data. Medical imaging technologies include a range of modalities used in a routine clinical setting: x-ray computed tomography (CT), positron-emission tomography (PET), single-photon-emission computed tomography (SPECT), magnetic resonance imaging (MRI), ultrasound, and optical imaging. In its 2006 Critical Path Opportunities Report, the U.S. Food and Drug Administration (FDA) defined biomarkers as “measurable characteristics that Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
43
44
OPPORTUNITIES AND CHALLENGES
reflect physiological, pharmacological, or disease processes in animals or humans. Changes in biomarkers following treatments reflect the clinical response to the product. Biomarkers can reduce uncertainty by providing quantitative predictions about performance” [1]. In this chapter we will discuss imaging as a localized biomarker. Although imaging biomarkers hold great potential for accelerating drug discovery and development, in practice they are often difficult and costly to implement for specific applications, may be used inappropriately in a development setting, or may fail to provide a robust answer to the development questions being asked. We focus our discussion on understanding the opportunities for using imaging biomarkers as tools in drug development while recognizing the current challenges and how efforts to overcome these challenges are evolving. We begin by presenting a brief overview of biological imaging. Then we discuss the general characteristics and context of an imaging biomarker along with key advantages and limitations of imaging biomarkers. We place the definition of a biomarker given above into the context of biological imaging in general and diagnostic medical imaging in particular. We follow this with an overview of the scope of imaging in drug development, highlight specific examples of established and emerging imaging biomarkers from oncology that have found utility in drug development and illustrate considerations for imaging biomarker selection. A key issue in the implementation and use of imaging biomarkers is the challenge of standardization which we then examine in some detail. We conclude by discussing current efforts to address the unique needs of imaging biomarker development.
OVERVIEW OF IMAGING IN BIOLOGICAL RESEARCH From their inception, the various noninvasive imaging modalities have been used widely in basic biological research. The broad utility of imaging in biological research reflects the breadth of fundamental physical processes that underlie all imaging technologies. The same types of probes and methods of detection that are used in biochemistry and in molecular and cellular biology [e.g., optical, nuclear (gamma emission), nuclear magnetic resonance (NMR) spectroscopy, x-ray] also form the basis of imaging technologies. When the electron microscope was introduced in the 1930s, the morphology of submicroscopic cellular structures that had only been inferred by indirect methods such as x-ray diffraction or structural chemistry could be visualized directly for the first time in electron micrographs [2]. By analogy, the development of medical imaging technologies (e.g., SPECT [3], PET [4,5], CT [6], and MRI [7,8]) has provided the ability to visualize in vivo biological processes previously accessible only by indirect or invasive means. As the science of biological imaging has grown, the development of new imaging applications has been driven by the unique ability of noninvasive imaging to reveal information
OVERVIEW OF IMAGING IN BIOLOGICAL RESEARCH
45
about anatomical structure and physiological function as well as various disease states in living subjects. It will be helpful to review briefly the use of imaging in biological research in order to understand the context and source of imaging biomarkers. We describe some common clinical imaging technologies and then highlight some of the image-based methods for measuring biological function that have been proposed in the literature, providing a background for detailed examples. Several examples of biological imaging applications in drug discovery are also provided in a recent monograph [9]. Clinical Imaging Technologies Although we limit our discussion here to nuclear PET and SPECT, CT, and MRI modalities, the following applies equally well to other imaging technologies, including ultrasound, which is widely used clinically, and various forms of optical imaging which are becoming more prevalent [10]. An accessible general introduction to imaging technologies is available [11], as is a comprehensive treatment of the theoretical basis of imaging [12]. CT, the most commonly used clinical imaging modality, provides threedimensional images of anatomical structure. The main source of contrast in CT images arises from the differential attenuation of x-rays by tissues in the body. In CT images, mineralized tissues appear brighter due to higher attenuation of the x-rays, and soft tissues have intermediate intensity due to lower attenuation. Because of limited contrast differences between soft tissues, an injectable iodine-based contrast agent is often necessary to better highlight anatomical lesions. Despite this limitation, CT provides the highest spatial resolution (less than 0.5 mm), and acquisition of CT data is straightforward and rapid. MRI may be characterized as a large family of methods to detect molecular- and cellular-dependent image contrasts through the manipulation of nuclear spin magnetic resonance during data acquisition. The proton nuclei of hydrogen in water and lipids are the most commonly imaged, although in principle the NMR basis of detection permits imaging or spatially localized spectroscopy of other nuclei, such as 13C, 19F, 23Na, and 31P. The sensitivity of MRI is relatively poor, generally requiring tens of micromolar to millimolar volumes of nuclei for detection. However, excellent soft tissue contrast with high spatial resolution (ca. 1 mm) may be obtained. Acquisition of MRI data is controlled by a pulse sequence, which is a computer program that orchestrates the transmission of radio-frequency pulses and magnetic field gradients, as well as the reception of radio-frequency signals from the subject [13]. Vendor-supplied pulse sequences provide a broad range of available acquisition methods; however, additional pulse sequences and improved methods are constantly being introduced in MRI research. While MRI applications tend to be focused on the seemingly limitless variations in MR acquisition methods, data acquisition in nuclear imaging (PET
46
OPPORTUNITIES AND CHALLENGES
and SPECT) is relatively straightforward, being based on either detection of single gamma photons (SPECT) or coincident collinear pairs of gamma photons (PET). Image contrast in nuclear imaging relies on the use of targeted biomolecular probes or tracers labeled with radionuclides to detect in vivo molecular processes and biological function. Thus, the scope of radionuclide imaging is limited primarily by the ability to efficiently radiolabel biologically relevant molecules, and applications are centered primarily on the skill and creativity of the radiochemist. In general, the sensitivity of nuclear imaging methods is the highest of all clinical modalities, with the picomolar sensitivity of PET being about two to three orders of magnitude higher than that of SPECT [14]. The low intrinsic resolution of PET (ca. 5 mm) and SPECT (ca. 15 mm) has been mitigated somewhat through the development of multimodality image acquisition combining PET/CT or SPECT/CT [15], and more recently PET/MRI [16], which provide high-resolution anatomical images registered with functional nuclear images. In all imaging modalities, the resulting images must be reconstructed from the data acquired using computational methods of varying complexity [17,18], and additionally, in some cases, a series of images may be used to calculate parametric images representing image-based measures corresponding to particular biological functions. Image-Based Measures of Biological Function Many imaged-based measures of biological and physiological function have arisen in imaging and medical research. For example, the Center for Biomarkers in Imaging (http://www.biomarkers.org) makes available an online catalog of approximately 350 image-based biological measures that have been investigated primarily at Massachusetts General Hospital in Boston. Apart from differences in modality, such image-based measures may be classified into three general types according to the way in which image contrast is generated: by endogenous biophysical mechanisms, exogenous contrast agents, or molecular probes [19]. For example, endogenous contrast can be produced by differences in MRI relaxation times such as T1, T2, T2*, macromolecular cross-relaxation, or transport mechanisms such as blood flow, perfusion, and diffusion. These contrast mechanisms are often enhanced by the use of exogenous contrast agents in CT and MRI as well as ultrasound. Table 1 lists some examples of endogenous and contrast agent–enhanced biological measures using MRI. Numerous other MRI approaches providing imagebased measures of disease processes are available [20], as well as MRI-based measures specific to diseases of the brain [21]. Imaging with the use of molecular probes, known as molecular imaging [22], also involves the injection of an exogenous agent to provide image contrast. In this case the imaging agent is designed for cellular uptake or binding with a high affinity for a specific in vivo molecular target or process [23]. Molecular imaging is able to provide new information on disease processes not available
OVERVIEW OF IMAGING IN BIOLOGICAL RESEARCH
47
TABLE 1 Examples of Image-Based Measures of Biological Function in MRI Using Endogenous Mechanisms or Exogenous Contrast Agents MRI Imaging Methoda
Image-Based Measure or Parameter
DSC bolus tracking
T2- and T2*-weighted intensity changes
BOLD
T2- and T2*-weighted intensity correlation with stimulus Difference between spin-inverted and control images Phase-shifted images from water–fat chemical shift
ASL
Chemical shift encoding T2-weighted MRI
MRTI
CEST/ PARACEST CEST
Integral increase in T2 values Weighted sum of biexponential T2 values Temperature-dependent proton resonance frequency shift Amide proton transfer Amide proton transfer
Biological Function or Parameter Relative cerebral blood volume, relative cerebral blood flow, mean transit time Task-induced functional activation in the brain
Reference [149]
[195]
Blood flow
[149]
In vivo water distribution, in vivo fat distribution Peripheral edema
[196]
[70]
Liver iron concentration
[197]
In vivo temperature distribution
[198]
Spatial distribution of pH Glycogen concentration
[199,200] [201]
a
DCS, dynamic susceptibility contrast; BOLD, blood-oxygenation-level-dependent; ASL, arterial spin labeling; MRTI, magnetic resonance temperature imaging; CEST, chemical exchange saturation transfer.
from other methods. For example, molecular imaging was used recently to establish the relationship between inflammation and microcalcification in early-stage atherosclerotic plaques [24]; it has been used to visualize the in vivo spatial distribution and time course of vascular endothelial growth factor receptor expression in a model of peripheral arterial disease [25]; and it has been combined with conditional disease models using transgenic mice and genetic profiling to yield a powerful system for in vivo preclinical study of disease processes and potential therapeutic targets [26–29]. The most common molecular imaging applications utilize radiolabeled tracers for detection by the nuclear modalities. However, many nonnuclear approaches have also been developed with optical [10], MRI [20,22], and ultrasound modalities [30,31]. The majority of such image-based measures are distinguished from direct imaging of anatomy by being connected indirectly to the biology. A biological,
48
OPPORTUNITIES AND CHALLENGES
biophysical, or molecular model is introduced to provide a bridge from the specific image contrast to the physiology, and a correlation is typically established using invasive (nonimaging) or histological measures. In the case of molecular imaging, the model is represented by the underlying signal generation and targeting mechanisms of the probe, which may be verified independent of the imaging experiment. In other cases, the model is implicit and an empirical correlation provides only a phenomenological link between the image-based measurement and a physiological parameter or process. The invasive or histological correlate often provides an initial rationale for the imaging approach, and may guide development of a model and subsequent interpretation of the measurements. It is helpful to think of such methods as image-based noninvasive assays, in the sense that the measurements extract or infer some biological, physiological, or molecular characteristic from the subject as a whole. Although not always clinically diagnostic for disease, such image-based assays can often provide the ability to monitor disease progression [19]. The associated biological or biophysical models provide widely varying degrees of linkage between measurement and biology. Molecular imaging provides perhaps the most direct linkage since the characteristics of target, signal generation, and the probe’s affinity for the target can be determined empirically, leading to relative confidence in the functional or molecular basis of the image-based signal. The use of [18F]fluorodeoxyglucose, a glucose analog, as a radiotracer in PET imaging relies on the empirical facts of glucose uptake and phosphorylation by metabolically active cells. In other cases, a single image-derived parameter corresponds to a complex of multiple functional parameters. For example, kinetic modeling of dynamic contrast-enhanced (DCE) MRI data produces a model parameter, Ktrans, that can be influenced by several factors, such as blood flow, vascular permeability, and vascular surface area, potentially complicating interpretation of this image-based measure. Even when an explicit biophysical model is available, as is the case for magnetization transfer MRI [32], it may be impractical to acquire the imagebased measurements necessary to apply that model. In this particular case, a related phenomenological model has proven useful to relate a simpler measurement (the magnetization transfer ratio) to the diffuse macromolecular changes and demyelination associated with multiple sclerosis [33]. The lack of specificity of some image-based measures warrants caution in their use prior to adequate assay validation. Early ex vivo NMR measurements of bulk T1 and T2 relaxation of tumors [34] were initially believed to distinguish malignancy from nonmalignancy, and even inspired development of the first wholebody MRI system [35]. However, these results have not subsequently be validated in vivo, although several other MRI methods for characterizing tumors and evaluating therapy have since been developed [36]. Anatomical and structural imaging can also provide functional measures if appropriate image quantification is applied, such as measuring tumor size or volume, which leads to an image-based measure of tumor growth.
ANATOMY OF AN IMAGING BIOMARKER
49
Despite the exploratory nature of many imaging research applications, the diversity of image-based functional measures often furnishes much of the interest and enthusiasm for potential imaging biomarkers. However, numerous additional factors must be addressed to establish the reliability, validity, and suitability of a particular image-based measurement as a tool for evaluating novel therapeutics.
ANATOMY OF AN IMAGING BIOMARKER Like the drug development process itself, developing, implementing, and using imaging biomarkers is a highly collaborative multidisciplinary effort. As depicted in Figure 1, an imaging biomarker may be defined conceptually by two aspects: image acquisition and image analysis. Image acquisition refers to all activities and technologies related and leading to the acquisition of image data, such as imaging hardware operation, maintenance and quality control, subject positioning, contrast agent or radiotracer preparation, injection protocols, and physiological monitoring. Image analysis includes all postacquisition activities, such as image reconstruction, processing, quantification, and interpretation. These two aspects of an imaging biomarker require significant
Biomedical Engineering
Physiology
Mathematics
Molecular Biology Pharmacology
Statistics
Imaging Physics
Radiochemistry
Nuclear Engineering
Medicine
Computer Science
Image Analysis Analysis Image (Reconstruction, (Reconstruction, Processing, Processing, Interpretation) Interpretation)
Image Image Acquisition Acquisition
Imaging Imaging Biomarker Biomarker
Figure 1 An imaging biomarker may be defined by its methods for image acquisition and analysis. Both aspects require input from numerous scientific and engineering disciplines.
50
OPPORTUNITIES AND CHALLENGES
input from a diverse array of scientific and engineering disciplines, including biology, chemistry, physics, biomedical and nuclear engineering, computer science, and mathematics as well as medicine. Each of these specialties has several critical roles in the implementation and use of imaging biomarkers: • Biologist/physiologist/molecular biologist/pathologist: help establish the biological relationship between the image-based measurement and the physiological function or disease process; help establish the rationale for studying particular image-based measures; and help interpret the quantitative results. • Pharmacologist/physician/pathologist: help establish the clinical relationship between the imaged-based measurement and the relevant therapeutic outcomes during treatment; and help understand all possible actions of the treatment (beneficial and deleterious) [37] to aid in the interpretation of imaging results. • Biochemist/radiochemist: provide understanding of physiological processes and molecular targets at the biochemical level; and design, synthesize, characterize, and test optical, magnetic, and molecular probes and radiotracers. • Mathematician/statistician/computer scientist: provide the fundamental mathematical, statistical, and quantitative basis for image acquisition, reconstruction, and analysis; and provide software tools for performing many tasks associated with image acquisition, reconstruction, quantification, analysis, visualization, and interpretation. • Imaging physicist/biomedical and nuclear engineer/technologist: provide the physical basis for imaging technologies, signal generation and detection, hardware design, and optimization; provide the technological expertise for isotope production to be used for generating radiotracers; and operate the imaging equipment, performing image acquisition, processing, and analysis tasks. Essential Characteristics of Imaging Biomarkers Earlier we highlighted the use of imaging in biological research. In such studies, when a correlation is demonstrated between an image-based measurement and a particular physiological characteristic or disease process, it is tempting to identify that measurement as an imaging biomarker. However, this is not always useful. Here we identify several additional characteristics of imaging biomarkers that will generally become essential as an image-based functional measure progresses through the process of validation and clinical qualification (Figure 2). Scientific Characteristics For an image-based measurement to be an effective tool in drug development, the “model” linking the measurement to a
ANATOMY OF AN IMAGING BIOMARKER
Assay validation
Clinical qualification
Biological model
Image-based Measure
51
Clinical trial
Biological Function or Disease Process
Clinical Endpoint
Figure 2 Two-step process for imaging biomarker validation/qualification. Assay validation establishes the link between the image-based measure and a specific biological function or disease process by way of a biological, biophysical, or molecular “model.” This includes characterizing the measurement’s intra- and interobserver and test–retest variability (measurement reliability), as well as correlating the measurement to an established (nonimaging or invasive) standard. The goal of the second step, clinical qualification, is to establish empirically, by clinical trials, the relationship between the image-based functional measure and a relevant clinical endpoint or outcome. (See insert for color reproduction of the figure.)
biological or disease process must be characterized, which is termed assay validation. This process includes: • Characterizing the validity of the model by correlation with an accepted standard • Characterizing the reliability of the measurement: • Intraobserver variability • Interobserver variability • Test–retest variability • Defining the conditions under which the image-based measurement will result in the expected model validity and measurement reliability: • Biological conditions • State and stage of disease • Class of therapy • Optimizing the image acquisition and analysis systems for the task of parameter estimation as opposed to classification Basic research studies of image-based biological measures may provide only partial validation by attempting to establish the validity of the model and occasionally reporting some measure of reliability in a limited sample size,
52
OPPORTUNITIES AND CHALLENGES
often in preclinical studies. Additional development may be necessary to fully characterize and optimize the measure in both animal models and in humans. Like the underlying image-based methodology, the imaging biomarker is necessarily quantitative, which requires parameter estimation from the images, a process that should aim to be as efficient and objective as possible. Clinical Characteristics In addition to the scientific concerns of assay validation, several clinical issues must be addressed. The relationship of an imagebased measure of biological function to a specified clinical endpoint must be established empirically through clinical trials, which may be termed clinical qualification (Figure 2). Clinical qualification seeks to: • Define a relevant clinical endpoint or outcome • Establish empirically the relationship between the image-based measure and the clinical endpoint • Characterize the propagation of measurement uncertainty to uncertainty in clinical outcome • Establish the safety of the imaging procedure These aims are often hindered by the need for large amounts of data to support clinical qualification. The imaged-based measure may be more sensitive to biological changes than the corresponding clinical endpoint, reflecting fluctuations that are not clinically relevant; or the clinical endpoint might be somewhat subjective, subject to large variability, or require lengthy trials. Additional Criteria Apart from the scientific and clinical issues of biomarker qualification, further logistical and economic concerns must be addressed as an imaging biomarker progresses through clinical qualification. • The necessary acquisition hardware, methods, and image analysis tools must be readily available. • The image acquisition and analysis methods must be standardized across hardware platforms and imaging centers. • The imaging biomarker must be cost-effective and efficient compared to other available imaging and nonimaging biomarkers with similar purpose. Development Stages It is clear that most imaging biomarkers today fall short of achieving all of these characteristics. To describe the development stage of imaging biomarkers, the FDA has identified four distinct classes of imaging biomarkers which are focused on drug development applications: prebiomarkers, biomarkers, surrogate biomarkers, and clinical diagnostic surrogate biomarkers [38]. According to this classification, pre-biomarkers are those at an incomplete stage of assay validation; biomarkers are those that have completed assay validation along with an appropriate safety profile; surrogate
ANATOMY OF AN IMAGING BIOMARKER
53
biomarkers are those for which sufficient clinical evidence exists to qualify the biomarker for use in drug development; and clinical diagnostic surrogate biomarkers are those that have been validated and qualified to such an extent as to be “integrated into an approved/licensed therapeutic regimen” and are able to “assure/improve safety and/or efficacy” [38]. Imaging biomarkers are useful tools for a variety of tasks in addition to drug development. Therefore, another perspective, which perhaps includes a wider spectrum of methods, is to stage imaging biomarkers according to their intended use [39]. There are at least four distinct categories of use for imaging biomarkers in the wider sense: 1. Elucidate biological function and disease processes in vivo (basic biological research). 2. Evaluate novel therapies in vivo for industry decision making (drug development—industry decisions). 3. Evaluate novel therapeutics in vivo for government licensure (drug development—government registration). 4. Enhance routine diagnosis of disease with better disease management, prognosis, and therapeutic decision making (clinical practice). The four stages correspond roughly to the FDA imaging biomarker classes described above in terms of the level of validation and qualification within each stage (Table 2). They also parallel the conceptual biomarker hierarchy
TABLE 2 Classification of Imaging Biomarkers According to Their Intended Usea and the Corresponding Stage of Validation and Qualification Stage 1 2
3
4 a
Imaging Biomarker Assay Intended Use Validation Basic research Drug development: industry decisions Drug development: government registration Clinical practice
Clinical Qualification
Additional Criteriab
Potential Impactc
Partial Yes
No Partial
No No
* **
Yes
Yes
Partial
***
Yes
Yes
Yes
****
Intended uses are “backward compatible,” in that an imaging biomarker that achieves a given stage of validation and qualification will fulfill the requirements for all preceding uses. Note that regulatory scrutiny would be limited to imaging biomarkers at stages 3 and 4. b Nonscientific and nonclinical criteria, including logistical and economic issues such as availability of technology, standardization of methodology, and cost-effectiveness. c At each development stage, the potential impact of an imaging biomarker on drug discovery and development and on human health.
54
OPPORTUNITIES AND CHALLENGES
suggested by Frank et al. [40] using the body of evidence supporting each class: plausibility, correlation/association, predictive power, and cause. The categories of “probable valid biomarker” and “known valid biomarker” suggested by the FDA [41] would be contained in stage 2, while stage 3 corresponds to “surrogate endpoints” [39]. These four categories of use emphasize that there may be an appropriate context and use of imaging biomarkers at each stage of validation or qualification. Basic biological research provides fundamental knowledge of disease processes as well as a ready source for new biomarker discovery; the use of imaging biomarkers in clinical practice clearly overlaps with the routine use of imaging in clinical diagnosis. The significance of these potential imaging biomarker uses is that they can be leveraged to facilitate further development and adoption of imaging biomarkers as drug development tools, particularly as surrogate endpoints for regulatory decision making. Imaging Biomarkers and Diagnostic Imaging The foremost application of medical imaging is the routine clinical diagnosis of disease. Historically, the development of imaging systems capable of scanning humans followed shortly after the initial technological development of the various imaging modalities [3,5–7,35,42–44]. Moreover, medical imaging applications emphasizing medical diagnosis have driven much of the technological improvements to the present day. Although the primary diagnostic purpose of medical imaging overlaps considerably with imaging biomarker applications, there are important distinctions between the context and goals of their use [45]. Diagnostic imaging is generally focused on detection of lesions or abnormality, and image assessments are usually made in a subjective, qualitative manner relying on the training and experience of a medical professional. In this case, the primary goal is the medical management of individual patients. By contrast, imaging biomarkers are focused on the quantitative characterization of disease which is generally already known to be present, with the aim of objectively measuring effects related to a medical treatment or intervention. Typically, serial measurements from one or more follow-up imaging sessions are compared to a baseline measurement, and measurements are collected from patient groups drawn from well-defined patient populations. Why is it important to draw a distinction between imaging biomarkers and diagnostic imaging? It is widely recognized that the most reliable and objective approach to assess image quality, and thus optimize the imaging process, is, first, to specify the purpose or task for which the image is produced, and second, to determine quantitatively how well that task is performed [46]. Two general categories of tasks that relate to medical imaging are classification tasks, which are the typical diagnostic tasks of deciding whether a patient has or does not have disease, and estimation tasks, which involve estimating quantities of interest from medical images [46]. An example of a classification task
ANATOMY OF AN IMAGING BIOMARKER
55
would be an oncologist using images to detect or locate abnormal lesions, and attempting to answer the questions: Are the lesions malignant or nonmalignant, and if malignant, what is the grade or stage of the cancer? Examples of estimation tasks would be a cardiologist estimating the extent and severity of a myocardial perfusion defect or an oncologist estimating the metabolic uptake of [18F] fluorodeoxyglucose in a tumor. Diagnostic imaging platforms and analysis tools are by definition concerned with classification (detection) tasks and have been uniformly optimized for those purposes. It should be clear that imaging biomarkers, being inherently quantitative, are concerned with either estimation tasks or combined classification/estimation tasks [47,48]. Thus, the requirements and standards for diagnostic imaging systems are somewhat different from that of imaging biomarkers. For example, the demands of image quantification and the extraction of objective image features for imaging biomarkers place strict limits on image quality that may differ from that required for diagnostic purposes, where the experience and skill of the clinician may compensate for typical nonidealities in the images. Evidently, qualitative visual image evaluation suffices for a majority of diagnostic imaging purposes. On the other hand, objective quantitative image assessment is essential for imaging biomarkers, and image acquisition and analysis systems must be optimized for this purpose. Such optimization may involve differences in hardware design, evaluation of image quality control, image reconstruction, processing and quantification methods, and software. Nuclear Cardiology as an Imaging Biomarker SPECT myocardial perfusion imaging (MPI) is a nuclear cardiology procedure developed for the clinical diagnosis of coronary artery disease [49]. The imaging procedure is based on cellular uptake of a gamma-emitting radiopharmaceutical (such as thallium-201 or various technetium-99m-labeled agents) in proportion to the regional blood flow in the myocardium [50]. The imaging procedure is typically performed twice: first, with the subject at rest, and second, after inducing hyperemia through exercise or pharmacological stress. The resulting images provide a relative measure of myocardial perfusion and allow the identification of perfusion defects. In general, perfusion defects may represent regions of possible ischemia (reversible defects) or necrotic tissue resulting from infarction (scar or fixed defects). Quantitative analysis of rest and stress images provides estimates of the perfusion defect extent and severity as well as discrimination between ischemic and scar regions [51]. In addition, numerous other quantitative measures of cardiac function can be derived when the images are acquired with electrocardiogram (ECG) gating. Such image-based measures play a critical role in guiding subsequent medical intervention and clinical management of coronary artery disease. As a mature imaging technology, SPECT MPI has been developed and improved continuously for nearly 25 years, and apparently has many characteristics of a qualified stage 4 imaging biomarker:
56
OPPORTUNITIES AND CHALLENGES
• Extensive validation of image-based quantitative measures of myocardial perfusion (perfusion defect type, extent, and severity) • Extensive clinical qualification for the diagnosis of coronary artery disease • Standardized protocols for image acquisition and reporting • Standardized automated image processing and quantification methods [52] • Validated imaging hardware and software tools readily available [53] • Cost-effective imaging procedures used widely in clinical practice In addition to the availability of standardized image quantification, an important aspect of SPECT MPI that sets it apart from many other diagnostic imaging applications is the routine use of normal subject databases. Imagebased assessment of abnormal perfusion is done by computer-automated comparison of perfusion parameters for a given patient with normal limits defined by databases of subjects with a low likelihood of coronary artery disease [51]. Moreover, the accuracy of automated assessment is comparable to human visual interpretation of the images, but with potentially lower variability [54]. Despite these characteristics and its broad acceptance in diagnostic imaging, SPECT MPI appears to have only limited utility currently as a tool for drug development. As a diagnostic tool, SPECT MPI technology and standards have been optimized for the clinical diagnostic setting. In particular, recent use of SPECT MPI for serial imaging of patients has highlighted limitations and the need for further research to optimize SPECT MPI for serial image assessment [55,56]. Furthermore, despite the availability of more objective quantitative measures of myocardial perfusion deficit, recent clinical trials that have used SPECT MPI evaluation as a primary endpoint have been compelled to use visual categorical interpretations and semiquantitative measures for drug registration [57,58]. We may speculate that these limitations are due in part to imaging systems that have been optimized for the classification task rather than the estimation task required of imaging biomarkers in drug development. For example, the quantitative thresholds that have been established for the use of normal subject databases in SPECT MPI software have generally been optimized by reference to expert visual reads in the single study setting, which may not be optimal for quantitative evaluation of serial studies. Thus, SPECT MPI technology illustrates the subtle, yet important differences between conventional diagnostic imaging and imaging biomarkers. Key Advantages of Imaging Biomarkers As a tool in drug development, imaging biomarkers share several general advantages among all imaging modalities. Imaging biomarkers provide the unique ability to visualize anatomy as well as to spatially localize physiological and molecular function in vivo. Additionally, most imaging modalities have the capability to acquire repeated or temporally resolved images which allow a study of various dynamic biological processes. In vivo image-based evalua-
ANATOMY OF AN IMAGING BIOMARKER
57
tion of disease processes or drug targets occurs in their respective biological microenvironments, accounting for both systemic and local modulators. As in vivo assays, imaging biomarkers also share with in vitro biomarkers the ability to determine drug–target interactions, to help optimize drug dose, and to evaluate longitudinally the response to therapy in vivo [59]. A broad range of sensitivity and spatial resolution is also available, providing great flexibility in the choice of applications. High-resolution CT and MRI anatomical imaging biomarkers can evaluate gross therapeutic effects at the level of organ systems; nuclear, CT, and MRI perfusion imaging biomarkers can probe functional effects at a regional tissue and vascular level; nuclear, MRI, and optical molecular imaging biomarkers can provide readouts down to the level of cellular and molecular effects. This range can sometimes be an advantage when considering the process of clinical qualification. For example, although molecular imaging applications may attain the level of detail necessary to test for proof of mechanism, clinical qualification of imaging biomarkers will probably be facilitated by coarser biological “resolution” at the macroscopic level of organ systems which may more readily be related to a clinical endpoint. In addition to the clinical relevance of the various imaging modalities, there is also the possibility for translation of preclinical imaging applications to the clinic, which provides a potential for continuity of readout parameters between preclinical safety and efficacy studies in animal models and eventual clinical evaluations in humans. Furthermore, a majority of hospitals and medical centers (often, the sites participating in later-phase clinical drug trials) typically have direct access to clinical imaging centers or imaging equipment. Longitudinal assessment of individual subjects provides a potential for reduced statistical variability, and consequently a reduction in the number of study subjects. Clearly, this can benefit drug development through faster studies and reduced costs; however, in practice, many additional factors can influence the net cost savings and efficiency achieved in using an imaging biomarker. These factors are related to the implementation of imaging biomarkers and must be evaluated carefully for each imaging biomarker application. Limitations of Imaging Biomarkers There are generally three categories of imaging biomarker limitations. First, there are limitations that are likely to be addressed in the near future by the accelerating rate of imaging biomarker development. The complex and multidisciplinary nature of image acquisition hardware and analysis methods has hindered the translation of promising image-based functional measures into validated and qualified imaging biomarkers. Similarly, a lack of standardization has also impeded imaging biomarker development, since variations in methodologies, as well as hardware platforms with differing capabilities, have made it difficult to compare and evaluate potential imaging biomarkers across sites. Compared to in vitro biomarkers, it may be costly and time consuming
58
OPPORTUNITIES AND CHALLENGES
to acquire the clinical evidence necessary to establish a biological relationship between an imaging biomarker and a particular disease state. All of these factors have limited the introduction of imaging biomarkers into clinical practice for patient management, diminishing the incentive to commercialize such technologies. Additionally, despite the impressive preclinical accomplishments of molecular imaging to date, expansion of such applications to the clinic has been limited by the fact that relatively few molecular imaging agents have been approved for clinical use in humans [22,60]. However, these limitations are starting to be addressed by novel approaches to imaging biomarker development, broad collaborations to standardize technology, and federal initiatives to foster the development of imaging agents and increase the availability of component tools and methods. In addition, radioisotope production for some important radionuclides, such as 18F, 11C, 13 N, and 15O, although complex and expensive, is beginning to be addressed by recent innovations in cyclotron technology [61]. Second, there are limitations in the underlying imaging technologies and methodologies that we might expect to be addressed by ongoing basic research and technology development already occurring independent of imaging biomarker development. The widespread diagnostic use of imaging motivates continual research and vendor improvements in hardware, software, and acquisition methods. Such improvements have recently included faster imaging and hardware improvements for MRI, innovations that reduce radiation dose in CT, new hardware designs that improve sensitivity and resolution of nuclear imaging modalities, and new capabilities afforded by combining modalities, as in SPECT/CT, PET/CT, and most recently, PET/MRI. Furthermore, many image analysis approaches and tools have been borrowed from ongoing computer vision research. All such advances have the potential to benefit imaging biomarkers’ performance; however, careful attention must be paid to the differing criteria for system optimization between diagnostic and biomarker applications. Finally, there are inherent limitations in imaging biomarkers that are not likely to be fully mitigated by future development. Within the spectrum of potential imaging biomarkers, those that rely on endogenous contrast mechanisms have the advantage of avoiding the added complexity and safety concerns introduced by the use of exogenous imaging agents. However, such image-based functional measures are often inherently nonspecific, relying on biological models in which the connection between measurement and function may not be interpreted easily or directly. In addition, there is a growing perception that the increased use of nuclear and CT modalities, which utilize ionizing radiation, presents a public health concern [62,63]. Diagnostic imaging has been guided by the principle that the risks of ionizing radiation should always be weighed against the benefits of the imaging procedure to the patient [64]; however, in some cases, radiation doses from medical tests may exceed limits known to increase the risk of cancer, particularly in the case of serial imaging [64]. Although progress has recently been made in reducing radiation
ANATOMY OF AN IMAGING BIOMARKER
59
doses in CT procedures [65], the image quality of CT depends inherently on a nonnegligible radiation exposure. Additional advantages and limitations specific to different imaging modalities and methods are discussed further in a later section. Importance of Context: Disease, Therapeutic Strategy, Development Stage, and Alternative Approaches The development or implementation of any particular imaging biomarker must be preceded by consideration of the current research environment, the local availability of technology and resources, and an attempt to evaluate the relative utility, efficiency, and cost-effectiveness of alternative approaches. Local implementation of an imaging biomarker is a time-consuming process, even for methods validated and published previously. Commercially available imaging systems are generally tailored toward diagnostic rather than quantitative applications, and as a consequence, the implementation of an imaging biomarker often involves applying customized acquisition, radiotracer synthesis, or analysis methods that must be locally optimized and compared to published results. In a preclinical setting, the implementation process places great demands on the local availability of resources such as MRI pulse sequences and acquisition methods, radiotracer production facilities, appropriate animal models, and image analysis tools, among many others. In a clinical research setting, an imaging biomarker may have to be implemented across multiple centers. An overriding concern in the usability of any particular imaging biomarker once implemented is the overall throughput that can be achieved [66]. Alongside these practical concerns are various scientific questions about which technique among several will be optimal. Within any one imaging biomarker application, there may be multiple methods for acquiring similar data, and multiple approaches for image reconstruction, processing, and quantification. Similarly, there may be multiple imaging or in vitro biomarkers that address the same biological function or clinical endpoint. Such choices typically have to account for incomplete data comparing the methods and limited available resources. When considering the use of imaging biomarkers, the dominant focus is often on the technology, implementation issues, or necessary tools. Although implementation choices must be weighed carefully, it is important to emphasize that imaging biomarkers, like any other biomarker, should be specified in terms of the disease context and therapeutic strategy, as well as the development stage and questions to be answered. Defining the clinical context of a given imaging biomarker is part of the process of biomarker validation, which answers the question: What are the biological conditions, disease states, and therapies under which the use of the imaging biomarker will result in the empirically determined reliability and validity? For example, the location of a tumor may limit our ability to obtain a robust quantitative assessment of treatment effects due to excessive respiratory or cardiac motion. An imaging
60
OPPORTUNITIES AND CHALLENGES
biomarker that is validated and qualified for a given tumor type and therapeutic class cannot be assumed to be validated and qualified for tumors of other types or anatomical locations or other therapies without additional validation data. Conversely, in drug development, time efficiency and the value of investing in imaging technology resources can be maximized by selecting and implementing cross-mechanism imaging biomarkers [67]. For example, multiple imaging biomarkers may share a common MRI pulse sequence or acquisition method, such as T2*-weighted dynamic imaging for dynamic susceptibility contrast (DSC) bolus tracking and blood-oxygenation-level-dependent (BOLD) functional MRI, or T2 mapping for measuring peripheral edema or liver iron concentration. Similarly, a custom PET radiotracer may be developed for a specific development program but may have utility in other therapeutic areas [45]. Such coordination requires significant planning and broad alignment across discovery and development groups and therapeutic areas, which may be difficult to achieve in large organizations.
SCOPE OF IMAGING BIOMARKERS IN DRUG DEVELOPMENT Parallel developments in our knowledge and understanding of biology and disease, and advances in biomedical technologies, have dramatically increased our ability to discover and develop new therapies as well as to evaluate the biological effects of disease interventions. Imaging technologies have played an important role across the spectrum of drug discovery and development [9,20]. Apart from basic biological research, the roles of imaging biomarkers in drug development generally fall into two categories: supporting internal decision making, and to support the demonstration of efficacy in pivotal clinical trials for registration. Within the category of imaging biomarkers used for decision making in drug development, there are three fundamental applications that imaging biomarkers have in common with in vitro biomarker technologies: confirming that a drug candidate hits the intended biological target, testing whether hitting the target, alters the disease process (proof of mechanism), and testing whether altering the disease process affects the clinical status of the patient (proof-of-therapeutic concept) [40,45]. Imaging biomarkers have provided early critical information at each of these development stage gates through in vivo molecular imaging studies of competitive binding or receptor occupancy [45], imaging of target distribution and radiolabeled drug distribution [68,69], lead optimization, and preclinical and early clinical efficacy testing [19,70]. Other tasks in early clinical development that imaging biomarkers have addressed include appropriate dose and regimen selection and in vivo pharmacokinetic/pharmacodynamic studies [71,72], patient stratification, and study population enrichment [40]. Importantly, imaging biomarkers for internal industry decision making have demonstrated value for drug development without comprehensive vali-
PROFILES OF IMAGING BIOMARKERS FROM ONCOLOGY
61
dation or clinical qualification [67]. Indeed, many such biomarkers may be proprietary and specific to a particular development program, therapeutic target, or mechanism [45]. However, achieving a stage of assay validation appropriate to a given task and the associated risk is a crucial but often timeconsuming process. The use of novel imaging biomarkers is most effective if development and characterization activities, such as PET radiotracer development or MRI pulse sequence and method development, are begun as early as possible and ideally in tandem with the corresponding drug development program [45]. Although imaging biomarkers have proven useful in many aspects of drug development without being fully clinically qualified, there is renewed interest from multiple stakeholders to develop qualified surrogate endpoints and expedite the lengthy and costly drug approval process [39]. In addition to the welldocumented difficulties of clinical qualification [45], the potential impact of such tools for regulatory decision making will be limited unless they achieve some measure of standardization and cost-effectiveness, and become widely accepted and widely available. In the next section we examine in some detail a few of the imaging biomarkers that are currently, or are poised to become, such high-impact tools for drug development in oncology. PROFILES OF IMAGING BIOMARKERS FROM ONCOLOGY Morphological Evaluation of Tumor Response There is a broad range of imaging biomarkers applicable to oncology drug development [59]. This spans exploratory imaging methodology that may contribute to early development decision making in small studies (e.g., DCEMRI assessment of anti-angiogenics [73]) to robust and broadly applicable assessments of disease burden (e.g., RECIST (response evaluation criteria in solid tumors) [74]). RECIST provides a framework to use imaging (predominantly CT) to document baseline disease burden and define change over time. It could be argued that RECIST should not be considered a biomarker since the aim of RECIST is to provide a broad categorization of response. However, the quantitative components of RECIST, the lesion size measurements, are measurable characteristics reflecting disease and response to treatment, and it therefore seems reasonable to consider it alongside other imaging biomarkers. Furthermore, understanding the attributes of RECIST will help inform the development of other imaging-derived biomarkers: (1) how it was developed; (2) how it became an international standard of assessment; (3) how it addresses the challenges of multisite application, including multiple modalities, acquisition parameters, interpreting radiologists; and (4) how it is implemented into large-scale clinical trials. Background In the early 1980s the World Health Organization (WHO) published recommendations aimed at standardizing tumor response assess-
62
OPPORTUNITIES AND CHALLENGES
ment [75] based on criteria formulated in 1979. The recommendations described minimum requirements for reporting response assessments using clinical, radiological, biochemical, and pathological factors. In terms of radiological assessment, this was to be performed on single or multiple lesions using a bidimensional approach to estimate tumor size. Further, clinical or radiological assessment could be used to review new lesions which would lead to disease progression irrespective of other changes. Four groups were described: complete response (CR), partial response (PR), stable disease (SD), and progressive disease (PD). This thresholding into discrete response categories was subsequently applied widely and continues to this day. Despite the longevity of this simple classification scheme it was noted early that there are significant advantages of reporting continuous tumor size estimates, “especially in the phase II setting” [76]. Although adoption of the broad principles was seen, subsequent modification based on different expert group preferences and tumor-specific requirements resulted in a heterogeneous use of the WHO criteria as described in the commentary by Michaelis and Ratain [77]. It was apparent that a new standard was required for measuring response of solid tumors in clinical trials. This was with a backdrop of dramatic change in standard imaging methodology and application (including improvements in CT, ultrasound, MRI, and PET) that offered potentially many new or improved ways in which response could be evaluated. With such a diversity of potential assessments the new criteria had to specifically address how lesion sizes needed to be estimated to avoid continual diversification of applied methodology. A collaboration between the European Organization for Research and Treatment of Cancer, the National Cancer Institute, and the National Cancer Institute of Canada culminated in a publication of the new guidelines in 2000 [78]. The new guidelines maintained the classification scheme of CR, PR, SD, and PD but made a number of important changes: adoption of a simplified tumor size estimation (i.e., longest diameter), a definition of how many lesions (target lesions) should contribute to the size estimation, a description of how nontarget lesions should be assessed, and a description of which imaging methodologies should be considered for response assessment. RECIST has been adopted widely and continues to contribute to many endpoints in cancer clinical trials, including progression-free survival. Limitations Tumor-Specific Limitations As with all such generalized criteria, there are instances when the approach becomes problematic. Since the publication of the original RECIST guidelines, many subsequent publications have detailed the limitations of applying RECIST for certain tumor types and certain therapies, such as cytostatic agents and general comparisons of RECIST with WHO [79]. The review by Therasse et al., in 2006 [80] documents the tumor-specific issues pertinent to RECIST.
PROFILES OF IMAGING BIOMARKERS FROM ONCOLOGY
63
It is clear that with challenges associated with different tumor types, modifications to the RECIST framework may provide benefits. Examples include the evaluation of gastrointestinal stromal tumors (GISTs) treated with imatinib. It was found that response by RECIST did not sufficiently translate into survival. However, when response was defined as a 10% reduction in tumor size or a 15% reduction in contrast-enhanced CT density, better linkage with outcome was observed [81,82]. Another example is that of mesothelioma, where Byrne and Nowak describe modifications to the RECIST criteria [83] that improve correlation with both survival and lung function. Limitations of Pure Radiological Assessment RECIST offers a generalized framework for defining response in clinical trials which has been integral to many successful drug development programs. However, there are many instances where radiological evaluation alone is limited and does not typify decision making in clinical practice. Often, the clinical decision making will be based on multiple factors (such as CA-125 for ovarian cancer), and clinical trial assessments may need to incorporate nonradiological parameters alongside radiological assessment. Centralized Radiological Review One of the recommendations in the RECIST guidelines is that response assessments should be made based on independent assessment, and this is often considered a requirement for regulatory submission. Central and independent image review aims to reduce the variability of assessment and remove or reduce the potential for investigatorintroduced bias. Central review of high volumes of imaging data from hundreds of sites across many countries does come with high complexity and high cost to the study sponsor. Furthermore, how do we deal with disagreements between the central reviewer and the site-based assessment? The sitebased radiologist is likely to be performing RECIST differently from the central reviewer: selecting a different number of target lesions, defining the longest diameter of the lesion differently, a different judgment on nontarget lesions, and equivocal new lesions may be interpreted differently. When there is subjectivity associated with the application of criteria such as RECIST it is important to identify sources of variability associated with image interpretation such that different results from different readers can be understood. Unidimensional Evaluations A frequently raised issue with RECIST and cause of discordance with WHO is the reliance on a single-dimensional size estimation. Schwartz et al. [84,85] discussed this issue in relation to esophageal cancer. With multidetector CT technology and MRI a three-dimensional lesion assessment can be performed and offers a theoretical advantage in terms of likely sensitivity to change and avoidance of lesion eccentricity problems. However, one of the most significant advantages of RECIST is the simplicity of the imaging required. If three-dimensional imaging was required
64
OPPORTUNITIES AND CHALLENGES
in a 100-site clinical trial, there would be many challenges to ensuring that this is done to a level that would allow robust volumetric assessments with acceptable variability. Further, the additional radiology review time (central and site based) required to outline multiple lesions on multiple slices would be substantial. It is likely that three-dimensional assessments will have a clear benefit in a number of applications, especially when deployment of a standardized three-dimensional acquisition can be achieved readily. A significant amount of data is required, however, to demonstrate that the theoretical advantages translate broadly into a measure more predictive of response and that the practical issues can be overcome for multisite clinical trials. Imaging Technology The ability to define meaningful response is strongly influenced by the methodology available and the mass of data that links a method with clinical outcome. RECIST focuses on morphological assessment with CT since the technology is available across all clinical trial centers, provides a relatively consistent acquisition, and is used most widely across the majority of cancers. However, it is recognized that technologies such as FDGPET as indicators of metabolic response could provide improved response assessments in the future. Owing to the tumor-specific utility of metabolic imaging, it is likely that approaches such as FDG-PET will become very important for some indications, but not broadly applicable. In lymphoma, for example, FDG-PET is an integral component of the current recommendations for response evaluation [86]. Continual Development of Response Criteria Response criteria will need to evolve continually with imaging technologies, whether new molecular imaging approaches become available or there are broad improvements in standard morphological assessment with CT or MRI. After extensive metaanalyses and in response to changes in imaging practice since RECIST was first published, modifications were proposed in 2009 in criteria termed RECIST 1.1 [202]. One challenge in applying the new criteria will be to ensure consistency within any a given clinical trial. Since oncology trials may often take years to complete, there will be an overlap between use of the original RECIST and the new guidelines. Learning from RECIST Development There are many important lessons to learn from the way in which RECIST has evolved to become an integral component of many clinical trials involving solid tumors: • Developing a consensus around response criteria takes a considerable effort requiring multidisciplinary input and broad representation from stakeholders. • Criteria should be sufficiently broad that revision from year to year is not required if, for example, imaging technologies change.
PROFILES OF IMAGING BIOMARKERS FROM ONCOLOGY
65
• To demonstrate a robust relationship to clinical outcome, very significant data sets are required. The new RECIST modifications use data from over 6500 subjects. • Response criteria applicable to large multicenter trials need to be pragmatic to ensure utility across many different imaging platforms and alignment with country-specific practices and radiologist preferences. Although RECIST may seem far removed from the concept of an imagingbiomarker it represents the most widely used imaging-based clinical trial methodology in oncology. For this reason, an understanding of how it has evolved and is deployed practically is important to consider when other techniques are being developed. [18F]FDG-PET The increased glucose metabolism associated with many cancers has been established for some time and the origin and consequences remain areas of continued discussion [87]. [18F]fluorodeoxyglucose (FDG) PET ([18F]FDGPET) exploits this general phenomenon, facilitating the widespread application of a molecular imaging technique for oncology. FDG is taken in cells via glucose transporters (Glut-1), phosphorylated by hexokinase, and becomes trapped. The rate of uptake is related to glucose utilization and therefore an [18F]FDG-PET examination interrogates the glucose uptake and subsequent metabolism. The [18F]FDG-PET study therefore broadly defines tumor metabolic rate and is also considered to reflect viable cell fraction, although the relationship is not necessarily straightforward [88]. The extensive review by Gambhir et al. outlines the broad use of [18F]FDGPET for diagnosis, staging, assessing recurrence, and monitoring response [89]. It is interesting to note the varied sensitivity to detect primary cancers relative to the generally high avidity to metastases. A more recent review by Kelloff et al. [90] provides an extensive review of [18F]FDG-PET applied to clinical practice and oncology drug development. Although the role in all tumor types is yet to be established, such reviews emphasize the high availability of [18F]FDG-PET and broad applicability, both of which are important defining points in the role of [18F]FDG-PET in drug development. Application of FDG-PET to Drug Development As explained by Kelloff et al. [90], current data suggest that [18F]FDG-PET response is likely to be sufficiently predictive of clinical benefit. Such surrogacy status would broaden the impact of [18F]FDG-PET in its ability to be used as a provider of key trial endpoints, including increased support of proof of concept through phase II studies to even the potential to provide a true surrogate endpoint for phase III studies. Although full validation studies are yet to be completed, data from some indications (NSCLC [91] and lymphoma, for example) suggest that [18F] FDG-PET can be used successfully to define response to approved therapies
66
OPPORTUNITIES AND CHALLENGES
and predict endpoints such as overall survival and progression-free survival [92–94]. Sarcoma is also an indication where [18F]FDG-PET response provides a better prediction than tumor size change of histopathological response to neoadjuvant therapy [95]. Published and ongoing studies of response with approved therapies are likely to provide a significant body of data to understand the scope of [18F]FDG-PET-based endpoints in clinical trials of many tumor types. With an increased understanding of [18F]FDG-PET responses to standard chemotherapies will come an accelerated application to support the clinical development of novel therapeutics. The demonstration by Stroobants et al. [96] that PET provides an early indicator of metabolic response to imatinib mesylate for subjects with soft tissues sarcoma was a convincing example of how metabolic response can significantly precede any morphological change. Even 48 hours after beginning treatment, it was demonstrated that response could be identified in the subgroup contributing this early time point. Additional data supported the sensitivity of [18F]FDG-PET to define treatment response in gastrointestinal stromal tumors (GISTs) [97,98]. A review of the rapidly evolving role of [18F]FDG-PET for GISTs is given by Van den Abbeele [99], who emphasizes that the experiences with GISTs are being seen to provide more personalized assessment of response of targeted therapy. Method Standardization Consistent acquisition and analysis of [18F]FDGPET data within clinical trials is paramount to successful interpretation of the clinical trial data. [18F]FDG-PET analysis can simply be undertaken by visual inspection of the PET images. Whilst commonly done in clinical practice it is often limited for clinical trials. However, there are some instances where visual inspection is considered sufficient for clinical trials such as for response criteria for lymphoma [86]. Generally, a more quantitative assessment is considered necessary for applying [18F]FDG-PET to clinical trials. The most comprehensive quantification of FDG uptake is achieved by acquiring a dynamic scan with simultaneous assessment of tracer concentration in arterial blood. Subsequent Patlak analysis (and other methods) can yield metabolic rate constants. The dynamic scanning, arterial line, and complex analysis are not suitable for all clinical trials. The most widely used approach for [18F]FDGPET analysis in clinical trials is a semiquantitative approach providing the standardized uptake value (SUV). This approach simply normalizes tumor uptake to injected dose and body weight, and the acquisition procedures are favorable to a busy PET center. In 1999 the EORTC (European Organization for Research and Treatment of Cancer) PET study group provided recommendations on how PET should be performed for studying clinical response [100]. These recommendations emphasized the need for consistent measurement, analysis, and interpretation of response. In 2006, Shankar et al. [101] provided recommendations on how [18F]FDG-PET should be applied to study therapeutic response in National Cancer Institute trials. Other groups from both industry and academia have
PROFILES OF IMAGING BIOMARKERS FROM ONCOLOGY
67
defined protocol standards for clinical trials [102,103]. Other publications provide important insight into the practical issues associated with [18F]FDGPET in clinical trials. For example, Weber [104] describes many pitfalls of seemingly straightforward SUV analyses, including errors associated with incorrect scanner calibration and varied period between injection and imaging. Furthermore, Westerterp et al. [105] describe the impact of image processing and ROI definition procedures on the SUV analysis. It demonstrates that despite the conceptual simplicity of SUV, every attention needs to be paid to all components in the imaging chain to ensure successful measurement in a multicenter setting. Also important may be tumor-type specific nuances of the PET approach as described by Lind et al. [106], where specific technical considerations are important to optimize sensitivity to detect disease during follow-up of breast cancer. The published recommendations and further practical insights can be used to form an [18F]FDG-PET protocol to define unambiguously how the patient is to be prepared, how the images are to be acquired, and how the images are to be processed and reported for a given clinical trial. Although there are still multiple standards that could be adopted for [18F]FDG-PET, the attention focused on standardization encourages wide adoption of quantitative [18F]FDG-PET. Many investigating sites now have experience of clinical trial imaging using [18F]FDG-PET and consistently demonstrate a willingness to follow a standard protocol. Given the varied approaches by which analysis could be performed, centralized review of PET data for SUV analysis is preferable. This is achieved either using a contract research organization with expertise in quantitative imaging or coordinated via an academic network of centers where one center takes the lead role in analysis. In addition to SUV analyses in the study of treatment-induced change, it is important to note the use of [18F]FDG-PET with visual interpretation for screening bone lesions, where it may have sensitivity advantages over bone scintigraphy [107]. Such bone lesion screening remains important for assessments of disease burden in many clinical studies. In some instances it may replace the need for scintigraphy where whole-body [18F]FDG-PET is undertaken as a matter of routine. However, given the high cost of PET relative to bone scintigraphy, it remains uncertain what the role of [18F]FDG-PET will ultimately be in this setting. New Tracers Whereas [18F]FDG-PET remains a developing tool in terms of its role in oncology drug development, PET has the potential to do much more [108]. Specifically, FDG has significant limitations in a number of settings. Kelloff et al. [109] review the extensive array of molecular imaging probes able to define molecular targets and understand processes to support oncology drug development. The tracers in development (including probes for SPECT) facilitate the study of proliferation, apoptosis, hypoxia, and angiogenesis and can study androgen and estrogen receptors. These molecular imaging platforms will provide an array of tools applicable for different stages of drug development, including:
68
OPPORTUNITIES AND CHALLENGES
1. Biodistribution and in vivo concentration. Studies of radiolabeled drugs to interrogate in vivo concentration and distribution [110], in particular the concept of microdosing where subtherapeutic doses of labeled drugs can be given, hold great promise [111]. 2. Receptor analyses. Understanding receptors pertinent to cancer growth continues to be of interest, particularly receptors highly relevant to therapeutic intervention (e.g., estrogen; see Katzenellenbogen et al. [112], and for HER2 expression, [113]). In a review by Mankoff et al. [114] the potential of tumor receptor imaging is reviewed with an emphasis on the complementary role that imaging can play vis-à-vis biopsy by studying receptor expression heterogeneity. 3. Proliferation. Labeled thymidine analogs offer a route to noninvasive assessment of tumor proliferation [115]. Of the potential agents, [18F]FLT (3′-deoxy-3′-fluorothymidine) has been established as a suitable agent for wide use [116]. Reports have demonstrated the key advantages of [18F]FLT-PET over [18F]FDG-PET in a number of tumor types [117–119]. The role of [18F]FLT-PET continues to grow, and increased availability of the tracer is likely to further establish the technique within clinical drug development. 4. Apoptosis. A noninvasive probe of apoptosis would undoubtedly provide general insight into early therapeutic success both for clinical assessments and for drug development decision making. Annexin-V has received considerable attention as an apoptosis probe labeled with either 18F [120] or 124I [121]. Different approaches to measure apoptosis are also in development, as described in the review of apoptosis tracer labeling strategies by Lahorte et al. [122]. It is a clear that a robust apoptosis imaging agent will play an important role in clinical drug development. 5. Angiogenesis. The specificity and sensitivity of currently available surrogates of angiogenic activity (e.g., dynamic contrast-enhanced MRI) are limited. PET tracers targeting receptors up-regulated with angiogenesis, such as the integrin αvβ3, make attractive approaches to a study of angiogenesis [123] with close linkage to the tumor biology. Such agents are currently in clinical development [124,125] for cancer imaging and can be used for quantitative assessment of integrin levels in tumors [126]. Given the preponderance of antiangiogenic targets in different stages of clinical development, PET tracers able to study change in this process remain highly attractive. The diversity of labeling strategies means that many drugs and additional tracers will become available to complement the already developing array of PET tracers. Some of these agents are clearly broadly applicable (such as proliferation and apoptosis) and could follow the route that [18F]FDG-PET has followed: to wide clinical and drug development use. The cost–benefit
PROFILES OF IMAGING BIOMARKERS FROM ONCOLOGY
69
relationship of such agents is clear. Tracer development for specific drug programs (e.g., labeling a drug for biodistribution studies) may be costly in terms of finance, time, and resource, but if the biodistribution analysis identifies an important safety signal, for example, the investment is likely to be rewarding. Molecular imaging development with required multidisciplinary teams and substantial infrastructure remains a costly endeavor. However, relative to potentially late-failing drug development programs the investment can be justified in many instances.
Dynamic Contrast-Enhanced MRI Dynamic contrast-enhanced MRI (DCE-MRI) is a functional imaging technique performed by rapidly acquiring multiple T1-weighted MR images from a given tissue during injection of a paramagnetic contrast agent. The consequential tissue signal changes are quantified, and parameters with biological linkage are extracted. A review of the experiment with a focus on the analysis required is given by Parker and Padhani [127]. Further, standardized terminology and definitions of DCE-MRI-derived parameters have been defined by Tofts et al. [128]. Parameters typically extracted and used in the study of pharmacological response include: • Ktrans (min−1): the volume transfer constant between plasma and the extravascular extracellular space. The parameter is related to both flow and vascular permeability and, as such, change cannot be interpreted precisely. However, successful pharmacological intervention is expected to reduce this parameter by either flow and/or permeability change. • ve (unitless): the volume fraction of the extravascular extracellular space. This is assumed to relate to interstitial space, although the relationship remains complex and posttreatment changes are not predictable. • kep (min−1): the rate constant between the extravascular extracellular space and the plasma. • IAUC: the initial area under the curve, typically over the first 60s. This is a semiquantitative expression of the initial contrast enhancement, which requires no data modeling. The relationship to physiology is complex [129], although both flow and permeability are likely to affect the parameter, and successful antiangiogenic treatment is expected to reduce the value. In the recommendations by Leach et al. [130] both Ktrans and IAUC should be used as primary DCE-MRI endpoints, with ve and kep as secondary endpoints. As emphasized in the review by O’Connor et al. [131], DCE-MRI has been used widely for the study of angiogenesis inhibitors. Since this class of agents has received considerable attention across industry, interest in DCE-MRI has
70
OPPORTUNITIES AND CHALLENGES
become widespread. The review by Jayson and Waterton [132] outlines the progress in use of the technique for drug development, focusing on fundamental aspects such as design of the protocol, measurement validity, and reproducibility. Work focusing on DCE-MRI as a pharmacodynamic endpoint has been complemented by developing our understanding of the technique for more routine clinical use: for prognostication, for example. The use of DCE-MRI offers significant advantages over other methods able to interrogate aspects of tissue vascularity. With careful implementation, multisite standardization can be achieved and sufficient reproducibility relative to typical pharmacodynamic effects can be realized. However, it could be argued that angiogenesis inhibitors either currently in development or marketed may have proceeded without DCE-MRI within the development program. So what does DCE-MRI add to the development paradigm, and why, if such a localized indicator of antiangiogenic activity exists does it not form a fundamental step in the development of such agents? First, there are technical challenges to successful implementation that form impediments to broader use. Second, there is still insufficient understanding of the relationship between given DCEMRI parameters and both histopathology and clinical outcome. These uncertainties lead to difficulties in interpreting DCE-MRI studies. For example, how much of a change in a DCE-MRI parameter is enough to predict benefit?
Method Standardization and Deployment Challenges Site Limitations DCE-MRI remains a specialized technique with limited routine application, and therefore many radiology departments will have limited experience with its use. It is important to consider the practicalities of consistent DCE-MRI implementation across multiple sites required for a given clinical trial. The first step requiring consideration is the selection of sites capable of performing DCE-MRI by adhering to a common acquisition for the study. One factor to consider is what experience site personnel have in performing DCE-MRI. However, this is not to say that sites with few experienced personnel are not capable of undertaking DCE-MRI, but the users must be committed to adhering to a prescribed protocol. Statistically, it may be better to have fewer subjects scanned at a small number of sites where the imaging can be tightly controlled versus many sites with greater diversity of equipment and experience. A pharmaceutical company considering multiple studies could consider developing a network of specialized imaging sites able to perform consistent imaging and where good communication between sites can help optimize imaging. Implementing DCE-MRI Within the Clinical Protocol A great variety of options exist regarding incorporating DCE-MRI within the clinical protocol. Some of the most important factors to consider are:
PROFILES OF IMAGING BIOMARKERS FROM ONCOLOGY
71
• Is a repeat baseline examination necessary to establish methodological reproducibility? • When should posttherapeutic administration imaging be performed? • What are the primary DCE-MRI endpoints? These questions should be answered by considering what is known about the expected pharmacology of the drug and what has been done previously with similar agents. Important practical parameters include how many examinations will be acceptable to the subject (with a careful review of other required procedures) and whether close time points present scheduling challenges to the radiology department. It has been recommended that reproducibility examinations be built into the study where possible in the form of a repeat baseline [130]. This would typically be done by performing two examinations within a week prior to beginning of therapy. An example of analysis of a repeat baseline study includes that by Galbraith et al. [133], where repeat examinations performed one week apart provided confidence intervals for Ktrans and IAUC90 (64% and 61%, respectively). Other examples include that by Roberts et al. [134], where investigators assessed repeat examinations from data across two centers in multiple tumor types, comparing compartmental modeling versus model-free analysis. The 95% confidence intervals for Ktrans and IAUC60 were calculated to be 35% and 55%, respectively. Clear differences in the reproducibility are apparent from the published examples and are expected to arise from differences in acquisition, modeling techniques, and assumptions in the model. This gives credence to the idea that a reproducibility assessment is often important, and that if performed it must be representative of subsequent study conditions. Although changes following administration of drug are often large relative to the methodological variability, in many instances it may be important to minimize the variability as much as possible. In a publication by Parker et al. [135] it is demonstrated that significant improvement in reproducibility of Ktrans and other parameters can be attained by using a population-based arterial input function. Such improvements in reproducibility are likely to be important. In a study where dose is varied, using DCE-MRI to understanding the lower limit of biological effectiveness could be important, and the measurement reproducibility will have a direct impact on the ability to define subtle change at low doses. Published DCE-MRI studies show a range of selected posttherapy measurement time points [131]. Between different drugs there will be significant dosing differences, pharmacokinetic properties, and differences in mechanism likely to lead to varied time scales of DCE-MRI response. Therefore, there is no consistent formula to select posttreatment time points. However, where studies do include an early time point (such as one or two days after therapy has begun), a large response is generally seen and is similar in magnitude to subsequent time points. The data by Morgan et al. [136], Mross et al. [137], and Thomas et al. [138] emphasize the early response (day 2) that can be
72
OPPORTUNITIES AND CHALLENGES
measured consistently between studies, which then persists (out to day 28). It would thus seem that in most instances an early time point provides significant data to evaluate vascular response. Choosing a Standardized Protocol A number of acquisition and analysis options exist to perform DCE-MRI. For any one study, consistent application for patients being investigated is an obvious requirement to keep variability minimized. However, for a given company there are clear benefits in maintaining a consistent acquisition and analysis protocol across multiple studies: reduced operational burden to develop acquisition guidelines and the ability to compare one study to the next potentially to compare the magnitude of response between different drugs, doses, or dosing schedules. There are also benefits to developing consistent practices for both academic- and industrysponsored DCE-MRI-based studies: • Increased awareness of a standard protocol to minimize the complexity encountered by radiology departments in dealing with many different DCE-MRI protocols for different studies. • Potential to perform metaanalyses on data from multiple studies to query relationships between DCE-MRI response and clinical outcome. Leach et al., on behalf of Cancer Research UK, proposed recommendations for performing DCE-MRI for studying antiangiogenic and antivascular therapies [130]. Their technique provides an academic and industry perspective on the key attributes of a standardized DCE-MRI acquisition. Owing to the differences in the way the technique is implemented by different scanners, different acquisition protocols will often be required from one center to the next. However, provided that cross-site implementation is undertaken carefully, a DCE-MRI measurement at one center should be comparable with that taken at another. MRI technology changes at a significant pace, evidenced by multichannel technology and availability of scanners operating at static magnetic field strengths of 3 T and above. Standardizing measurements from one study to the next is a challenge with a background of technological evolution. When a new technology (e.g., 3 T MRI) is introduced, a careful comparison with existing technology is required to ensure equivalence of measurement. Responsibilities for Deployment and Centralized Analysis Ideally, multiple sites with quantitative imaging expertise will be selectable to ensure that significant recruitment goals can be achieved in a reasonable period of time. Often, this may not be the case, and sites with limited imaging experience are required. For this reason, deployment of the DCE-MRI acquisition is complex, costly, and time consuming, requiring the following activities: review of the adequacy of scanner specifications, implementation of the MRI sequences required, scanner performance evaluation, training of site technologists,
PROFILES OF IMAGING BIOMARKERS FROM ONCOLOGY
73
instructions to upload the data in a timely manner on scan completion, robust quality control steps, and centralized analysis. Often, a third-party company, typically a contract research organization with imaging expertise, can be given responsibility for these activities. Alternatively, one clinical site in a network can be responsible for defining the acquisition protocol and communicating imaging details with partnering sites. The latter is particularly applicable where all sites have expertise in quantitative imaging. Whatever the approach selected, it is important that data acquired be transported readily (ideally via electronic transfer) to a central analysis lab in order that any important deviations from the acquisition protocol can be identified to ensure future consistency. Owing to the diversity of software platforms able to analyze DCE-MRI and the options available for analysis, it is imperative that data sets be reviewed centrally using a consistent process. This will ensure that variability in the analysis is kept to an absolute minimum. Typically, each analyzable lesion has an associated DCE-MRI parameter created (Ktrans, IAUC, and others) following pixel-by-pixel analysis. A mean value associated with each lesion is typically assumed to provide the simplest output with more advanced analysis (such as heterogeneity analyses) potentially able to provide further insight into the data. Toxicity of Gadolinium-Based Contrast Agents Data show that administration of gadolinium-based contrast media is associated with a risk of developing nephrogenic systemic fibrosis (NSF) [139–141]. Although understanding is still limited, it appears that risk factors for NSF include reduced kidney function; multiple exposures to gadolinium-based chelates; the use of linear chelates, indicating a potential relationship to thermodynamic stability [142]; and the presence of major tissue injury, termed pro-inflammatory conditions [143]. It is clear that much is still to be understood about NSF and risk factors involved. However, it is prudent to minimize the risk of NSF for all subjects being exposed to gadolinium as part of a DCE-MRI examination or other MRI study based on existing data. DCE-MRI studies used for drug development differ from clinical practice and should approach the issue with particular caution, owing to the following factors: • The need for multiple injections of gadolinium-based contrast agents within a short period of time, particularly if a repeat baseline and early posttreatment follow-up is required. • For clinical diagnostic purposes the justification of gadolinium in highrisk subjects can be clearly defined, but for a patient in a clinical drug study the benefits of receiving multiple examinations are less clear. The safety of the patient is paramount at the same time as acquisition of data to assist the development of novel therapies remains important. A conservative approach is recommended to minimize risk. This could include:
74
OPPORTUNITIES AND CHALLENGES
• Minimizing the number of DCE-MRI examinations while still being able to provide an understanding of the pharmacology • Use of a single-dose gadolinium agent • Careful monitoring of renal function and setting boundaries below which subjects should not be studied • Use of contrast agents that, based on current data, appear to have the lowest association with NSF • Continual review of emerging data with necessary regular revision of guidelines DCE-MRI for Drug Development Decision Making As reviewed by O’Connor et al. [131] DCE-MRI has been used extensively to study multiple antiangiogenic and antivascular therapeutics. Many of the DCE-MRI experiments performed to date have been undertaken on drugs still in development. DCE-MRI certainly appears to be a methodology applied consistently within antiangiogenic development programs. Examples include the evaluation of AG-013736 by Liu et al. [144], where a relationship was observed between drug exposure and change in both Ktrans and IAUC from baseline to day 2. DCE-MRI was also incorporated into phase I investigations of the kinase inhibitor (VEGFR, PDGFR, FGFR) BIBF 1120 [145,146]. Both studies reported DCE-MRI findings after baseline, early (2 days [145] and 3 days [146]), and later time points (28 days [145] and 30 days [146]). Although DCE-MRI was obtained at multiple drug doses, it is not clear whether there was any dose–response relationship to support a dose decision for future studies. A methodology able to define dose versus vascular response for antiangiogenic therapies should in theory play a significant role in many drug development programs. There do remain, however, a number of significant challenges. First, most published studies where DCE-MRI has been applied to novel therapies have been undertaken on subjects with a range of tumor types, typical of the phase I setting. This creates methodological challenges likely to add to measurement variability (and measurement failure in a percentage of patients), and biological response will be less uniform than a study of a given tumor type. Another challenge is interpretation of data. How much of a change in Ktrans is sufficient to predict subsequent treatment success? Without data linking DCE-MRI parameter change with metrics of clinical outcome in large studies, with different tumor types, and different therapies, it is not possible to establish a broad threshold of, say, the Ktrans change required to predict therapeutic success. If this could be realized, a firm understanding of radiological effectiveness from DCE-MRI could be used to inform dose decisions for a drug development program. Until such data are available, DCEMRI is likely to be used as a supportive function contributing at least to dose decisions, evaluating early signals of efficacy, and understanding scheduling. The value that DCE-MRI brings, however, would increase significantly, and
PROFILES OF IMAGING BIOMARKERS FROM ONCOLOGY
75
implementation cost would clearly be justified if there were greater understanding of different levels of DCE-MRI response. Collins [147] has critiqued how imaging (both PET and MRI) contributed to decision making in the development of combretastatin A4 phosphate. Such analyses are important in order to understand how imaging has actually contributed to decision making, the many instances when it has not, and what can be done to enhance the role of imaging in clinical drug development. Future Improvements in the Use of DCE-MRI As described previously, published recommendations will help to encourage consistent DCE-MRI measurements. These should enable pharmaceutical companies and academic groups to focus on the challenges of drug development over methodological optimization. However, the current recommendations that exist remain appropriately broad since they require applicability to scanners from different manufacturers. Greater harmonization of methods between scanner manufacturers would considerably benefit the DCE-MRI experiment, lowering the barrier to implementation and enabling broader use in drug development. Increased standards of measurement would deliver more studies that could be subjected to metaanalysis: for example, to study DCE-MRI response versus outcome. With such relationships understood, the role of DCE-MRI in early drug development can potentially move from an interesting secondary objective to delivering primary study goals. Clinical trials are also likely to benefit increasingly from the multiparameter assessments possible within a single MR scan session. In a study of the panVEGF receptor tyrosine kinase inhibitor AZD2171 in glioblastoma patients, Batchelor et al. [148] demonstrated that an extensive MRI evaluation provides significant benefit over DCE-MRI alone. The study incorporated lesion volume and apparent diffusion coefficient measurements, together with DCEMRI-derived indices, including Ktrans and relative vessel size estimations. Such approaches are likely to be necessary to fully interpret the complex changes occurring following antiangiogenic therapy and to promote understanding of drug scheduling and how optimally to combine different therapies. Improved standardization will both enhance the ability to incorporate DCE-MRI into drug development and promote understanding of how different DCE-MRI response levels are interpreted.
Apparent Diffusion Coefficient MRI A number of other image-based functional measures are being investigated as potential imaging biomarkers in ongoing exploratory research. Here we briefly highlight one additional cancer application. Diffusion MRI is an imagebased method used to measure the molecular diffusion of water molecules in biological tissues, and derives from the well-established NMR method of
76
OPPORTUNITIES AND CHALLENGES
molecular diffusion measurement [149]. The parameter measured, the apparent diffusion coefficient (ADC), reflects the diffusivity or mobility of water molecules and is termed apparent because of the undetermined effects of perfusion, multiple tissue water compartments, restricted diffusion due to cellular and extracellular structures, and tissue anisotropy [149]. Measurement of tissue water diffusion has proven to be remarkably versatile for characterizing tissue structure, as well as tissue changes due to pathology or the effects of therapy. Diffusion MRI has been well studied in the clinical evaluation of ischemic stroke [150]. In oncology, the application of diffusion MRI has focused on the image-based measurement of tumor ADC to detect early changes associated with treatment response [151]. ADC-MRI has been applied with some preliminary success in preclinical and clinical research as an imaging biomarker of treatment response in cancers of the brain [152,153], head and neck [154], cervix [155], breast [156], and prostate [157]. A related MRI technique, diffusion tensor imaging (DTI), has been used for noninvasive characterization of tissue anisotropy, structural integrity, and connectivity in myocardium [158] and white matter tracts in the brain [159]. In particular, the latter application has been useful in preoperative planning for brain tumor patients [160]. The concept of using ADC-MRI to evaluate the effects of therapy in solid tumors is based on the general observation of an increase in tumor ADC after cytotoxic treatments in several tumor types [161–163]. Lyng et al. [163] demonstrated in four different melanoma xenograft models that tumor ADC is inversely proportional to viable tissue cell density. It is thought that an increase in tumor ADC upon treatment reflects the development of necrosis through increased cell membrane permeability and extracellular fraction, and decreased cellularity which ultimately results in greater water mobility [164]. In some tumor types such as intracranial gliomas, an early transient decrease in ADC has also been observed prior to subsequent ADC elevation, which may be due to treatment-induced cellular swelling [152,165]. These changes generally precede gross tumor regression, so that changes in tumor ADC appear to predict such regression. ADC-MRI offers some key advantages as a potential imaging biomarker of tumor treatment response. ADC-MRI does not require the injection of exogenous contrast agents, since it relies on the endogenous contrast mechanism of water diffusion. In addition, as an MRI-based method it does not involve exposure to ionizing radiation. Moreover, the ADC is an absolute biophysical quantity which, in principle, should be comparable across MRI hardware platforms. However, varying acquisition methodologies currently lead to some variability across hardware platforms and between different imaging centers, which could be addressed through better standardization of acquisition and processing protocols. A primary limitation currently with ADC-MRI is the enhanced sensitivity of diffusion MRI to patient motion, such that involuntary or unrelated physiological motion can produce quantitative artifacts. A variety of acquisition
PROFILES OF IMAGING BIOMARKERS FROM ONCOLOGY
77
and image-processing techniques have been devised to correct or minimize such image artifacts (e.g., [165–167]), which must be addressed and optimized in the process of standardizing the imaging protocol. An additional concern is the fact that tumor ADC response to therapy can also reflect formation of edema secondary to treatment [151]. Recently developed voxel-based statistical classification approaches have improved the robustness of ADC-MRI against such potentially confounding factors [152,153]. Although ADC-MRI appears to be a promising imaging biomarker for oncology treatment monitoring, further clinical studies are needed to fully characterize the validity and reliability of ADC-MRI for different tumor types and therapeutic classes, as well as to compare the relative value of ADC-MRI with similar measures of tumor viability, such as FDG-PET. Selecting the Right Technique Very often, similar measurements of biological and disease processes are provided by multiple imaging modalities or even distinct approaches within a single modality. In selecting the right technique for a given problem, it is important to consider the range of possible biomarkers (both imaging and nonimaging), as well as the varying technical requirements and available resources, any significant differences between hardware platforms, and the limitations of specific modalities. Here we illustrate the considerations for selecting a particular imaging biomarker for the problem of assessing tumor vascularity. Example: Studying Aspects of Tissue Vascularity Let us assume that we wish to evaluate the tumor response to a new antiangiogenic agent for the purposes of defining the lower limit of biological effectiveness across a dose range. What parameters need consideration when deciding on the appropriate methodology? The advantages and disadvantages of techniques are seldom reviewed comprehensively, and more specifically, there are few studies actually comparing methodologies for their potential to study therapy. De Langen et al. provide an insightful comparison of [15O]H2O-PET versus DCE-MRI for tumor blood flow measurements [168]. The review focuses on both the practical aspects of the two techniques and the ability of the different techniques to probe flow. The conclusion from the comparison is that both approaches are viable methods for the study of flow, yet important differences need to be considered for each study. Goh and Padhani compare the merits of DCE-CT versus DCEMRI in the study of tumor angiogenesis [169]. Again, it is concluded that both techniques should be considered, but a number of factors should determine selection, including drug mechanism of action, tumor location, patient characteristics, and available infrastructure and expertise [169]. The potential techniques able to define tumor vascular characteristics include the following:
78
OPPORTUNITIES AND CHALLENGES
Dynamic Contrast-Enhanced CT • Endpoints. DCE-CT can measure blood flow, blood volume, permeability, and mean transit time. • Tumor localization. Multidetector CT technology enables a significant coverage of the tumor during the dynamic assessment, overcoming previous limitations of the approach. Furthermore, CT offers some key advantages when imaging certain anatomical regions, as summarized by Goh and Padhani [169]. • Clinical trial practicalities. The technique is widely available, although it is preferable that sites have previous expertise in dynamic CT. However, training imaging centers with a standard protocol is achievable. Also, since CT is generally required for RECIST assessment of disease burden, the use of CT for a pharmacodynamic investigation may result in fewer scanning sessions. • Risks. Dynamic CT may add a significant radiation dose burden that may become problematic in many patient populations. This is particularly true if these subjects are already receiving regular CT scans for standard radiological assessment, perhaps also with nuclear medicine investigations accumulating a significant dose. This is likely to limit the number of time points that are measurable within a short clinical trial, particularly compared to DCE-MRI and microbubble ultrasound. • Quantification. The straightforward relationship between CT contrast agent concentration and tissue enhancement is often cited as a significant advantage that DCE-CT has over DCE-MRI, making absolute quantification of perfusion achievable. For analysis, the availability of commercial software on the scanning platform holds a significant advantage relative to other methods, although centralized analysis is often required anyway, so this advantage may be limited. In summary, DCE-CT is a widely available, robust methodology for providing perfusion quantification [170]. However, the associated radiation dose is likely to limit the use of CT in certain patient populations and certainly keep the number of time points low. Dynamic Contrast-Enhanced MRI • Endpoints. DCE-MRI typically provides indices relating to permeability and flow (Ktrans and IAUC) in addition to indices relating to the tissue microenvironment (ve). • Tumor localization. The localization can be similar to CT in that a single level is typically scanned using multiple slices. However, coronal scanning, for example, may facilitate greater coverage if multiple lesions
PROFILES OF IMAGING BIOMARKERS FROM ONCOLOGY
79
require measurement. Compared to DCE-CT, DCE-MRI is not optimal for all lesions and may often suffer from artifacts rendering measurement impossible (e.g., in the mediastinum). • Clinical trial practicalities. The equipment required to perform DCEMRI is widely available, yet the expertise in quantitative MRI measurement is limited. This can often be overcome by working closely with the imaging center and providing rapid feedback on data generated. In many instances, willingness of a site to adhere to a consistent imaging protocol is preferable to a site with extensive experience, insisting that the measurement be performed a particular way. The number of time points available from DCE-MRI is one advantage, with many publications demonstrating five measurements or more are feasible. However, recent concerns regarding NSF are likely to limit the number of time points, especially in patient populations with compromised renal function. • Risks. The only significant risk to the subject is from the exposure to gadolinium-containing contrast media. These risks can be minimized, but in patient populations with compromised renal function an alternative imaging technique may be deemed appropriate. • Quantification. With the lack of consistent and widely available image analysis platforms for DCE-MRI, central analysis of multicenter data is paramount. A significant limitation is the inability to define flow separately from permeability. Although Ktrans and IAUC may provide useful general indicators of vascular pharmacology, interpretation can be challenging. Without the radiation dose burden and the ability to perform multiple posttherapy time points, DCE-MRI remains attractive for many studies. However, the challenge to implement across multiple centers with the associated cost and time should not be overlooked. Furthermore, ambiguity of the imaging parameters needs to be considered carefully to ensure that DCE-MRI will contribute sufficient insight into the pharmacology. Microbubble Ultrasound • Endpoints. The reported endpoints are varied and include parameters such as area under the enhancement curve, arrival time, time to peak enhancement, and counting of identified vessels. • Tumor localization. Bone and air attenuation sets significant limits on accessibility of this technique to some lesions. This is a fundamental limitation that should be considered when establishing the appropriate patient population to study. • Clinical trial practicalities. Ultrasound remains widely available, although centers experienced in quantitative microbubble techniques remain limited. Furthermore, differences in techniques at those centers need
80
OPPORTUNITIES AND CHALLENGES
harmonizing before a clinical trial can begin. Several repeat measurements are possible [171]. This could allow studies in multiple lesions or repeat measurement in the event of failure or repeated measurements averaged to minimize variability. The technique is likely to be favorable to the patient, requiring relatively quick examination times, and comfort during the examination is likely to be high. The low cost may also be an important factor, and if implemented appropriately would constitute only a small fraction of a total study budget. A frequently quoted criticism of quantitative ultrasound is the operator dependence of the measurement. However, it has been shown that with careful implementation, such dependence can be controlled [172]. How this would translate into large multicenter studies, however, is yet to be established. • Risks. No significant risks are expected with repeated microbubble ultrasound investigations. Microbubble contrast media are considered generally safe, with serious adverse events rarely observed [173]. • Quantification. Analysis is currently limited to empirical analyses of the microbubble dynamics. As such, there is likely to be significant variability of measurement from one scanner to the next where different performance characteristics and software implementations will affect measurement. A cross-site calibration exercise combined with central analysis could overcome many of the limitations. Understanding the relationship of the ultrasound parameters to the biology may be challenging. Microbubble ultrasound remains a quick, low-cost, highly accessible methodology for gauging vascular change [174]. Further understanding is needed of the multisite performance together with more insight into the biological linkage. For single or small multisite studies it could provide a useful method to enable many repeat measurements over a short period of time to understand relationships between pharmacokinetics and pharmacodynamic response. [15O]H2O-PET • Endpoints. Flow and distribution volume are typically derived. Since the tracer is rapidly diffusible, permeability is not extracted. • Tumor localization. The poor spatial resolution of PET will limit the study of small lesions. Even if such lesions are identifiable, partial volume effects are likely to make robust quantification challenging [175]. Careful screening of subjects is required to ensure that there exist, for example, at least 2-cm lesions prior to entering into the study. • Clinical trial practicalities. Owing to the short half-life of 15O (2 minutes), an onsite cyclotron is required to generate the tracer and facilitate rapid administration. Even sites with a cyclotron may not produce [15O]H2O
PROFILES OF IMAGING BIOMARKERS FROM ONCOLOGY
81
routinely. If this technique is required at multiple sites, the infrastructure and expertise of potential sites must be assessed carefully. One benefit of the short half-life is that repeat examinations can be performed within 10 minutes of each other. This allows either repeat perfusion assessments early after therapy or use of correlative imaging such as [18F]FDG-PET without interference. Since an arterial input function needs to be defined for quantification, rapid arterial blood sampling is required during the study. This adds significant complexity onto an imaging examination, and the practicalities need consideration when selecting this technique. • Risks. Although the effective radiation dose is lower than an extensive CT scan or an [18F]FDG-PET scan, the additional radiation dose this brings to a subject who has been scanned extensively should be considered carefully. This is particularly true where there may be no direct benefit to the subject of such a pharmacodynamic endpoint versus a diagnostic CT or PET scan. • Quantification. A significant body of work exists on quantification of dynamic PET data, particularly from the brain and heart. Adaptation of such analyses can facilitate robust measurement of tumor pathophysiology [176]. Owing to different implementation options, central analysis is required if data are obtained from multiple centers. PET provides a robust technique to quantify absolute tumor perfusion, and the quantification has many advantages over other methodologies. However, the limited availability across sites, the relatively complex patient setup, and limited performance in small lesions need careful consideration. As outlined above, there exist many practical and basic differences between measurement techniques able to study different aspects of the same phenomenon. All decisions will be dependent on the study question being asked. For example, if analysis of lung tumor perfusion across multiple centers is required, DCE-CT is likely to be an appropriate technology. If multiple time points are required to assess relative vascular change of liver lesions, microbubble ultrasound may be an appropriate methodology. The following parameters will define the appropriateness of the various techniques: the ability to perform centralized analysis; the patient characteristics (radiation dose considerations and sensitivity to both MR and CT contrast media), the number of subjects and study centers required for delivery of the primary study endpoint, the number of time points required to fully assess the pharmacodynamics, the drug mechanism, and the probable importance of perfusion analysis versus a measure also sensitive to permeability. There is no single technique ideally suited to address all questions in all types of clinical trials. DCE-MRI has tended to receive considerable attention for studies of anti-angiogenics. This is probably because although it has
82
OPPORTUNITIES AND CHALLENGES
limitations in terms of the quantification, it comes with high-resolution robust measurement in a variety of tumor types, and multiple repeat measurements are possible in most patient populations. Despite these advantages, the limitations of DCE-MRI should be evaluated and all measurement options considered.
THE ONGOING CHALLENGE OF IMAGING STANDARDIZATION Without standardization of imaging methods, the variety of imaging practices, tracers, equipment, and interpretation would lead to variability far greater than any drug-induced change. Since the extent of the drug-induced change is generally unknown before the experiment, study setup should aim to minimize measurement variability as far as practically possible. Standardization of imaging is required at a number of levels: 1. Clinical protocol. The clinical protocol defines how the imaging endpoint is incorporated into the clinical protocol. For example, when are imaging time points relative to drug dosing? What size lesions are expected to be measurable by the technique chosen? Consistency of how imaging is incorporated is certainly required within any drug development program to ensure that cross-comparison of studies can be made. 2. Technique selection. What technique is chosen to deliver a given endpoint? Deciding what techniques would generally be used for a given drug development program or throughout a development organization would aid consistency of decision making and, through familiarization, lower implementation costs and resource requirements. 3. Patient preparation. Standardization of patient preparation for an imaging examination includes management within the scanner: fasting requirements, patient position, and cannulation. Patient preparation is particularly important for metabolic imaging (e.g., [18F]FDG-PET) and functional imaging (e.g., DCE-CT), where multiple factors can significantly impact the imaging result (e.g., via modulation of blood glucose level or liver blood flow). 4. Scanner setup. How the imaging technique is implemented on a given scanner and how this is done between different scanners at different institutions defines the most significant complexity of multisite imaging. The variety of equipment is growing as manufacturers offer more hardware and software options. This diversification increases the demands to set up imaging between two different scanners with equivalence. 5. Image analysis and reporting. Generally, the software platforms used by centers (on the scanner, third-party, or site proprietary) are so diverse that data must be analyzed centrally. When this is done, standardization within a study and between studies can readily be controlled.
THE ONGOING CHALLENGE OF IMAGING STANDARDIZATION
83
Unique Challenges to Standardized Imaging The diversity of imaging hardware creates one of the most significant sources of measurement variability. Seemingly equivalent imaging methods deployed on two different scanners may provide equivalent diagnostic information and images of similar appearance. However, for quantitative imaging where imaging-derived biomarkers are required, very significant differences in derived values can occur. Also, there may exist proprietary algorithms for image reconstruction or subsequent analysis, the details of which are not known and therefore cannot be compensated for. When implementing complex imaging across multiple centers, expert resources are required to (1) compile a site-specific imaging protocol (e.g., detail acquisition requirements for a given manufacturer), (2) provide training to the relevant site staff likely to undertake the imaging (technologist and radiologist), (3) send standard phantoms to sites and review data to ensure expected performance and equivalence between sites, (4) ensure that data are received rapidly, preferably via electronic image transfer, so that the data can be reviewed rapidly and feedback given on issues, and (5) analyze the data and provide the imaging endpoints to the sponsor for statistical analysis. Achieving Protocol Consensus by Relevant Stakeholders An effective method to provide consistent imaging biomarker application is to develop an agreed protocol between experts internationally. Examples of this include DCE-MRI and [18F]FDG-PET [101,130]. Development of such recommendations is challenging to achieve given the complexity and likely disagreement on certain components during rapid evolution of imaging technology. Such efforts tend to result in general principles that are sufficiently flexible to work across many centers and ensure that similar endpoints are reported out from different studies. How Much Standardization Is Enough? The degree of standardization needs to be considered for each technique requiring deployment. Standardization also comes at a cost in terms of resource and time in particular. There should be realistic expectations that imaging cannot, for example, be comprehensively standardized at 50 centers to the extent that it can be controlled at three centers. Some imaging techniques (e.g., [18F]FDG-PET) can be standardized by a well-considered imaging protocol likely to be adhered to by the site imaging experts. However, complex MRI methodologies may require phantoms to be shipped to all participating sites and phantom data submitted for central review prior to any patients being scanned. If budget is limited, it may be preferable to perform consistent imaging on fewer subjects at a small number of centers rather than acquiring data from more subjects without investing in standardization.
84
OPPORTUNITIES AND CHALLENGES
It is important to remember that standardization is never likely to be achieved completely. With the rapid developments in imaging hardware, radiology best practice and contrast media/tracers standardization need to adapt. For example, with the increasing availability of 3T MRI scanners relative to the standard 1.5T scanner, when should 3T be recommended? A reasonable approach is that where data support the use of new hardware for a given application, equivalence should be defined prior to wider adoption. Mechanisms for Implementing Standardized Imaging A number of bodies that represent imaging experts or users of imaging endpoints are optimally placed to develop consensus on imaging techniques and to disseminate recommendations. This includes government bodies such as the National Cancer Institute [101,177] and Cancer Research UK [130], accreditation bodies such as the American College of Radiology (ACR; http:// www.acr.org) and the Intersocietal Accreditation Commission (IAC; http:// www.intersocietal.org/intersocietal.htm), as well as the many professional societies related to medical imaging. ACR and IAC have traditionally provided voluntary accreditation for diagnostic imaging providers, although a recent U.S. health insurance quality improvement initiative and changes in Medicare law have mandated accreditation as a condition for insurance reimbursement. Standards have been established by IAC and ACR that specify the qualifications of imaging personnel, hardware quality control, conduct of imaging studies, image interpretation, and reporting. The scope of these standards is diagnostic imaging in clinical practice, and they are not directly applicable to imaging biomarkers. However, with the development of appropriate standards for quantitative imaging biomarkers, ACR and IAC may be ideally suited to facilitate the accreditation and education of imaging centers for imaging biomarkers. In response to the need for quantitative imaging standards, the Radiological Society of North America (RSNA) has recently initiated the Quantitative Imaging Biomarkers Alliance (QIBA; http://www.rsna.org/Research/qiba_ intro.cfm), represented by pharmaceutical companies, imaging hardware and software vendors, government agencies, imaging societies, RSNA leadership, and clinical trialists [178]. Modeled on the collaboration known as Integrating the Healthcare Enterprise (IHE; http://www.ihe.net/), QIBA aims “to advance quantitative imaging by focusing on standardizing the use of imaging biomarkers in clinical trials” with the long-term goal of “enhancing the use of quantitative imaging methods in clinical practice.” Initially, planned efforts will address standardizing quantitative anatomical CT, FDG-PET, and DCE-MRI. Although QIBA has been formed under the auspices of RSNA, it seems probable that a dedicated standard setting organization may eventually be needed for ongoing development, maintenance, and dissemination of new imaging biomarker standards.
MECHANISMS TO FACILITATE IMAGING BIOMARKER DEVELOPMENT
85
MECHANISMS TO FACILITATE IMAGING BIOMARKER DEVELOPMENT Despite the potential of imaging biomarkers as unique tools in drug development, the multidisciplinary nature and complexity of these technologies have somewhat hindered biomarker development. In addition, implementing and using imaging biomarkers has tended to be isolated to research institutions and centers with the necessary range of resources. The difficulties of imaging standardization and the expense of imaging biomarker qualification has also limited the incentive for commercialization of these technologies. Many stakeholders in government and industry have recognized the inadequacy of traditional research and development approaches as applied to biomarker development. Other biomarkers in use today, such as blood pressure or plasma cholesterol, have taken decades to achieve their current utility [40]. In its Critical Path Initiative the FDA acknowledges the urgent need for new approaches and broad collaboration to address the unique challenges of biomarker development [179]. Two new approaches that are beginning to have a positive impact are public–private partnerships and the open-source development paradigm. Public–Private Partnerships The Foundation for the National Institutes of Health (FNIH), in collaboration with the National Institutes of Health (NIH), the FDA, and the Pharmaceutical Research and Manufacturers of America (PhRMA), established in 2006 the Biomarkers Consortium [180], a public–private partnership focused on the discovery, development, and qualification of biomarkers, including imaging biomarkers, through collaborative biomedical research [181]. The Biomarkers Consortium is unprecedented in its size and coordination of government, academic, and industry aims for biomedical research focused on new biomarker technologies. Government partners, including the NIH, FDA, and Centers for Medicare and Medicaid Services, contribute scientific and technical expertise as well as financial resources and project management to the consortium. Membership in the consortium is open to nonprofit and for-profit organizations and patient advocacy groups with an interest in helping the consortium reach its goals. Consortium members may nominate subject-area specialists as representatives on steering committees devoted to various therapeutic areas that are charged with evaluating biomarker project proposals. According to its Web site, “the Biomarkers Consortium wants all members to be actively engaged in this collaborative multi-sector approach to speed the discovery and validation of disease biomarkers and surrogates. Given the expense and logistical challenges of moving biomarker science forward, we’ve adopted this collaborative multi-sector approach that encompasses all relevant stakeholders.”
86
OPPORTUNITIES AND CHALLENGES
Following biomarker project approval by the appropriate steering committee, private-sector fundraising for biomarker projects is facilitated by the FNIH. Several large-scale imaging biomarker development and qualification studies have been initiated with assistance from the Biomarker Consortium. Two studies, to be conducted by the National Cancer Institute, are evaluating the utility of FDG-PET to monitor cytotoxic treatment of non-Hodgkin lymphoma and non-small cell lung carcinoma, providing data to support clinical qualification of FDG-PET as a possible surrogate endpoint [182]. A multicenter MRI study of carotid atherosclerotic plaque is determining the test– retest and intraobserver variability of MRI-based measures of plaque size and composition to facilitate imaging assay validation [183]. The Osteoarthritis Initiative (OAI) is a four-year, multicenter, longitudinal study of the clinical onset and progression of knee osteoarthritis which seeks to “characterize imaging, biochemical and genetic biomarkers that predict and track the course of [osteoarthritic] disease” [184]. Clinical data and images from almost 5000 patients will be made publicly available for further biomarker research worldwide [185]. The Alzheimer’s Disease Neuroimaging Initiative (ADNI) is the largest project to date, with $25 million of its $60 million funding being raised through the FNIH. ADNI is a five-year study tracking 823 normal, mildly cognitively impaired, and Alzheimer disease patients using MRI and PET as well as laboratory and cognitive tests [186]. As with the OAI study, ADNI clinical data and images have been made available to the general scientific community for continued biomarker research and development [187]. These and other imaging-related projects supported by the Biomarker Consortium are addressing the need for clinical trials to establish the validity and clinical qualification of imaging biomarkers. Such trials would be too costly and would make little sense for any single organization to pursue but are made possible through collaborative sharing of the costs and associated risks among consortium members. Open-Source Collaboration The success of the open-source paradigm for software development [188] has led to the open-source approach being applied to the development of biomedical informatics tools [e.g., Open Bioinformatics Foundation (http://www.openbio.org) and Bioinformatics Organization (http://www.bioinformatics.org/)] as well as a gradual adoption of “open-source” collaborative models of biomedical research [189]. Key features of the open-source model are: (1) participation is open to anyone with the interest and skills to contribute; (2) development and communication is centralized through an Internet-based organization; and (3) the results are freely shared. Among drug development public–private partnerships, a hybrid approach is emerging in which the knowledge-based aspects of the research follow an open-source model, while “rule-based” aspects (such as toxicology, chemistry manufacturing and controls, clinical trials) are outsourced [190,191].
CONCLUSIONS: CHALLENGES AND FUTURE OPPORTUNITIES
87
Large publicly funded research initiatives are similarly adopting an opensource approach. The cancer Biomedical Informatics Grid (caBIG; https:// cabig.nci.nih.gov/) is an ambitious collaboration launched by the National Cancer Institute in 2003 to develop and share computing tools, infrastructure, and data across the spectrum of cancer research [192]. Data sharing follows a federated model [193] and integrates diverse cancer data from basic and clinical sources, as well as cancer imaging and tissue pathology data. A broad set of tools have been developed based on widely available open-source components and standards, including bioinformatics software, image analysis and database tools [194], and a complete clinical trials management system. Sharing of tools and data is facilitated by an open-source grid-based computing infrastructure. Although the focus of the caBIG research enterprise is cancer, the infrastructure and many of the tools are generic and may be useful in other therapeutic areas. One recent example has been the launch of the CardioVascular Research Grid (cvrGRID; http://cvrgrid.org) by the National Heart Lung and Blood Institute. Given the complexity and difficulties of imaging biomarker development, it may be advantageous to apply the open-source paradigm to an integrated development effort for imaging biomarkers. As the foundational imagerelated tools from caBIG reach maturity, they could form the basis of an open-source development model that integrates image acquisition, analysis, protocol specification, cross-platform compatibility, data standards, validation, and qualification. Such an approach could address development and resource hurdles through the open participation of any interested party, standardization of protocols through centralized development and dissemination, and wide availability of common image analysis tools. Although some development issues would still need to be resolved, such as intellectual property protection and regulatory concerns, an open-source model may ultimately be the most efficient approach to developing, maintaining, and delivering the imaging biomarker tools that can affect drug development.
CONCLUSIONS: CHALLENGES AND FUTURE OPPORTUNITIES One of the most salient features of imaging science, which is relevant to both diagnostic imaging and imaging biomarkers, is that the state of the technology does not stand still. Continuous research and development for more than 30 years has produced a steady stream of new imaging technologies as well as incremental improvements that address the limitations of existing technologies. Development and implementation strategies for imaging biomarkers must account for this continuous change and attempt to balance the needs of standardization with the adoption of new innovative technologies. Like software, imaging biomarkers will probably need to be “versioned” to account for changes to the underlying technologies. However, this will also require a continuous process of development and maintenance, for which an open-source
88
OPPORTUNITIES AND CHALLENGES
model may be the most appropriate. Moreover, with the continued introduction of new imaging technologies, in the future we might expect the translation of additional image-based biological measures to promising new imaging biomarkers. This will become increasingly feasible as innovative systems make imaging biomarker development more routine. It seems likely that no one biomarker will be able to adequately capture all of the important aspects of a given disease process or biological response to therapy. A combination of imaging and other biomarkers may be the most effective surrogate endpoint tool for drug development. In addition to the requirements of biomarker validation and qualification, this will require that the component biomarker technologies are cost-effective as a combination. Current collaborative efforts being led by public–private partnerships such as the FNIH and by the NIH and QIBA are starting to address many of the hurdles of imaging biomarker development and standardization. New approaches that allow contributions from all stakeholders will accelerate the work of imaging biomarker validation and qualification, and we might expect that these initiatives will soon lead to broader availability of new imaging biomarker tools for drug development. REFERENCES 1. FDA (2006). Critical path opportunities report. http://www.fda.gov/oc/initiatives/ criticalpath/reports/opp_report.pdf. 2. Frey-Wyssling A (1953). Submicroscopic Morphology of Protoplasm. Elsevier, Amsterdam. 3. Kuhl DE, Edwards RQ (1963). Image separation radioisotope scanning. Radiology, 80:653–662. 4. Wrenn FR, Good ML, Handler P (1951). The use of positron-emitting radioisotopes for the localization of brain tumors. Science, 113:525–527. 5. Brownell GL, Sweet WH (1953). Localization of brain tumors with positron emitters. Nucleonics, 11:40–45. 6. Hounsfield GN (1973). Computerized transverse axial scanning (tomography): 1. Description of system. Br J Radiol, 46:1016–1022. 7. Lauterbur PC (1973). Image formation by induced local interactions: examples employing nuclear magnetic resonance. Nature, 242:190–191. 8. Mansfield P, Grannell PK (1973). NMR diffraction in solids. J Phys C, 6:L422–L426. 9. Rudin M (ed.) (2005). Imaging in Drug Discovery and Early Clinical Trials. Birkhauser, Basel, Switzerland. 10. Weissleder R, Pittet MJ (2008). Imaging in the era of molecular oncology. Nature, 452:580–589. 11. Schaeffter T (2005). Imaging modalities: principles and information content. Imaging in Drug Discovery and Early Clinical Trials. In Rudin M (ed.), Birkhauser, Basel, Switzerland, pp. 15–81.
REFERENCES
89
12. Barrett HH, Myers K (2003). Foundations of Image Science. Wiley–Interscience, Hoboken, NJ. 13. Bernstein MA, King KF, Zhou XJ (2004). Handbook of MRI Pulse Sequences. Academic Press, San Diego, CA. 14. Rahmim A, Zaidi H (2008). PET versus SPECT: strengths, limitations and challenges. Nucl Med Commun, 29:193–207. 15. Zaidi H (ed.) (2005). Quantitative Analysis in Nuclear Medicine Imaging. Springer Verlag, New York. 16. Judenhofer MS, Wehrl HF, Newport DF, et al. (2008). Simultaneous PET-MRI: a new approach for functional and morphological imaging. Nat Med, 14:459–465. 17. Defrise M (2001). A short reader’s guide to 3D tomographic reconstruction. Comput Med Imaging Graphics, 25:113–116. 18. Haacke EM, Brown RW, Thompson MR, Venkatesan R (1999). Magnetic Resonance Imaging: Physical Principles and Sequence Design, Wiley–Liss, New York. 19. Rudin M, Beckmann N, Rausch M (2005). Evaluation of drug candidates: effi cacy readouts during lead optimization. In Rudin M (ed.), Imaging in Drug Discovery and Early Clinical Trials. Birkhauser, Basel, Switzerland, pp. 185–255. 20. Beckmann N (ed.) (2006). In Vivo MR Techniques in Drug Discovery and Development. Informa HealthCare, London. 21. Tofts P (ed.) (2003). Quantitative MRI of the Brain: Measuring Changes Caused by Disease. Wiley, Hoboken, NJ. 22. Jaffer FA, Weissleder R (2005). Molecular imaging in the clinical arena. JAMA, 293:855–862. 23. Massoud TF, Gambhir SS (2003). Molecular imaging in living subjects: seeing fundamental biological processes in a new light. Genes Dev, 17:545–580. 24. Aikawa E, Nahrendorf M, Figueiredo J, et al. (2007). Osteogenesis associates with inflammation in early-stage atherosclerosis evaluated by molecular imaging in vivo. Circulation, 116:2841–2850. 25. Willmann JK, Chen K, Wang H, et al. (2008). Monitoring of the biological response to murine hindlimb ischemia with 64Cu-labeled vascular endothelial growth factor-121 positron emission tomography. Circulation, 117:915–922. 26. Grimm J, Kirsch DG, Windsor SD, et al. (2005). Use of gene expression profiling to direct in vivo molecular imaging of lung cancer. Proc Natl Acad Sci USA, 102:14404–14409. 27. Koo V, Hamilton PW, Williamson K (2006). Non-invasive in vivo imaging in small animal research. Cell Oncol, 28:127–139. 28. Pomper MG, Lee JS (2005). Small animal imaging in drug development. Curr Pharm Des, 11:3247–3272. 29. Singh M, Johnson L (2006). Using genetically engineered mouse models of cancer to aid drug development: an industry perspective. Clin Cancer Res, 12: 5312–5328. 30. Lanza GM, Wickline SA (2003). Targeted ultrasonic contrast agents for molecular imaging and therapy. Curr Probl Cardiol, 28:625–653.
90
OPPORTUNITIES AND CHALLENGES
31. Kaufmann BA, Lindner JR (2007). Molecular imaging with targeted contrast ultrasound. Curr Opin Biotechnol, 18:11–16. 32. Henkelman RM, Stanisz GJ, Graham SJ (2001). Magnetization transfer in MRI: a review. NMR Biomed, 14:57–64. 33. Khaleeli Z, Sastre-Garriga J, Ciccarelli O, Miller DH, Thompson AJ (2007). Magnetisation transfer ratio in the normal appearing white matter predicts progression of disability over one year in early primary progressive multiple sclerosis. J Neurol Neurosurg Psychiatry, 78:1076–1082. 34. Damadian R (1971). Tumor detection by nuclear magnetic resonance. Science, 171:1151–1153. 35. Damadian R, Goldsmith M, Minkoff L (1977). NMR in cancer: XVI. FONAR image of the live human body. Physiol Chem Phys, 9:97–100, 108. 36. Sorensen AG (2006). Magnetic resonance as a cancer imaging biomarker. J Clin Oncol, 24:3274–3281. 37. Katz R (2004). Biomarkers and surrogate markers: an FDA perspective. NeuroRX, 1:189–195. 38. Mills G (2005). Biomarker imaging in drug development and licensed products. http://www.fda.gov/CDER/REGULATORY/medImaging/ImagingWorkshop. ppt. 39. Wagner JA, Williams SA, Webster CJ (2007). Biomarkers and surrogate end points for fit-for-purpose development and regulatory evaluation of new drugs. Clin Pharmacol Ther, 81:104–107. 40. Frank R, Hargreaves R (2003). Clinical biomarkers in drug discovery and development. Nat Rev Drug Discov, 2:566–580. 41. FDA (2005). Guidance for Industry: Pharmacogenomic data submissions. http:// www.fda.gov/CDER/GUIDANCE/6400fnl.pdf. 42. Kuhl D, Edwards R (1970). The Mark III scanner: a compact device for multipleview and section scanning of the brain. Radiology, 96:563–570. 43. Phelps ME, Hoffman EJ, Mullani NA, Higgins CS, Pogossian MMT (1976). Design considerations for a positron emission transaxial tomograph (PETT III). IEEE Trans Nucl Sci, 23:516–522. 44. Mansfield P, Pykett IL, Morris PG (1978). Human whole body line-scan imaging by NMR. Br J Radiol, 51:921–922. 45. Hargreaves R (2008). The role of molecular imaging in drug discovery and development. Clin Pharmacol Ther, 83:349–353. 46. Barrett HH (1990). Objective assessment of image quality: effects of quantum noise and object variability. J Opt Soc Am A, 7:1266–1278. 47. Kupinski MA, Clarkson E, Gross K, Hoppin JW (2003). Optimizing imaging hardware for estimation tasks. In Medical Imaging 2003: Image Perception, Observer Performance, and Technology Assessment, San Diego, CA. SPIE, Bellingham, WA, pp. 309–313. 48. Clarkson E (2007). Estimation receiver operating characteristic curve and ideal observers for combined detection/estimation tasks. J Opt Soc Am A, 24: B91–B98. 49. DePuey EG, Garcia EV, Berman DS (eds.) (2001). Cardiac SPECT Imaging. Lippincott Williams & Wilkins, Philadelphia.
REFERENCES
91
50. Taillefer EG (2001). Radiopharmaceuticals. In DePuey EG, Garcia EV, Berman DS (eds.), Cardiac SPECT Imaging. Lippincott Williams & Wilkins, Philadelphia, pp. 117–152. 51. Van Train CF, Garcia EV, Cooke CD, Areeda JS (2001). Quantitative analysis of SPECT myocardial perfusion. In DePeuy EG, Garcia EV, Berman DS (eds.), Cardiac SPECT Imaging. Lippincott Williams & Wilkins, Philadelphia, pp. 41–64. 52. Lin GS, Hines HH, Grant G, Taylor K, Ryals C (2006). Automated quantification of myocardial ischemia and wall motion defects by use of cardiac spect polar mapping and 4-dimensional surface rendering. J Nucl Med Technol, 34: 3–17. 53. Ficaro EP, Lee BC, Kritzman JN, Corbett JR (2007). Corridor4DM: the Michigan method for quantitative nuclear cardiology. J Nucl Cardiol, 14:455–465. 54. Leslie WD, Tully SA, Yogendran MS, Ward LM, Nour KA, Metge CJ (2004). Automated quantification of 99mTc sestamibi myocardial perfusion compared with visual analysis. Nucl Med Commun, 25:833–838. 55. Iskandrian AE, Garcia EV, Faber T (2008). Analysis of serial images: a challenge and an opportunity. J Nuclear Cardiol, 15:23–26. 56. Slomka PJ, Berman DS, Germano G (2004). Quantification of serial changes in myocardial perfusion. J Nucl Med, 45:1978–1980. 57. Iskandrian AE, Bateman TM, Belardinelli L, et al. (2007). Adenosine versus regadenoson comparative evaluation in myocardial perfusion imaging: results of the ADVANCE phase 3 multicenter international trial. J Nucl Cardiol, 14: 645–658. 58. Udelson JE (2008). Lessons from the development of new adenosine A2A receptor agonists. J Am Coll Cardiol Imaging, 1:317–320. 59. Shields AF, Price P (2007). In Vivo Imaging of Cancer Therapy. Humana Press, Totowa, NJ. 60. Weissleder R (2006). Molecular imaging in cancer. Science, 312:1168–1171. 61. Nutt R, Vento LJ, Ridinger MHT (2007). In vivo molecular imaging biomarkers: clinical pharmacology’s new PET? Clin Pharmacol Ther, 81:792–795. 62. Thompson RC, Cullom SJ (2006). Issues regarding radiation dosage of cardiac nuclear and radiography procedures. J Nucl Cardiol, 13:19–23. 63. Einstein AJ, Henzlova MJ, Rajagopalan S (2007). Estimating risk of cancer associated with radiation exposure from 64-slice computed tomography coronary angiography. JAMA, 298:317–323. 64. Amis ES, Butler PF, Applegate KE, et al. (2007). American College of Radiology White Paper on Radiation Dose in Medicine, 4:272–284. 65. McCollough CH, Bruesewitz MR, Kofler JM (2006). CT dose reduction and dose management tools: overview of available options. Radiographics, 26:503–512. 66. McConville PJ, Moody JB, Moffat BA (2005). High-throughput magnetic resonance imaging in mice for phenotyping and therapeutic evaluation. Curr Opin Chem Biol, 9:413–420. 67. Hargreaves R, Wagner JA (2006). Imaging as biomarker for decision-making in drug development. In Beckmann N (ed.), In Vivo MR Techniques in Drug Discovery and Development. Informa HealthCare, London, pp. 31–44.
92
OPPORTUNITIES AND CHALLENGES
68. Bergström M, Långström B (2005). Pharmacokinetic studies with PET. In Rudin M (ed.), Imaging in Drug Discovery and Early Clinical Trials. Birkhauser, Basel Switzerland, pp. 279–317. 69. Willmann JK, van Bruggen N, Dinkelborg LM, Gambhir SS (2008). Molecular imaging in drug development. Nat Rev Drug Discov, 7:591–607. 70. Major TC, Dhamija S, Black N, et al. (2008). The T- and L-type calcium channel blocker (CCB) mibefradil attenuates leg edema induced by the L-type CCB nifedipine in the spontaneously hypertensive rat: a novel differentiating assay. J Pharmacol Exp Ther, 325:723–731. 71. Seddon BM, Workman P (2003). The role of functional and molecular imaging in cancer drug discovery and development. Br J Radiol, 76:S128–S138. 72. Uppoor RS, Mummaneni P, Cooper E, et al. (2008). The use of imaging in the early development of neuropharmacological drugs: a survey of approved NDAs. Clin Pharmacol Ther, 84:69–74. 73. Evelhoch JL (2007). Magnetic resonance measurement of tumor perfusion and vascularity. In Shields AF, Price P (eds.), In Vivo Imaging of Cancer Therapy. Humana Press, Totowa, NJ, pp. 73–84. 74. Gwyther SJ (2007). Anatomical measure of tumor growth with computed tomography and magnetic resonance imaging. In Shields AF, Price P (eds.), In Vivo Imaging of Cancer Therapy. Humana Press, Totowa, NJ, pp. 33–46. 75. Miller AB, Hoogstraten B, Staquet M, Winkler A (1981). Reporting results of cancer treatment. Cancer, 47:207–214. 76. Lavin PT (1981). An alternative model for the evaluation of antitumor activity. Cancer Clin Trials, 4:451–457. 77. Michaelis LC, Ratain MJ (2006). Measuring response in a post-RECIST world: from black and white to shades of grey. Nat Rev Cancer, 6:409–414. 78. Therasse P, Arbuck SG, Eisenhauer EA, et al. (2000). New guidelines to evaluate the response to treatment in solid tumors. J Natl Cancer Inst, 92:205–216. 79. Tuma RS (2006). Sometimes size doesn’t matter: reevaluating RECIST and tumor response rate endpoints. J Natl Cancer Inst, 98:1272–1274. 80. Therasse P, Eisenhauer E, Verweij J (2006). RECIST revisited: a review of validation studies on tumour assessment. Eur J Cancer, 42:1031–1039. 81. Benjamin RS, Choi H, Macapinlac HA, et al. (2007). We should desist using RECIST, at least in GIST. J Clin Oncol, 25:1760–1764. 82. Choi H (2008). Response evaluation of gastrointestinal stromal tumors. Oncologist, 13:4–7. 83. Byrne MJ, Nowak AK (2004). Modified RECIST criteria for assessment of response in malignant pleural mesothelioma. Ann Oncol, 15:257–260. 84. Schwartz LH, Curran S, Trocola R, et al. (2007). Volumetric 3D CT analysis: an early predictor of response to therapy. J Clin Oncol (Meet Abstr), 25:4576. 85. Schwartz LH, Colville JAC, Ginsberg MS, et al. (2006). Measuring tumor response and shape change on CT: esophageal cancer as a paradigm. Ann Oncol, 17:1018–1023. 86. Cheson BD, Pfistner B, Juweid ME, et al. (2007). Revised response criteria for malignant lymphoma. J Clin Oncol, 25:579–586.
REFERENCES
93
87. Gillies RJ, Robey I, Gatenby RA (2008). Causes and consequences of increased glucose metabolism of cancers. J Nucl Med, 49:24S–42S. 88. Spaepen K, Stroobants S, Dupont P, et al. (2003). [18F]FDG PET monitoring of tumour response to chemotherapy: Does [18F]FDG uptake correlate with the viable tumour cell fraction? Eur J Nucl Med Mol Imaging, 30:682–688. 89. Gambhir SS, Czernin J, Schwimmer J, Silverman DH, Coleman RE, Phelps ME (2001). A tabulated summary of the FDG PET literature. J Nucl Med, 42:1S–93S. 90. Kelloff GJ, Hoffman JM, Johnson B, et al. (2005). Progress and promise of FDGPET imaging for cancer patient management and oncologic drug development. Clin Cancer Res, 11:2785–2808. 91. Weber WA, Petersen V, Schmidt B, et al. (2003). Positron emission tomography in non-small-cell lung cancer: prediction of response to chemotherapy by quantitative assessment of glucose use. J Clin Oncol, 21:2651–2657. 92. Mikhaeel NG, Hutchings M, Fields PA, O’Doherty MJ, Timothy AR (2005). FDG-PET after two to three cycles of chemotherapy predicts progression-free and overall survival in high-grade non-Hodgkin lymphoma. Ann Oncol, 16:1514–1523. 93. Lin C, Itti E, Haioun C, et al. (2007). Early 18F-FDG PET for prediction of prognosis in patients with diffuse large B-cell lymphoma: SUV-based assessment versus visual analysis. J Nucl Med, 48:1626–1632. 94. Hutchings M, Loft A, Hansen M, et al. (2006). FDG-PET after two cycles of chemotherapy predicts treatment failure and progression-free survival in Hodgkin lymphoma. Blood, 107:52–59. 95. Evilevitch V, Weber WA, Tap WD, et al. (2008). Reduction of glucose metabolic activity is more accurate than change in size at predicting histopathologic response to neoadjuvant therapy in high-grade soft-tissue sarcomas. Clin Cancer Res, 14:715–720. 96. Stroobants S, Goeminne J, Seegers M, et al. (2003). 18FDG-positron emission tomography for the early prediction of response in advanced soft tissue sarcoma treated with imatinib mesylate (Glivec). Eur J Cancer, 39:2012–2020. 97. Jager PL, Gietema JA, van der Graaf WTA (2004). Imatinib mesylate for the treatment of gastrointestinal stromal tumours: best monitored with FDG PET. Nucl Med Commun, 25:433–438. 98. Antoch G, Kanja J, Bauer S, et al. (2004). Comparison of PET, CT, and dualmodality PET/CT imaging for monitoring of Imatinib (STI571) therapy in patients with gastrointestinal stromal tumors. J Nucl Med, 45:357–365. 99. Van den Abbeele AD (2008). The lessons of GIST–PET and PET/CT: a new paradigm for imaging. Oncologist, 13:8–13. 100. Young H, Baum R, Cremerius U, et al. (1999). Measurement of clinical and subclinical tumour response using [18F]fluorodeoxyglucose and positron emission tomography: review and 1999 EORTC recommendations. Eur J Cancer, 35:1773–1782. 101. Shankar LK, Hoffman JM, Bacharach S, et al. (2006). Consensus recommendations for the use of 18F-FDG PET as an indicator of therapeutic response in patients in National Cancer Institute trials. J Nucl Med, 47:1059–1066.
94
OPPORTUNITIES AND CHALLENGES
102. Boellaard R, Oyen W, Hoekstra C, et al. (2008). The Netherlands protocol for standardisation and quantification of FDG whole body PET studies in multicentre trials. Eur J Nucl Med Mol Imaging, http://dx.doi.org/10.1007/ s00259-008-0874-2. 103. Hallett WA, Maguire RP, McCarthy TJ, Schmidt ME, Young H (2007). Considerations for generic oncology FDG-PET/CT protocol preparation in drug development. IDrugs, 10:791–796. 104. Weber WA (2005). Use of PET for monitoring cancer therapy and for predicting outcome. J Nucl Med, 46:983–995. 105. Westerterp M, Pruim J, Oyen W, et al. (2007). Quantification of FDG PET studies using standardised uptake values in multi-centre trials: effects of image reconstruction, resolution and ROI definition parameters. Eur J Nucl Med Mol Imaging, 34:392–404. 106. Lind P, Igerc I, Beyer T, Reinprecht P, Hausegger K (2004). Advantages and limitations of FDG PET in the follow-up of breast cancer. Eur J Nucl Med Mol Imaging, 31:S125–S134. 107. Hamaoka T, Madewell JE, Podoloff DA, Hortobagyi GN, Ueno NT (2004). Bone imaging in metastatic breast cancer. J Clin Oncol, 22:2942–2953. 108. Mankoff DA, Eary JF, Link JM, et al. (2007). Tumor-specific positron emission tomography imaging in patients: [18F]fluorodeoxyglucose and beyond. Clin Cancer Res, 13:3460–3469. 109. Kelloff GJ, Krohn KA, Larson SM, et al. (2005). The progress and promise of molecular imaging probes in oncologic drug development. Clin Cancer Res, 11:7967–7985. 110. Gupta N, Price PM, Aboagye EO (2002). PET for in vivo pharmacokinetic and pharmacodynamic measurements. Eur J Cancer, 38:2094–2107. 111. Bergström M, Grahnén A, Långström B (2003). Positron emission tomography microdosing: a new concept with application in tracer and early clinical drug development. Eur J Clin Pharmacol, 59:357–366. 112. Katzenellenbogen JA, Welch MJ, Dehdashti F (1997). The development of estrogen and progestin radiopharmaceuticals for imaging breast cancer. Anticancer Res, 17:1573–1576. 113. Smith-Jones PM, Solit D, Afroze F, Rosen N, Larson SM (2006). Early tumor response to Hsp90 therapy using HER2 PET: comparison with 18F-FDG PET. J Nucl Med, 47:793–796. 114. Mankoff DA, Link JM, Linden HM, Sundararajan L, Krohn KA (2008). Tumor receptor imaging. J Nucl Med, 49:149S–163S. 115. Shields AF, Grierson JR, Kozawa SM, Zheng M (1996). Development of labeled thymidine analogs for imaging tumor proliferation. Nucl Med Biol, 23:17–22. 116. Shields AF, Grierson JR, Dohmen BM, et al. (1998). Imaging proliferation in vivo with [F-18]FLT and positron emission tomography. Nat Med, 4:1334– 1336. 117. Buck AK, Halter G, Schirrmeister H, et al. (2003). Imaging proliferation in lung tumors with PET: 18F-FLT versus 18F-FDG. J Nucl Med, 44:1426–1431. 118. Chen W, Cloughesy T, Kamdar N, et al. (2005). Imaging proliferation in brain tumors with 18F-FLT PET: comparison with 18F-FDG. J Nucl Med, 46:945–952.
REFERENCES
95
119. Smyczek-Gargya B, Fersis N, Dittmann H, et al. (2004). PET with [18F]fluorothymidine for imaging of primary breast cancer: a pilot study. Eur J Nucl Med Mol Imaging, 31:720–724. 120. Grierson J, Yagle K, Eary J, et al. (2004). Production of [F-18]fluoroannexin for imaging apoptosis with PET. Bioconjug Chem, 15:373–379. 121. Glaser M, Collingridge DR, Aboagye EO, et al. (2003). Iodine-124 labelled Annexin-V as a potential radiotracer to study apoptosis using positron emission tomography. Appl Radiat Isotopes, 58:55–62. 122. Lahorte C, Vanderheyden J, Steinmetz N, Wiele C, Dierckx R, Slegers G (2004). Apoptosis-detecting radioligands: current state of the art and future perspectives. Eur J Nucl Med Mol Imaging, 31:887–919. 123. Haubner R, Wester H, Burkhart F, et al. (2001). Glycosylated RGD-containing peptides: tracer for tumor targeting and angiogenesis imaging with improved biokinetics. J Nucl Med, 42:326–336. 124. Kenny LM, Coombes RC, Oulie I, et al. (2008). Phase I trial of the positronemitting Arg-Gly-Asp (RGD) peptide radioligand 18F-AH111585 in breast cancer patients. J Nucl Med, 49:879–886. 125. Beer AJ, Haubner R, Goebel M, et al. (2005). Biodistribution and pharmacokinetics of the ανβ3-selective tracer 18F-Galacto-RGD in cancer patients. J Nucl Med, 46:1333–1341. 126. Zhang X, Xiong Z, Wu Y, et al. (2006). Quantitative PET imaging of tumor integrin ανβ3 expression with 18F-FRGD2. J Nucl Med, 47:113–121. 127. Parker GJM, Padhani AR (2003). T1-W DCE-MRI: T1-weighted dynamic contrast-enhanced MRI. In Tofts P (ed.), Quantitative MRI of the Brain. Wiley, Hoboken, NJ, pp. 341–364. http://dx.doi.org/10.1002/0470869526.ch10. 128. Tofts PS, Brix G, Buckley DL, et al. (1999). Estimating kinetic parameters from dynamic contrast-enhanced T1-weighted MRI of a diffusable tracer: standardized quantities and symbols. J Magn Reson Imaging, 10:223–232. 129. He ZQ, Evelhoch JL (1998). Analysis of dynamic contrast-enhanced MRI in tumors: relationship of derived parameters with physiologic factors. In Proceedings of the International Society for Magnetic Resonance in Medicine, Sydney, Australia. Wiley, New York, p. 1652. http://cds.ismrm.org/ismrm-1998/ PDF6/P1652.PDF. 130. Leach M, Brindle K, Evelhoch J, et al. (2005). The assessment of antiangiogenic and antivascular therapies in early-stage clinical trials using magnetic resonance imaging: issues and recommendations. Br J Cancer, 92:1599–1610. 131. O’Connor JPB, Jackson A, Parker GJM, Jayson GC (2007). DCE-MRI biomarkers in the clinical evaluation of antiangiogenic and vascular disrupting agents. Br J Cancer, 96:189–195. 132. Jayson G, Waterton J (2005). Applications of dynamic contrast-enhanced MRI in oncology drug development. In Baert A, Jackson A, Buckley D, Parker G (eds.), Dynamic Contrast-Enhanced Magnetic Resonance Imaging in Oncology. Springer, New York, 281–298. 133. Galbraith SM, Lodge MA, Taylor NJ, et al. (2002). Reproducibility of dynamic contrast-enhanced MRI in human muscle and tumours: comparison of quantitative and semi-quantitative analysis. NMR Biomed, 15:132–142.
96
OPPORTUNITIES AND CHALLENGES
134. Roberts C, Issa B, Stone A, Jackson A, Waterton JC, Parker GJ (2006). Comparative study into the robustness of compartmental modeling and modelfree analysis in DCE-MRI studies. J Magn Reson Imaging, 23:554–563. 135. Parker GJM, Roberts C, Macdonald A, et al. (2006). Experimentally-derived functional form for a population-averaged high-temporal-resolution arterial input function for dynamic contrast-enhanced MRI. Magn Reson Med, 56: 993–1000. 136. Morgan B, Thomas AL, Drevs J, et al. (2003). Dynamic contrast-enhanced magnetic resonance imaging as a biomarker for the pharmacological response of PTK787/ZK222584, an inhibitor of the vascular endothelial growth factor receptor tyrosine kinases, in patients with advanced colorectal cancer and liver metastases: results from two phase I studies. J Clin Oncol, 21:3955–3964. 137. Mross K, Drevs J, Müller M, et al. (2005). Phase I clinical and pharmacokinetic study of PTK/ZK, a multiple VEGF receptor inhibitor, in patients with liver metastases from solid tumours. Eur J Cancer, 41:1291–1299. 138. Thomas AL, Morgan B, Horsfield MA, et al. (2005). Phase I study of the safety, tolerability, pharmacokinetics, and pharmacodynamics of PTK787/ZK222584 administered twice daily in patients with advanced cancer. J Clin Oncol, 23: 4162–4171. 139. Prince MR, Zhang H, Morris M, et al. (2008). Incidence of nephrogenic systemic fibrosis at two large medical centers. Radiology, 248:807–816. 140. Deo A, Fogel M, Cowper SE (2007). Nephrogenic systemic fibrosis: a population study examining the relationship of disease development to gadolinium exposure. Clin J Am Soc Nephrol, 2:264–267. 141. Wertman R, Altun E, Martin DR, et al. (2008). Risk of nephrogenic systemic fibrosis: evaluation of Gadolinium chelate contrast agents at four American universities. Radiology, 248:799–806. 142. Penfield JG, Reilly RF (2008). Nephrogenic systemic fibrosis risk: Is there a difference between Gadolinium-based contrast agents? Semin Dial, 21:129–134. 143. Sadowski EA, Bennett LK, Chan MR, et al. (2007). Nephrogenic systemic fibrosis: risk factors and incidence estimation. Radiology, 243:148–157. 144. Liu G, Rugo HS, Wilding G, et al. (2005). Dynamic contrast-enhanced magnetic resonance imaging as a pharmacodynamic measure of response after acute dosing of AG-013736, an oral angiogenesis inhibitor, in patients with advanced solid tumors: results from a phase I study. J Clin Oncol, 23:5464–5473. 145. Lee CP, Taylor NJ, Attard G, et al. (2006). A phase I study of BIBF 1120, an orally active triple angiokinase inhibitor (VEGFR, PDGFR, FGFR) given continuously to patients with advanced solid tumours, incorporating dynamic contrast enhanced magnetic resonance imaging (DCE-MRI). J Clin Oncol (Meet Abstr), p. 3015. http://meeting.ascopubs.org/cgi/content/abstract/24/18_ suppl/3015. 146. Mross KB, Gmehling D, Frost A, et al. (2005). A clinical phase I, pharmacokinetic (PK), and pharmacodynamic study of twice daily BIBF 1120 in advanced cancer patients. J Clin Oncol (Meet Abstr), p. 3031. http://meeting.ascopubs.org/ cgi/content/abstract/23/16_suppl/3031. 147. Collins JM (2003). Functional imaging in phase I studies: decorations or decision making? J Clin Oncol, 21:2807–2809.
REFERENCES
97
148. Batchelor TT, Sorensen AG, di Tomaso E, et al. (2007). AZD2171, a pan-VEGF receptor tyrosine kinase inhibitor, normalizes tumor vasculature and alleviates edema in glioblastoma patients. Cancer Cell, 11:83–95. 149. Thomas DL, Lythgoe MF, Pell GS, Calamante F, Ordidge RJ (2000). The measurement of diffusion and perfusion in biological systems using magnetic resonance imaging. Phys Med Biol, 45:R97–R138. 150. Warach S, Gaa J, Siewert B, Wielopolski P, Edelman RR (1995). Acute human stroke studied by whole brain echo planar diffusion-weighted magnetic resonance imaging. Ann Neurol, 37:231–241. 151. Chenevert T, McKeever P, Ross B (1997). Monitoring early response of experimental brain tumors to therapy using diffusion magnetic resonance imaging. Clin Cancer Res, 3:1457–1466. 152. Moffat BA, Chenevert TL, Lawrence TS, et al. (2005). Functional diffusion map: a noninvasive MRI biomarker for early stratification of clinical brain tumor response. Proc Natl Acad Sci USA, 102:5524–5529. 153. Hamstra DA, Galbán CJ, Meyer CR, et al. (2008). Functional diffusion map as an early imaging biomarker for high-grade glioma: correlation with conventional radiologic response and overall survival. J Clin Oncol, 26: 3387–3394. 154. Razek AAKA, Megahed AS, Denewer A, Motamed A, Tawfik A, Nada N (2008). Role of diffusion-weighted magnetic resonance imaging in differentiation between the viable and necrotic parts of head and neck tumors. Acta Radiol, 49:364–370. 155. McVeigh P, Syed A, Milosevic M, Fyles A, Haider M (2008). Diffusion-weighted MRI in cervical cancer. Eur Radiol, 18:1058–1064. 156. Lee KC, Moffat BA, Schott AF, et al. (2007). Prospective early response imaging biomarker for neoadjuvant breast cancer chemotherapy. Clin Cancer Res, 13:443–450. 157. Lee KC, Sud S, Meyer CR, et al. (2007). An imaging biomarker of early treatment response in prostate cancer that has metastasized to the bone. Cancer Res, 67: 3524–3528. 158. Helm P, Beg MF, Miller MI, Winslow RL (2005). Measuring and mapping cardiac fiber and laminar architecture using diffusion tensor MR imaging. Ann NY Acad Sci, 1047:296–307. 159. Jellison BJ, Field AS, Medow J, Lazar M, Salamat MS, Alexander AL (2004). Diffusion tensor imaging of cerebral white matter: a pictorial review of physics, fiber tract anatomy, and tumor imaging patterns. Am J Neuroradiol, 25: 356–369. 160. Mori S, Frederiksen K, van Zijl PCM, et al. (2002). Brain white matter anatomy of tumor patients evaluated with diffusion tensor imaging. Ann Neurol, 51: 377–380. 161. Ross BD, Chenevert TL, Kim B, Ben-Yoseph O (1994). Magnetic resonance imaging and spectroscopy: application to experimental neuro-oncology. Q Magn Reson Biol Med, 1:89–106. 162. Zhao M, Pipe JG, Bonnett J, Evelhoch JL (1996). Early detection of treatment response by diffusion-weighted 1H-NMR spectroscopy in a murine tumour in vivo. Br J Cancer, 73:61–64.
98
OPPORTUNITIES AND CHALLENGES
163. Lyng H, Haraldseth O, Rofstad EK (2000). Measurement of cell density and necrotic fraction in human melanoma xenografts by diffusion weighted magnetic resonance imaging. Magn Reson Med, 43:828–836. 164. Ross BD, Moffat BA, Lawrence TS, et al. (2003). Evaluation of cancer therapy using diffusion magnetic resonance imaging. Mol Cancer Ther, 2:581–587. 165. Koh D, Collins DJ (2007). Diffusion-weighted MRI in the body: applications and challenges in oncology. Am J Roentgenol, 188:1622–1635. 166. Rohde G, Barnett A, Basser P, Marenco S, Pierpaoli C (2004). Comprehensive approach for correction of motion and distortion in diffusion-weighted MRI. Magn Reson Med, 51:103–114. 167. Pipe JG, Zwart N (2006). Turboprop: improved PROPELLER imaging. Magn Reson Med, 55:380–385. 168. de Langen AJ, van den Boogaart VEM, Marcus JT, Lubberink M (2008). Use of H215O-PET and DCE-MRI to measure tumor blood flow. Oncologist, 13:631–644. 169. Goh V, Padhani A (2006). Imaging tumor angiogenesis: functional assessment using MDCT or MRI? Abdominal Imaging, 31:194–199. 170. Miles KA (2002). Functional computed tomography in oncology. Eur J Cancer, 38:2079–2084. 171. Lamuraglia M, Escudier B, Chami L, et al. (2006). To predict progression-free survival and overall survival in metastatic renal cancer treated with sorafenib: pilot study using dynamic contrast-enhanced Doppler ultrasound. Eur J Cancer, 42:2472–2479. 172. Rouffiac V, Bouquet C, Lassau N, et al. (2004). Validation of a new method for quantifying in vivo murine tumor necrosis by sonography. Invest Radiol, 39:350–356. 173. Jakobsen JÅ, Oyen R, Thomsen HS, Morcos SK (2005). Safety of ultrasound contrast agents. Eur Radiol, 15:941–945. 174. Lassau N, Chami L, Benatsou B, Peronneau P, Roche A (2007). Dynamic contrast-enhanced ultrasonography (DCE-US) with quantification of tumor perfusion: a new diagnostic tool to evaluate the early effects of antiangiogenic treatment. Eur Radiol Suppl, 17:89–98. 175. Soret M, Bacharach SL, Buvat I (2007). Partial-volume effect in PET tumor imaging. J Nucl Med, 48:932–945. 176. Bacharach SL, Libutti SK, Carrasquillo JA (2000). Measuring tumor blood flow with H215O: practical considerations. Nucl Med Biol, 27:671–676. 177. Evelhoch J, Garwood M, Vigneron D, et al. (2005). Expanding the use of magnetic resonance in the assessment of tumor response to therapy: workshop report. Cancer Res, 65:7041–7044. 178. Frank R (2008). Quantitative Imaging Biomarkers Alliance FDG-PET/CT Working Group Report. Mol Imaging Biol, 10:305. 179. Woodcock J (2005). The Critical Path Initiative: one year later. http://www.fda. gov/CDER/REGULATORY/medImaging/woodcock.ppt. 180. The Biomarkers Consortium (2008). http://www.biomarkersconsortium.org. 181. Altar C (2008). The Biomarkers Consortium: on the critical path of drug discovery. Clin Pharmacol Ther, 83:361–364.
REFERENCES
99
182. Foundation for the National Institutes of Health (2008). Biomarkers: FDG-PET lung and lymphoma. http://www.fnih.org/index.php?option=com_content&task= view&id=503&Itemid=638. 183. Foundation for the National Institutes of Health (2008). Biomarkers: carotid MRI reproducibility study. http://www.fnih.org/index.php?option=com_content &task=view&id=489&Itemid=600. 184. Nevitt MC (2007). Osteoarthritis Initiative (OAI): Design, subject characteristics, data and images. http://www.oai.ucsf.edu/datarelease/docs/presentations/ oarsi12062007/Nevitt_OARSI2007.pdf. 185. OAI:Home (2008). http://www.oai.ucsf.edu/datarelease/. 186. Foundation for the National Institutes of Health–Alzheimer’s Disease Neuroimaging Initiative (2008). http://fnih.org/index.php?option=com_content& task=view&id=103&Itemid=227. 187. Alzheimer’s Disease Neuroimaging Initiative LONI–ADNI (2008). http://www. loni.ucla.edu/ADNI/. 188. Maurer SM, Scotchmer S (2006). Open Source Software: The New Intellectual Property Paradigm. National Bureau of Economic Research Working Paper Series, no. 12148. http://www.nber.org/papers/w12148. 189. Rai AK (2004). Open and collaborative research: a new model for biomedicine. http://ssrn.com/paper=574863. 190. Maurer SM (2007). Open source drug discovery: finding a niche (or maybe several). http://ssrn.com/paper=1114371. 191. Munos B (2006). Can open-source R&D reinvigorate drug research? Nat Rev Drug Discov, 5:723–729. 192. Eschenbach ACV, Buetow K (2006). Cancer Informatics Vision: caBIG. Cancer Inf, 2:22–24. 193. Piwowar HA, Becich MJ, Bilofsky H, Crowley RS (2008). Towards a data sharing culture: recommendations for leadership from academic health centers. PLoS Med, 5:e183 EP. 194. Prior FW, Erickson BJ, Tarbox L (2007). Open Source Software Projects of the caBIG In Vivo Imaging Workspace Software Special Interest Group. J Digital Imaging, 20:94–100. 195. Jezzard P, Ramsey NF (2003). Functional MRI. In Tofts P (ed.), Quantitative MRI of the Brain: Measuring Changes Caused by Disease. Wiley, Hoboken, NJ, pp. 415–453. 196. Börnert P, Keupp J, Eggers H, Aldefeld B (2007). Whole-body 3D water/fat resolved continuously moving table imaging. J Magn Reson Imaging, 25:660–665. 197. St Pierre TG, Clark PR, Chua-anusorn W, et al. (2005). Noninvasive measurement and imaging of liver iron concentrations using proton magnetic resonance. Blood, 105:855–861. 198. Stafford RJ, Hazle JD (2006). Magnetic resonance temperature imaging for focused ultrasound surgery: a review. Top Magn Reson Imaging, 17:153–163. 199. Zhou J, Payen J, Wilson DA, Traystman RJ, van Zijl PCM (2003). Using the amide proton signals of intracellular proteins and peptides to detect pH effects in MRI. Nat Med, 9:1085–1090.
100
OPPORTUNITIES AND CHALLENGES
200. Woods M, Woessner DE, Sherry AD (2006). Paramagnetic lanthanide complexes as PARACEST agents for medical imaging. Chem Soc Rev, 35:500–511. 201. van Zijl PCM, Jones CK, Ren J, Malloy CR, Sherry AD (2007). MRI detection of glycogen in vivo by using chemical exchange saturation transfer imaging (glycoCEST). Proc Natl Acad Sci USA, 104:4359–4364. 202. Eisenhauer EA, Therasse P, Bogaerts J et al. (2009) New response evaluation criteria in solid tumors: revised RECIST guideline (version 1.1). Eur J Cancer, 45:228–247.
5 PROTEIN BIOMARKER DISCOVERY USING MASS SPECTROMETRY– BASED PROTEOMICS Joanna M. Hunter, Ph.D., and Daniel Chelsky, Ph.D. Caprion Proteomics, Inc., Montreal, Quebec, Canada
INTRODUCTION Recent advances in biological sample preparation, automated sample handling, and sensitive mass spectrometry with a variety of sample ionization sources have catalyzed the emergence of these methods as tools for clinical proteomic analyses [1,2]. Concurrently, proteomic data analysis has been aided by the expansion and improvement of publicly available protein and gene databases. Among the methods reported for quantitative comparison of proteins of biological origin, several techniques based on liquid-phase separation of proteins and peptides rather than gel electrophoresis have been optimized and implemented within the last few years. These include stable isotope labeling as well as label-free methods [3–9]. Direct measurement and comparison of unlabeled peptide ion peak intensities has become the standard, particularly for high-throughput applications. Quantitative, label-free mass spectrometry–based methods for protein expression profiling include those based on pattern recognition [10,11], peptide counting [12–14], and peptide ion intensity or area [4,5,15–20]. Of these, the latter seems to be the most robust and applicable to clinical studies. The correlation between observed peptide intensity differences and protein abundance differences is the fundamental basis of label-free protein expression profiling. In particular, the relaBiomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
101
102
PROTEIN BIOMARKER DISCOVERY USING MS–BASED PROTEOMICS
tionship between liquid chromatography–mass spectrometry (LC-MS) signals and peptide abundance has been shown to vary linearly with concentration, even in complex samples [5,21–23]. Mass spectrometry–based protein expression profiling, without the use of labels or internal standards, is applicable to a wide range of sample types, including tissues, cells, organelles, and organisms, as well as body fluids such as plasma, cerebrospinal fluid, and urine. It has the additional advantage that comparisons can be made post hoc on a per patient basis, thereby dramatically increasing the granularity of information that can be obtained from the data. Multiple approaches can be taken for the label-free identification of differentially expressed proteins in large sample sets. At Caprion we have developed a specific approach, called CellCarta, which has proven to be very effective and reliable. Features of this platform will be used as an example in describing issues and solutions for biomarker discovery. The platform has low overall variability [<15% coefficient of variation (CV) in the absence of biological variability] and provides an accurate and quantitative measure of the biological modulation of protein abundance. Thus, statistically significant differences in plasma protein abundance can be detected, making this technology highly applicable to biomarker discovery in clinical samples. Plasma markers of disease, drug reversion of disease, predictive markers of drug response, pharmacodynamic markers of drug action, and early markers of both efficacy and toxicity can be detected using this technology. Studies may be conducted in animal models (preclinical) as well as with human clinical samples [24–26]. One requirement for a useful biomarker is that it can be measured in an easily accessible body fluid. In this regard, blood (serum or plasma) is preferred, and it is used routinely for many clinical assays [27]. However, the discovery of biomarkers in blood brings some technical challenges. Plasma has been estimated to contain anywhere from several hundred to several thousand proteins [28,29], whose concentrations are known to vary by up to 10 orders of magnitude. Many of the proteins with desirable biomarker characteristics are expected to be at the low end of the plasma protein dynamic range of concentration, typically at or below 100 ng/mL. Two approaches to this issue are described here. One approach depends on the immunoaffinity-based depletion of some of the most abundant plasma proteins, thus making it possible to see more of the lower-abundance proteins. The second approach is based on identifying and comparing plasma proteins just prior to release from their cells or tissue of origin. While residing in the secretory apparatus, proteins destined for the blood are highly concentrated, thus allowing for a comprehensive picture of tissue-specific markers.
PROTEIN PROFILING PLATFORM FOR BIOMARKER DISCOVERY One of the major challenges of biomarker discovery is the execution of unbiased large-scale studies to avoid the confusion of process artifacts with actual
PROTEIN PROFILING PLATFORM FOR BIOMARKER DISCOVERY
103
biomarkers. Unbiased experimental design and study implementation are arguably among the most critical components of a differential protein expression platform [30]. Because subtle errors in a work plan can result in large biases, careful study planning and implementation are necessary in order to discover biomarkers that are robust to population variation, sample collection variability, processing artifacts, and other confounding factors. The first step in setting up a study is to clearly define the questions that must be addressed. From these goals, the minimum number of patients in each condition to be compared can be fixed. Although the group sizes can be small from a purely analytical perspective, the number of samples required is dictated by the possibility of confounding factors within the target group and/or by the need to sample a diverse target population. Multiple comparisons across large sample sets are achievable and fully exploit the analytical platform. In addition, the order in which samples are processed and analyzed must be considered. Depending on the comparisons to be made, samples must be interleaved and/ or randomized between processing steps. Alternatively, the samples to be compared are block randomized, with comparisons made within each block, thus ensuring the greatest processing consistency within a comparison. After samples have been processed, data analysis techniques are employed to ensure that biases have not occurred. If biases appear, appropriate sample pairing and statistical tests are utilized to ensure that the biases do not influence the conclusions drawn from the study. To generate highly reproducible results, stringent control of sample preparation, analysis, and data normalization procedures are required. Low peptide signal intensity variation across study samples is required to detect statistically significant and subtle protein changes against background variation. Protocols followed rigorously, including quality control checks, are necessary throughout the study, from sample collection to data analysis. For example, strictly following a standard operating procedure (SOP) for blood collection and generation of plasma enables blood samples to be collected from multiple sites for the same study. Furthermore, a well-documented sample “chain of custody” is obligatory when handling clinical samples. As part of this documentation process, samples are barcoded, logged into a laboratory information management system (LIMS), and stored in locked freezers. All subsequent steps in sample handling and analysis are monitored and logged, and all data are secured. Thus, SOPs and quality control procedures ensure sample integrity throughout the process. Specific features of the CellCarta work flow are detailed below. Focusing on the Relevant Proteome Without enrichment of a targeted proteome, samples contain far too many proteins to be visualized in a reasonable number of analytical runs, and only the most abundant proteins are detected. Enrichment of subproteomes is necessary to reduce the sample complexity and to decrease the limit of detection. More comprehensive protein identification as well as information on
104
PROTEIN BIOMARKER DISCOVERY USING MS–BASED PROTEOMICS
subcellular localization is obtained by isolating organelles such as plasma membranes, endosomes, phagosomes, mitochondria, endoplasmic reticulum, nuclei, or Golgi apparatus from cells or tissue homogenates. Similarly, blood (plasma or serum) is enriched for lower abundance proteins by the removal of high-abundance proteins. For example, depletion of the 14 most abundant proteins using the Multiple Affinity Removal System (MARS, Agilent Technologies) removes over 96% of total protein and consequently enhances the detection of the remaining lower-abundance components [31,32]. Various depletion approaches are being used to get deeper into the plasma proteome [33]. The key criteria for implementing enrichment or depletion strategies are that they provide minimal contamination of the nontargeted proteins and are highly reproducible. Sample Fractionation Regardless of the origin of the purified protein samples, the complexity is typically too great—both in terms of the number of proteins and the dynamic range of concentration—for direct analysis by mass spectrometry. Reversedphase liquid chromatography coupled to mass spectrometry (LC-MS) can reproducibly distinguish and quantify over 5000 peptides in an hour-long analytical run. A single LC-MS injection is therefore suitable for samples comprised of some 500 to 1000 proteins. Subproteome enrichment typically reduces the total number of proteins to fewer than 10,000. To characterize these complex samples more comprehensively, a fractionation step must be applied before the LC-MS. At the protein level, separations such as strong anion exchange provide effective fractionation prior to trypsin digestion and LC-MS. Alternatively, the proteins are first proteolyzed to tryptic peptides, with subsequent separation of the peptides by strong cation exchange (SCX) liquid chromatography [34]. The CellCarta work flow employs this method. Differential Protein Expression by LC-MS For optimal matching of peptide signals across many samples, the LC-MS platform must be as reproducible as possible (Figure 1). Technically, the requirements of stable chromatography, high resolution, and mass accuracy may be addressed by a variety of instruments. Peptide profiling takes place on mass spectrometers that are capable of providing sensitive, high-mass-accuracy measurements, such as time-of-flight or high-resolution ion-trapping instruments. Peptide sequencing is performed on quadrupole-time-of-flight instruments such as the QSTAR (Applied Biosystems) or the QTOF (Waters), or on an ion trap such as the LTQ-Orbitrap (ThermoFisher), which are sensitive and can provide high-quality MS/MS spectra. Regardless of the mass spectrometer type, capillary reversed-phase LC columns are coupled to these instruments via an electrospray interface. The CellCarta mass spectrometry platform utilizes capillary liquid chromatography (CapLC) and QTOF LC-MS
PROTEIN PROFILING PLATFORM FOR BIOMARKER DISCOVERY
m/z
m/z
RT (min) RT (min) Increasing intensity of selected ion
Human Plasma 3
m/z
Human Plasma 2
Human Plasma 1
105
RT (min)
Figure 1 Consistent peptide intensity across all samples allows the detection of differentially expressed peptide ions. Shown is a partial view of peptide ion maps (as measured by LC-MS) from the plasma of three individuals. The horizontal axis is chromatographic retention time, the vertical axis is mass-to-charge ratio (m/z), and the peptide ion intensity is denoted by the size and color of the spots. The peptide ion circled shows differential expression across patients and increases in abundance from sample 1 to sample 3. (See insert for color reproduction of the figure.)
systems (Waters). The LC-MS measurements are reliable and reproducible, providing a median CV of peak intensity of 8 to 9% for peptides matched across six replicate injections of a standard sample. To quantify the differentially expressed peptides, a suite of proprietary bioinformatics tools was developed at Caprion and implemented into CellCarta. Matching peptides across large sample sets (peak alignment) is the first step in the differential expression analysis (Figure 2). Algorithms to perform mass-to-charge ratio (m/z) and chromatographic retention-time alignment result in the confident detection of significant peptide intensity differences between samples. Peptide ions are detected and matched across all the samples in the study. Each study peptide is characterized by m/z, charge, retention time, and intensity. Bioinformatics software tools map peptides across all samples, comparing ion intensity for all reproducible peptide ions across all fractions. Those peptides that show a statistically significant differential abundance are targeted for protein identification. Protein Identification Identification of the candidate protein biomarkers is accomplished using two different and complementary tools. First, database searching of LC-MS/MS spectra for peptide identification is accomplished using Mascot (MatrixScience), which searches for the best fit between the recorded spectra and theoretical spectra calculated from tryptic peptides from proteins in the reference
106
PROTEIN BIOMARKER DISCOVERY USING MS–BASED PROTEOMICS
855 850 845
m/z
840 835 830 825 820 815 27
27.5
28
28.5 29 Retention time
29.5
30
Figure 2 Related peptide ions are clustered across the entire study. High confidence matching is achieved, independent of small variations in mass and retention time detected. Each symbol represents a detected ion in one sample, plotted at the observed mass-to-charge ratio and chromatographic retention time.
database. For human plasma studies, the human International Protein Index (IPI; European Bioinformatics Institute) is searched. For samples not of human origin, other public databases may be used, depending on the species. The search results are parsed based on peptide and protein score thresholds that are determined by setting an acceptable false-positive rate of identification calculated by searching a randomized database [35]. In addition, we have developed novel software that adds a differential abundance correlation filter to mass and retention-time fingerprinting [virtual mass spectrometry (VMS)]. Protein identification using mass and retentiontime fingerprinting alone suffers from the high level of mass redundancy for peptides from complex species. VMS incorporates the protein identification and peptide expression resolver (PIPER) filter, which requires all peptides pointing to the same protein to share the same relative intensity across all samples, a logical requirement for association. Application of the PIPER filter to the identified peptides results in false-positive protein identification rates below 5% [36]. This method provides for the identification of proteins where the peptide ions can be detected but are not sufficiently intense to provide an interpretable MS/MS fragmentation pattern [37]. In addition to adding new protein identifications, VMS can also be used to supplement LC-MS/MS by
PROTEIN PROFILING PLATFORM FOR BIOMARKER DISCOVERY
107
increasing confidence in low-scoring sequence matches or single-peptide identifications. Thus, the combined use of LC-MS/MS sequencing with VMS greatly expands the ability to identify and quantify novel proteins by expanding the dynamic range (number and concentration) of identifiable proteins as well as the confidence (number of identified peptides per protein) of protein identification. Candidate Biomarker Results Visualization Proteomics analyses can often generate large data sets, including all of the associated peptide sequence and expression data. To enable rapid evaluation and interpretation of results, it is very helpful to organize the output into a searchable database with a specialized but simple user interface. We have created such a tool that is Java-based and accessible on a secure Web portal. The Data Report provides query, filtering, and data analysis capabilities as well as visualization of peptide differential expression and protein identification across all of the samples in the study. Included in the Data Report are listings of all the peptides, protein accession numbers, and gene names identified, along with differential expression information for each. Graphical displays of peptide intensities, differential expression, and MS/MS spectra are also available (Figure 3). In addition, selected data can be exported into standard formats. Candidate Biomarker Panels Historically, circulating biomarkers have been defined by a single analyte that can be used to distinguish given groups of individuals with high specificity and sensitivity (positive and negative predictive values). In the last two decades, the development of high-throughput multivariate analytical platforms, such as protein and gene chips, has enabled the identification and validation of multivariate biomarkers. Due to population variation and the multitude of effects that treatment can have within a person and across the population, it is acknowledged that panels of proteins are likely to produce a biomarker that is significantly more sensitive and specific than any single protein alone [38]. In a typical biomarker discovery study performed on the CellCarta platform, hundreds of proteins are found to be significantly differentially modulated. These proteins are then prioritized into panels of 3 to 10 proteins. This process involves filtering based on several criteria, including diagnostic strength [i.e., the area under the receiver operator characteristic (ROC) curve or AUC] and linear regression to clinical factors. In particular, the panel of proteins must have a high composite AUC in addition to strong individual protein AUC scores. Biomarker panels with high discriminating power (positive and negative predictive values) can thus be composed from the individual biomarker candidates. The number of protein candidates included in the panel
108
PROTEIN BIOMARKER DISCOVERY USING MS–BASED PROTEOMICS
Figure 3 Caprion Data Report user interface. An example Data Report screenshot is shown. The upper left panel contains a summary of the proteomic data for the proteins selected by the user. The upper right panel contains several tabs that permit additional details to be displayed, including a hierarchical clustering of the peptides. The lower left panel contains search options, while the lower right panel contains several tabs that provide additional proteomic data, such as peptide intensities, differential expression, and MS/MS spectra.
represents a trade-off between cost and risk. Although larger protein panels are more costly to verify, they are more resilient to population variability. Prioritized candidate biomarkers are then qualified and verified further using immunoassays such as enzyme-linked immunosorbent assay (ELISA) or mass spectrometry–based assays such as multiple reaction monitoring (MRM) [39,40].
PROTEIN BIOMARKER DISCOVERY IN BLOOD PLASMA The CellCarta platform enables the measurement of biological changes of interest in human plasma profiling studies for protein biomarker discovery. The following examples illustrate that protein biomarkers can be found in plasma using a highly controlled proteomics technology platform and employing an appropriate study design.
PROTEIN BIOMARKER DISCOVERY IN BLOOD PLASMA
109
Global Proteomics Applied to Alzheimer Disease A collaboration with Hyman Schipper, a neuropsychologist at the McGill University Memory Clinic, part of the Jewish General Hospital (JGH; Montreal, Canada), was undertaken to investigate the effect of drug treatment on Alzheimer disease (AD) patients. Plasma was collected from patients after obtaining written informed consent and with approval of the research and ethics committee of the JGH. Included in the study were 33 age-matched healthy control, 19 untreated AD, and 25 donepezil-treated AD patients. Each patient was administered the Folstein Mini-Mental State Examination (MMSE), a qualitative test of disease severity [41]. In addition, clinical histories were obtained. In the course of the study, a “global proteomics” approach to data visualization and analysis was developed and implemented. Following unsupervised clustering by multidimensional scaling (MDS), the untreated disease group was found to cluster apart from the healthy group (Figure 4). Subsequently, a centroid (median value in three dimensions) was calculated for each group. The line intersecting the normal and disease group centroids is defined as the
DonepezilTreated Patients
Healthy Controls
Figure 4 Global proteomics analysis on Alzheimer disease and normal patients. Multidimensional scaling of proteomics data demonstrates the separation of healthy individuals (green spheres) from Alzheimer patients (red spheres). Caprion has defined a disease axis that is used to quantify relative disease state. The axis is a line that passes through the disease and healthy centroids (yellow spheres). Each patient is then positioned on the axis according to its orthogonal intercept. Donepezil-treated Alzheimer patients (purple spheres), as a group, are shifted on the disease axis from the disease group toward the healthy group. (See insert for color reproduction of the figure.)
110
PROTEIN BIOMARKER DISCOVERY USING MS–BASED PROTEOMICS
disease axis (Figure 4). The disease severity for each patient is indicated by the distance between its orthogonal intercept with the disease axis and the normal centroid. The disease severity of each patient, as determined by this approach, correlated well with the patient MMSE score, with a Pearson correlation coefficient of 0.75. When the relative position on the disease axis was matched to relative peptide intensity, 282 peptides were found to correlate with the disease severity profile. The proteins corresponding to these peptides were subsequently identified. Although there is currently no blood-based diagnostic test for AD, these results suggest that a test monitoring blood protein levels could measure AD severity. The global proteomics approach was applied to assess the effect on AD patients of treatment with donepezil. A total of 75 peptides were found to be highly correlated with the disease severity profile of treated versus untreated AD patients. These peptides are a subset of the 282 disease-related peptides. The CellCarta platform can thus be employed as an assay that enables the analysis of drug response for patient segregation. As an extension, pharmacodynamic analyses such as dose optimization and drug efficacy could be performed using this method. This study demonstrates that global proteomics can be an effective approach to pharmacodynamic biomarker identification. Oncology Biomarkers: Blinded Case Study The goal of this second study was to identify circulating markers of ovarian and breast cancer. Two sets of patient samples were examined. The first set was composed of eight samples each from patients with breast cancer or ovarian cancer as well as eight matched healthy subjects. The second set of samples had the same composition, but the identities were blinded to Caprion. All samples were processed concomitantly though the CellCarta platform. Of the 53,628 peptide ions detected across the study, 4089 were differentially modulated reproducibly and significantly (p < 0.005) in one of the three cohorts. These peptides were further analyzed by the bioinformatics tool, multidimensional scaling, to determine the relationship between each of the patients at the global proteomics level. Results, shown in Figure 5, indicate that each of the three groups of samples cluster separately. The samples in each group are therefore more similar to other members of the same group than to members of other groups. This similarity within a group translates to peptides (and proteins) that distinguish and separate each group. The behavior of the peptides that were found to distinguish each of the three groups was analyzed for each of the 24 blinded samples. Each sample was therefore categorized according to which of the three groups it was most related. When the samples were unblinded, it was determined that 22 of the 24 assignments were correct. Only one pair, consisting of a normal and a breast cancer plasma sample, was assigned incorrectly. In this example, all differentially expressed peptides were used to discriminate the three groups in order to maximize the ability of the small learning
PROTEIN BIOMARKER DISCOVERY IN BLOOD PLASMA
111
Multidimensional Scaling (Log Transform OFF) Ovarian Breast Normal
MDS 3
MDS 1
MDS 2
Figure 5 Sample groups distinguished by differentially expressed peptides. Multidimensional scaling analysis was performed using the intensity values for 4089 differentially expressed peptide ions from 24 samples. Separation along three axes of variance (MDS1 to MDS3) is shown, where each sphere represents a patient sample. The groups are identified by the colors indicated. (See insert for color reproduction of the figure.)
set (first 24 samples) to determine the identity of the blinded sample test set. None of the individual peptide ions could discriminate the three groups very well. However, sets of five or 10 peptides were much more effective, as shown in the receiver operating characteristic (ROC) plots in Figure 6. In the case of multiple peptide analysis, peptides were chosen randomly from the population of differentially expressed peptides in order to provide multiple examples of the range of discrimination possible. The AUC required to fully discriminate groups is a value of 1. This was most frequently achieved as the panel size increased from one to 10 proteins (Figure 6, right panels). To better understand the biology underlying the differentially expressed peptides, their parent proteins were identified. This was accomplished by LC-MS/MS and VMS to identify groups of peptides representing a single protein. This result was further filtered by requiring that the relative intensity of a minimum of three peptide ions be well correlated across all samples. Approximately 200 proteins met these and other strict confidence criteria. Half of these proteins distinguished either breast cancer or ovarian cancer from the other two conditions. The remaining half of the proteins distinguished cancer (breast or ovarian) from normal healthy subjects. These results suggest that circulating blood proteins provide a rich source of high-quality
112 True Positive 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
False Positive
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 1 0 0 0 0 0 0 0 0 0
0.2 0.4 0.6
0.2 0.4
0.6
0
0.2 0.4
0.6
Median = 0.94
0
Median = 0.83*
0
Median = 0.78
0.8
0.8
0.8
1
1
1
1.2
1.2
1.2
Area Under the Curve (AUC)
0 –0.2
100
200
300
400
500
0 –0.2 600
50
100
150
200
250
300
350
450 400 350 300 250 200 150 100 50 0 –0.2 400
Figure 6 Multiple peptide panels are effective at discriminating groups. ROC plots show improved performance when going from one to five to 10 peptides. Displayed are curves (left) representing true-positive (x-axis) and false-positive (y-axis) ratios and the area under the curve (AUC; right column) for single-peptide (top), five-peptide (middle) and 10-peptide panels (bottom). In the case of multiple-peptide analysis, peptides were chosen randomly from the population of differentially expressed peptides. The optimal AUC for discriminating groups is a value of 1. This was best achieved as the panel size increased. (See insert for color reproduction of the figure.)
the median AUC panels of size 5 is under 0.4.
* Starting with random sets of 890 peptides,
(1000 random combinations)
10-peptide panel
(1000 random combinations)
5-peptide panel
(1000 peptides)
1-peptide panel
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Frequency
PROTEIN BIOMARKER DISCOVERY IN THE SECRETORY APPARATUS
113
disease biomarkers that can be discovered using mass spectrometry–based protein expression profiling.
PROTEIN BIOMARKER DISCOVERY IN THE SECRETORY APPARATUS Secreted Proteins as Biomarkers In the quest for circulating biomarkers of disease or drug response, much effort has been made to avoid the overwhelming effect of high-abundance proteins in the blood that obscure lower abundance and potentially more interesting proteins. Removal of the high-abundance proteins with multiple affinity antibody columns has been very effective in this regard. An alternative approach that has been successful as well is to isolate plasma proteins as they are being released from the source tissues or cells. One common such method involves harvesting proteins released by tissue culture cells into conditioned media [42,43]. Although this approach has been productive, proteins may be lost due to dilution or adherence to the culture container, and it is generally limited to cultured cells rather than whole tissue. Analysis of tissues can provide a more physiological context, as they include primary cells as well as the natural composition of multiple cell types that may modulate secretion. Tissues can also be taken from the host, typically animal models of disease, following drug treatment, in order to assess the impact of treatment on the secretome. This can lead to a better understanding of the mechanism of action of the drug treatment as well as the identification of possible markers of therapeutic response. Proteins released from tissue into culture medium have also been studied [44], but contamination from lysed cells and culture medium can obscure the actual proteins secreted. Reliance on those proteins with a signal sequence as well as isotopic labeling of de novo synthesized proteins from the tissue are usually necessary. An alternative approach to identifying and quantifying secreted proteins relies on the isolation of such proteins directly from the secretory pathway of the cells or tissue. The contents of the Golgi and related secretory vesicles provide a highly concentrated collection of proteins, just prior to their release from the cell [45,46]. This method avoids dilution into the plasma, which would obscure all but the more abundant secreted proteins. Dilution as well as contamination by cell lysis proteins in culture medium is also eliminated as a concern. Tissues can be as easily investigated as cultured cells, with the added advantage that the samples do not need to be viable at the time of analysis. Thus, frozen surgical samples from patients or animal models are amenable to study and comparison, even when samples are collected at different time points. To isolate secretory proteins directly from cell culture or tissue, the homogenized samples are separated on a density gradient to isolate secretory vesicles.
114
PROTEIN BIOMARKER DISCOVERY USING MS–BASED PROTEOMICS
Secreted Proteins Differentially Expressed in Prostate Cancer The prostate secretes a range of proteins related to its function, including proteolytic enzymes, acid phosphatase (ACPP), and prostate-specific antigen (PSA). In prostate tumors, some of these proteins are known to increase or decrease in the amount secreted. To test the application of secretory vesicle isolation to frozen human surgical specimens, samples were obtained from patients with prostate cancer, with institutional review body (IRB) approval and patient consent. Tumor and normal tissue were separated as described [47] and the content of the secretory apparatus was isolated. Proteins were digested with trypsin and analyzed by LC-MS in order to compare expression between each normal and tumor pair from all six patients. Significantly differentially expressed peptides were identified. Four of the best known prostate cancer–associated secreted proteins were found to be up-regulated, as expected, in the tumors. PSA, ACPP, kallekrin 2 (KLK2), and macrophage migration inhibitory factor (MIF) are all low-abundance proteins in the blood, at levels ranging from 0.2 to 3.5 ng/mL in normal individuals. Detection of these proteins by direct LC-MS analysis of plasma would not be possible, yet was readily accomplished by examination of the secretory pathway. Two of these proteins, PSA and MIF, were also evaluated by commercial ELISA in the plasma of the same patients. The levels observed in the secretory vesicles by unlabeled mass spectrometry were directly comparable to levels in the plasma from the same patients, with correlation coefficients of 0.69 and 0.74, respectively [46]. Thus, analysis of differentially expressed proteins in the secretory apparatus is predictive of relative expression in the plasma. This correlation means that discovery can be done in the highly concentrated milieu of the vesicles, while verification and validation studies can be conducted with more sensitive antibody-based assays directly in blood. In addition to the well-known prostate cancer markers, other known cancer-related proteins were found, for a total of 40 proteins with known cancer association. A further 20 proteins were identified as differentially expressed in the tumors that were not previously known to have a cancer association. The known function of the proteins typically fit well with their observed expression patterns. For example, of 11 proteins involved with sugar metabolism, all were found to go up in the tumors. On the other hand, of nine proteins involved with contraction and adhesion, six were found to be expressed at lower levels in the tumor samples than in the matching normal tissue.
Proteins Secreted by Visceral Adipose Tissue Visceral adipose tissue is much more than a storage medium for excess energy. This realization came with the discovery in 1994 of leptin and its wide-ranging activities, which include reduction of appetite, angiogenesis, hematopoiesis, and bone formation [48–50]. Visceral adipose tissue is now recognized as an endocrine organ that releases a large number of biologically active molecules
115
PROTEIN BIOMARKER DISCOVERY IN THE SECRETORY APPARATUS
called adipokines, which have pleiotropic effects on a variety of metabolic pathways [48,51]. Proteins released from adipose tissue include regulators of inflammation, inducers of angiogenesis, and modulators of hypertension, among others. Visceral adipose tissue, in particular, has been implicated in the release of proteins responsible for metabolic syndrome and type 2 diabetes [52]. For this reason, evaluation of visceral adipose tissue, either alone or in comparison to subcutaneous adipose tissue, is of great interest [52,53]. As a pilot study of the eventual comparison of proteins secreted from visceral and subcutaneous adipose tissue, visceral adipose tissue was obtained from three obese patients undergoing gastric bypass surgery. The contents of the secretory apparatus were isolated from each subject and analyzed by LC-MS/MS to determine feasibility and yield, as well as to catalog the proteins and verify that known adipocyte-secreted proteins could be found. In a limited analysis, 155 proteins were identified, including many of the well-known adipokines. Among the proteins identified were cytokines (TGFb1; IL 25), acutephase proteins (SPARC; lipocalin 2/NGAL), proteins that regulate blood pressure (angiotensinogen; ARTS-1), regulators of lipid metabolism (retinolbinding protein 4; lipoprotein lipase), and signaling molecules (galectin-1; prohibitin). Only 65 of the proteins were also found in one of three secretedprotein databases [54–56]. Although this could suggest that some of the proteins are contaminants, it could also point to the possibility that many secreted proteins are missed by conventional discovery methods that typically screen for proteins with an N-terminal signal sequence [44,57]. Some well-known
TABLE 1 Five Proteins Found in the Secretory Apparatus of Human Visceral Adipose Tissue That Are Targets for Therapies in Clinical Trials Target Leucocyte elastase Transforming growth factor-b-1 Clusterin
Transthyretin (TTR)
Condition
Drug
MoA
Phase
Acute respiratory distress syndrome Pulmonary fibrosis
Depelestat
Inhibitor of elastase activity
II
GC1008
Neutralizing antibody
I
Lung and prostate cancers Familial amyloidosis
OGX-011
Inhibition of gene expression
II
Diflunisal
Stabilization of TTR to prevent formation of amyloid fibrils Gene therapy
III
Pigment epithe- Macular lium-derived degeneraprotein tion
AdGVPEDF.11D
I
116
PROTEIN BIOMARKER DISCOVERY USING MS–BASED PROTEOMICS
secreted proteins, including FGF-1, FGF-2, IL-1, and galectins, do not have a classical N-terminal signal sequence [58]. Direct detection of these nonclassical secreted proteins will be an important approach to expanding on the secretome as well as validating the predictive tools currently under development for these proteins, such as SecretomeP [59]. An important aspect of studying the contents of the secretory apparatus is the ability to identify low-abundance secreted proteins that would be difficult to detect without specific antibodies, such as IL-25, TGFb1, and lipocalin 2. Also important is that the proteins identified have important biological functions that could eventually aid in the discovery of novel approaches to therapeutics. Of interest, therefore, is the observation that five of the proteins identified in the pilot adipose tissue study are current targets of therapies in clinical trials (Table 1). These therapies are not necessarily involved in adipose-related diseases, but still serve to demonstrate that proteins identified by this approach include those that are important to drug discovery.
SUMMARY The global proteomics approach for candidate biomarker discovery using label-free mass spectrometry-based methods is able to detect significantly differentially expressed proteins in a variety of biological systems. An advantage of this technology is that it enables varied study designs involving comparisons between large cohorts or multiple experimental conditions. For example, pharmacodynamic markers of drug treatment, markers of disease severity, or predictive markers of efficacy may be discovered, depending on the study design. Candidate biomarkers are combined into panels having sufficient discriminating power (positive and negative predictive values) to provide meaningful diagnostic or prognostic information and perform better than single-protein biomarkers against broad populations. Consequently, these high-confidence panels provide biomarkers that appear suitable for development into high-throughput screening assays. Multiple approaches to identifying lower-abundance proteins in plasma have demonstrated value, including antibody depletion of high-abundance proteins as well as the analysis of the contents of the secretory apparatus. These two techniques are seen as complementary in terms of the plasma protein concentration range interrogated. Typically, the limit of detection of direct analysis of plasma or serum proteins is approximately 10 to 100 ng/mL, which has proven to be a very productive range for candidate biomarker identification. The blood concentrations of proteins secreted from Golgi lumen are approximately two orders of magnitude below this, allowing for an even more comprehensive view of the low-abundance tissue-specific blood proteome. Further improvements are expected in the future, but clearly the tools now exist to identify novel and useful biomarkers of disease and drug response.
REFERENCES
117
REFERENCES 1. Marko-Varga G, Lindberg H, Löfdahl CG, et al. (2005). Discovery of biomarker candidates within disease by protein profiling: principles and concepts. J Proteome Res, 4:1200–1212. 2. Engwegen JY, Gast MC, Schellens JH, Beijnen JH (2006). Clinical proteomics: searching for better tumour markers with SELDI-TOF mass spectrometry. Trends Pharmacol Sci, 27:251–259. 3. Julka S, Regnier FJ (2005). Recent advancements in differential proteomics based on stable isotope coding. Brief Funct Genom Proteom, 4:158–177. 4. Silva JC, Denny R, Dorschel CA, et al. (2005). Quantitative proteomic analysis by accurate mass retention time pairs. Anal Chem, 77:2187–2200. 5. Wang W, Zhou H, Lin H, et al. (2003). Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Anal Chem, 75:4818–4826. 6. Wiener MC, Sachs JR, Deyanova EG, Yates NA (2004). Differential mass spectrometry: a label-free LC-MS method for finding significant differences in complex peptide and protein mixtures. Anal Chem, 76:6085–6096. 7. Wang G, Wu WW, Zeng W, Chou CL, Shen RF (2006). Label-free protein quantification using LC-coupled ion trap or FT mass spectrometry: reproducibility, linearity, and application with complex proteomes. J Proteome Res, 5:1214–1223. 8. DeSouza LV, Grigull J, Ghanny S, et al. (2007). Endometrial carcinoma biomarker discovery and verification using differentially tagged clinical samples with multidimensional liquid chromatography and tandem mass spectrometry. Mol Cell Proteom, 6:1170–1182. 9. Wu WW, Wang G, Baek SJ, Shen RF (2006). Comparative study of three proteomic quantitative methods, DIGE, cICAT, and iTRAQ, using 2D gel or LCMALDI TOF/TOF. J Proteome Res, 5:651–658. 10. Kislinger K, Gramolini AO, MacLennan DH, Emili A (2005). Multidimensional protein identification technology (MudPIT): technical overview of a profiling method optimized for the comprehensive proteomic investigation of normal and diseased heart tissue. J Am Soc Mass Spectrom, 16:1207–1220. 11. Calvo KR, Liotta LA, Petricoin EF (2005). Clinical proteomics: from biomarker discovery and cell signaling profiles to individualized personal therapy. Biosci Rep, 25:107–125. 12. Gao J, Friedrichs MS, Dongre AR, Opiteck GJ (2005). Guidelines for the routine application of the peptide hits technique. J Am Soc Mass Spectrom, 16:1231–1238. 13. Old WM, Meyer-Arendt K, Aveline-Wolf L, et al. (2005). Comparison of labelfree methods for quantifying human proteins by shotgun proteomics. Mol Cell Proteom, 4:1487–1502. 14. Lu P, Vogel C, Wang R, Yao X, Marcotte EM (2007). Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotechnol, 25:117–124. 15. Patil ST, Higgs RE, Brandt JE, et al. (2007). Identifying pharmacodynamic protein markers of centrally active drugs in humans: a pilot study in a novel clinical model. J Proteome Res, 6:955–966.
118
PROTEIN BIOMARKER DISCOVERY USING MS–BASED PROTEOMICS
16. Roy SM, Becker C (2007). Quantification of proteins and metabolites by mass spectrometry without isotopic labeling. Methods Mol Biol, 359:87–105. 17. Finney GL, Blackler AR, Hoopmann MR, Canterbury AD, Wu CC, MacCoss MJ (2008). Label-free comparative analysis of proteomics mixtures using chromatographic alignment of high-resolution LC-MS data. Anal Chem, 80(4):961–971. 18. Prakash A, Piening B, Whiteaker J, et al. (2007). Assessing bias in experiment design for large scale mass spectrometry-based quantitative proteomics. Mol Cell Proteom, 6:1741–1748. 19. Fang R, Elias DA, Monroe ME, et al. (2006). Mol Cell Proteom, 5:714–725. 20. Gaspari M, Verhoeckx KCM, Verheij ER, van der Greef J (2006). Integration of two-dimensional LC-MS with multivariate statistics for comparative analysis of proteomic samples. Anal Chem, 78:2286–2296. 21. Follettie MT, Pinard M, Keith JC Jr, et al. (2006). Organ messenger ribonucleic acid and plasma proteome changes in the adjuvant-induced arthritis model: responses to disease induction and therapy with the estrogen receptor-b selective agonist ERB-041. Endocrinology, 147:714–723. 22. Chelius D, Bondarenko P (2002). Quantitative profiling of proteins in complex mixtures using liquid chromatography and mass spectrometry. J Proteome Res, 1:317–323. 23. Lamontagne J, Butler H, Chaves-Olarte E, et al. (2007). Extensive cell envelope modulation is associated with virulence in Brucella abortus. J Proteome Res, 6:1519–1529. 24. Sonnen JA, Keene CD, Montine KS, et al. (2007). Biomarkers for Alzheimer’s disease. Expert Rev Neurother, 7:1021. 25. Maurya P, Meleady P, Dowling P, Clynes M (2007). Proteomic approaches for serum biomarker discovery in cancer. Anticancer Res, 27:1247. 26. Fu Q, Van Eyk JE (2006). Proteomics and heart disease: identifying biomarkers of clinical utility. Expert Rev Proteom, 3:237. 27. Burtis CA, Aswood EA, Burns DE (2005). Tietz Textbook of Clinical Chemistry. Elsevier Saunders, Philadelphia. 28. Anderson NL, Anderson NG (2002). The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteom, 1:845. 29. Thadikkaran L, Siegenthaler MA, Crettaz D, Queloz PA, Schneider P, Tissot JD (2005). Recent advances in blood-related proteomics. Proteomics, 5:3019–3034. 30. Hendriks MM, Smit S, Akkermans WL, et al. (2007). How to distinguish healthy from diseased? Classification strategy for mass spectrometry-based clinical proteomics. Proteomics, 7:3672–3680. 31. Sitnikov D, Chan D, Thibaudeau E, Pinard M, Hunter JM (2006). Protein depletion from blood plasma using a volatile buffer. J Chromatogn B, 832:41–46. 32. Bjorhall K, Miliotis T, Davidsson P (2005). Comparison of different depletion strategies for improved resolution in proteomic analysis of human serum samples. Proteomics, 5:307–317. 33. Whiteaker J, Zhang H, Eng JK, et al. (2007). Head-to-head comparison of serum fractionation techniques. J Proteome Res, 6:828–836. 34. Sitnikov D, Hunter JM, Hayward C, et al. (2007). Peptide shifter: enhancing separation reproducibility using correlated expression profiles. J Am Soc Mass Spectrom, 18:1638–1645.
REFERENCES
119
35. Peng J, Elias JE, Thoreen CC, Licklider LJ, Gygi SP (2003). Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LCMS/ MS) for large-scale protein analysis: The yeast proteome. J Proteome Res, 2:43–50. 36. Kearney P, Butler H, Eng K, Hugo P (2008). Harmonizing protein identification with protein expression data. J Proteome Res, 7:234–244. 37. Lekpor K, Benoit MJ, Butler H, et al. (2007). An evaluation of multidimensional fingerprinting in the context of clinical proteomics. Proteom Clin Appl, 1:457–466. 38. Rifai N, Gillette MA, Carr SA (2006). Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat Biotechnol, 24:971. 39. Barnidge DR, Goodmanson MK, Klee GC, Muddiman DC (2004). Absolute quantification of the model biomarker prostate-specific antigen in serum by LC-MS/MS using protein cleavage and isotope dilution mass spectrometry. J Proteome Res, 3:644–652. 40. Bondar OP, Barnidge DR, Klee EW, Davis BJ, Klee GG (2007). LC-MS/MS quantification of Zn-α2 glycoprotein: a potential serum biomarker for prostate cancer. Clin Chem, 53:673–678. 41. Folstein MF, Folstein SE, McHugh PR (1975). Mini-Mental State: A practical method for grading the state of patients for the clinician. J Psych Res, 12:189–198. 42. Kratchmarova I, Kalume DE, Blagoev B, et al. (2002). A proteomic approach for identification of secreted proteins during the differentiation of 3T3-L1 preadipocytes to adipocytes. Mol Cell Proteom, 1:213–222. 43. Chen X, Cushman SW, Pannell LK, Hess S (2004). Quantitative proteomic analysis of the secretory proteins from rat adipose cells using 2D liquid chromatographyMS/MS approach. J Proteome Res, 4:570–577. 44. Alvarez-Llamas G, Szalowska E, de Vries MP, et al. (2007). Characterization of the human visceral adipose tissue secretome. Mol Cell Proteom, 6:589–600. 45. Lee MC, Miller EA, Goldberg J, Orci L, Schekman R (2004). Bi-directional protein transport between the ER and Golgi. Annu Rev Cell Dev Biol, 20: 87–123. 46. Lanoix J, Paramithiotis E (2008). Secretory vesicle analysis for discovery of low abundance plasma biomarkers. Expert Opin Med Diagn, 2(5):475–485. 47. Sircar K, Gaboury L, Ouadi L, et al. (2006). Isolation of human prostatic epithelial plasma membranes for proteomics using mirror image tissue banking of radical prostatectomy specimens. Clin Cancer Res, 12:4178–4184. 48. Yang R, Barouch LA (2007). Leptin signaling and obesity: cardiovascular consequences. Circ Res, 101:545–559. 49. Malendowicz LK, Rucinski M, Belloni AS, Ziolkowska A, Nussdorfer GG (2007). Leptin and the regulation of the hypothalamic–pituitary–adrenal axis. Int Rev Cytol, 263:63–102. 50. Louis GW, Myers MG (2007). The role of leptin in the regulation of neuroendocrine function and CNS development. Rev Endocr Metab Disord, 8:85–94. 51. Lago F, Dieguez C, Gomez-Reino J, Gualillo O (2007). Adipokines as emerging mediators of immune response and inflammation. Nat Clin Pract Rheumatol, 3:716–724.
120
PROTEIN BIOMARKER DISCOVERY USING MS–BASED PROTEOMICS
52. Iannucci CV, Capoccia D, Calabria M, Leonetti F (2007). Metabolic syndrome and adipose tissue: new clinical aspects and therapeutic targets. Curr Pharm Des, 13:2148. 53. Vohl MC, Sladek R, Robitaille J, et al. (2004). A survey of genes differentially expressed in subcutaneous and visceral adipose tissue in men. Obes Res, 12:1217–1222. 54. Secreted Protein Database. http://spd.cbi.pku.edu.cn/. 55. Polanski M, Anderson NL (2006). A list of candidate cancer biomarkers for targeted proteomics. Biomarker Insights, 2:1–48. 56. States DJ, Omenn GS, Blackwell TW, et al. (2006). Challenges in deriving highconfidence protein identifications from data gathered by a HUPO plasma proteome collaborative study. Nat Biotechnol, 24:333–338. 57. Arnoys EJ, Wang JL (2007). Dual localization: proteins in extracellular and intracellular compartments. Acta Histochem, 109:89–110. 58. Walter N (2005). Unconventional secretory routes: direct protein export across the plasma membrane of mammalian cells. Traffic, 6:607–614. 59. Bendtsen JD, Jensen LJ, Blom N, von Heijne G, Brunak S (2004). Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel, 17(4):349–356.
6 QUANTITATIVE MULTIPLEXED PATTERNING OF IMMUNERELATED BIOMARKERS Dominic Eisinger, Ph.D., Ralph McDade, Ph.D., and Thomas Joos, Ph.D. Rules Based Medicine, Inc., Austin, Texas
INTRODUCTION The human immune system is a complex cellular and molecular network unlike any other biological system. The complex interplay of multiple immune cells that rid the body of foreign invaders and diseased tissues is governed by a vast number of cell-to-cell communications. In this chapter we focus on the expanding utility of measuring these secreted protein messengers as biomarkers for drug development and diagnostic efforts that are relevant to a variety of diseases. Herein we use a broad acronym, IB (immune-related biomarker), to represent protein biomarkers for a variety of immune-related disorders. The IBs most prominently monitored in drug development efforts are the cytokines and chemokines, but also include acute-phase reactants, tissue remodeling factors, vascular markers, and growth factors. This complexity has led to the rapid adoption of newer multiplexed measurement tools. Key features of these include the quantitation, sensitivity, and perhaps most important, precision of each measurement. Ecosanoids, prostaglandins, prostacyclins, the thromboxanes, and the leukotrienes are also key elements of immune processes but do not fit under the paradigm of protein biomarkers and are not addressed in this chapter. Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
121
122
QUANTITATIVE MULTIPLEXED PATTERNING
This ability to measure multiple IBs simultaneously from a small biological sample has ushered in an era of exploration known as biomarker patterning. It would be convenient if the single biomarker paradigm of the past were to continue unabated. However, the rate of introduction of new U.S. Food and Drug Administration (FDA)–approved diagnostic protein assays has fallen dramatically to an average of one or less per year [1]. This unfortunate trend does not suggest relying on a single biomarker but, rather, a group or set of biomarkers for drug development and diagnostic efforts. The theory of biomarker stacking is that by combining the predictive power of each marker, the optimal sensitivity and specificity of the multiplexed test will be achieved [2]. Rather than a single biomarker to fit with a specific disease or drug effect, a pattern of multiple biomarkers provides stronger predictive value, due to their “stacking” effects. In this chapter we focus on the multiplexed immunoassay measurement technologies as well as some recent and relevant examples of successful biomarker patterning. These approaches are rapidly bringing multiplexed biomarker assays into everyday use in drug development and diagnostics for personalized medicine.
MEASUREMENT PROTEOMICS VS. TRADITIONAL PROTEOMICS Multiplexed immunoassays used to identify novel biomarker patterns is a form of proteomics. However, it is often confused with traditional proteomics, which typically use complex separation technologies such as two-dimensional gel electrophoresis and/or mass spectrometry to identify patterns of peaks that are peptides or proteins of unknown origin. This approach provides new targets for future research but lacks the quantitation and certainly the precision necessary for clinical use. In contrast, the approach discussed here measures only those analytes where a specific assay has been developed. It requires specific ligand-binding reagents, generally antibodies, and a physical address to localize that binding reaction. This technique has been described as measurement proteomics, where a restricted set of analytes is measured and data reported in absolute terms (e.g., pg/mL) [3]. One caveat of measurement proteomics that must be addressed is the inherent variability among individuals for expression of different IBs. This variation results from a blend of genetic and environmental factors. Before one can accurately predict the biological response to a disease or its therapy, the natural variance within a population as well as within a person must be determined for the protein analytes in question. This is typically performed by assessing the biomarker patterning data for a control group of people while simultaneously measuring the identical markers in the diseased or drug-treated group. Statistical tools and means are widely available to use either cutoff values for individual analytes or multivariate analysis for multiple biomarkers in order to achieve a valuable test. Experience has shown that the level of a given protein biomarker or biomarker pattern often correlates with a particu-
MEASUREMENT PROTEOMICS PLATFORM TECHNOLOGIES
123
lar disease state. In addition, these biomarker levels in serum or plasma have been shown to be stable in a person over time [1,2].
MEASUREMENT PROTEOMICS PLATFORM TECHNOLOGIES ELISA-Based Immunometric (Sandwich) Immunoassays Many of the key IBs (cytokines and chemokines in particular) exist at relatively low concentrations in serum or plasma, requiring the femtomolar sensitivity provided by the sandwich immunoassay. Sandwich immunoassays get their name from the fact that the antigen is “sandwiched” in between two antibodies. This is typically a more sensitive technique than single-antibody competitive assays. Traditional single-plex methods such as the 96-well microtiter plate–based enzyme-linked immunoassay (ELISA) use a capture antibody anchored either hydrophobically or covalently to the polystyrene wall of each well. After binding of antigen from the liquid sample, the assay is completed by adding a second antibody solution specific for the same target, thus completing the sandwich. A reporter signal attached to the second antibody provides the quantitative signal. ELISAs have been the standard in the research lab for decades, while the same sandwich immunoassay technique has been applied in a more industrial fashion in fully automated random access immunoanalyzers such as the Immulite from Siemens, built for FDA-approved assays in the clinical lab.
Multiplexed Immunometric (Sandwich) Immunoassays The random access immunoanalyzers found in the clinical lab do provide a form of multiplexing by physically splitting the loaded serum or plasma sample into multiple single-plex assays. Although this is an efficient approach for the work flow in a lab, it still requires a minimum of sample volume for each assay and does not reduce the cost per assay significantly. True multiplexing, as delivered by today’s newer platforms, does provide significant savings in both the sample volume requirements and the cost per assay. Although a number of interesting multiplexed immunoassay platforms have been launched in the last decade, two primary types of platforms have survived as viable commercial entities: planar arrays and microsphere-based arrays.
Planar Arrays Planar microarrays are highly miniaturized and parallelized solid-phase assay systems that use a large number of different capture molecules immobilized in physically addressable locations called microspots. These spots, which typically have diameters of less than 250 μm and are arrayed in rows and columns,
124
QUANTITATIVE MULTIPLEXED PATTERNING
are where the capture antibody for each assay is deposited. This deposition by physical touching with pins or spraying with piezoelectric tips is a technologically demanding step that was first pioneered in the gene expression industry for detection of nucleic acids by sequence-specific hybridization [4]. This approach has been used to generate antibody arrays consisting of several hundred capture antibodies by depositing them on nitrocellulose-coated microscope slides. Typically, these large arrays have then been probed either by using a direct labeling approach, where the mixture of antigens is fluorescently labeled; or conversely, a cocktail of second antibodies labeled with a fluorescent tag have been used to complete the sandwich assays. In either case, quantitation was achieved by imaging the spots with a change-coupled device (CCD) camera. The two limiting features of this approach have been (1) the irreproducibility of the spotting approach, where each new array is considered a new lot of material from a quality control point of view; and (2) the high probability of antibody–antibody interactions when greater than 25 antigen– antibody reactions are performed in a shared physical space. The second is a limiting factor for all multiplexed immunoassays and has been considered in the three most popular platforms. Microtiter-Based Planar Arrays Glass slide–based immunoassay microarrays have been adapted to the microtiter plate format to increase throughput and to use existing automated solutions. Although this limits the number of microspots available per well, it helps to avoid the problem of antibody–antibody interactions. Such an approach was adapted by Thermo Scientific with their SearchLight Protein Array Technology. On the bottom of each microtiter well, an array consisting of up to 16 microspot features are generated, which allows the measurement of 16 analytes from 50 μL of sample. After incubation with sample, the wells are washed as with an ELISA and the captured antigen visualized using a secondary antibody labeled with a chemiluminescent probe. Quantitation of captured analytes is provided by a CCD imaging system using a calibration curve set up in parallel wells (www.piercenet.com). Meso Scale Discovery (http://www.meso-scale.com) also uses a microtiter well microarray strategy called Multi-Array. Each well is arranged with 1 to 25 carbon microspots that are in essence miniature electrodes integrated into the bottom of the plate. Onto each carbon electrode the capture antibodies are immobilized, which after multiplexed immunoassay development results in a bound electrochemiluminescent (ECL) label. Quantitation of each assay is performed by passing electrical current through each electrode, producing a luminescent signal which is detected by a CCD camera imaging system. The microtiter plate format allows the use of standard liquid-handling automation with a multiplexed control and calibration strategy. As each Multi-Array plate can be imaged across all 96 wells within a few seconds, the sample throughput of this approach is large and quite scalable. The ECL detection system is also
MEASUREMENT PROTEOMICS PLATFORM TECHNOLOGIES
125
thought to provide significantly better sensitivity for immunoassays than can be provided by chemiluminescence or fluorescence. The two planar array systems described above are typically operated in a manual mode, although both are capable of interfacing with automated liquidhandling systems. One fully automated planar array system in the marketplace is the Evidence, an automated biochip system developed by Randox Laboratories (http://www.randox.com). It provides multiplexed analysis of miniaturized and parallelized immunoassays in a macroarray format. This macroarray contains 25 features and uses a chemiluminescence-based readout. Several immunoassays panels are available, including fertility, cardiac disease, tumors, cytokines and growth factors, and drug residue panels. Microsphere Arrays Robust and flexible microsphere-based array systems have been developed over the last decade [5]. For the purposes of this review, there is only one microsphere-based array developed and commercialized in a manner parallel to the planar array discussed previously that has succeeded in garnering the most significant portion of the multiplexed immunoassay market: xMAP from Luminex Corporation (http://www.luminexcorp.com). Luminex technology can perform up to 100 multiplexed, microsphere-based assays in a single reaction vessel by combining optical classification schemes, biochemical assays, flow cytometry, and advanced digital signal-processing hardware and software. Multiplexing is accomplished by assigning each analyte-specific assay a microsphere set labeled with a unique fluorescence signature. To attain 100 distinct microsphere signatures, two fluorescent dyes, red and infrared, are mixed in various combinations using 10 intensity levels of each dye (i.e., 10 × 10). Each batch or set of microspheres is encoded with a fluorescent signature by impregnating the microspheres with one of these dye combinations. After the encoding process, assay-specific capture antibodies are conjugated covalently to each unique set of microspheres. Coupling is performed on large numbers of individual microspheres (107 to 109 microspheres) simultaneously within each unique set, resulting in low microsphere-to-microsphere variability. After optimizing the parameters of each assay separately, multianalyte profiles (MAP) are performed by mixing up to 100 different sets of the microspheres in a single well of a 96-well microtiter plate. A few microliters of sample is added to the well and allowed to react with the microspheres. The assay-specific capture reagent on each microsphere binds the analyte of interest. A cocktail of assay-specific, biotinylated antibodies is reacted with the microsphere mixture, followed by a streptavidin-labeled fluorescent “reporter” molecule (typically, phycoerythrin). Finally, the multiplex is washed to remove unbound detecting reagents. After washing, the mixture of microspheres is analyzed using a Luminex instrument which uses hydrodynamic focusing to pass the microspheres in
126
QUANTITATIVE MULTIPLEXED PATTERNING
single file through two lasers. As each individual microsphere passes through the excitation beams, it is analyzed for size, encoded fluorescence signature, and the amount of fluorescence generated in proportion to the analyte. Microsphere size, determined by measuring the 90 ° light scatter as the microspheres pass through a red diode laser (633 nm), is used to eliminate microsphere aggregates from the analysis. While in the red excitation beam, the encoded red and far-red dyes are excited and the resulting fluorescence signature (ratio 660 nm/720 nm) is filtered, measured using avalanche photodiodes, and classified to a microsphere set. Since each microsphere is encoded with a unique signature, the classification identifies the analyte being measured on that individual microsphere. As the microsphere passes through a green diode-pumped solid-state laser (532 nm), the fluorescence “reporter” signal (580 nm) is generated in proportion to the analyte concentration, filtered, and measured using a photomultiplier tube. Data acquisition, analysis, and reporting are performed in real time on all microsphere sets included in the MAP. A minimum of 50 individual microspheres from each unique set are analyzed and the median value of the analyte-specific, or reporter, fluorescence is logged. Using calibrators and controls of known analyte quantity, sensitive and quantitative results are achieved with precision enhanced by the redundant oversampling at each data point. xMAP provided several key advantages over the two-dimensional planar arrays. First, consistent coating of the polystyrene microsphere surface with antibodies has been performed in the diagnostic industry for decades, so it is a reproducible technique. Microsphere coating does not suffer from the irreproducibility that plagues planar arrays. Second, the lot size for manufacturing of a microsphere array is generally in the thousands to millions of tests per lot. Lower lot-to-lot variability coupled with extremely large manufacturing lots means that the problem of lot variation is greatly diminished. The primary disadvantage of microsphere arrays is throughput, as each sample must be read in turn in the flow system. So far, no group has solved this problem be developing multiple-channel flow systems, but this is the next logical step of the technology.
APPLICATIONS AND EXAMPLES IN DRUG DEVELOPMENT AND DIAGNOSTICS Whether one uses planar arrays or microsphere arrays, the resulting quantitative, multiplexed data are then mined for distinctive patterns of biomarkers that indicate disease, drug toxicity or efficacy, or some other useful biochemical phenotype. We will not discuss the data mining techniques available but simply describe several examples where data mining of these complex data sets has revealed meaningful biomarker patterns. Typically, data mining is first done using traditional statistics to identify those analytes whose values are significantly different between two experimental groups. This is often done by
APPLICATIONS AND EXAMPLES
127
determining mean analyte values and applying Student’s t-tests to determine p-values of significance. The next step is often to use more complex multivariate analysis tools, such as principal component analysis, for pattern recognition. The reports of IB monitoring cited most often are related to inflammatory disease. Asthma, rheumatoid arthritis (RA), multiple sclerosis (MS), systemic lupus erythematosus (SLE), chronic obstructive pulmonary disease (COPD), and psoriasis (PS) are thought of as classic inflammatory diseases. In recent years, a host of other diseases have been found to have a significant inflammatory component. These include cardiovascular disease (CVD), central nervous system diseases such as Alzheimer disease, most infectious diseases, and various cancers [6]. IBs are thus very useful for quantifying, characterizing, and monitoring the inflammatory and anti-inflammatory components of a disease in addition to therapeutic benefits during drug trials. Measurement of inflammatory markers is a popular application of multiplexed profiling in drug development and diagnostic efforts. The duration and pattern of cytokine and chemokine expression in serum or plasma have been the most common uses of multiplexed immunoassays. Common inflammatory mediators are often cited (e.g., TNFα, IL-1, IL-6, IL-8, MIP-1) due to the large set of publications describing the association of these classical inflammatory markers with numerous diseases, conditions, and drug treatments. Although important, other IBs that are related directly or indirectly to the disease and/ or therapy in question will often add significant prognostic value to biomarker patterning efforts. Inflammatory Bowel Disease in a Mouse Model Crohn disease and ulcerative colitis are chronic inflammatory diseases of the gut that cause tremendous human suffering and loss of productivity. There is a clear need for better diagnostic tools that are less invasive or costly than endoscopy and histology. Torrence et al. published a study [7] of an inflammatory bowel disease model in mouse where they identified a serum-based IB pattern using data generated from a microsphere array. These data were generated using serum samples of mice infected with Helicobacter bilis, which causes an inflammation in the gut. These IB patterns were then compared with those of control animals. Samples were sent to Rules-Based Medicine (RBM), Inc., Austin, Texas for biomarker analysis using their RodentMAP (http:// www.rbmmaps.com). RBM has an automated testing laboratory that utilizes xMAP technology under regulatory compliance to both good laboratory practices (GLPs) and the Clinical Laboratory Improvement Act (CLIA). RBM performed a multianalyte profile consisting of 58 different quantitative immunoassays and reported the data for each analyte in mass per milliliter of sample. Results indicated that serum levels of IL-11, IL-17, interferon gammainducible protein (IP-10), lymphotactin, monocyte chemoattractant protein
128
QUANTITATIVE MULTIPLEXED PATTERNING
(MCP-1), and vascular cell adhesion molecule (VCAM-1) were elevated in early disease. In later, more severe disease, IL-11, IP-10, haptoglobin, matrix metalloproteinase-9 (MMP-9), macrophage inhibitory protein 1 alpha (MIP 1α), fibrinogen, immunoglobulin A, apolipoprotein A1, and IL-18 were elevated. All of these biomarkers are considered IBs with the exception of apolipoprotein A1, which is considered a metabolic biomarker. Interestingly, all of the IBs correlated with histopathological scores. Antibiotic treatment of the infected mice both improved the histopathology scores and decreased the mean serum values for most of the IBs. These data suggest that the serum IB patterns could be useful for both diagnosis and prognosis of similar human diseases. Scleroderma Scleroderma is an autoimmune disease that manifests itself in the skin, resulting in fibrosis and damage to the vasculature. It is thought that IBs play a major role in the course of the disease and therefore should be good targets for both therapeutic intervention and diagnostic/prognostic development. Recently, Duan et al. [8] have compared the serum IB patterns of scleroderma patients with controls using the RBM testing approach, measuring 188 different biomarkers. In addition, this group compared the serum protein expression patterns with the messenger RNA expression from the monocytes and lymphocytes isolated from these patients’ peripheral blood. As in the previous mouse study, the majority of biomarkers that were differentially expressed were IBs. Acute-phase reactants (fibrinogen, haptoglobin, and von Willebrand factor), cytokines (IL-1α, IL-16, soluble tumor necrosis factor receptor 2, and MIP-1α), tissue remodeling proteins (intercellular adhesion molecule type 1 and tissue inhibitor of metalloproteinase type 1), and one metabolic biomarker, apolipoprotein CIII, were all elevated in the scleroderma patient sera compared with normal age-matched controls. Comparison of the gene expression patterns from the various immune cell populations correlated poorly with the serum IB profiling data. This was not surprising, as these two profiles often do not correlate well. The reason(s) for the gene expression and IB profiling differences are unknown. Results indicated that the pattern of IB elevation in scleroderma could be used as a more specific diagnostic to help the rheumatologist differentiate scleroderma from other autoimmune conditions or skin maladies. More important, the intensity of IB elevation could be used to monitor therapeutic intervention in clinical trials as well as after drugs developed for this indication are approved. Rheumatoid Arthritis Rheumatoid arthritis (RA) is an autoimmune disease where the current focus of drug development efforts is devoted to specific blockade of certain cytokine
APPLICATIONS AND EXAMPLES
129
activities. It is therefore highly likely that IB monitoring will play an important role in RA disease prognosis and drug development programs. A brief review of the cytokine blockade strategies for RA starts with the inflammatory cytokine TNFα, a major therapeutic advancement in the treatment of RA [9]. There are now several TNFα biological drugs on the market (Humira, Abbott; Enbrel, Amgen; Remicade, Centicor) that in addition to treatment of RA, are also being used off-label for psoriasis and for Crohn disease. Blockade of IL-1 is another option available for RA therapy. Anakinra (Kineret, Amgen), a recombinant interleukin-1 receptor antagonist, is used for treatment of rheumatoid arthritis. The next major advancement in anticytokine therapy for RA appears to be the humanized anti-IL-6 receptor blocking antibody tocilizumab (Actemra, Roche). The acute-phase reactant serum amyloid A was identified in a serum profile that differentiated response from no response to Actemra therapy [10]. Intervention of IL-15 also looks promising for treatment of RA and psoriatic arthritis in addition to pulmonary inflammatory diseases [11]. Targeted cytokine neutralization as a therapeutic option in autoimmune diseases such as RA will probably lead to novel IB patterns with both prognostic and diagnostic value during drug efficacy trials. For example, in a recent study of Sjögren syndrome, a panel of 25 IBs provided diagnosis and disease management in this debilitating condition [12]. Cardiovascular Disease Over the past 25 years, the number of risk factors for coronary artery disease has increased dramatically. Systemic inflammation and abnormal lipoprotein metabolism are important contributors to the progression of atherosclerotic disease leading to plaque instability [13]. Even so, the vast majority of patients who experience an acute coronary event have no prior symptoms. Further complicating the diagnosis of acute coronary syndrome (ACS) is the frequent occurrence of patients who present symptoms of chest pain that can be attributable to other completely nonrelated events, such as acute gastroesophageal reflux disease. Given the serious life-threatening nature of ACS, an improvement in the early diagnosis of heart attacks would be a major medical advancement. Matrix metalloproteinases (MMPs) are promising IBs for early identification of high-risk patients. Pro-inflammatory cytokines stimulate the secretion of MMPs, and an increase in systemic MMPs has been documented in ACS and correlated to unstable plaque metabolism [13]. In a recent study by Gurbel and co-workers, MMP-2, MMP-3, and MMP-9 were sensitive and specific markers differentiating asymptomatic coronary artery disease (A-CAD) from symptomatic CAD (S-CAD). The stable quiescent disease group of A-CAD patients were documented by angiography, a history of coronary artery bypass grafting, or revascularization (stenting or balloon angioplasty). The progressive S-CAD group of patients were enrolled immediately before coronary intervention and presented either myocardial infarction or stable angina.
130
QUANTITATIVE MULTIPLEXED PATTERNING
Patients with S-CAD had markedly elevated levels of specific MMP-2 and MMP-9, whereas patients with A-CAD had significantly greater levels of MMP-3. In addition to the MMPs, other IBs (TIMP-1, C-reactive protein IL-8, IL-10, RANTES, endothelin, plasminogen activator inhibitor type 1, apolipoprotein C-III, and IL-1a) reinforced the MMP results for distinguishing the two groups [13]. Hepatitis C Wright and co-workers documented cytokine profiles of patients grouped based on their viral titers before and after standard therapy [14]. They found that they were able to demonstrate the prognostic power of serum cytokine profiling in chronic hepatitis C virus (HCV) infection. The study demonstrated that overall serum cytokine levels were significantly higher in patients than in controls and that the levels dropped significantly after therapy in concert with their viral titer. Distinct cohort subgroups based on changes in viral titers correlated to specific sets of cytokines that decreased in each group. This could allow for new efforts to stratify patients and increase efforts to find improved therapies.
Ovarian Cancer Ovarian cancer has been called the silent killer, as it is often not diagnosed until late in the disease progression, when therapeutic intervention is rarely successful. The unmet medical need for a sensitive and specific diagnostic test that will detect early-stage ovarian cancer and differentiate it from benign ovarian disease is strong. Bertenshaw et al. recently reported the identification of a biomarker pattern of eight serum analytes that using a multivariate algorithm could differentiate stage 2 ovarian cancer patients from benign ovarian disease [15]. In addition to Cancer Antigen (CA)-125, seven IBs were identified (C-reactive protein, epidermal growth factor, IL-10, IL-8, connective tissue growth factor, haptoglobin, and MMP-1) as being of importance to a potential diagnostic. These IBs, as well as CA-125, were identified by comparing the plasma biomarker patterns of samples from ovarian cancers in various stages, common benign gynecological conditions, as well as age-matched normal controls.
EMERGING PARADIGM: EX VIVO IMMUNE BIOMARKER MONITORING WITH WHOLE-BLOOD CELL CULTURES A fortuitous aspect of applying measurement proteomics to the immune system is the relative ease with which many properties of the immune system can be quantified outside the human body, ex vivo. (Herein, ex vivo refers to
EMERGING PARADIGM
131
the in vitro culture and ex vivo analysis of the immune system outside the body.) Both innate and adaptive immune system responses in blood leukocytes can be measured with multiplex IB measurements. Indeed, much of our understanding of the immune system has been generated by experimentation on human leukocytes cultures ex vivo. Given the complexity of all the different types of immune cells present in blood, a reductionist approach of isolating subpopulations of leukocytes was invaluable. For example, density gradient centrifugation of blood to obtain peripheral blood mononuclear cells (PBMCs) is a common procedure to obtain cell populations enriched for lymphocytes and monocytes but depleted of granulocytes, platelets, and red blood cells [16]. The implementation of ex vivo analysis of the immune system has only recently been deployed in the clinical drug development setting with the primary applications of safety (immunotoxicity) and pharmacodynamics. An example of an immunotoxicity application would be the Tegenero incident of 2006, in which the CD28 superagonistic antibody TGN1412 was administered to healthy phase I volunteers. Within a few hours, a single intravenous dose induced severe lymphopenia and a systemic cytokine storm, leading to multiorgan failure [17]. Clearly, it is preferable to identify the potential for such a catastrophic drug effect during an ex vivo study. An example of a pharmacocynamic application would be quantifying IBs in the culture media that correlate with the drug mechanism of action. The utility of ex vivo analysis of the immune system for drug development programs can be divided into two broad categories. Initially, a candidate drug can be tested ex vivo for its effects on the immune system in a resting inactivated state in addition to an activated state created by the addition of an immune stimulant to mimic inflammation. Later in the drug development process, a candidate drug can be tested in vivo in a trial subject, and the subject’s immune system tested ex vivo with and without immune stimulation. Unfortunately, the historic research methodology of utilizing PBMCs for ex vivo studies is not very adaptable to the clinical setting, nor does it provide a physiologically relevant situation for drug development studies. Compared to isolated PBMC cultures, the whole-blood culture approach has a major advantage, due to the fact that all components of the blood, including cellular (e.g., granulocytes: neutrophils, basophils, eosinophils; lymphocytes, monocytes, NK-cells), pseudocellular (e.g., red blood cells, platelets), or subcellular components (e.g., enzyme inhibitors, complement system, kininogens), are present. Therefore, functional assays performed with a whole-blood culture reflect the situation of a person’s whole immune repertoire much better than do PBMC-based cultures [18–20]. In a whole-blood stimulation assay, the leukocytes are within the center of a well-composed concert of regulatory components that relies on an entire series of feedback loops, which modulate in a positive or negative manner the function of the immune system. T-cells, for example, can amplify the activities of monocytes and macrophages, and platelets produce and release precursors of leukotrienes, which are taken up
132
QUANTITATIVE MULTIPLEXED PATTERNING
and processed further to become full-fledged mediators by polymorphonuclear granulocytes [21,22]. Red blood cells, which are not present in PBMC preparations, can produce leukotrienes and modulate the action of chemokines by expressing receptors for chemokines on their surface [23]. Plasma proteins like the complement system and the kininogens form readily available precursors of peptides exhibiting strong immunoregulating activities [24]. Some of these precursors are cleaved to become active only in the presence of enzymes, released for example by activated polymorphonuclear granulocytes [25]. Using whole-blood cultures, the functionality of the human immune system can be analyzed as close as possible to the in vivo situation. The difficulty with implementing whole-blood cultures in the clinical setting is the need for a specialized cell culture laboratory and highly trained technicians, a very expensive prospect. Shipping samples to a centralized cell culture facility dramatically reduces the quality and reproducibility of results. Recently, a standardized, fully closed, whole-blood culture system has been developed for the clinical setting that does not require cell culture facilities or specialized personnel [26,27]. The TruCulture system incorporates a 3-mL blood collection tube with cell culture media to which test substances can be added. Blood is drawn from a subject directly into a collection tube that contains a medium for an immediate and standardized initiation of the whole-blood culture. After incubation in a 37 °C heating block, the cells are physically separated from the culture medium by insertion of a valve separator, and the unit is frozen until IBs in the culture medium can be analyzed. This system will greatly aid the routine implementation of ex vivo immune monitoring in clinical trials.
CONCLUSIONS History has shown repeatedly that the application of a new breakthrough technology is the driving force into the next phase of meaningful discovery. The development and commercialization of quantitative multiplexed immunoassay platforms is a prime example. The two primary array platforms, planar and microsphere, provide investigators with more comprehensive biomarker data for multiple analytes with the same sensitivity and precision of single-plex testing. The technology has been invaluable to basic research and drug discovery efforts and is now moving rapidly into the clinical arena of drug development, with the goal that useful biomarker patterns will allow more informed drug development decisions to be made. Immune-related biomarker patterning has great potential to improve the success rate of drug approval in a highly cost-effective manner. The efficiency of simultaneous profiling of multiple immune-related biomarkers with very small sample volumes has revolutionized our ability to discover and validate biomarker patterns in order to stratify patients and find markers of efficacy and safety. Central to many diseases are the immune-related biomarkers dis-
REFERENCES
133
cussed in this chapter. Their modulation as drug targets themselves has also become a core theme for many of the autoimmune diseases. Finally, innovative products designed for the clinical trial setting, such as the TruCulture whole-blood culture system, will help bring immune-related biomarker patterning into the everyday realm of clinical drug development and companion diagnostics.
REFERENCES 1. Anderson NL, Anderson NG (2002). The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteom, 1(11):845–867. 2. Bodovitz S, Patterson S (2003). Protein biomarker strategies. Drug Discov World, Fall, pp. 67–78. 3. Heuer J, Cummins D, Edmonds B (2005). Multiplex proteomic approaches to sepsis research: case studies employing new technologies. Expert Rev Proteom, 2(5):669–680. 4. Kricka LJ, Master SR, Joos TO, Fortina P (2006). Current perspectives in protein array technology. Ann Clin Biochem, 43(6):457–467. 5. Templin M, Stoll D, Bachmann J, Joos T (2004). Protein microarrays and multiplexed sandwich immunoassays: What beats the beads? Combin Chem High Throughput Screening, 7(3):223–229. 6. Lucas S, Rothwell NJ, Gibson RM (2006). The role of inflammation in CNS injury and disease. Br J Pharmacol, 147:S232–S240. 7. Torrence A, Brabb T, Bielefeldt-Ohmann H, et al. (2008). Serum biomarkers in a mouse model of bacterial-induced inflammatory bowel disease. Inflamm Bowel Dis, 14(4):480–490. 8. Duan H, Fleming J, Pritchard D, et al. (2008). Combined analysis of monocyte and lymphocyte messenger RNA expression with serum protein profiles in patients with scleroderma. Arthritis Rheum, 58(5):1465–1474. 9. Fogler WE (2008). Treating rheumatoid arthritis with DMARDs and biologics. Drug Discov World, Fall, pp. 15–18. 10. Miyamae T, Malehorn D, Lemster B, et al. (2005). Serum protein profile in systemic-onset juvenile idiopathic arthritis differentiates response versus nonresponse to therapy. Arthritis Res Ther, 7(4):R746–R755. 11. McInnes IB, Gracie JA (2004). Interleukin-15: a new cytokine target for the treatment of inflammatory diseases. Curr Opin Pharmacol, 4(4):392–397. 12. Szodoray P, Alex P, Brun JG, Centola M, Jonsson R (2004). Circulating cytokines in primary Sjogren’s syndrome determined by a multiplex cytokine array system. Scand J Immunol, 59(6):592–599. 13. Gurbel P, Kreutz R, Bliden K, DiChiara J, Tantry U (2008). Biomarker analysis by fluorokine multianalyte profiling distinguishes patients requiring intervention from patients with long-term quiescent coronary artery disease: a potential approach to identify atherosclerotic disease progression. Am Heart J, 155(1): 56–61.
134
QUANTITATIVE MULTIPLEXED PATTERNING
14. Wright H, Alex P, Nguyen T, et al. (2005). Multiplex cytokine profiling of initial therapeutic response in patients with chronic hepatitis C virus infection. Dig Dis Sci, 50(10):1793–1803. 15. Bertenshaw GP, Yip P, Seshaiah P, et al. (2008). Multianalyte profiling of serum antigens and autoimmune and infectious disease molecules to identify biomarkers dysregulated in epithelial ovarian cancer. Cancer Epidemiol Biomarkers Prev, 17(10):2872–2881. 16. Jackson A (1990). Basic phenotyping of lymphocytes: selection and testing of reagents and interpretation of data. Clin Immunol Newsl, 10:43–55. 17. Suntharalingam G, Perry MR, Ward S, et al. (2006). Cytokine storm in a phase 1 trial of the anti-CD28 monoclonal antibody TGN1412. N Engl J Med, 355(10):1018–1028. 18. Esteve E, Ricart W, Fernandez-Real JM (2004). Dyslipidemia and inflammation: an evolutionary conserved mechanism. Clin Nutr, 24(1):16–31. 19. Jensen LE, Whitehead AS (1998). Regulation of serum amyloid A protein expression during the acute-phase response. Biochem J, 334(3):489–503. 20. Kemper C, Atkinson JP (2007). T-cell regulation: with complements from innate immunity. Nat Rev Immunol, 7(1):9–18. 21. Danese S, de la Motte C, Reyes BMR, Sans M, Levine AD, Fiocchi C (2004). Cutting Edge: T cells trigger CD40-dependent platelet activation and granular RANTES release: a novel pathway for immune response amplification. J Immunol, 172(4):2011–2015. 22. Murdoch C, Finn A (2000). Chemokine receptors and their role in inflammation and infectious diseases. Blood, 95(10):3032–3043. 23. Pruenster M, Rot A (2006). Throwing light on DARC. Biochem Soc Trans, 34:1005–1008. 24. Bank U, Ansorge S (2001). More than destructive: neutrophil-derived serine proteases in cytokine bioactivity control. J Leukoc Biol, 69:97–206. 25. Ellis TN, Beaman BL (2004). Interferon-gamma activation of polymorphonuclear neutrophil function. Immunology, 112(1):2–12. 26. Schmolz M, Hurst TL, Bailey DM, et al. (2004). Validation of a new highly standardised, lab-independent whole-blood leukocyte function assay for clinical trials (ILCS). Exp Gerontol, 39(4):667–671. 27. http://www.rulesbasedmedicine.com/products-services/TruCulture.asp.
7 GENE EXPRESSION PROFILES AS PRECLINICAL AND CLINICAL CANCER BIOMARKERS OF PROGNOSIS, DRUG RESPONSE, AND DRUG TOXICITY Jason A. Sprowl, Ph.D., and Amadeo M. Parissenti, Ph.D. Laurentian University, Sudbury, Ontario, Canada
INTRODUCTION Measurement of the expression level of specific genetic or protein biomarkers in patient serum or biopsies can be extremely valuable in the diagnosis and treatment of a variety of human neoplasms. Often, the careful and timeconsuming evaluation of several biomarkers by highly trained pathologists is required to construct a definitive diagnosis and treatment regimen. The expression level of single or several biomarkers can also be useful in the prediction of patient prognosis or response to chemotherapy, but these approaches generally have only limited accuracy [1–4]. Current studies suggest that no single biomarker will suffice for the accurate prediction of patient prognosis or tumor response to chemotherapy [5–7]. It is therefore thought that the tumor or serum level of large groups of proteins or transcripts is necessary to predict patient prognosis or outcome reliably after chemotherapy. In this chapter we review the current progress that has been achieved in the use of gene profiling to predict patient prognosis and response or toxicity to specific Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
135
136
GENE EXPRESSION PROFILES
chemotherapy regimens. Moreover, the impact of experimental design and the approach to data analysis on the ability of gene profiling experiments to reliably identify large groups of genes that can serve as prognostic or predictive biomarkers is discussed. A critical assessment of the prospect for gene profiling experiments to affect the clinical management of cancer patients and drug development is also provided.
PLATFORMS AND TOOLS FOR GENOME PROFILING EXPERIMENTS The task of gene expression profiling is complex, and knowledge of the platforms and tools used to accomplish such a task is essential. The most popular approaches for profiling of gene expression in biological samples are DNA microarray analysis and quantitative reverse-transcription polymerase chain reaction (Q-PCR). In a typical microarray experiment (reviewed by Villeneuve and Parissenti [8]), RNA is isolated from cells, and mRNAs present within the sample are reverse-transcribed with oligo-dT primers in the presence of fluorescent or radioactive-labeled nucleotides to yield a series of labeled cDNA probes. After denaturation, the labeled cDNA probes are hybridized to known denatured PCR products or single-stranded oligonucleotides immobilized in a grid pattern on either glass slides or nylon membranes. After washing to remove unbound probe, the intensity of label associated with the various PCR products or oligonucleotides on the array can be quantified by autoradiography or by using an array scanner. The amount of labeled cDNA hybridizing to a specific gene on the microarray is a measure of its expression (level of transcription). Often, two samples can be compared for differences in gene expression by labeling the cDNA preparations from the two samples with nucleotides conjugated to dyes that fluoresce at different wavelengths (e.g., Cy3 and Cy5 dyes). In addition, cDNAs can be transcribed using three wild-type nucleotides and one aminoallyl-labeled ribonucleotide to generate labeled aminoallyl RNAs as probes [9]. While many investigators use microarrays that are prepared within their laboratories [10–13], a variety of companies now produce microarrays with widely varying numbers of oligonucleotides or PCR products immobilized to them. This creates significant variability across experiments. The oligonucleotides can also be immobilized on microarrays via different methods, including photolithography (Affymetrix) or by printing directly on glass slides. This introduces additional variability in microarray data across laboratories, resulting in significant difficulty in identifying common genetic profiles across experiments. Moreover, while standards for the performance of microarray experiments have been established (minimal information about microarray experiments (MIAME) standards [14]), not all microarray studies follow these standards. Some genome profiling studies are also conducted using high-throughput Q-PCR. This approach is highly quantitative, providing a more reliable
DATA ANALYSIS PITFALLS
137
measure of gene expression, but the number of genes that can be quantified is limited, since two gene-specific primers are required for each gene whose expression is measured. In addition, gene expression is measured relative to specific reference genes whose expression must also be measured. The choice of reference genes also varies significantly between studies. Multiplexing of reactions, and approaches in which each PCR reaction product has a distinct wavelength for fluorescence emission (e.g., the Luminex system), have helped increase the number of genes whose expression can be profiled by PCR-based approaches. PCR-based methods of genome profiling are quite distinct from those used in microarray analyses, making comparisons between Q-PCR- and microarray-based findings more difficult. Nevertheless, various PCR-based gene profiling methods have been used to identify sets of genes associated with response to chemotherapy or recurrence of disease post-chemotherapy [15–17].
DATA ANALYSIS PITFALLS ASSOCIATED WITH GENE PROFILING EXPERIMENTS Other difficulties in comparing data across genome profiling experiments relate to the large genetic variation that exists in humans, the design of microarray or Q-PCR experiments, and the widely varying approaches used to analyze data from such experiments. This is particularly the case for DNA microarray studies. While microarray and, in particular, PCR-based analyses are capable of measuring the expression of a large number of genes with acceptable accuracy, gene expression varies widely across patients as well as in the same person over time. In addition, the probability of identifying genes correlating with a specific phenomenon such as chemotherapy drug response is very high when tens of thousands of genes are surveyed. Unfortunately, such correlations are often not replicated in additional independent data sets, suggesting that the false discovery rate is quite high using these approaches. The use of multiple replicate arrays per sample in microarray experiments helps considerably to reduce the false discovery rate, as does the use of multiple independent data sets from highly controlled experiments. In addition to variable gene expression among individuals, microarray procedures often yield noisy data sets with differing efficiencies of labeling and background labeling across arrays. Therefore, a variety of algorithms and statistical approaches have been developed to normalize and analyze microarray data to improve the accuracy of identifying significant changes in gene expression between groups. Statistical tests include the computation of t-tests for each gene and/or the use of SAM, a significance analysis for microarrays algorithm that permits one to identify significant changes in gene expression after a statistical computation of a false discovery rate based on the data set. Common additional algorithms used for microarray data analysis include weighted voting (WV) [16,18,19], supporting vector machine (SVM) [16,20,21],
138
GENE EXPRESSION PROFILES
and k-nearest-neighbor (K-NN) [16,22]. The resulting list of differentially expressed genes is commonly quite large, and each gene in this training set should then be validated by several methods, including a “leave-one-out” cross-validation. This involves the removal of one patient sample from the training set and determining if the remaining gene set is capable of classifying correctly the left-out sample. Once the training set has demonstrated accurate prediction within the samples from which it was derived, the genetic profile should be further tested within a large patient population, ideally outside of that used to create the training set. Should the genetic profile result continuously in high predictability in the classification of interest, useful sets of biomarkers have been established. However, the use of varying data analysis methods can often result in completely different sets of differentially expressed genes. Consequently, multiple data analysis techniques should be used to obtain a list of genes commonly identified across methods [23–25]. These differences in gene expression can be further validated using other independent approaches, such as Q-PCR and immunoblotting experiments. In addition to the data analysis tools above, clustering algorithms are being used routinely to identify sets of genes whose expression is significantly and consistently different between various groups of patients (e.g., between responders and nonresponders of chemotherapy). However, it is important to note that recent findings suggest that unsupervised methods of data analysis, such as data clustering, are often not effective in identifying predictive or prognostic biomarkers. The reason for this may be that cluster analysis does not utilize information related to the original sample grouping, and it is therefore a subjective strategy for data analysis. Furthermore, the number of genes that distinguish between classifications with the use of clustering algorithms is often small, and the distances calculated in cluster analysis may not properly portray the importance of certain genes, while excluding others of great importance [26].
GENE PROFILES AS BIOMARKERS FOR THE DIAGNOSIS AND CLASSIFICATION OF HUMAN CANCERS Gene profiling has proven quite valuable in identifying genes whose expression can be used to diagnose specific cancers, including many that are difficult to distinguish by other methods. For example, Kohlman et al. in 2003 used gene profiling to differentiate between acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) [18]. Bone marrow samples were isolated from 90 patients, 25 having been diagnosed with ALL, and 65 with AML. RNA was extracted from the samples and labeled cDNAs hybridized to Affymetrix U95Av2 and U133 microarrays. The WV algorithm and the leaveone-out cross-validation approach were then used to identify a subset of genes that were expressed differentially between the two leukemias. The expression of 24 genes was then used to accurately identify patients with AML, while 19
GENE PROFILES AS BIOMARKERS OF PROGNOSIS IN CANCER PATIENTS
139
genes were sufficient to diagnose ALL. However, upon comparing the genes identified by the two types of microarrays, it was observed that the 24 genes identified by the U133 arrays were different from those identified by U95Av2 chips. Furthermore, only five genes were observed to be in common within the 19-gene set when the two types of array tools were used. These results show the importance of using the same array platforms when comparing across experiments. The utility of the U133 arrays was studied further in the diagnosis of leukemia using 937 patients (892 with clinically relevant leukemia subtypes and 45 nonleukemic patients) [27]. Patients were divided into 13 subgroups of leukemia type, and each subgroup was split equally into training and validation sets. Class-specific gene expression was determined using an SVM approach, with an overall classification accuracy of 95.1% when the top 100 genes for leukemia class discrimination were used. The studies cited above clearly demonstrate the utility of gene profiling for the diagnosis of cancers and suggest that this approach can be used further in the classification of cancers into various subgroups. Sørlie et al.’s study in 2001 further illustrates the latter utility by demonstrating that breast cancers could be classified into a basal epithelial-like group, an ERBB2-overexpressing group, a normal breastlike group, and a luminal epithelial/estrogen receptor–positive group, which could be further divided into at least two additional subgroups, each with a distinct gene expression profile. Interestingly, survival analyses using a subset of uniformly treated patients with locally advanced breast cancer revealed clear differences in outcomes among the various groups. For example, the basal-like group had a poor prognosis, and the two estrogen receptor–positive groups had clear differences in treatment outcome [28]. These findings have been well corroborated in additional experiments performed across array platforms and using RNA extracted from paraffin-embedded tissues [29–31]. Similar approaches and findings have been observed for endometrial cancer, hepatocellular carcinoma, sarcomas, neuroblastoma, and other cancers [32–35].
GENE PROFILES AS BIOMARKERS OF PROGNOSIS IN CANCER PATIENTS Despite the problems associated with gene profiling approaches that were noted above, there have recently been significant advances in using gene profiling to predict prognosis in cancer patients. For example, van’t Veer and colleagues isolated at diagnosis 78 core samples from breast cancer patients under the age of 55 [36]. Of these patients, all were lymph node–free at the time of diagnosis. By measuring patient outcome over five years (regardless of treatment regimen), it was observed that 34 patients developed distant metastases, whereas 44 patients did not. Microarrays containing 25,000 human genes were prepared by the investigators, to which labeled cDNA samples from the patients and a pooled reference RNA sample were hybridized.
140
GENE EXPRESSION PROFILES
Following supervised data analysis and correct classification by a leave-oneout cross-validation approach, a classifier of 70 genes was developed in which disease outcome could be predicted. When used within the 78-patient sample training set, the genetic profile predicted patient outcome with an accuracy of 83%. In support of the classifier, the 70-gene set predicted with 89% accuracy the prognosis of an additional 19 patients who were not included within the training set. The utility of the van’t Veer genes as prognostic biomarkers has been tested further using a larger study, which included 295 patients [13]. Sixty-one patient samples (from the study in which the 70-gene classifier was developed) were included as a control. Of the patients used, 151 were lymph node–negative and 144 were lymph node–positive. Kaplan–Meier analysis was performed to determine the probability of patients remaining metastatis-free. The 70-gene classifier showed excellent accuracy at predicting prognosis in all patients ( p < 0.001), including lymph node–negative patients (which generally have better outcomes). These findings further suggested that the acquisition of metastases in patients is already defined through gene expression in their tumors rather than being acquired late during tumorgenesis. The findings from the two studies clearly indicate that the 70-gene classifier can reliably predict prognosis in breast cancer patients and therefore may be useful in preventing overtreatment in some patients while identifying those that would probably benefit from adjuvant therapy [13]. The 70-gene set (also known as MammaPrint) has recently been approved for use in the United States by the Food and Drug Administration (FDA). The utility of the van’t Veer classifier in predicting prognosis for cancers of tissue origins other than breast is currently unknown. A recent investigation attempted to predict outcome in breast cancer patients using RNA from tumors of 162 patients [11] and from several cell lines to serve as the reference RNA (SW872, WM115, NTERA2, MCF-7, HEPG2, MOLT4, Hs578t, HL60, OVCAR3, COLO205, and RPMI 8226 cells). Both labeled reference and tumor RNAs were hybridized to arrays consisting of 10,368 unique genes. Analysis of the hybridized arrays by SAM, prediction analysis for microarrays (PAM), and the approaches described in the van’t Veer et al. study [36] resulted in the identification of 49 genes that correlated with patient outcome. The gene list, when further reduced to 21 genes by including only those genes observed in all three analyses, displayed 69% accuracy by the leave-one-out cross-validation strategy. Using the gene set, the classifier appeared to predict patient outcome with 65% accuracy using the van’t Veer data and 62% accuracy using data from another study [37]. Considering chance alone can yield as much as 60% accuracy; the gene set above was not validated. Furthermore, the expression of the van’t Veer prognostic genes was unable to classify the patients successfully in terms of disease outcome, though over half of the genes identified in the van’t Veer study were not present on the microarrays used. This underscores the difficulties in comparing across microarray experiments when different array platforms are used.
GENE EXPRESSION PROFILING
141
In addition, successful prediction of prognosis using this subset of the 70-gene classifier may have been increased by dividing patients according to lymph node status as was done in the van’t Veer follow-up study [13]. While the van’t Veer classifier shows significant promise, it has yet to be used widely in a clinical setting. An alternative gene classifier has undergone considerable validation in a clinical setting. This classifier was developed by selecting 250 candidate genes that showed promise as prognostic biomarkers, based on literature searches and previous microarray experiments [13,19,28,38]. The expression of these genes was then measured in tumor biopsies of 447 patients associated with three independent clinical trials. Analysis of the relationship between the tumor expression of each gene and patient outcome resulted in the identification of 21 prognostic genes, including 16 cancerrelated genes and five reference genes. This 21-gene prognostic classifier is now known as the Oncotype DX profile [15]. Interestingly, five of the 16 genes in the Oncotype DX profile were identified previously in the study by van’t Veer et al. [36]. The Oncotype DX genetic profile measures the probability of disease recurrence in patients diagnosed with estrogen receptor–positive, lymph node–negative breast cancer who were treated with tamoxifen. The likelihood of recurrence is measured using a recurrence score, which is calculated by measuring the expression of the 21 genes and converts these values into a score ranging from 1 to 100. Low risk of recurrence is determined by a score of <18, while ranges from 18 to 30 and ≥31 indicates intermediate and high risk of recurrence, respectively. The Oncotype DX gene set has been validated in three additional clinical trials [15,39,40] and is currently undergoing further validation in a phase III study known as TAILORx. Furthermore, the Oncotype DX gene set has been studied further for its ability to predict factors other than disease recurrence. The recurrence score was recently used to measure recurrence risk in 651 patients who were randomly assigned to treatment with tamoxifen or tamoxifen and chemotherapy [41]. The 10-year distant recurrence rates were determined for the two treatment groups and it was revealed that low- and intermediate-risk patients (as predicted by RS) did not benefit from chemotherapy. In contrast, high-risk patients derived great benefit from chemotherapy. Therefore, the Oncotype DX recurrence score appears capable of identifying in a group of patients who typically have a relatively low risk of disease recurrence those who will probably benefit from chemotherapy. This, in turn, would enable us to spend health care resources more efficiently by providing chemotherapy only to those who will benefit from treatment.
CLASSIFICATION OF TOXINS AND PREDICTION OF DRUG TOXICITY USING GENE EXPRESSION PROFILING Profiling of gene expression in cells is not limited to the diagnosis, classification, or prognosis of disease. One can also profile changes in gene expression
142
GENE EXPRESSION PROFILES
elicited by external agents such as chemotherapy drugs or toxicants. This can provide significant insight into their mechanisms of action. It is theorized that certain families of drugs or toxicants induce similar changes in gene expression in various cell lines, such that gene expression profiles or signatures can be used to classify known and unknown drugs or toxicants. Nuwaysir and colleagues reviewed the potential utility of microarrays in the identification, classification, and characterization of toxicants in toxicology studies [42]. The authors suggested a procedure involving the treatment of specific cell lines or animals with or without a known toxicant, isolation of RNA from the cell lines or animal organs, reverse transcription of the RNAs to create differentially labeled cDNAs, and the profiling of gene expression by microarray analysis. The use of toxicants of a specific class with a known mechanism of action in gene profiling experiments would then allow researchers reliably to identify changes in gene expression associated with a known class of toxicant (a toxicant signature). After a wide variety of toxicant signatures have been established for known toxicants, the toxicant signature for an unknown toxicant could be established and compared to standard toxicant signatures in order to identify and/or classify the unknown toxicant. Moreover, the toxicant signature could provide significant insight into the biochemical mechanisms responsible for its toxicity. Nuwaysir et al. further discussed how microarray technology could be used in toxicology studies involving animal models [42]. Current methods of toxicology measure toxin levels or the expression of specific hepatic enzymes levels within blood [42,43]. The use of animal models often requires that studies take place over an extensive period of time, particularly for chronic exposure to low levels of toxins. In contrast, gene expression changes typically occur quickly (in the manner of hours) and may require only low levels of toxicants. Therefore, specific posttreatment changes in gene expression may be effective measures of exposure to certain toxicants and may provide insight both into the mechanism of toxicity and into effective measures to combat this toxicity. As raised by Nuwaysir and colleagues [42], specific standards would have to be established, including the animal model to be used, the organs to be assessed for changes in gene expression, the concentration of compounds to be administered, and the time following exposure when gene expression will be measured. Such factors will probably be dependent on the toxicant, its properties, and its target organ. Other influences on the degree of toxicity and on gene expression would also have to be minimized, including the age of subjects or animals, variations in diet, and gender. Recently, there have been several attempts to use the approach described above to identify changes in gene expression associated with exposure to known hepatotoxic or nephrotoxic compounds in either human or animal models. Fielden et al. [44] used gene profiling approaches in rats to identify changes in gene expression induced by nephrotoxic agents, after confirmation of renal tubular degeneration by histopathology. By identifying gene profiles associated with the onset of nephrotoxicity in rats, these authors theorized
GENE EXPRESSION PROFILING
143
that drug development could be strongly facilitated by focusing on lead compounds that, when profiled, did not exhibit nephrotoxicity-associated changes in gene expression. In the study, rats were subjected to short, repeated exposures to well-classified toxicants or the vehicle in which they were dissolved. Changes in gene expression induced in the rats by 15 nephrotoxic and 49 nonnephrotoxic compounds were then studied by microarray analysis. Blood samples were assessed for changes in cholesterol or albumin levels, which would be consistent with the onset of nephrotoxicity. Rats were only sacrificed upon detection of nephrotoxicity-associated changes in blood levels of cholesterol or albumin. RNA was isolated from kidneys, and toxicant-induced changes in gene expression were determined for these compounds by cDNA microarray analysis using Amersham Codelink Uniset Rat1 Bioarrays and the sparse linear programming (SPLP) algorithm. Thirty-five genes were identified as changing expression only upon administration of nephrotoxic compounds to rats. Interestingly, within the training set, the 35-gene classifier was found to have 83% accuracy for identifying nephrotoxic compounds with renal tubular degeneration and 94% accuracy for identification of compounds that are not nephrotoxic. Studies were then conducted for cross-validation of the classifier with an additional 21 independent rodent samples. In the validation experiments, nephrotoxic agents were identified with 79% accuracy and nonnephrotoxic samples with 75% accuracy. Although complete accuracy at identifying nephrotoxic agents was not obtained, this approach would nevertheless help in drug discovery programs to eliminate lead compounds that have a significantly higher likelihood of being nephrotoxic. However, the identity of the 35 genes did not provide significant insight into likely mechanisms for nephrotoxicity. It is also unknown whether the gene expression changes identified in rats were specific to kidney tissue. In addition, it should be noted that there are significant differences between rats and humans in terms of their metabolism of and sensitivity to drugs and toxicants. Thus, while the study did appear able to show a significant link between the expression of specific genes and nephrotoxicity, it remains unclear whether rats serve as a good model for nephrotoxicity in humans. An attempt to address the differences in toxicant metabolism between rats and humans was addressed in a study by Martin et al. [45]. This study utilized the potential of two toxicogenomic knowledge bases to estimate the hepatotoxicity of various compounds in humans based on changes in gene expression induced by the toxic compounds within rat in vitro and in vivo models. The in vitro analysis consisted of observing through microarray analysis changes in gene expression induced by 86 compounds, 36 of which were nonhepatotoxins. In contrast, the in vivo study analyzed gene expression changes induced by 111 compounds, 32 of which were nonhepatotoxic. Affymetrix RGU34 microarrays were used containing probes for 8799 genes. Toxicant-induced changes in gene expression were identified through comparison to control cell lines or rats exposed to the solvent in which the toxicant was dissolved. Using a tree-based machine learning technique known as gradient boosting, 173
144
GENE EXPRESSION PROFILES
genes were identified whose expression would be expected to be modified by hepatotoxins in humans. In the in vivo experiments, 168 genes would be expected to change expression in humans by hepatotoxins. Unfortunately, while the in vitro and in vivo data identified real gene classifiers associated with hepatotoxicity, only five genes were identified in both the in vitro and in vivo experiments. Furthermore, none of the common genes identified were within the top 30 changes in gene expression, and validation experiments strongly suggested that both the in vitro and in vivo gene sets were found to be poor predictors of toxicity. To increase the reliability of predictive gene sets and decrease the degree of variability within the data, it is recommended that higher numbers of nontoxic compounds also be monitored for changes in gene expression (as controls) and that associated toxicology databases have better coverage of human studies. Given the findings described above, it appears that significant additional work is required before reliable gene classifiers for the identification of nephrotoxic or hepatotoxic compounds are available. Most toxicology studies are performed in animal model systems due to ethical concerns and it appears unlikely that such studies can be extrapolated easily to humans (particularly if rodents are used). Human cell lines could be used, but such in vitro experiments are unlikely to be replicated in living human subjects. Nevertheless, the development of gene classifiers for toxicants in nonhuman systems could assist the drug discovery process by avoiding lead compounds that induce changes in gene expression in both rodents and human cell lines which resemble strongly those of known nephrotoxic and hepatotoxic compounds. In addition, gene classifiers for agents capable of inducing toxicity to other organs (e.g., cardiotoxicity) can possibly be developed using this approach.
USE OF GENE PROFILING TO PREDICT DRUG SENSITIVITY OR RESISTANCE Approaches similar to those described above for toxicants can also be used to classify drugs and their modes of action. Moreover, it may be possible to use genome profiling of patients to identify those that would probably benefit from administration of a particular drug. Considerable emphasis has been placed recently on the identification of genes whose expression in the tumors or host tissues of cancer patients can predict or measure the response to specific chemotherapy agents or regimens. Various families of chemotherapy drugs act by different mechanisms within tumors, such as the inhibition of topoisomerase II or DNA replication [46–49] or the stabilization of microtubules [50]. Destruction of such essential functions would be expected to trigger cell cycle checkpoints and promote cell death mechanisms, resulting in specific changes in gene expression that are characteristic to the administered drug. In addition, since tumors can often be heterogeneous, there may exist within the tumor population variants that before or after drug treatment are better able to
PREDICT DRUG SENSITIVITY OR RESISTANCE
145
survive in the presence of chemotherapy agents. These or additional variants may also be capable of inactivating specific chemotherapy agents or blocking drug-induced cell death pathways. This would result in drug resistance, which is often observed in cancer patients before or after treatment with chemotherapy agents. By genome profiling of tumor or host tissues from cancer patients, it may be possible to identify which tumors or patients are likely to exhibit resistance to a given chemotherapy agent. Moreover, the genome profile may enable the oncologist to select a chemotherapy agent that has the highest probability of killing a given tumor. The profiling and classification of tumors into specific subtypes (as described above) may aid further in this goal. Chang et al. were the first researchers to use gene profiling successfully to identify possible genes whose expression in tumors pretreatment could differentiate between responders and nonresponders of docetaxel chemotherapy [10]. Patient tumor size was measured throughout treatment, allowing classification of 24 patients into 11 responders and 13 nonresponders (who had progressive disease or no response to treatment). RNA was isolated from tumors before chemotherapy, amplified, reverse-transcribed to cDNA, labeled with Cy3 or Cy5 fluorescent dyes, and hybridized to HgU95-Av2 Affymetrix microarrays. Student’s t-tests were used to identify 92 genes whose expression was significantly different between responders and nonresponders. Using these genes in a leave-one-out cross-validation approach, the gene set appeared to correctly classify responders and nonresponders with an accuracy of 88% within its training set. The genes identified were found to be involved in various cellular functions, such as cell signaling, immunologic response, DNA damage detection or repair, cell cycle regulation, and tumor suppressors. To confirm this gene set, the expression level of 15 differentially expressed genes was determined by reverse-transcription polymerase chain reaction. Thirteen positive correlations were obtained, of which six were found to be highly significant between responders and nonresponders. To test this genetic profile further , RNA was isolated from six additional patients who were not involved in the training set. Quantitation of expression of the 92 genes using the same approach correctly identified all six as drug responders. The accuracy of the 92-gene profile to predict response to docetaxel chemotherapy was then further evaluated in a follow-up study by the same investigators [51]. The authors observed gene expression of resistant and sensitive tumors after 3 weeks of treatment in a subset of 13 patients. Surprisingly, very few differences in gene expression were observed. Therefore, it would appear that changes in tumor gene expression shortly after administration of docetaxel cannot be used to differentiate between responding and nonresponding tumors. A study by Iwao-Koizumi et al. also used high-throughput Q-PCR to identify genes whose expression could predict response to docetaxel in 44 breast tumors [16]. The expression of 2453 genes was profiled in the tumors relative to controlling RNA from 78 primary breast cancers. WV, K-NN, and SVM approaches were used in data analyses to identify gene sets, which were then
146
GENE EXPRESSION PROFILES
subjected to leave-one-out verification studies. It was observed that the most consistent results were obtained with the use of the WV algorithm, which identified 85 genes that were capable of identifying response to docetaxel with 80.7% accuracy in verification studies. To provide further evidence of their utility, expression vectors containing cDNAs for a number of the genes associated with resistance to docetaxel chemotherapy were transfected into MCF-7 cells and were found to increase survival in the presence of docetaxel compared to nontransfected controls. This suggests that observations from in vitro experiments using cell lines may have relevance to drug resistance in cancer patients. Interestingly, the genes determined to be associated with docetaxel response in this study differed greatly from those identified in the study by Chang and colleagues [10]. Only three genes were in common between the two studies. A number of differences in the studies may account for this discordance, including different patient populations, widely varying sets of genes being profiled, and differences in the algorithms used in data analysis. Use of a more quantitative method for gene profiling, the larger set of patients, and the higher prediction accuracy would suggest that the 85-gene set identified by Iwao-Koizumi has a higher probability of being replicated in other independent studies. Genome profiling was also used by Cleator et al. to identify genes whose expression is predictive of response to doxorubicin/cyclophosphamide chemotherapy in breast cancer patients [52]. In this study, RNA samples were isolated from 12 patients whose tumors responded to chemotherapy and from six patients whose tumors did not. After amplification and labeling, the RNAs were hybridized to Affymetrix HgU133A arrays. After scanning the arrays, the output data were assessed using a “randomized variance model,” and 253 genes were identified to be differentially expressed between responding and nonresponding tumors. Leave-one-out cross-validation studies were then performed using the training set, and only 67% of samples were correctly classified as responders or nonresponders. This accuracy is only slightly better than chance. Thus, this approach could not be used to identify genes associated with response to doxorubicin/cyclophosphamide chemotherapy, or, more likely, the number of patients in the training set was of insufficient size to reliably identify genetic biomarkers for prediction of response to doxorubicin/ cyclophosphamide chemotherapy. The authors of this study also tested the ability of the Chang 92-gene set for docetaxel responsiveness [10] (described above) to predict response to doxorubicin/cyclophosphamide treatment. The 92-gene profile was unable to distinguish between responders and nonresponders of doxorubicin/cyclophosphamide chemotherapy, suggesting that genes for prediction of response to chemotherapy regimens are probably regimen-specific and not general biomarkers of drug response. A similar approach to that described above was used to identify 74 genes whose expression correlates positively or negatively with response to T/FAC chemotherapy (paclitaxel, followed by 5-fluorouracil, doxorubicin, and cyclophosphamide). Accuracy at measuring response to this regimen was deter-
PREDICT DRUG SENSITIVITY OR RESISTANCE
147
mined to be 78% [53]. Independent expression of any one of the 74 genes could not be used to predict response to T/FAC chemotherapy. As mentioned previously, false discovery rates are often high in genome profiling experiments, particularly when tens of thousands of genes are surveyed. Thus, in the gene profiling studies above, the genes identified may correlate with drug response, but, in reality, may have no relationship to drug action. This would then explain the many failures that have been observed in identifying genes whose expression correlates with drug response across a wide variety of independent clinical studies. To circumvent this problem, it can often be advantageous to restrict the analysis of genes in patient populations to those that have demonstrated a clear relationship with drug response or resistance in in vitro studies. A number of studies have identified clear differences in gene expression between drug-sensitive and drug-resistant tumor cells. For example, Villeneuve et al. used microarray studies to identify a number of genes whose expression was altered in doxorubicin- or paclitaxelresistant MCF-7 breast tumor cells compared to wild-type cells [12]. The majority of genes (>98%) exhibited no difference in expression between the wild-type and drug-resistant cell lines, confirming the isogenicity of the established cell line. One strength of this study was that 91 to 95% of the differences in gene expression identified by microarray analysis were corroborated through Q-PCR verification experiments. In addition, control cell lines were created to account for differences in gene expression during selection for drug resistance that were simply due to extended propagation of cells in culture. The identities of the genes uncovered in this study were associated with drug transport, drug metabolism, growth promotion, and cell death inhibition, consistent with the types of genes that one might expect to observe in drugresistant cells. Thus, this approach has the prospect of providing significant insight into the many and varied mechanisms by which tumor cells acquire resistance to chemotherapy drugs. The limitation of the study by Villeneuve et al. [12] is that gene expression was compared between single sets of isogenic wild-type and drug-resistant breast tumor cell lines. Although the genes identified may play a role in drug resistance in a single breast tumor cell line, some may have little relevance to drug resistance in other breast tumor cell lines or other cell lines of different tissue origin. In contrast, genes having altered expression in cell lines from several tissues upon acquisition of resistance to a specific drug have a higher likelihood of playing a bona fide role in drug resistance, although probably not in a tissue-specific manner. Györffy et al. profiled gene expression across 30 different cancer cell lines using HGU133 Affymetrix arrays and identified genes whose expression correlated with sensitivity or resistance to 11 different chemotherapy drugs [24]. The accuracy of the gene sets to identify a cell line as sensitive or resistant to a given chemotherapy agent was 86%. Using this approach, 1481 genes were identified as playing a role in resistance to a diverse set of anticancer agents. Not surprisingly, the drug-resistance genes identified were different depending on the chemotherapy agent. Only 67 genes were
148
GENE EXPRESSION PROFILES
observed to be associated with resistance to four or more anticancer agents, and the expression level of only two genes correlated with resistance to all 11 drugs. In a similar study [54], HGU133 Affymetrix arrays were also used in the study by Györffy et al. to identify genes associated with resistance in gastric cancer cells to three of the agents used. Despite similar array platforms, the number and identities of genes associated with resistance to 5-fluorouracil, doxorubicin, or cisplatin varied dramatically between the two studies. A number of factors could account for the differences in the gene numbers and identities. Whereas the former study assessed a large number of wild-type and drug-resistant cell lines of varying tissue origin for sensitivity or resistance to 11 chemotherapy agents, the latter study examined only 14 gastric cancer cell lines, of which 10 were resistant to one of three chemotherapy agents. Thus, it is possible that the genes associated with drug resistance in gastric cancer cells are distinct from those that would have relevance across cell lines. Alternatively, variations in experimental design, the number of cell lines, and approaches to microarray data analysis resulted in the identification of different gene sets, which may or may not be relevant to drug resistance.
STRATEGIES TO IMPROVE THE ACCURACY OF GENE PROFILING IN IDENTIFYING PROGNOSTIC AND PREDICTIVE BIOMARKERS As stated previously, very few gene sets have proven reliable in predicting response to chemotherapy agents across large groups of patients [10,11,16]. Numerous factors may contribute to the poor reproducibility of findings from gene profiling experiments, including differences in the tissues studied, the array platforms, the number and nature of genes on the microarrays, RNA extraction procedures, the number of replicate arrays used in experiments, and the data analysis and verification methods. Simon and colleagues reviewed approaches to improving the accuracy of gene profiling experiments in the identification of reliable predictive or prognostic biomarkers [26]. The first limitation to the reliable identification of predictive and prognostic biomarkers is the size of the training and validation sets, which in many studies is from 30 to 50 patients. Larger data sets of 500 patients or more have a much higher probability than small data sets of identifying useful biomarkers that can be validated across experiments. The use of more than one method for obtaining gene classifiers is also preferred. Moreover, in addition to larger data sets, there should be equal representation between patients from which the positive and negative classifiers will be developed. In addition, a number of studies only validate the utility of a gene set to positively identify patients with a particular attribute, without showing that the positive classifier has no ability to identify patients without the attribute. Minna et al. [23] discussed further the reproducibly of genome profiling experiments across laboratories in an editorial written in response to a study
FUTURE DIRECTIONS
149
by Hsu et al. [25] which identified a set of genes that could serve as a classifier of cisplatin responsiveness. While acknowledging the promising prospects of the study by Hsu and colleagues, Minna et al. commented on what additional information could have been described or collected in this and other microarray experiments. These include the method used for classification of patients, the analytical procedures used in the genome profiling experiments, and access to all microarray data so that it is available for analysis by researchers. The free availability of well-described data sets will enable future researchers to use this information to identify the best analytical methods for microarray data and to compare microarray data analysis tools effectively across experiments. This will enable researchers to determine if future gene sets can accurately predict drug response in groups of patients outside the initial training and validation sets. It will also enable researchers to repeat or apply new analytical procedures to previous data sets in order to make effective comparisons across experiments. Interestingly, it has been shown that with the use of identical gene arrays, extraction methods, and statistical analyses, identical gene signatures can be obtained across microarray experiments, despite differences in operators, sample handling, and time of extraction [55].
FUTURE DIRECTIONS There can be little doubt that genome profiling experiments are beginning to yield important new biomarkers of clinical relevance, particularly for the diagnosis of disease states. However, considerable variability exists in the performance of genome profiling experiments across studies, in particular for microarray studies. Differences in array platforms, the number and identities of genes on arrays, sample size, experimental design, methods of patient classification and RNA extraction, and algorithms for data analysis all contribute to the wide variability observed in microarray findings. Only through comparisons across well-characterized microarray experiments can a clear conclusion be made as to the optimal design of a microarray experiment and optimal methods for both data analysis and patient classification. Moreover, by describing array experiments in detail and ensuring full access to microarray data, it should be possible to repeat microarray data analyses using such optimal methods (once defined). In the future, the preferred method of microarray experimental design and microarray data analysis will enable more accurate and reproducible findings to be achieved. This will dramatically increase the capacity of microarray studies to affect the clinical diagnosis and management of cancer patients. Microarray or PCR-based gene profiling experiments are powerful tools which, used optimally will dramatically increase the discovery of gene classifiers for the accurate diagnosis of cancer patients and for the optimal management of their care. This would probably include gene classifiers for the prediction of patient prognosis pre- and posttherapy, the optimal selection of
150
GENE EXPRESSION PROFILES
chemotherapy agents, and the measurement of patient response and/or toxicity to these agents. Moreover, genome profiling is likely to have a major impact on drug discovery programs by enabling investigators to classify and characterize drugs and toxicants, with the goal of optimizing drug structures to achieve maximal therapeutic benefit with minimal toxicity to patients. Gene profiling of patients and their tumors before and after drug treatment may also enable us to tailor our approaches to the management of cancer patients by accurately identifying those that will derive benefit, with minimal toxicity, from a given chemotherapy regimen. For the pharmaceutical industry, this may also permit the introduction or reintroduction of highly effective drugs into the marketplace that exhibit toxicities in rare subsets of patients. Recent evidence confirms that gene profiling experiments are beginning to revolutionize the practice of oncology and that physicians are prepared to embrace their use, provided that the utility of genetic biomarkers identified from gene profiling experiments are widely validated in appropriately conducted clinical trials. This appears to be the case for both Oncotype DX [15,41] and Mammaprint [56] gene sets for prediction of prognosis in women with breast cancer. The Oncotype DX gene expression assay will probably be widely used by breast oncologists for a number of reasons: 1. The recurrence score associated with the gene set used in this assay greatly aids the oncologist in assessing the prognosis of a patient and the need for adjuvant chemotherapy, such as MF/CMF [41]. This is particularly useful for low-risk patients with node-negative, ER-positive tumors, where the physician is likely to recommend adjuvant chemotherapy for patients with high recurrence scores but not for patients with low recurrence scores. 2. Respected clinical organizations such as the American Society of Clinical Oncology [57] and the National Comprehensive Cancer Network have established clear clinical guidelines for use of the Oncotype DX assay in patients with tumors of 0.6 to 1 cm with moderate to poorly differentiated or unfavorable features, and all tumors larger than 1 cm. 3. Gene expression is assessed by a highly quantitative procedure (Q-PCR) conducted centrally in a recognized laboratory following good laboratory practices (GLP) guidelines. 4. There is now evidence that an expanded Oncotype DX assay, which includes estrogen and progesterone receptor scores, provides a quantitative measure of the expression of these receptors with a high degree of concordance with standard immunohistochemical approaches [58]. This provides greater information to the physician on the degree of expression of the receptors and the confidence that can be placed on the diagnosis and treatment plans based on receptor status. Such developments further increase the adoption of this assay by breast oncologists. In addition to the benefits of the Oncotype DX assay, recent studies using the new molecular classifications of breast cancer (based on earlier genome
REFERENCES
151
profiling studies) suggest that responses to specific therapies may vary significantly among the various subgroups. Successful adoption of the Oncotype DX assay into clinical trials and into clinical practice guidelines clearly illustrates that oncologists are willing to accept quantitative gene profiling of patient tumors as an important tool in the clinical management of a cancer patient, but this requires the successful translation of findings in research laboratories into validated robust assays that can be conducted in licensed clinical service laboratories. The support and cooperation of clinical societies is critical to the adoption of gene-based assays into the clinic, including the establishment of clear practice guidelines for the physician.
REFERENCES 1. Goldhirsch A, Glick JH, Gelber RD, Senn HJ (1998). Meeting highlights: International Consensus Panel on the Treatment of Primary Breast Cancer. J Natl Cancer Inst, 90:1601–1608. 2. Eifel P, Axelson JA, Costa J, et al. (2001). National Institutes of Health Consensus Development Conference Statement: adjuvant therapy for breast cancer, Nov. 1–3. J Natl Cancer Inst, 93:979–989. 3. McGuire WL (1991). Breast cancer prognostic factors: evaluation guidelines. J Natl Cancer Inst, 83:154–155. 4. Bast RC, Ravdin P, Hayes DF, et al. (2001). 2000 update of recommendations for the use of tumor markers in breast and colorectal cancer: clinical practice guidelines of the American Society of Clinical Oncology. J Clin Oncol, 19: 1865–1878. 5. Gottesman MM, Fojo T, Bates SE (2002). Multidrug resistance in cancer: role of ATP-dependent transporters. Natl Rev Cancer, 2:48–58. 6. Jänicke F, Prechtl A, Thomssen C, et al. (German N0 Study Group) (2001). Randomized adjuvant chemotherapy trial in high-risk, lymph node-negative breast cancer patients identified by urokinase-type plasminogen activator and plasminogen activator inhibitor type 1. J Natl Cancer Inst, 93:913–920. 7. van Diest PJ, Michalides RJ, Jannink L, et al. (1997). Cyclin D1 expression in invasive breast cancer: correlations and prognostic value. Am J Pathol, 150:705–711. 8. Villeneuve DJ, Parissenti AM (2004). The use of DNA microarrays to investigate the pharmacogenomics of drug response in living systems. Curr Top Med Chem, 4:1329–1345. 9. ‘t Hoen PA, de KF, van Ommen GJ, den Dunnen JT (2003). Fluorescent labelling of cRNA for microarray applications. Nucl Acids Res, 31:e20. 10. Chang JC, Wooten EC, Tsimelzon A, et al. (2003). Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet, 362:362–369.
152
GENE EXPRESSION PROFILES
11. Korkola JE, Blaveri E, DeVries S, et al. (2007). Identification of a robust gene signature that predicts breast cancer outcome in independent data sets. BMC Cancer, 7:61. 12. Villeneuve DJ, Hembruff SL, Veitch Z, Cecchetto M, Dew WA, Parissenti AM (2006). cDNA microarray analysis of isogenic paclitaxel- and doxorubicin-resistant breast tumor cell lines reveals distinct drug-specific genetic signatures of resistance. Breast Cancer Res Treat, 96:17–39. 13. van de Vijver MJ, He YD, van’t Veer LJ, et al. (2002). A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med, 347:1999–2009. 14. Brazma A, Hingamp P, Quackenbush J, et al. (2001). Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Geneti, 29:365–371. 15. Paik S, Shak S, Tang G, et al. (2004). A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med, 351:2817–2826. 16. Iwao-Koizumi K, Matoba R, Ueno N, et al. (2005). Prediction of docetaxel response in human breast cancer by gene expression profiling. J Clin Oncol, 23:422–431. 17. Chang JC, Makris A, Gutierrez MC, et al. (2007). Gene expression patterns in formalin-fixed, paraffin-embedded core biopsies predict docetaxel chemosensitivity in breast cancer patients. Breast Cancer Res Treat, 108:233–240. 18. Kohlmann A, Schoch C, Schnittger S, et al. (2003). Molecular characterization of acute leukemias by use of microarray technology. Genes Chromosomes Cancer, 37:396–405. 19. Golub TR, Slonim DK, Tamayo P, et al. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531–537. 20. Zhao YP, Chen G, Feng B, et al. (2007). Microarray analysis of gene expression profile of multidrug resistance in pancreatic cancer. Chin Medi J, 120:1743–1752. 21. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16:906–914. 22. Pomeroy SL, Tamayo P, Gaasenbeek M, et al. (2002). Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415:436–442. 23. Minna JD, Girard L, Xie Y (2007). Tumor mRNA expression profiles predict responses to chemotherapy. J Clin Oncol, 25:4329–4336. 24. Györffy B, Surowiak P, Kiesslich O, et al. (2006). Gene expression profiling of 30 cancer cell lines predicts resistance towards 11 anticancer drugs at clinically achieved concentrations. Intl J Cancer, 118:1699–1712. 25. Hsu DS, Balakumaran BS, Acharya CR, et al. (2007). Pharmacogenomic strategies provide a rational approach to the treatment of cisplatin-resistant patients with advanced cancer. J Clin Oncol, 25:4350–4357. 26. Simon R, Radmacher MD, Dobbin K, McShane LM (2003). Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst, 95:14–18. 27. Haferlach T, Kohlmann A, Schnittger S, et al. (2005). Global approach to the diagnosis of leukemia using gene expression profiling. Blood, 106:1189–1198.
REFERENCES
153
28. Sørlie T, Perou CM, Tibshirani R, et al. (2001). Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA, 98:10869–10874. 29. Perreard L, Fan C, Quackenbush JF, et al. (2006). Classification and risk stratification of invasive breast carcinomas using a real-time quantitative RT-PCR assay. Breast Cancer Res, 8:R23. 30. Hu Z, Fan C, Oh DS, et al. (2006). The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomi, 7:96. 31. Mullins M, Perreard L, Quackenbush JF, et al. (2007). Agreement in breast cancer classification between microarray and quantitative reverse transcription PCR from fresh-frozen and formalin-fixed, paraffin-embedded tissues. Clini Chem, 53: 1273–1279. 32. Cai B, Liu L, Xi W, et al. (2007). Comparison of the molecular classification with FIGO stage and histological grade on endometrial cancer. Eur J Gynaecol Oncol, 28:451–460. 33. Seike M, Yanaihara N, Bowman ED, et al. (2007). Use of a cytokine gene expression signature in lung adenocarcinoma and the surrounding tissue as a prognostic classifier. J Natl Cancer Inst, 99:1257–1269. 34. Tschoep K, Kohlmann A, Schlemmer M, Haferlach T, Issels RD (2007). Gene expression profiling in sarcomas. Critl Rev Oncol Hematol, 63:111–124. 35. Warnat P, Oberthuer A, Fischer M, Westermann F, Eils R, Brors B (2007). Crossstudy analysis of gene expression data for intermediate neuroblastoma identifies two biological subtypes. BMC Cancer, 7:89. 36. van’t Veer LJ, Dai H, van de Vijver MJ, et al. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415:530–536. 37. Sotiriou C, Neo SY, McShane LM, et al. (2003). Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proc Natl Acad Sci USA, 100:10393–10398. 38. Perou CM, Sorlie T, Eisen MB, et al. (2000). Molecular portraits of human breast tumours. Nature, 406:747–752. 39. Cronin M, Sangli C, Liu ML, et al. (2007). Analytical validation of the Oncotype DX genomic diagnostic test for recurrence prognosis and therapeutic response prediction in node-negative, estrogen receptor-positive breast cancer. Clin Chem, 53:1084–1091. 40. Habel LA, Shak S, Jacobs MK, et al. (2006). A population-based study of tumor gene expression and risk of breast cancer death among lymph node-negative patients. Breast Cancer Res, 8:R25. 41. Paik S, Tang G, Shak S, et al. (2006). Gene expression and benefit of chemotherapy in women with node-negative, estrogen receptor-positive breast cancer. J Clin Oncol, 24:3726–3734. 42. Nuwaysir EF, Bittner M, Trent J, Barrett JC, Afshari CA (1999). Microarrays and toxicology: the advent of toxicogenomics. Mol Carcinog, 24:153–159. 43. Batt AM, Ferrari L (1995). Manifestations of chemically induced liver damage. Clin Chem, 41:1882–1887. 44. Fielden MR, Eynon BP, Natsoulis G, Jarnagin K, Banas D, Kolaja KL (2005). A gene expression signature that predicts the future onset of drug-induced renal tubular toxicity. Toxicol Pathol, 33:675–683.
154
GENE EXPRESSION PROFILES
45. Martin R, Rose D, Yu K, Barros S (2006). Toxicogenomics strategies for predicting drug toxicity. Pharmacogenomics, 7:1003–1016. 46. Tarr M, van Helden PD (1990). Inhibition of transcription by adriamycin is a consequence of the loss of negative superhelicity in DNA mediated by topoisomerase II. Mol Cell Biochem, 93:141–146. 47. Zijlstra JG, de Jong S, de Vries EG, Mulder NH (1990). Topoisomerases, new targets in cancer chemotherapy. Med Oncol Tumor Pharmacother, 7:11–18. 48. Binaschi M, Bigioni M, Cipollone A, et al. (2001). Anthracyclines: selected new developments. Curr Medi Chem Anticancer Agents, 1:113–130. 49. Galmarini CM, Mackey JR, Dumontet C (2002). Nucleoside analogues and nucleobases in cancer treatment. Lancet Oncol, 3:415–424. 50. Chazard M, Pellae-Cosset B, Garet F, et al. (1994). [Taxol (paclitaxel), first molecule of a new class of cytotoxic agents: taxanes.] Bull Cancer, 81:173–181. 51. Chang JC, Wooten EC, Tsimelzon A, et al. (2005). Patterns of resistance and incomplete response to docetaxel by gene expression profiling in breast cancer patients. J Clin Oncol, 23:1169–1177. 52. Cleator S, Tsimelzon A, Ashworth A, et al. (2006). Gene expression patterns for doxorubicin (Adriamycin) and cyclophosphamide (Cytoxan) (AC) response and resistance. Breast Cancer Res Treat, 95:229–233. 53. Ayers M, Symmans WF, Stec J, et al. (2004). Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide chemotherapy in breast cancer. J Clin Oncol, 22:2284–2293. 54. Kang HC, Kim IJ, Park JH, et al. (2004). Identification of genes with differential expression in acquired drug-resistant gastric cancer cells using high-density oligonucleotide microarrays. Clin Cancer Res, 10:272–284. 55. Kohlmann A, Schoch C, Dugas M, et al. (2005). Pattern robustness of diagnostic gene expression signatures in leukemia. Genes Chromosomes Cancer, 42:299–307. 56. Wittner BS, Sgroi DC, Ryan PD, et al. (2008). Analysis of the MammaPrint breast cancer assay in a predominantly postmenopausal cohort. Clin Cancer Res, 14:2988–2993. 57. Harris L, Fritsche H, Mennel R, et al. (2007). American Society of Clinical Oncology 2007 update of recommendations for the use of tumor markers in breast cancer. J Clin Oncol, 25:5287–5312. 58. Badve SS, Baehner FL, Gray RP, et al. (2008). Estrogen- and progesteronereceptor status in ECOG 2197: comparison of immunohistochemistry by local and central laboratories and quantitative reverse transcription polymerase chain reaction by central laboratory. J Clin Oncol, 26:2473–2481.
8 USE OF HIGH-THROUGHPUT PROTEOMIC ARRAYS FOR THE DISCOVERY OF DISEASEASSOCIATED MOLECULES Douglas M. Molina, Ph.D., W. John W. Morrow, Ph.D., and Xiaowu Liang, Ph.D. Antigen Discovery, Inc., Irvine, California
INTRODUCTION The discovery of biologically relevant disease-state markers, or biomarkers, is a crucial step in the maturation of accurate disease diagnosis and appropriate treatment. With all the advances in modern research, the need for better diagnostics as well as markers of disease is paramount. The management of infection, cancer, and autoimmune disorders could benefit greatly from new biomarkers, or combinations of biomarkers, that could provide quick and accurate identification of the causative agent of disease, disease state, and even the stage of the disease. The lack of progress in identifying such molecules is due primarily to the way that, to date, researchers have approached the problem. When using traditional approaches to the discovery of serodiagnostic antigens, researchers have been limited to the number of molecules they could study. Typically, most laboratories have studied just a handful of proteins at any given time and therefore have only a limited understanding of the pathobiology of disease. Much like the children’s game Flip ‘n’ Match, inves-
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
155
156
Figure 1
DAY 4 Chip QC
QC with Anti-Tag Ab
DAY 5 Serology
DAY 6 Data Acquisition
Serum Sample Profile
High-throughput proteomics screening technique used by the authors.
DAY 3 Chip Printing
DAY 3 Expression
DAY 2 Plasmid DNA Miniprep
DAY 1 PCR & Transformation
PROTEIN MICROARRAYS
157
tigators have employed the “best guess” approach to which “pieces” (i.e., antigens or proteins) may be most appropriate as disease markers. Such bestguess hypotheses have usually been based on the body of literature available as well as extrapolations from our understanding of the immune response. Much as in the game, once in a while the correct choice is made. However, a much more productive approach would be to view all the pieces simultaneously and assess the complete picture, without prejudice, in the context of the immune response profile. In this way we would be in a much better position to make determinations as to which proteins, or combinations of proteins, would allow us to diagnose and monitor disease accurately. Very recently, such an approach has been made possible, and in this chapter we discuss how researchers have taken advantage of the sequencing boom sparked by the human genome project. We also examine assay miniaturization technology, which has carried over from the nucleic acid microarray field, and how new advances in cloning and gene expression have allowed the use of those resources to profile the immune response to infection, cancer, and autoimmune disease.
OVERVIEW OF THE PRODUCTION AND USE OF MICROARRAYS FOR THE DISCOVERY OF ANTIGENIC BIOMARKERS To take advantage of genomic sequence data for the development of novel therapeutics and clinical diagnostics, high-throughput cloning and screening technologies are required. The authors and own collaborators have developed and described a gene cloning and expression platform that allows whole proteome microarrays to be constructed from genomic sequence information in a relatively short time. The platform can then be screened with large numbers of well-characterized sera to identify antibody targets. This powerful highthroughput approach lends itself to the rapid identification of reactive antigens on an epidemiological scale, and should lead to the development of improved serodiagnostic tests for many diseases (Figure 1).
PROTEIN MICROARRAYS In the context of the discovery of disease-associated molecules, protein microarrays can be thought of as miniaturized dot blots or highly multiplexed enzyme-linked immunosorbent assays (ELISAs) for serology, and highly multiplexed sedimentation, co-immunoprecipitation assays, or enzyme function assays for protein biochemistry. They allow data collection from hundreds of parallel interaction studies in an area the size of a single well of a 96-well ELISA plate. To achieve this type of miniaturization, the chemistry of the surface being used for spotting the proteins is important. Reagents can
158
DISCOVERY OF DISEASE-ASSOCIATED MOLECULES
be coated onto activated plastics or glass to accommodate the deposition and subsequent binding of the protein onto the surface. Some examples of the chemistries used to bind proteins to the substrates are described below.
Two-Dimensional Substrates Well-characterized protein-binding substrates, used extensively in other assays, have been transferred or “ported” for the use of microarray fabrication. Most of these surface substrates were used extensively in DNA microarray research and have been used with success with purified protein arrays. There are two principal ways that proteins bind to surfaces: (1) by the formation of covalent bonds with reactive groups on the surface of the slide, and (2) by noncovalent interactions with the slide substrate. In slide surface chemistry, the use of aldehyde and epoxy coatings allows for covalent bonds between the proteins by virtue of the amine group on amino acids and the activated aldehyde of the epoxy group on the slide [1,2]. The primary amine group reacts with the aldehyde, or epoxy, groups on the slide surface. The electrons on the nitrogen attach to the carbon with a partial positive charge on the reactive group and form a nitrogen–carbon bond. In protein–aldehyde coupling, the attachment is stabilized by a dehydration reaction. Another option for two-dimensional substrates is the coat the slide surface with a molecule that is part of a high-affinity binding interaction. Binding-partner substrates are not as common but are still useful. There are examples of protein-binding surfaces that have been treated with avidin, streptavidin, glutathione, and monoclonal antitag antibodies. Unlike the aldehyde and epoxy examples, these are very specific and require that the appropriate binding partner be coupled to the protein(s) being spotted.
Three-Dimensional Substrates Polyacrylamide-coated slides form a three-dimensional substrate for protein microarrays. Much like the polyacrylamide gels used for electrophoresis, they are reported to provide high probe loading capacity and a hydrophilic, solution phase–like environment, which is thought to preserve protein probe function [3–6]. Consequently, this flexible and versatile platform is well suited for many types of proteins and has been used to develop a variety of assays in microarray format. A three-dimensional hydrogel coating preserves the native three-dimensional structure of proteins, thereby maintaining stability and functionality. The reactive chemistry is stable and remains active even during very long spotting (also know as “printing”) runs. Agarose is an economical, protein-friendly medium that reduces the background of the slide significantly, which can interfere with signal detection of low-abundance proteins or pep-
PROTEIN MICROARRAYS
159
tides. There are other proprietary three-dimensional porous polymer surfaces, such as those produced by Full Moon Biosystems, that claim to provide a high level of sensitivity and specificity for fluorescence-based assays. All seem to have similar properties, which include covalent multifunctional bonding sites that provide high binding specificity by covalent attachment to surface functional groups. Nitrocellulose is the best-characterized protein-binding chemistry that has been adapted successfully to microarray fabrication. Nitrocellulose membranes are used for immobilizing a wide variety of molecules, such as nucleic acids in Southern and Northern blots, as well as proteins in Western blots. Nitrocellulose makes an ideal surface for a wide variety of assays, including antibody-based assays, due to its nonspecific affinity for proteins. Nitrocellulose is usually coated onto a nylon backbone to produce membranes. It has also been coated on activated glass microscope slide, and as such has become a very popular tool for the fabrication of protein microarrays. A variety of nitrocellulose-coated slides are currently available which are ideal for microarrays because of their high binding capacity per unit volume. At one end of the spectrum, the nitrocellulose surface may be as thin as 5 μm (Path Slides, GenTel), and at the other, the membrane can be as thick as 150 μm (SuperNitro, Telechem). In the case of nitrocellulose-coated slides, it is critically important to pick the correct thickness for the application. The thickness of the coating chosen will be influenced greatly by the type of microarray printing that is to be used. Contact vs. Noncontact Printing Like DNA microarray fabrication, protein microarray fabrication is dependent on applying the protein onto the substrate reliably and reproducibly. To achieve this objective, two major classes of spotting, or microarray printing, have evolved. Noncontact printing is the mechanism of spotting onto the slide or plate surface without any contact. The volumes range from picoliters up to hundreds of microliters having coefficients of variation (CVs) of less than 10%, depending on the technology used. One such technology, solenoid-based dispensing systems, relies on the rapid opening and closing of valves, which permits the flow of pressurized liquid. Rapid opening and closing of a valve, or pulsing, results in a very small volume being deposited. Leaving the valve open longer results in the deposition of larger volumes. Solenoid-based systems typically spot between 10 nl and hundreds of microliters. Another noncontact technology, piezoelectric-based dispensing systems, can dispense as little as picoliter quantities of liquid. They rely on electrical pulses that create a pressure wave when a piezoelectric element is deformed and squeezes the tubing, the pressure created results in the displacement of a small volume of liquid. By opening the valve at the precise moment that the piezoelectric element is deformed, a small volume of liquid is deposited on the substrate.
160
DISCOVERY OF DISEASE-ASSOCIATED MOLECULES
The main drawback of noncontact printing is that it is not the ideal tool for arraying more viscous or concentrated protein mixtures. Cell lysates immediately come to mind when considering heterogeneous materials that may cause problems for noncontact arrayers. There is a high potential for clogging not only the dispensing solenoids, but the tubing as well. Another drawback is that the noncontact dispensing robots available are limited by the number of valves that can be used to array spots onto a slide or plate; thus, throughput becomes a problem. When printing thousands of different samples, the number of solenoids becomes the limiting step. Contact printers address these shortcomings directly. Although there are numerous variants of contact printing technologies, the two most prevalent are the solid pin and split pin methods. Unlike the solenoid- and valve-based systems, pin-based printing systems are not limited by size. Some printers use the traditional 9-mm pin spacing used in 96-well microtiter plates, but the majority use 4.5-mm pin spacing. More recently, printers that use 2.25-mm pin spacing have become available. Depending on the size of the printhead, as many as 256 pins in an 8 × 32 configuration can be fitted when the 2.25-mm spacing is used. This results in 256 samples for every sample uptake, or over 1000 samples with only four sample uptakes. The basics are the same in solid pin and split pin printing. The robot dips the pins into the samples, the pins deposit the sample on a substrate by making contact, the pins re-dip when necessary, and the pins are washed before the next sample. Solid pins must re-dip after every deposit on a substrate; split pins act more like a quill and fill up a reservoir, of which only a fraction is deposited every time the pins make contact. The number of spots that can be made after sample uptake by split pins varies greatly depending on a number of factors. The viscosity of the sample, the substrate type due to varying amounts of adsorption, the humidity, the pin design, and whether or not it has a reservoir are all factors and must be tested for empirically. When making DNA arrays, the samples are more homogeneous; they have similar viscosities, similar concentrations (moles of DNA), and similar molecule substrate interactions. When dealing with protein microarrays these characteristics are seldom the same. Proteins are usually dialyzed after purification with whichever buffer will keep the protein in solution. This can very widely from protein to protein. This will contribute to potential viscosity differences. Protein concentrations can vary greatly depending on the success of expression and subsequent purification of the protein. In case the concentrations are similar, proteins vary in size and there will be a heterogeneous distribution of the number of molecules per spot. The amino acid sequence of the protein(s) will also influence how it interacts with the substrate on which it is being spotted. In addition, the isoelectric point of the proteins will undoubtedly have an effect on how they bind to the substrate, and furthermore, hydrophilic proteins will behave differently than hydrophobic proteins. There is also the problem of protein aggregation, or too high a concentration, clogging a split pin. There are various-sized split pins with chan-
BIOMARKER RESEARCH
161
nels up to 400 μm across which would accommodate most substances. As attempts to make higher-density arrays to increase throughput take place, this option become less feasible. When the substrate to be arrayed is not compatible with the split pins necessary to meet the array density required, solid pins must be employed. The need for solid pins is more of an issue when arraying cell lysates, cells of bacterial culture fractions. Solid pins are a better choice for these situations, but at the expense of throughput. Even the fastest solid pin machine, the Aushon 2470, requires twice the number of pins to outperform the fastest split pin systems, such as the Omnigrid line of microarray printers manufactured by Genomic Solutions. Throughput may or may not be an issue, but when dealing with proteins, most researchers need the process to take as little time as possible.
GENOME–PROTEOME CONTENT CURRENTLY USED FOR BIOMARKER RESEARCH There is a need for an expressible open reading frame (ORF) eome that can be made into an arrayable proteome in the search for disease biomarkers. The number of proteins that are available for screening limits disease biomarker discovery. For example, in the search for human autoantigens, the availability of proteins to use in screening studies has been a bottleneck. The National Institutes of Health (NIH) has a collection of approximately 29,636 human ORFs that contain 17,526 (source: http://mgc.nci.nih.gov/) nonredundant gene sequences. Although Invitrogen has cloned all the human ORFs into their Gateway system, the creation of an assayable human proteome has been limited by the protein purification process. After almost a decade, the number of human proteins contained on their commercial Human ProtoArray chip remains at 8000 (<30% coverage). Recent work has shown that that these ProtoArray human protein chips, although limited in scope, have been used successfully to identify serodiagnostic autoantigens (cancer biomarkers). Hudson et al. profiled serum samples from 30 cancer patients and 30 healthy persons using microarrays containing 5005 human proteins. Ninety-four discriminatory serodiagnostic antigens were identified that exhibited enhanced reactivity from sera in cancer patients relative to control sera [7]. This is a good proof-of-principle experiment that should be expanded to verify that the 30 serum samples used for this experiment are actually representative of the ovarian cancer population as a whole, not just markers for families that are predisposed to this type of cancer but may never develop the disease. An additional control would have included serum samples from the sisters of the patients who did not have ovarian cancer. To assay even a small population with an appropriate number of controls is financially prohibitive using the commercially available human ProtoArray. The array is available for $3000 each. For this reason the Human ProtoArray
162
DISCOVERY OF DISEASE-ASSOCIATED MOLECULES
chip offered by Invitrogen has not been utilized for large-scale screening. Small-scale experiments may be fine for a publication, but to identify robust biomarkers there needs to be a massive screening that allows profiling as many healthy, predisposed, and diseased persons as possible so that the data derived from the experiment are biologically relevant. It is only at this scale that we can truly make any conclusions about the immune response. The reason that we still have not found the ideal biomarkers is due to the complexity of immune response and its variability from person to person. The current human protein chip is being distributed as a consumable with a sizable price tag. To attempt a study of the size required to determine disease-state discriminatory antigens, or groups thereof, there needs to be a higher-throughput, more costeffective method of generating the human proteome chip. Infectious disease biomarker discovery faces challenges as well. Purified proteins of many infectious agents are scarce. Purified protein microarrays of pathogens usually include only a handful of proteins. The lack of content has been the major reason for the slow rate of emergence of new serodiagnostic markers. For example, only 149 out of the 4198 ORFs (3.5%) constituting the Yersinia pestis proteome were used to profile the antibody response to live vaccine [8]. In addition, 156 out of approximately 1000 (15.6%) Chlamydia trachomatis proteins were used to profile the human humoral immune response to urogenital tract infections in human subjects [9]. The ability to express and purify recombinant proteins in the lab limits the amount of unique content available for serological screening. This low percentage of proteome coverage is common, and most laboratories in this field are studying a few antigens at a time from thousands of possible candidates. One of the innovative ways is which researchers have overcome the lack of coverage is to apply the lessons learned from whole-cell Western blots and two-dimensional gels to microarray research. Native protein microarrays are heterogeneous protein pools that have been fractionated using chromatography. Sartain et al. developed a technique for the separation of native Mycobacterium tuberculosis cytosol and culture filtrate proteins that resulted in 960 unique protein fractions that were used to generate protein microarrays. These 960 fractions represented all the expressed proteins, having some proteins represented in different fractions. When these microarrays were used to profile the reactivity of different disease states, various previously characterized proteins as well as some novel proteins were identified from these fractions using mass spectrometry [10]. A similar approach was utilized in a different field to profile the sera of prostate cancer patients and controls. Twodimensional liquid chromatography was used to separate proteins from the prostate cancer cell line LNCaP into 1760 fractions, which were subsequently spotted onto a microarray. The microarrays were probed with serum samples from 25 men with prostate cancer and 25 male controls. Statistical analysis revealed that 38 of the fractions showed significantly more reactivity in the cancer group. Samples were classified with up to 98% accuracy using the same sera [11]. Although a good academic exercise, the protein fraction microarray
FABRICATION OF PROTEOME MICROARRAYS
163
needs further studies to uncover the proteins responsible for the reactivity. In the tuberculosis case, mass spectrometry was able to characterize a number of reactive proteins, but the authors conceded that this approach was a stopgap until the complete proteome is cloned, expressed, and arrayed. In the case of prostate cancer the reactive fractions must be analyzed further to determine which of the many proteins represented in the fractions are the important ones. Another challenge that faces users of protein fraction microarrays is that proteins are not all represented equally in these fractions. Antigens that potentially are extremely good biomarkers may be underrepresented, due to an expression level regulated by cell degradation pathways that may target these proteins.
HIGH-THROUGHPUT GENE CLONING AND EXPRESSION FOR THE FABRICATION OF PROTEOME MICROARRAYS The genomics era ushered in a new type of research. For the first time in history the decoded “blueprint of life” was available for study. The Human Genome Project (HGP) was the first major undertaking of its kind. The HGP was a 13-year project coordinated by the U.S. Department of Energy and the National Institutes of Health that was completed in 2003. The goal was to identify all of the approximately 20,000 to 30,000 genes in human DNA, determine the sequences of the 3 billion chemical base pairs that make up human DNA, and store this information in databases. This initiative sparked the boom in genome sequencing, and today there is a vast amount of publicly available data stored in numerous databases. The genomes of more than 180 organisms have been sequenced since 1995, and the data from these projects are available to the scientific community. The availability of genome sequence data leads to the first big boom in microarray research. Researchers used this sequence to create libraries of oligo nucleotides representing all the ORFs of the organism being studied. Robotic machines were then used to transfer and arrange nanoliter amounts of thousands of gene sequences representing a cell expression state on a single microscope slide. These high-density, highly organized arrays of DNA are called microarrays. DNA microarrays allowed for the miniaturization and high-level multiplexing of Southern blots. Affymetrix and Agilent technologies pioneered the use of DNA microarrays for full genome gene expression analysis. Researchers were now able to track the expression of genes in response to various stimuli and incorporated this new tool in their existing research. They could track the synthesis and degradation at the mRNA level, to determine which genes are turned on and which are turned off in a given cell. We now knew the number and sequence of the expressible genes and the expression pattern of the genes in response to a particular stimulus. An area that still needs to be explored, however, is how the proteins performing their function once expressed. This conundrum has ushered in the age of protein microarrays.
164
DISCOVERY OF DISEASE-ASSOCIATED MOLECULES
Currently, protein microarrays are being used for a variety of assays. There has been much trial and error when attempting to miniaturize binding assays, enzyme activity assays, and serological assays. The same tools that helped propel the era of the DNA microarray are being applied to protein microarrays. Protein microarrays permit the miniaturization and high-level multiplexing of protein-based assays. Companies like Protometrix cloned, expressed, and purified thousands of proteins to make the Yeast Proteome chip and the Human Partial Proteome chip. The Human chip, now on version 8, is available commercially from Invitrogen, with approximately 8000 human proteins that can be used for screening binding partners, substrates for enzymes, small molecules, and antibody profiling. That is the equivalent of 8000 cosedimentation assays, 8000 phosphorylation assays, or 8000 ELISA assays on a surface area of 3 inches × 1 inch. When all the approximately 29,000 ORFs that have been cloned are eventually expressed and printed on a microarray, all proteins expressed by human cells will be contained on two or three microscope slides and ready for assay. Traditional multistep cloning methods have been a major stumbling block because of their innate low throughput. Starting first by amplifying the desired insert by polymerase chain reaction (PCR), then digesting it and the vector, followed by ligation, subsequent transformation, plating, colony picking, and verification of clones, the volume of work is the rate-limiting step. Technical problems may be encountered in any of the aforementioned steps, and most often than not, researchers spend months cloning a few genes of interest. There are, however, ways to circumvent this laborious cloning methodology when attempting to manufacture a high-content microarray. One way is to take advantage of the vast genomic sequence data and the high-throughput friendly PCR reaction to amplify out all of the ORFs, followed by a second PCR step to make the products transcriptionally active. The secondary PCR step adds a promoter region at the 5′ end and a stop codon at the 3′ end of the PCR product, making a transcriptionally active PCR (TAP) product. The production of TAP fragments for expression can yield a large library of expressible ORFs in a relatively short amount of time. TAP fragments have been reported to express very well in a variety of expression systems [12]. Organisms with large genomes that would otherwise take decades to clone into an expressible form will now only take two rounds of PCR. A recent report by Regis et al. has shown that in the case of Plasmodium falciparum, TAP fragments could be used for a large-scale screening of the humoral immune response and eventual selection of antigens for further study [13]. While providing the throughput, TAP fragments are limited by the amount of product produced during the second PCR step. Recent research has shown that serological screening of cDNA libraries created from cancer cell lines can be used for the discovery of cancer biomarkers. Cancer is the result of a synergistic malfunctioning of multiple signaling pathways. This may or may not be the same set of pathways, proteins, and mutations in every cancer type and tissue, so it becomes even more critical to
FABRICATION OF PROTEOME MICROARRAYS
165
cast the widest net to find the most potential biomarkers [14]. Lung cancer cell lines were used to make cDNA libraries that have been used successfully for serological identification of serodominant antigens [16,17]. This is encouraging, but the task at hand with this technology is more daunting. After the creation of the reverse complement of the mRNA, there is still a need to use brute-force cloning techniques to accomplish the task. One must keep in mind that the majority of mRNAs in most tissues are in low abundance, and thus quite a few of the mRNAs will be underrepresented in the cDNA library [15,18]. Extremely large number of clones are required to ensure a good representation of these low-abundance genes. There is also the very real problem of partial cDNA clones, which do not have the complete sequence. If we are fortunate enough to minimize these two issues, there is still the very real bottleneck of screening the clones. High-density gridding is becoming more and more common and makes it possible to screen large amounts of mRNA complementary DNA. This technology is designed largely for gene discovery, but if refined to the point that all mRNA transcripts are represented equally, we may have a tool for more reliable screening of human serum samples. The next rung on the ladder would be the creation of an expressible genome library that contains all the ORFs in the genome. To make the genome library quickly and efficiently, we need to consolidate all the traditional cloning steps into one efficient step. Recombination in vivo provides the all-encompassing single step. Recombination cloning is widely used in research, but for a long time it was thought that bacteria lacked the mechanism to allow for recombination. However, bacterial in vivo homologous recombination has been an efficient and heavily used tool in the genetic field. Recombination-based cloning allows DNA sequences to be inserted or deleted without regard to location of restriction sites [19,20]. One very widely used methodology for recombination cloning is the Gateway system now offered commercially by Invitrogen. The first step to Gateway cloning is inserting the gene of interest into the Gateway entry vector. There are two ways to clone your gene of interest into a gateway entry clone. The first is the standard cut-and-paste protocol using restriction enzymes and ligase. The second way is to create a PCR product with terminal attB sites, using primers containing a 25-base pair attB sequence, plus four terminal G’s. This product will then be inserted in the entry clone using reconstituted recombination machinery. PCR-based inserts can be made using genomic DNA, a cDNA library, or an plasmid clone containing your gene of interest. An entry clone is a vector containing your gene of interest and flanked by Gateway att recombination sites. Bacteriophage lambda att site recombination is a well-characterized phenomenon [21]. Bacteria have a stretch of DNA sequence called att encoded into their genome, and bacteria phage have the same stretch of DNA sequence. When the phage infects a bacterium, the lambda DNA injected recombines with the corresponding bacterial DNA via the att sites in the presence of integration-specific enzymes. The enzyme-assisted recombination results in integration of the phage DNA into the bacterial genome. Using the same reconstituted lambda
166
DISCOVERY OF DISEASE-ASSOCIATED MOLECULES
att site recombination system, entry clones containing a gene of interest can then be transferred into any expression vector for subsequent protein expression. Although the Gateway System works and increases cloning throughput dramatically, it is not efficient enough to work as a true high-throughput alternative. In vivo recombination cloning directly into an optimized expressible vector, however, appears to have solved the throughput dilemma. Recent advances in cloning methodologies have resulted in the ability to clone entire bacterial ORFeomes, comprising thousands of genes, in a relatively short time. Figure 2 illustrates the typical results one would see beginning with the PCR step followed by the in vivo recombination cloning, and checking the clones by PCR using sequence specific primers. Specifically, in vivo recombination cloning [22] in Erscherichia coli has broken the barriers posed by traditional brute-force cloning methodologies that rely on multiple-step cloning techniques: from PCR, restriction enzyme digestion of the PCR product and vector, ligation, to single colony screen and selection. Available sequence data are used to design 5′ and 3′ gene-specific primers for all ORFs encoded in the genome. The primers contain 53 nucleotides, which are comprised of a 33-nucleotide recombination adapter sequence and a 20-nucleotidegene specific sequence. The PCR products are then mixed with a T7-based linear expression vector as described previously [22] and transformed into supercompetent DH5a cells (Antigen Discovery Inc.). The cells transformed are grown overnight and are checked for turbidity the following day. DNA can then be purified from the mixed cultures using 96-well plate–based mini-prep protocols such as Qiaprep Turbo 96. The plasmid DNA purified from overnight culture can then be used to express the protein encoded by the insert. Proteins are expressed in a coupled in vitro transcription-translation (IVT) reaction in an E. coli–based cell-free
Figure 2 Synthesis and verification of clones. Representative images of whole Vv ORFeome PCR (A), cloning (B), and QC-PCR (C).
FABRICATION OF PROTEOME MICROARRAYS
167
expression system such as RTS 100 from Roche. The unpurified proteins can then be printed directly to nitrocellulose-coated slides along with a set of negative and positive controls using a contact microarray printer. It is crucial that the mixture of unpurified proteins be spotted as quickly as possible. The quality of the arrays is inversely proportional to the amount of time from the end of the reaction to the deposition onto the nitrocellulose. Once the unpurified protein mixture is spotted and dry, it is very stable. In the high-throughput microarray fabrication first reported by Davies et al. in 2005 [22], the cell-free expressed proteins can be detected using antibodies against the N-terminal polypolyhistidine (polyHis) tag and the C-terminal hemagglutinin (HA) tag directly on the chip. The antibodies were used to monitor expression of the large numbers of parallel reactions. The arrays probed using a mouse monoclonal antibody raised against the polyHis epitope and rat monoclonal antibody raised against the HA epitope are visualized using a fluorescent-based microarray scanner. An example of one such hybridization and scanning output can be seen in Figure 3. The data can then be quantified using a microarray data analysis software package that can quantify the intensity of the spots on the microarray chip. Each array contains positive control spots printed from serial dilutions of whole immunoglobulin G (IgG). Each array also contains no DNA negative control spots, and the reactivity of
(A)
(B)
Figure 3 Quality assessment for protein expression. Images of whole Vv proteome microarray probed with mAb against polyhistidine (A) and hemoagglutinin tags (B). 99.0% of Vv proteins were reactive to antipolyhistidine antibody and 88.2% to antihemoagglutinin antibody. The images show four dilutions of human IgG (yellow box), positive controls (green box), and six negative controls (mock transcription/translation reaction) (red circles). The remaining spots are Vv proteins. (See insert for color reproduction of the figure.)
168
DISCOVERY OF DISEASE-ASSOCIATED MOLECULES
these spots is low for both serum samples. The positive and negative controls are used to normalize the data using a modified VSN package in a statistical environment named R [23,24] from arrays that are probed on different days. There are also serially diluted EBNA1 protein control spots, which have been shown to be reactive to varying degrees in different human subjects [25–27]. Once an array has passed quality assurance, it is ready to be used in serological studies.
USING WHOLE PROTEOME MICROARRAYS TO SCREEN FOR DISEASE-STATE DISCRIMINATORY ANTIGENS The ability to find biomarkers relies on the ability to profile the immune response to entire proteomes. Antigens that are highly reactive to infection, or auto-antibodies, can be used in diagnostics and therapeutics. They are also potential candidates for the development of vaccines. Using the model organism vaccinia virus Western Reserve strain, researchers were able to use in vivo recombination cloning to clone the 185 Vv proteins, or greater than 90% of the proteome, from genomic DNA into an expressible vector. This proteome vector library was subsequently expressed and arrayed onto nitrocellulose-coated slides. Each slide contains 16 nitrocellulose pads with a Vv proteome array per pad. Each slide is capable of producing serological data for 16 serum samples interrogating approximately 200 spots (185 Vv proteins plus controls), yielding 3200 data points per slide, the equivalent of thirty-three 96-well ELISA plates. Davies et al. [22] showed that there are specific vaccination-state and disease-state profiles that are discernible using whole proteome screening. Comparing the pre- and postvaccination data for each species, naive mice, monkeys, and humans have a very different reactivity pattern than that of vaccinated mice, monkey, and humans. In Figure 4 we see one such example comparing the hyperimmune serum vaccinia immunoglobulin (panel A) from Cangene and a naive subject (panel B). These results verified that this platform was a rapid way to scan humoral immunity comprehensively from vaccinated or infected humans and animals on a whole proteome scale [22]. In 2008, Davies et al. showed the power of the technology for profiling the immune response. Figure 5 is a heat map of an experiment comparing the immune response of two different stains of vaccinia virus. The U.S. government is considering using modified vaccinia virus Ankara (MVA) as a smallpox vaccine to replace live vaccinia virus, otherwise known as Dryvax, because it is safer. The heat map is a global view that allows for comparison of the immune response elicited by MVA and Dryvax and can give us an idea of whether it elicits a similar response to confer protection. Barbour et al. recently published an article that takes advantage of this technology to clone and express 1292 Borrelia burgdorferi proteins. Employing enzymeless recombination cloning and E. coli cell-free expression, they produced a protein array representing approximately 80% of the genome. The
DISEASE-STATE DISCRIMINATORY ANTIGENS
(A)
169
(B)
Figure 4 Naive subject serum and VIG. Images of whole vaccinia virus proteome microarray probed with vaccinia immunoglobulin (Cangene) (A) and naive individual (B). The IgG control spots (shown in yellow) show signal intensities for both samples; both objects do not react with the no-DNA control spots (shown in red). The images show four dilutions of human IgG (yellow box), positive controls (green box), and six negative controls (mock transcription/translation reaction) (red circles). The remaining spots are Vv proteins. (See insert for color reproduction of the figure.)
antibody reactivities of sera from patients’ Lyme disease were compared to the antibody reactivities of sera from controls. Overall, approximately 15% of the (ORFs) of B. burgdorferi in the array elicited an antibody response in humans with natural infections. Among the immunogens, 103 stood out on the basis of statistical criteria [26]. In 2007 this same platform technology allowed fabrication of the Francisella tularensis proteome. The proteome was interrogated with serum samples to find biomarkers that are specific to tularemia, the disease caused by F. tularensis. This particular study was looking at the efficacy of subunit vaccines, and the proteome arrays were used to take a look at the humoral response to different vaccination protocols. The stimulation of a protective immune response against intracellular pathogens using nonreplicating vaccines was investigated. Many of today’s vaccines still use attenuated organism vaccines or a different less virulent strain as a vaccine. Eyles et al. showed the effect of different adjuvants on the response to the nonreplicating vaccine. The F. tularensis proteome chips were used to effectively differentiate which adjuvant was stimulating an IgG1 or IgG2 response. An IgG1 response is related to the TH2 receptor and does not confer protection. An IgG2 response is associated with the TH1 receptors and has been shown to be associated with protective immunity. Furthermore, these data give insight into the protective immune response
170
DISCOVERY OF DISEASE-ASSOCIATED MOLECULES (A)
MVA
DVX
(B) 1°
wk 6
I EGBDHAJFC I EGBDHAJ FC
MV
1. Structural
EV core other
2. Regulation
transcr. replic.
3. Virulence / host defense
4. Unknown
5000
pre
wk 4
2° pre
wk 4
9 70 157 137 15 116 91 141 57 11 65 10 97 9 70 157 137 15 116 91 141 57 11 65 10 97 2 12 13 5 16 14 6 4 3 7 8 1 2 12 13 5 16 14 6 4 3 7 8 1
pre
H3 D8 L1ss A14 A27 A13 A17 A9 A21 L5 H2 A28 B5 A58 A33 F13 A36 A34 F5 A10 I1 L4 F17 A3 D3 D2 G7 A4 A15 WR148 D13 A11 WR149 E10 F8 H5 A28 A18 A5 I8 D11 E10R J3 H4 WR083 A7 D1 D7 J6 I6 E5 A48 J2 E9 E3 K2 A46 WR011 C7 WR208 K1 C11 B9 A52 B19 B2 WR189 E2 A42 A19 G5 B12 WR018 A37 A51 A55 C1 L2 M1 F15 C8 A47 E7 F7 F11 WR148 C20 WR207 C17 B3
L L L L L L L L L L L L E/L E/L L L E/L L E L L L L L L L L L L L L L L L L E/L E E/L E E/L L L E L E L E E E/L E/L E E E E E E E/L E/L E/L E/L E E/L E E/L E E/L E/L E/L L L E E E E E/L E/L E E/L E E/L L E/L E/L E E
50000
Figure 5 Antibody profiling of human antibody profiles for pre- and postvaccination with MVA (A) and WR (B). For Dryvax responses, primary (n = 13) and secondary (n = 12) infections are shown. (From ref. 25.) (See insert for color reproduction of the figure.)
and have potentially important implications for the rational design of nonliving vaccines for tularemia and other intracellular pathogens. Recent developments in the fabrication of protein microarrays allow the multiplexing of the entire proteome in profiling the immune response. This approach is powerful enough to discover content to finally replace tests used currently to diagnose infectious disease, some of which were developed over a century ago. The diagnosis of tuberculosis (TB) is an excellent example of such an opportunity. TB is a major cause of illness and death worldwide. Globally, 9.2 million new cases and 1.7 million deaths from TB occurred in 2006 [28]. Multiple tests must be performed to diagnose TB. A person sus-
DISEASE-STATE DISCRIMINATORY ANTIGENS
171
pected of having TB is first tested using the tuberculin skin test (TST), also known as the purified protein derivative (PPD) test, pioneered by French physician Charles Mantoux in 1907. The TST is used routinely in humans, but the results are subjective, variable, and their interpretation is rarely consistent [29]. Sputum testing identifies M. tuberculosis directly, providing specificity, but current methodologies do not give good sensitivity [30,31]. The gold standard for the diagnosis of TB is a bacterial culture, which is a time-consuming process that requires a dedicated microbiology lab and several weeks to obtain the results [32]. More novel and sophisticated methods such as real-time PCR (RT-PCR) and cell-mediated-immunity–based detection assays, such as interferon gamma release assays (IGRAs), are accurate but require sophisticated equipment and highly trained personnel [33,34]. The limitations of the current TB diagnosis techniques have sparked renewed interest in antibody detection– based tests, which are relatively straightforward. The immune response to TB infection is variable enough that an infected population may not recognize a single antigen [35–40]. Methodology using previously characterized M. tuberculosis proteins using immunoassays such as ELISAs and microbead suspension assays for the purpose of TB diagnosis have shown the need of a multiplex approach at diagnosing TB [41,42]. There is a dire need for the discovery of an optimal panel of antigens that will allow accurate and reliable diagnosis. Cancer is another field that is in urgent need of diagnostics with improved accuracy. We have known that cancer cells produce auto-antibodies since the 1950s [43]. Researchers found that these auto-antibodies were against proteins that were associated with these cells. Auto-antibodies against the tumor suppressor p53 have been found in breast cancer, lung cancer, and ovarian cancer [44–47]. Proteins that are overexpressed in a cancer cells also elicit the production of auto-antibodies. For example, the overexpression of c-Myc is linked to the presence of auto-antibodies against c-Myc in lung cancer [48]. There is evidence that the immune system begins to produce auto-antibodies against cancer cell–associated proteins years, perhaps even decades, before the onset of disease [49–51]. There is a dire need for earlier and more accurate cancer detection. The development of “early warning” tests will ensure a better prognosis for the patient. Serological biomarkers can also be a tool for the development of antibody-based therapeutics to treat cancer. To achieve this objective, it is important to identify sets of cancer-specific antigens, or biomarkers, that can differentiate disease states as well as different forms of cancer. Serum autoantibody profiling is a promising approach for early detection and diagnosis of breast cancer. It is becoming more and more clear that it will take more than one marker to achieve the early warning system that cancer clinicians so desperately seek, and it is likely that well-characterized panels of biomarkers will be required. Zong et al. recently tested a breast cancer T7 phage library with healthy control sera and breast cancer sera and concluded that a panel of auto-antibodies, not a single auto-antibody, is required for optimal accuracy of disease-state discrimination [49]. A similar study tested colon cancer sera
172
DISCOVERY OF DISEASE-ASSOCIATED MOLECULES
with a colon cancer T7 phage library. Ran et al. were able to identify 24 autoantigens associated with colon cancer. A six-marker (six antigens) achieved 91.7% sensitivity and 91.7% specificity in the training set of sera. In a testing set with this marker panel, they predicted 85% of the samples correctly [52]. There are approximately 30,000 ORFs in the NIH human gene collection, representing 17,500 nonredundant genes and variants there of [53]. Using enzymeless recombination cloning, a human proteome chip could be made with upward of 90% of the ORFs represented. With the ability to assay 27,000 proteins at once, the discovery of new biomarkers for human disease will be limited only by the number of serum samples used to determine disease-state discriminatory sets of antigens, and potentially, antigen panels that differentiate between manifestations of the same disease, such as cancer. SUMMARY The high-throughput proteomics approach described here is the natural progression of the genome sequencing projects of the 1990s, but instead of sequencing the genome, we are fabricating the proteomes. Organisms such as M. tuberculosis with 4000 ORFs, or malaria (P. falciparum and P. vivax) with even larger genomes, can now be assayed on a whole-proteome scale. By providing a comprehensive tool to test all the proteins at once, we open the door to a new nonbiased approach in biomarker discovery. Instead of working with a handful of proteins and attaining very limited data sets, we can now look at these organisms from a whole-proteome perspective, allowing the immune system to tell us what is and is not important. Although this approach has been used primarily for infectious disease research, it also holds promise for the discovery and study of biomarkers for human disease. REFERENCES 1. Redkar RJ, Schultz NA, Scheumann V, et al. (2006). Signal and sensitivity enhancement through optical interference coating for DNA and protein microarray applications. J Biomol Tech, 17(2):122–130. 2. Angenendt P, Glökler J, Sobek J, Lehrach H, Cahill DJ (2003). Next generation of protein microarray support materials: evaluation for protein and antibody microarray applications. J Chromatogr A, 1009(1–2):97–104. 3. Chiari M, Cretich M, Corti A, Damin F, Pirri G, Longhi R (2005). Peptide microarrays for the characterization of antigenic regions of human chromogranin A. Proteomics, 5(14):3600–3603. 4. Brueggemeier SB, Wu D, Kron SJ, Palecek SP (2005). Protein–acrylamide copolymer hydrogels for array-based detection of tyrosine kinase activity from cell lysates. Biomacromolecules, 6(5):2765–2675. 5. Cretich M, Pirri G, Damin F, Solinas I, Chiari M (2004). A new polymeric coating for protein microarrays. Anal Biochem, 332(1):67–74.
REFERENCES
173
6. Brueggemeier SB, Kron SJ, Palecek SP (2004). Use of protein–acrylamide copolymer hydrogels for measuring protein concentration and activity. Anal Biochem, 329(2):180–189. 7. Hudson ME, Pozdnyakova I, Haines K, Mor G, Snyder M (2007). Identification of differentially expressed proteins in ovarian cancer using high-density protein microarrays. Proc Natl Acad Sci USA, 104(44):17494–17499. 8. Li B, Jiang L, Song Q, et al. (2005). Protein microarray for profiling antibody responses to Yersinia pestis live vaccine. Infect Immun, 73(6):3734–3739. 9. Sharma J, Zhong Y, Dong F, Piper JM, Wang G, Zhong G (2006). Profiling of human antibody responses to Chlamydia trachomatis urogenital tract infection using microplates arrayed with 156 chlamydial fusion proteins. Infect Immun, 74(3):1490–1499. 10. Sartain MJ, Slayden RA, Singh KK, Laal S, Belisle JT (2006). Disease state differentiation and identification of tuberculosis biomarkers via native antigen array profiling. Mol Cell Proteom, 5(11):2102–2113. 11. Bouwman K, Qiu J, Zhou H, et al. (2003). Microarrays of tumor cell derived proteins uncover a distinct pattern of prostate cancer serum immunoreactivity. Proteomics, 3(11):2200–2207. 12. Liang X, Teng A, Braun DM, et al. (2002). Transcriptionally active polymerase chain reaction (TAP): high throughput gene expression using genome sequence data. J Biol Chem, 277(5):3593–3598. 13. Regis DP, Dobaño C, Quiñones-Olson P, et al. (2008). Transcriptionally active PCR for antigen identification and vaccine development: in vitro genome-wide screening and in vivo immunogenicity. Mol Biochem Parasitol, 158(1):32–45. 14. McCarrey JR, Williams SA (1994). Construction of cDNA libraries from limiting amounts of material. Curr Opin Biotechnol, 5(1):34–39. 15. Suzuki Y, Sugano S (2001). Construction of full-length-enriched cDNA libraries: the oligo-capping method. Methods Mol Biol, 175:143–153. 16. Ali Eldib AM, Ono T, Shimono M, et al. (2004). Immunoscreening of a cDNA library from a lung cancer cell line using autologous patient serum: identification of XAGE-1b as a dominant antigen and its immunogenicity in lung adenocarcinoma. Int J Cancer, 108(4):558–563. 17. Li HH, Ma LJ, Wang QQ (2004). [Construction and characterization of cDNA library for lung cancer cell line YTMLC-90 isolated from Gejiu, Yunnan province]. Xi Bao Yu Fen Zi Mian Yi Xue Za Zhi, 20(4):465–468. 18. Fulton LL, Hillier LD, Wilson RK (1995). Large-scale complementary DNA sequencing methods. Methods Cell Biol, 48:571–582. 19. Thomason L, Court DL, Bubunenko M, et al. (2007). Recombineering: genetic engineering in bacteria using homologous recombination. Curr Protoc Mol Biol, (1):16. 20. Wendland J (2003). PCR-based methods facilitate targeted gene manipulations and cloning procedures. Curr Genet, 44(3):115–123. 21. Groth AC, Calos MP (2004). Phage integrases: biology and applications. J Mol Biol, 335(3):667–678. 22. Davies DH, Liang X, Hernandez JE, et al. (2005). Profiling the humoral immune response to infection by using proteome microarrays: high-throughput
174
23.
24.
25.
26.
27.
28. 29.
30.
31.
32. 33.
34. 35. 36.
37. 38.
DISCOVERY OF DISEASE-ASSOCIATED MOLECULES
vaccine and diagnostic antigen discovery. Proc Natl Acad Sci USA, 102(3): 547–552. Sundaresh S, Doolan DL, Hirst S, et al. (2006). Identification of humoral immune responses in protein microarrays using DNA microarray data analysis techniques. Bioinformatics, 22(14):1760–1766. Sundaresh S, Randall A, Unal B, et al. (2007). From protein microarrays to diagnostic antigen discovery: a study of the pathogen Francisella tularensis. Bioinformatics, 23(13):i508–i518. Davies DH, Wyatt LS, Newman FK, et al. (2008). Antibody profiling by proteome microarray reveals the immunogenicity of the attenuated smallpox vaccine modified vaccinia virus ankara is comparable to that of Dryvax. J Virol, 82(2): 652–663. Barbour AG, Jasinskas A, Kayala MA, et al. (2008). A genome-wide proteome array reveals a limited set of immunogens in natural infections of humans and white-footed mice with Borrelia burgdorferi. Infect Immun, 76(8):3374–3389. Eyles JE, et al., (2007). Immunodominant Francisella tularensis antigens identified using proteome microarray. Crown Copyright 2007 Dstl. Proteomics, 7(13): 2172–2183. WHO (2008).Global Tuberculosis Control: Surveillance, Planning, Financing. WHO, Geneva, Switzerland. Eyles JE, Unal B, Hartley MG, et al. (2007). A systematic review of rapid diagnostic tests for the detection of tuberculosis infection. Health Technol Assess, 11(3):1–196. Gilpin C, Kim SJ, Lumb R, Rieder HL, Van Deun A (2007). Critical appraisal of current recommendations and practices for tuberculosis sputum smear microscopy. Int J Tuberc Lung Dis, 11(9):946–952. Steingart KR, Henry M, Ng V, et al. (2006). Fluorescence versus conventional sputum smear microscopy for tuberculosis: a systematic review. Lancet Infect Dis, 6(9):570–581. Baylan O (2005). [Culture based diagnostic methods for tuberculosis]. Mikrobiyol Bul, 39(1):107–124. Parashar D, Chauhan DS, Sharma VD, Katoch VM (2006). Applications of realtime PCR technology to mycobacterial research. Indian J Med Res, 124(4): 385–398. Hoffmann H, Loytved G, Bodmer T (2007). [Interferon-gamma release assays in tuberculosis diagnostics]. Internist (Berl), 48(5):497–498, 500–506. Steingart KR, Ramsay A, Pai M (2007). Commercial serological tests for the diagnosis of tuberculosis: do they work? Future Microbiol, 2:355–359. Steingart KR, Henry M, Laal S, et al. (2007). Commercial serological antibody detection tests for the diagnosis of pulmonary tuberculosis: a systematic review. PLoS Med, 4(6):e202. Bothamley GH (1995). Serological diagnosis of tuberculosis. Eur Respir J Suppl, 20:676s–688s. Raja A, Ranganathan UD, Bethunaickan R (2008). Improved diagnosis of pulmonary tuberculosis by detection of antibodies against multiple Mycobacterium tuberculosis antigens. Diagn Microbiol Infect Dis, 60(4):361–368.
REFERENCES
175
39. Wang EL, Liu WT, Li T (2006). [The serodiagnostic value of antigens secreted from Mycobacterium tuberculosis]. Zhonghua Jie He He Hu Xi Za Zhi, 29(7): 466–469. 40. Perkins MD, Conde MB, Martins M, Kritski AL (2003). Serologic diagnosis of tuberculosis using a simple commercial multiantigen assay. Chest, 123(1):107–112. 41. Khan IH, Ravindran R, Yee J, et al. (2008). Profiling antibodies to Mycobacterium tuberculosis by multiplex microbead suspension arrays for serodiagnosis of tuberculosis. Clin Vaccine Immunol, 15(3):433–438. 42. Verma RK, Jain A (2007). Antibodies to mycobacterial antigens for diagnosis of tuberculosis. FEMS Immunol Med Microbiol, 51(3):453–461. 43. Giraud G, Latour H, Levy A, Puech P, Roujon J (1955). [Cancer of pancreas with auto-antibodies in hemolytic anemia]. Montpellier Med, 48(3):465–467. 44. Bergqvist M, Brattström D, Larsson A, et al. (1998). P53 auto-antibodies in nonsmall cell lung cancer patients can predict increased life expectancy after radiotherapy. Anticancer Res, 18(3B):1999–2002. 45. Angelopoulou K, Diamandis EP (1997). Detection of the TP53 tumour suppressor gene product and p53 auto-antibodies in the ascites of women with ovarian cancer. Eur J Cancer, 33(1):115–121. 46. Regidor PA, Regidor M, Callies R, et al. (1996). Detection of p53 auto-antibodies in the sera of breast cancer patients with a new recurrence using an ELISA assay: Does a correlation with the recurrence free interval exist? Eur J Gynaecol Oncol, 17(3):192–199. 47. Green JA, Mudenda B, Jenkins J, et al. (1994). Serum p53 auto-antibodies: incidence in familial breast cancer. Eur J Cancer, 30A(5):580–584. 48. Yamamoto A, Shimizu E, Sumitomo K, et al. (1997). L-Myc overexpression and detection of auto-antibodies against L-Myc in both the serum and pleural effusion from a patient with non-small cell lung cancer. Intern Med, 36(10):724–727. 49. Zhong L, Ge K, Zu JC, et al. (2008). Autoantibodies as potential biomarkers for breast cancer. Breast Cancer Res, 10(3):R40. 50. Chapman CJ, Murray A, McElveen JE, et al. (2008). Autoantibodies in lung cancer: possibilities for early detection and subsequent cure. Thorax, 63(3):228–233. 51. Chapman C, Murray A, Chakrabarti J, et al. (2007). Autoantibodies in breast cancer: their use as an aid to early diagnosis. Ann Oncol, 18(5):868–873. 52. Ran Y, Hu H, Zhou Z, et al. (2008). Profiling tumor-associated autoantibodies for the detection of colon cancer. Clin Cancer Res, 14(9):2696–2700. 53. Zabarovskii ER, Allikmets RL (1989). [Construction of libraries of structural genes (cDNA) and “jumping” gene libraries in lambda vectors]. Mol Biol (Mosk), 23(5):1205–1220.
PART III CHARACTERIZATION AND VALIDATION
177
9 CHARACTERIZATION AND VALIDATION BIOMARKERS IN DRUG DEVELOPMENT: REGULATORY PERSPECTIVE Federico Goodsaid, Ph.D. Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland
INTRODUCTION Biomarkers can be considered tools to be qualified in close association with individual submissions for drug approvals. The U.S. Food and Drug Administration (FDA) Drug-Diagnostic Co-development Concept Paper [1] described a specific example of this association, where the approval of a drug and a test are closely linked with each other in both their product concepts and the timelines of their regulatory approvals. Several examples of biomarkers approved through this co-development process are shown in the FDA’s Table of Valid Genomic Biomarkers in the Context of Approved Drug Labels [2]. Biomarkers in this table include both genetic and translational entries. Genetic biomarkers are often integrated in drug development as clinical or nonclinical markers of drug efficacy or safety for the purpose of patient selection in clinical trials, response prediction through stratification or enrichment, or dose optimization. Translational biomarkers have applications similar to those of genetic biomarkers but may also be useful for response monitoring and as early indicators of toxicity or adverse reactions. Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
179
180
REGULATORY PERSPECTIVE
Novel approaches are being tested continuously for the successful integration of biomarkers in drug development. Their applications range from early compound selection through postmarketing applications. However, integration of these novel biomarkers into routine nonclinical and clinical practice, and regulatory submissions, have often been slow. Hesitation in the application of these tests is often associated with fear not only about how comprehensive data supporting these applications are but also about the regulatory interpretation of the context of use for these applications [3]. Biomarker tests can be integrated into drug development when we have a consensus about the context in which we are measuring the biomarker and the evidence supporting this measurement. These levels of consensus need to be reflected in the regulatory review of biomarker data.
REGULATORY PATHS IN BIOMARKER EVALUATION AND QUALIFICATION The path from an exploratory biomarker to a biomarker qualified for a specific application context can be long and unpredictable [4]. Application of these biomarkers requires an objective record for their nonclinical or clinical context and supporting qualification evidence. Information from the development of exploratory biomarkers has been shared between the pharmaceutical industry and the FDA through voluntary exploratory data submissions (VXDSs) [5]. Submissions of exploratory biomarker data have allowed reviewers at the FDA to share with scientists in the pharmaceutical industry study designs, sample isolation and storage protocols, technology platforms, analysis algorithms, biological pathway interpretation, and electronic data submission formats. This experience has been valuable in training our reviewers for the analysis and interpretation of biomarker data. VXDS have stressed the need for a regulatory path from exploratory biomarkers to biomarkers qualified for a specific context. Such a path is currently being tested at the FDA through a pilot process for biomarker qualification [6]. This process is focused on the specific needs of the regulatory environment to ensure scientifically accurate and clinically (or preclinically) useful decision making. In the first use of this new joint-agency review process put in place by U.S. and European drug regulators, the FDA and the European Medicines Association (EMEA) [7] allow drug companies to submit the results of seven new tests as evidence of nephrotoxicity by new drugs. At this time, the qualification of these biomarkers covers voluntary submission of these data for rat studies. On a case-by-case basis, the FDA will also consider possible application of these biomarkers in phase I human trials. The tests measure levels of seven key proteins or biomarkers that scientists from the FDA and EMEA believe provide important new safety information about the effect of drugs on the kidney. When reviewing investigational new drug (IND) applications, new drug applications (NDAs), or BLAs, both regulatory agencies will now
REGULATORY PATHS
181
consider the test results in addition to blood urea nitrogen (BUN) and creatinine. The development of the new renal toxicity biomarkers was led by the Predictive Safety Testing Consortium (PSTC) [8], whose members include scientists from 16 pharmaceutical companies. The PSTC was organized and led by the Critical Path Institute [9]. Researchers from Merck and Novartis identified the new biomarkers, tested them to prove their sensitivity, specificity, and positive and negative predictive value, and then shared their findings with the other consortium members for further study. The consortium then submitted applications for their qualification to FDA and EMEA. This is a unique example of how a group of drug companies can work together to propose and generate qualification data for new safety tests and then present them jointly to the FDA and EMEA for qualification. The FDA and EMEA laid the groundwork for such joint-agency reviews in 2004 with the development of the VXDS framework. The VXDS review served as the baseline model around which to design the pilot process for biomarker qualification in 2006. A similar biomarker qualification data submission (BQDS) meeting is held in this pilot process to allow an exchange of questions with the sponsor about scientific and clinical information submitted for qualification. The pilot process for biomarker qualification allowed the PSTC to submit a single application for biomarker qualification to both regulatory agencies, and then to meet jointly with scientists from both agencies to discuss it in detail and to address additional scientific questions posed by the regulators. Each regulatory agency reviewed the application separately and made independent decisions on whether each would allow the new biomarkers to be used. The new biomarkers qualified by FDA and EMEA are KIM-1, albumin, total protein, β2-microglobulin, cystatin C, clusterin, and trefoil factor-3. Testing for these proteins will help scientists assess whether a drug is likely to cause damage to the kidneys, a toxic side effect of some drugs. At this time, both FDA and EMEA require drug companies to submit the results of two other tests, BUN and serum creatinine, to show whether such kidney damage has occurred. The seven new tests may provide important advantages over these two tests. For example, in the rat model, once kidney damage has begun to occur, it takes a week before the two current tests can detect it [10]. The new tests are more sensitive and can reveal cellular damage within hours [11]. BUN and serum creatinine show that damage has occurred somewhere in the kidneys, but the new tests can also pinpoint which parts of the kidney have been affected [12]. Although additional studies are needed, the new biomarkers may one day allow promising drugs to advance into clinical trials that otherwise would have been abandoned, because currently there are no tests available to detect early-onset renal injury. The seven new tests were developed and will be carried out initially in rats, but they were selected because other
182
REGULATORY PERSPECTIVE
studies have shown that similar biomarkers are produced in human kidney cells [13]. Although initially the FDA and EMEA will consider only data from rat studies, the PSTC will begin work to qualify these biomarkers in human studies. If these studies are successful, the PSTC will present a new application seeking acceptance of the human biomarkers over the next two years. The need for an accurate, comprehensive, and efficient process for biomarker qualification is closely linked with our ability to quickly integrate new biomarkers in drug development and regulatory review. The biomarker qualification pilot process at the FDA is testing the scientific, clinical, and regulatory components for a biomarker qualification process. Experience gained with this pilot process will be useful in the development of a formal regulatory process for biomarker qualification.
EVIDENTIARY RECOMMENDATIONS The most difficult part of this process will be to define incremental contexts of use and the corresponding evidence with which biomarkers may be qualified. We should not misrepresent the industry goal: qualified biomarkers capable of managing new drugs in the clinic with nonclinical or clinical findings of nephrotoxicity somewhere in their previous developmental stages. We also cannot misrepresent the goal as far as public health is concerned, which is to obtain better biomarkers of nephrotoxicity for routine clinical use as quickly as the data will allow. Intermediate qualification contexts and data need to be defined so that investment in biomarker qualification studies will be productive both for clinics and for the pharmaceutical industry. Initial studies proposed by consortia are unlikely to match a clear context for qualification for a full clinical application of biomarkers. What intermediate contexts for qualification can we define, and what study characteristics can we propose for qualification in these intermediate contexts of use? Several authors [14,15] have proposed evidentiary recommendations for biomarker qualification. Unlike the incremental process for biomarker qualification embodied in the pilot process for biomarker qualification at the FDA [7], papers on evidentiary recommendations often propose all-or-nothing qualification contexts, where if the ultimate goal is a clinical qualification, no intermediate qualification contexts are expected to be defined or qualified. This approach is not only time consuming but is also not likely to encourage the investment needed to generate data for biomarker qualification. At each stage, whether the context of use for a biomarker is to be in vitro, in a nonclinical animal model, or in the clinic, a company or consortium proposing a qualification will probably seek a quick return on the qualification of a biomarker after data are available to qualify a biomarker in a specific context in drug development. An effective process for biomarker qualification should include incremental application
HARMONIZATION
183
context steps, so that these incremental steps can quickly benefit the drug development process.
HARMONIZATION The application of biomarkers to drug discovery and development has the potential of improving the efficacy and speed of bringing more effective and safer new drugs to market. This requires that biomarkers which may be applicable to such uses be qualified for a specific application context. To achieve this, both the process of qualification and the evidentiary criteria and standards for qualification will need to be described and defined. The International Committee on Harmonization’s (ICH) E16 [16] is a harmonization effort to define the context, structure, and format of the biomarker qualification submission. It is based on the previous experience by the FDA and EMEA regarding biomarker qualification. This harmonization effort does not address the evidentiary requirements for biomarker qualification. The structure, format, and content of a submission of biomarker data for qualification depend on the context in which the biomarker is intended to be used. The first step in drafting a submission for qualification of a biomarker is to determine its context of use, preceding specific decisions on applicable structure and format. The context of use for a biomarker is (1) the general area of biomarker application, (2) the specific applications/implementations, and (3) the critical factors that define where a biomarker is to be used and how the information from measurement of this biomarker is to be integrated in drug development and regulatory review. To demonstrate the alignment between proposed context and data, the initial context proposal must be supported by data available at the initial application step or expected to be available throughout the data evaluation process in biomarker qualification. There is a convergent relationship between an initial qualification context and the data supporting it. The initial gap between proposed context and data may need to be filled throughout the qualification process. Initial context proposals, however, should project a significant improvement over currently available biomarkers and/or endpoints. The context of a biomarker drives data requirements to demonstrate its qualification for the intended application. The structure of a submission document ensures that the context and data can be submitted in a package consistent for consortia submitting qualifications as well as for reviewers in regulatory agencies evaluating a qualification package. The structure of a qualification submission is independent of the context of this submission, but must also be flexible enough to deal with the specific requirements of each context. On the other hand, the format of data required to qualify a biomarker may vary significantly with the context in which it is to be used. It is therefore only possible
184
REGULATORY PERSPECTIVE
to harmonize general regulatory guidelines on data format for biomarker qualification submissions.
SUMMARY The qualification of novel biomarkers for drug development and regulatory review is made possible by the development and testing of regulatory mechanisms for biomarker qualification. The urgency for getting more and better biomarkers to improve new drug development is clear to the pharmaceutical industry and regulatory agencies. Harmonized processes for biomarker qualification are being actively developed through ICHs harmonization procedures.
REFERENCES 1. FDA. Drug-Diagnostic Co-development Concept Paper. http://www.fda.gov/cder/ genomics/pharmacoconceptfn.pdf (accessed Oct. 19, 2008). 2. FDA. Table of valid genomic biomarkers in the context of approved drug labels. http://www.fda.gov/cder/genomics/genomic_biomarkers_table.htm (accessed Oct. 19, 2008). 3. Wagner JA (2008). Strategic approach to fit-for-purpose biomarkers in drug development. Annu Rev Pharmacol Toxicol, 48:631–651. 4. Rifai N, Gillette MA, Carr SA (2006). Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat Biotechnol, 24(8):971–983. 5. Goodsaid F, Frueh FW (2007). Implementing the U.S. FDA guidance on pharmacogenomic data submissions. Environ Mol Mutagen, 48(5):354–358. 6. Goodsaid F, Frueh F (2006). Process map proposal for the validation of genomic biomarkers. Pharmacogenomics, 7(5):773–782. 7. Goodsaid F, Frueh F (2007). Biomarker qualification pilot process at the US Food and Drug Administration. AAPS J, 9(1):E105–E108. 8. Goodsaid FM, Frueh FW, Mattes W (2008). Strategic paths for biomarker qualification. Toxicology, 245(3):219–223. 9. Anon. (2008). Public consortium efforts in toxicogenomics. Methods Mol Biol, 460:221–238. 10. Duarte CG, Preuss HG (1993). Assessment of renal function-glomerular and tubular. Clin Lab Med, 13:33–52. 11. Vaidya VS, Ramirez V, Ichimura T, Bobadilla NA, Bonventre JV (2006). Urinary kidney injury molecule: 1. A sensitive quantitative biomarker for early detection of kidney tubular injury. Am J Physiol Renal Physiol, 290(2):F517–F529. 12. Zhang J, Brown RP, Shaw M, et al. (2008). Immunolocalization of Kim-1, RPA-1, and RPA-2 in kidney of gentamicin-, mercury-, or chromium-treated rats: relationship to renal distributions of iNOS and nitrotyrosine. Toxicol Pathol, 36(3):397–409.
REFERENCES
185
13. Dieterle F, Maurer E, Suzuki E, Grenet O, Cordier A, Vonderscher J (2008). Monitoring kidney safety in drug development: emerging technologies and their implications. Curr Opin Drug Discov Dev, 11(1):60–71. 14. Altar CA, Amakye D, Bounos D, et al. (2008). A prototypical process for creating evidentiary standards for biomarkers and diagnostics. Clin Pharmacol Ther, 83(2): 368–371. 15. Wagner JA, Williams SA, Webster CJ (2007). Biomarkers and surrogate end points for fit-for-purpose development and regulatory evaluation of new drugs. Clin Pharmacol Ther, 81(1):104–107. 16. International Committee on Harmonization (ICH) E16. Genomic biomarkers related to drug response: context, structure and format of qualification submissions. http://www.ich.org/cache/html/4773-616-1.html. (accessed Oct. 19, 2008).
10 FIT-FOR-PURPOSE METHOD VALIDATION AND ASSAYS FOR BIOMARKER CHARACTERIZATION TO SUPPORT DRUG DEVELOPMENT Jean W. Lee, Ph.D., Yuling Wu, Ph.D., and Jin Wang, M.S. Amgen, Inc., Thousand Oaks, California
INTRODUCTION Novel biomarkers have been discovered from recent research in genomic and proteomic studies. Differences in gene expression and protein abundance or modification provide the basis for novel biomarkers to be tested further with in vitro and preclinical disease models. Interactions of putative biomarkers at interwoven pathways and the intervening drug actions are relevant to polygenic disorders associated with the complex interplay between genetics, epigenetics, and the environment. Activities of novel biomarker development have increased, especially in diabetes, cancer, rheumatoid arthritis, and cardiovascular disease. In addition to clinical biomarkers proven previously, putative biomarkers have been included in clinical trials of new, mechanisticbased drug candidates for exploratory, demonstrative, or characterization applications in the process of developing mechanism-specific biomarkers toward the ultimate goal of surrogacy (Pepe et al., 2001; Wagner, 2002a; Bjornsson, 2005; Wagner et al., 2007). For example, to the commonly employed measurement of blood glucose, hemoglobin A1c, and circulating insulin, novel Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
187
188
FIT-FOR-PURPOSE METHOD VALIDATION AND ASSAYS
biomarkers could be included in diabetes trials, depending on the mechanism of action. For peroxisome proliferator–activated receptor gamma (PPARγ) agonists in type 2 diabetes, free fatty acid was linked to insulin resistance as mechanism-specific, and adiponectin was identified as a proximal biomarker correlated with insulin sensitivity (Berger and Wagner, 2002; Wagner 2002b), while glycosylated hemoglobin (Hba1c) was qualified as a surrogate marker (Krishnamurti and Steffes, 2001). Other mechanisms and the relevant biomarkers (in parentheses), such as lipid and bone metabolism (free fatty acids, leptin, osteocalcin), incretins (glucagon, GLP-1), and inflammation (cytokines, hsCRP, PAI -1, fibrinogen, adhesion molecules) are also involved in the complex disease of diabetes (Lee and Pratley, 2005; Chu et al., 2006; van Doorn et al., 2006; Haffner, 2007; Liu et al., 2007). In addition to circulating prostate-specific antigen (PSA), carcinoembryonic antigen (CEA), and other tumor markers, the mRNA of specific genes, various growth factors, and imaging are also used in clinical trials to monitor drug responses for cancer treatments (Sweep et al., 2003; Rubin, 2004; Quraishi et al., 2007; Yu and Veenstra, 2007; Khatami, 2007). The use of novel biomarkers has become a prominent component of decision-making processes in drug development. They are used in the in vitro and preclinical models and early clinical phase for quick-hit and early-attrition decisions (Bloom and Dean, 2003; Bjornsson, 2005; Lee et al., 2005; FDA CDER, 2006). Characterization of a novel biomarker in the translational phase requires data collection to show preclinical sensitivity and specificity, and linkage to clinical outcomes in multiple clinical studies in humans (Wagner et al., 2007). The purposes at this phase are different from the exploratory or demonstrative phase. In exploratory and demonstrative studies, pharmacodynamic (PD) correlations are typically unknown, data are used mainly for internal decision making, and the output will generally not be subject to regulatory review. The extent of method validation can thus be limited to a few basic components to expedite the process and preserve resources without unduly affecting commercialization (Lee et al., 2006). In contrast, the purposes of characterization application are to provide pivotal data to establish linkage to clinical outcome and to monitor patient progress upon treatment (Figure 1). The data are often used for critical decisions (such as supporting dose selection and patient stratification, and demonstration of drug safety or efficacy), submitted to review by regulatory agencies, or used for postmarketing patient monitoring (Kummar et al., 2007; Stoch and Wagner, 2007). In addition, the same method used for characterization is likely to be used in the qualification phase toward surrogacy, confirmed over multiple drugs of similar mechanism, and used during surveillance studies. Therefore, biomarker characterization studies would require more intense rigor in assay method validation and in traceable and more detailed documentation than that of previous phases to meet the study objectives in a defined context of use. A biomarker is useful if there are distinctions in concentrations in the disease state from that of a healthy status (or disease stabilization, in the case
GENERAL PROCESSES
Biomarker Development
Discovery
Demonstration
Characterization
Qualification
Surrogacy
189
Discover putative marker(s) of new mechanism. Quantitative and quantitative methods are used. Select biomarker with “druggable” mechanism. Develop quantitative bioanalytical method. Demonstrate proof of mechanism with exploratory studies of single or multiple drugs Validate quantitative bioanalytical method and apply to multiple confirmatory studies of single or multiple drugs. Translation from preclinical to clinical phase: • Dose selection • PK/PD modeling • Patient stratification • Efficacy or safety linkage to clinical outcome • Patient monitoring of treatment progress The same bioanalytical method is applied by multiple laboratories over drugs of similar mechanism through many late-phase and postapproval surveillance studies confirming linkage to clinical outcome.
Figure 1 Path from putative to confirmed mechanism of a novel biomarker through the processes from discovery to surrogacy. The side boxes illustrate the relationship of the processes to drug development applications and the methods required.
of some oncology trials). The assay performance of the biomarker analytical method must be able to meet this goal of disease status distinction and estimation of the drug effect. Therefore, contrary to the general perception of many analytical laboratories, the required assay acceptance criteria do not depend solely on the deliverable method’s accuracy and precision. Instead, consideration should be on the assay suitability for the intended applications based on three major factors: (1) intended use of the data during various stages of drug development, (2) nature of the assay methodology and the different types of data that it provides, and (3) biological variability of the biomarker that exists within and between populations. The first factor helps shape the assay tolerance or acceptance criteria for biomarkers. A fit-for-purpose method validation approach at various phases of biomarker application has been described by Lee and colleagues (2006). In this chapter we focus on how the fit-for-purpose method validation and assay application during drug development clinical trials can contribute to biomarker characterization.
GENERAL PROCESSES Quantification of protein expression is important when considering the translation of a candidate protein biomarker. If there are changes in posttransla-
190
FIT-FOR-PURPOSE METHOD VALIDATION AND ASSAYS
tional modification of the biomarker, it would be preferable to be able to quantify levels of both protein expression and posttranslational modification, utilizing technologies such as ligand-binding assays (LBA), enzymatic assays, liquid chromatography–mass spectrometry (LC-MS) in multiple-reaction mode (MRM-LC-MS), protein assay, and real-time polymerase chain reaction (RT-PCR) genomics. LBAs can be performed in high throughput at a relatively low cost per samples, which is most advantageous for application through phases of characterization and qualification development of a novel biomarker. Therefore, LBAs have been widely used for protein biomarker quantification and will be the typical method for discussion in this chapter. The developmental processes of a novel biomarker and a new chemical drug entity are intertwined (Lee et al., 2003, 2007). Biomarker development may happen concurrently with the development of a single or multiple drug candidates, a refined new chemical or biological entity, or for extended indications and/or additional mechanisms. The progress of a potential biomarker does not always coincide with or parallel that of a new drug candidate development. To utilize and integrate information from development of drugs in therapeutic programs and the development of relevant putative biomarkers, a detailed work plan can be prepared for the novel biomarker and the study objectives identified in the corresponding study protocols of each drug candidate. Innovative companies often organize biomarker work groups to facilitate timely input and communication among therapeutic areas and supporting teams. As depicted in Figure 2, novel target (BMKa) and/or proximal biomarkers (BMKb) of a specific mechanism can be used for multiple drug candidates to prove the mechanism of action by drug intervention during the exploratory and demonstration phases. If the circulatory biomarkers are reflecting their actions in the target cells/organs, measurements of blood levels of the target and/or proximal biomarkers can be included during clinical trials for further characterization purposes. In addition, distal biomarkers (BMKi,j) of a specific disease indication that have been used for other drug compounds can be applied for a new drug of a different mechanism toward the same indication. In general, distal biomarkers are closer to the disease symptom magnification for efficacy and off-target indirect effects, while the target and proximal biomarkers are closer to the direct exposure effect. Thus, the establishment of PD relationships to pharmacokinetics (PK) is usually simpler for the target or proximal biomarkers, while linkage to a clinical outcome may be shown more readily by distal biomarkers. PK-PD modeling is used to correlate the dose– exposure effects. PK-PD modeling based on various mechanisms of action, direct and indirect drug effects, and reactions of biological cascades has been developed (Mager et al., 2003). For proof of biology, the biochemical coverage of biomarkers from target hit to distal to clinical outcome would provide a thorough understanding of drug effect and characterization of a novel biomarker, whether it is a target, proximal, or distal. Additionally, off-target negative effects should be monitored carefully by toxicity biomarkers (BMKt). Such information is vital in the selection of lead candidates and appropriate dose ranges for clinical trials (FDA, 2004).
METHOD VALIDATION AND ASSAY APPLICATION Specific target pathway BMKa’
On-target biomarker
Biomarkers from other pathways Drug Exposure
Proximal biomarker
BMKn, m Downstream Distal biomarkers
BMKa
BMKb
191
BMKi, j
BMKt Toxicity biomarkers
Direct effect
Disease outcome
Indirect, downstream effect
Figure 2 On- and off-target biomarkers of a specific mechanism of drug intervention. Possible relationship of biomarkers from a specific pathway to disease outcome. Other pathways may have common distal biomarkers that may be used as predictive index of the same disease outcome.
The application of a novel biomarker in clinical trials for the purpose of biomarker characterization is often included as a secondary objective in a clinical protocol, while toxicity and commonly recognized efficacy biomarkers may be included in the primary objectives. Characterization of a specific novel biomarker may be carried out on different new chemical entity development programs over several therapeutic areas. Therefore, it is important for the assay laboratory to have both a “big picture” perspective on the development of a certain novel biomarker and the specific purpose of the application of that biomarker in a clinical trial. The time required for biomarker assay development and method validation, the operational and logistical issues, including preanalytical factors, and the limitations in data interpretation are key elements for careful planning and execution to support biomarker development and its applications.
METHOD VALIDATION AND ASSAY APPLICATION IN DRUG DEVELOPMENT Population Range and Assay Range In the process of biomarker assay development and validation, samples from the target populations of healthy persons and patients are used to establish expected ranges and biological variations for a given biomarker. Knowledge
192
FIT-FOR-PURPOSE METHOD VALIDATION AND ASSAYS
of these ranges and assay performance in these matrices are necessary for a number of reasons. The initial concentration ranges of healthy populations are usually available from literature or vendor brochures for previously identified biomarkers. During the characterization phase the range-finding experiments should include sufficient samples from individuals of the target populations (and extended populations if applicable) to enrich the existing information. Biological variation exists in some biomarkers. It is important to note the variable levels over time (i.e., diurnal, seasonal, and food effects) to plan the correct conditions and time of sample collection appropriately. In addition, biological variation can come from the changes in the bindingprotein components in the biological matrices, affecting the biomarker measurement in some patient samples. The effect of matrix from patient samples is discussed further later in the chapter. The results of the range-finding studies and the expected drug effect on the biomarker are used in the design of method validation and assay application in clinical studies. Knowledge of variation and modulation will power the sample size necessary to confidently detect relevant changes in a biomarker after therapeutic intervention. It is desirable to have the biomarker assay range cover the range of the target populations and the expected drug effect. However, this is often not possible. For example, most protein biomarkers are analyzed by LBA, with the assay working range dependent on the antibody reagents used, which is often different from the biological range. Additionally, the biomarker levels of one disease are often different from that of another. For example, inflammatory biomarkers such as IL-6 and TNFα in serum samples for healthy and rheumatoid arthritis patients are in pg/mL range, whereas those of sepsis patients are in the ng/mL range. Use of the inflammatory biomarkers for sepsis will require dilutions into the working range. Thus, application of these biomarkers to sepsis studies would require dilutional linearity during method validation, and the variability introduced by dilution should be monitored during in-study assays. Sample Integrity: Sample Collection at the Clinic Sample integrity means that the measurement of a biomarker after collection, transport, and storage should yield a result as close to that which could be measured in as fresh a sample as possible. Sample characteristics can vary over time, in the fed state, or by age- and gender-specific factors, and by the methods used to prepare the sample. Sample handling can alter a sample irrevocably, so care must be used in specimen collection. Delays in specimen processing, addition of anticoagulants and other additives, type of collection vessel and storage tube, and a myriad of other preanalytical circumstances can affect a biomarker. A careful review of potential confounding variables is essential prior to method validation. Preanalytical variables have been shown to compromise data utility in proteomic biomarker discovery and validation (Ferguson et al., 2007; Banks, 2008).
METHOD VALIDATION AND ASSAY APPLICATION
193
Serum, plasma, or urine samples are the most common matrices during the characterization phase. For diseases involving the brain or central nervous system, cerebral spinal fluids are commonly used. Accessible cells from sputum, bronchial lavage, and blood cells are used for receptor functional measurements. The preservation of samples must be explicitly defined in the study protocol. Control and standardization of sample collection are especially necessary to minimize variability from multiple clinical sites. For both tissue collection and ex vivo stimulation assays, it is best to validate sample collection procedures using conditions that closely mimic the planned sampling protocol and to examine sample stability as soon as possible. If sample handling is complicated, it would be prudent to establish tolerance boundaries surrounding sample collection and processing for the clinical sites to abide and document. In addition, staff training and oversight procedures are essential for the monitoring and maintenance of sample integrity. In some biomarker studies, normalization is used to correct concentration variability. For example, urine creatinine concentration is used to normalize hydration variability in urine samples. In that case, the assay of the normalizing factor should also be validated on stability and reliability prior to the clinical study. During biomarker characterization, multiple clinical studies from multiple drug candidate development programs may be involved. It is important for the biomarker team to coordinate the study protocols to include standardized procedures for sample collection, shipping, and analysis and for sample storage and disposal. Sample collection and storage stability should be tested by the bioanalytical laboratory to cover the conceivable conditions during in-life study sample analysis. This includes process stability, assessing the impact of short- and long-term storage, and multiple freeze–thaw cycles. Method Development What Will Be Measured Many biomarkers of interest are proteins whose function is affected by a change in disease progression and drug intervention. The ultimate goal of a biomarker assay is typically to evaluate the effect on the biomarker activity in vivo, and a method that measures this activity accurately is most desirable. Direct measurement of changes to target cells, tissues, or organs in response to a drug is rarely feasible, and given current drug potencies, activity assays rarely have adequate sensitivity and precision. Moreover, cell-based and activity assays are usually laborious, low in throughput, and often lack definitive reference standards. LBA are often chosen for biomarker analysis over activity measurements because of their sensitivity, versatility in application, relatively high throughput, and low cost. Many biomarkers are endogenous protein that can be physicochemically heterogeneous, with multiple forms present simultaneously. The concentration and biological activity of each component are often unknown and may vary with health status, over time, and between individuals. However, it is
194
FIT-FOR-PURPOSE METHOD VALIDATION AND ASSAYS
desirable to know, even in the case of a well-defined regulatory peptide, the related species of the analyte existing in the biological matrix of measurement. For example, measurements of proinsulin, insulin, and C-peptide provide different information for clinical interpretation and assessment. For large molecules, the heterogeneity of the molecules poses challenges to identification of the exact species being measured. Therefore, various approaches in designing the analytical method, including choices of the reference standard and critical reagents, are worth considering. LBA methods measure the binding reaction of the mixed components in the matrix. The majority of the data generated by LBA is relative quantification (Lee et al., 2006). Limitations of LBA methods are usually a lack of chemical information or correlation of the binding activity to the biological activity. In addition, the reference standard used for data regression may be a recombinant protein or one of several components which may or may not be representative of the entire endogenous mixture. However, as the purpose during biomarker characterization is to monitor population and longitudinal effects of drug treatment and disease progression, this type of relative quantification is adequately “fit for use.” Reference Standard and Alternative Matrix for Standard Preparation The reference material may not be fully representative of the endogenous analyte because the endogenous analyte may exist in various forms in the biological matrix. A well-defined reference standard is used as a scalar for the relative measurement of the endogenous species. For example, the recombinant form of a biomarker can be produced and characterized with defined molecular size and purity to enable concentration and molar equivalence calculations. Documentation of the reference material is necessary for its use. Most biomarkers are endogenous molecules with variable measurable amounts in the matrix. It is difficult to find “blank” control matrix for preparation of reference standards. An initial standard curve can be constructed by spiking the reference standard into a protein buffer solution. An initial screen on multiple matrix lots against the buffer standard curve may identify some blank matrix lots for standard preparation. When that is not possible, an alternative matrix can be prepared by depleting the endogenous analyte using methods such as charcoal stripping, high-temperature incubation, acid or alkaline hydrolysis, or affinity chromatography. Commonly, a protein-containing buffer or the matrix from another species with a non-cross-reactive homolog is used. The use of alternative matrices require studies of matrix effects and parallelism during method development and validation to understand the impact that differences from patient samples will have on assay results. Parallelism experiments are performed through serial dilutions of a highconcentration sample with the standard matrix. Multiple individual matrix lots (three or more) should be tested to demonstrate lot-to-lot consistency. There are several ways to assess data of parallelism (Lee et al., 2006). The concentra-
Ratio of Conc./Mean Conc.
METHOD VALIDATION AND ASSAY APPLICATION
195
1.4 1.2 1 0.8 0.6 0.00
0.20
0.40
0.60
0.80
1.00
1.20
1/Dilution factor
Figure 3 Parallelism of three individual matrix lots. Each lot is represented by a separate symbol. The sample was diluted with the standard diluent at dilution factors of 1.5, 2.5, and 5. The concentrations of the diluted and undiluted samples observed were all within the standard curve assay range. The mean of the calculated concentrations (observed concentration times the dilution factor) of each lot was determined. The ratios of the individual calculated concentration over the mean of the lot were plotted against 1/dilution factor. The data showed that the ratios were all within the range 0.8 to 1.2, which was the defined acceptance criterion of the assay. Therefore, the parallelism test results were acceptable for the test lots.
tions of diluted and undiluted samples observed should all be within the standard curve range. An example is depicted in Figure 3: The mean of the concentrations calculated (concentration observed times the dilution factor) of each lot was determined. The ratios of the individual concentrations calculated over the mean of the lot were plot against 1/dilution factor. Parallelism was demonstrated for the test lots because the ratio was not affected by the variable amounts of standard matrix introduced by dilution. When parallelism is not possible due to the unavailability of samples with sufficiently high concentrations of analyte, dilutional linearity should be tested in a manner similar to parallelism, except that high-concentration spike samples are used in place of the endogenous samples. The failure to demonstrate parallelism should be taken into consideration in data assessment and interpretation. For example, it may mean that the values obtained should be treated as quasiquantitative instead of as relative quantitative measurements (Lee et al., 2006). In addition, within-subject comparison on a longitudinal time sequence would be more useful than between subjects or populations, which affects the clinical study design. All of these considerations would be dependent on the a priori goals of the study. Selection of Critical Reagents in Ligand-Binding Assays For the development of a first-in-class drug compound that targets a novel biomarker, assay reagents will need to be developed for both the drug candidate and the target protein biomarker. In the case of a macromolecular drug, often an analog of the drug candidate may be used as the binding reagent(s). On the other hand,
196
FIT-FOR-PURPOSE METHOD VALIDATION AND ASSAYS
reagents for most off-target biomarkers may be available from commercial sources. These reagents may be established assay kits [U.S. Food and Drug Administration (FDA) approved for diagnostic use] or kits for research use only. For the purpose of characterization of the biomarker of interest, fit-forpurpose method validation should be conducted with sufficient rigor to provide reliable data from multiple clinical studies. This should include assay range finding, accuracy and precision, selectivity, specificity, stability, and robustness (reproducibility) during prestudy method validation. Additional data will be collected from in-study assay performance, change control method validation, and long-term storage stability. Except in the case of FDA-approved diagnostic kits, many of the assay performance parameters will have to be established by the bioanalytical laboratory. For FDA-approved kits, because the application will be for drug development instead of diagnosis, some of the assay performance parameters will need to be established. These often include selectivity against the target patient sample matrices and specificity against the drug compound(s). If the drug is expected to decrease the biomarker to a level lower than the lowest standard of the kit, the method may need to be modified to increase assay sensitivity. Antibody pairs are typically chosen as capture and detection reagents. In general, the more selective antibody would be chosen as the capturing agent, especially if it is more readily available than the other member of the pair. A tertiary detector antibody can be used that is conjugated to a reporter enzyme such as horseradish peroxidase. Alternatively, a biotinylated detector antibody can be used together with a biotin-binding protein (e.g., antibiotin antibody or an avidin-type protein) conjugated to a reporter enzyme. The sensitivity of an assay can be increased by varying the number of reporter enzyme molecules on the detection reagents, or by using multivalent strategies to increase the effective signal from each analyte captured. Some assays use receptors or their fragments as binding partners, most often in concert with a specific second antibody. This arrangement can improve selectivity for specific ligands (e.g., a cytokine activated from a latent precursor, or a particular subtype of ligand with distinct binding characteristics from its homologs). The binding selectivity of such reagents can offer added biological relevance to quantification, and can suggest results that might otherwise be obtained only via functional assays. Critical Reagents Characterization, Consistency, Stability, and Documentation Characterization of a novel biomarker often involves many clinical studies that last for several years. If the studies are to be conducted within the same company or from a joint program from different institutions, plans should be made to provide consistent supplies of the same reference material and assay reagents throughout the studies. If the materials are from in-house sources, sufficient production of the reference standard and primary reagents should be assured. In addition, multiple preparations of labeled
METHOD VALIDATION AND ASSAY APPLICATION
197
detector reagents should be assessed during method validation. If the materials are from commercial sources, negotiation with the vendor should take place to assure a consistent and sufficient supply of the same batch material, if possible. Often, multiple batches of materials (reference standards and critical reagents) should be tested during method validation. Documentation should be obtained for the stability of the respective lots over the time span of a clinical study. Documentation of chain of custody should be similar to that of a good laboratory practices (GLP) study. For late-phase studies using multiple clinical sites, the use of a central sample repository offers numerous advantages in controlling the process of specimen collection and labeling of multiple biomarker assays. In-study sample control (SC) charts such as those shown in Figure 4 can be used for trend analysis of assay performance and stability. A big pool of endogenous sample at low, middle, and high concentrations of a biomarker were prepared during method validation and monitored throughout in-study.
1:2S 1:2S
LQC
3.5 3.0
4.0
UCL = 3.817
3.5 Avg = 2.814
2.5
LQC
4.0
UCL = 3.052
3.0
Avg = 2.635
2.5
2.0
LCL = 1.811
LCL = 2.217
3 6 9 12 15 18 21 24 27 30 33 36 39
3 6 9 12 15 18 21 24 27 30 33 36 39
2.0 12
MQC
8 Avg = 8.28
9
11
UCL = 11.23
10
MQC
1:2S 1:2S
11
10 8
7
Avg = 7.74
7
6
LCL = 6.67
LCL = 5.33
3 6 9 12 15 18 21 24 27 30 33 36 39
3 6 9 12 15 18 21 24 27 30 33 36 39
5 24 22
1:2S
UCL = 21.46
18 Avg = 16.63
16 14 12
LCL =11.80
22 21 20 19 18 17 16 15 14 13
UCL = 18.16 Avg = 15.80
3 6 9 12 15 18 21 24 27 30 33 36 39
LCL = 13.45
3 6 9 12 15 18 21 24 27 30 33 36 39
10
HQC
20 HQC
UCL = 8.81
9
Run Number
Run Number
(A)
(B)
Figure 4 Use of sample controls for trend analysis on variability. Low, middle, and high sample controls of a biomarker were monitored in Levey Jennings control charts. There was a noticeable shift in trend in all the sample control levels after run 27. The average, upper and lower control limits of analytical runs up to run 27 (B) were compared to the overall parameters (A). (See insert for color reproduction of the figure.)
198
FIT-FOR-PURPOSE METHOD VALIDATION AND ASSAYS
The Levey Jennings control charts showed a noticeable shift in trend at all the SC levels after run 27, indicating a systemic bias. The average, upper, and lower control limits of analytical runs up to 27 (right panels) were different from the overall values (left panels). This shift was traced to a change in commercial kit lot. Method Validation on Performance Parameters The foremost assay performance parameters to be established for quantitative methods are precision and accuracy. In addition to these two parameters, sensitivity, selectivity, specificity, stability, and reproducibility should also be demonstrated. These parameters are established during prestudy method validation to demonstrate reliably that the method is fit for the purpose of biomarker characterization over multiple clinical studies. If more than one bioanalytical laboratory is used, each additional laboratory should also demonstrate its ability to perform by conducting a full or partial method validation. Accuracy and Precision Method performance is consistently better understood and validated through the appropriate use of control samples. To ensure data quality, assay performance is evaluated during method validation with validation samples (VSs) and monitored during sample analysis with quality control samples (QCs). VSs are used in method qualification or validation to define intra- and interrun accuracy and precision, providing data to demonstrate the robustness and suitability of the assay for its intended application. On the other hand, QCs are essential in determining run acceptability during specimen analysis. However, many laboratories have not made the distinction, using the term QC for both VS (for prestudy method validation) and QC (for in-study run acceptance). For a well-defined biomarker assay, at least six accuracy-and-precision validation runs should be performed to provide statistical data to calculate these assay parameters (DeSilva et al., 2003; Lee et al., 2008). Each run should consist of standards prepared by spiking the reference standard into blank biological matrix or an alternative blank matrix. QC/VS at low, middle, and high concentrations should be prepared in the biological matrix or alternative matrix by spiking the reference standard. In addition, sample controls (SCs) should be prepared by pooling authentic samples at the low and high levels of the biomarker to reflect the performance of the endogenous biomarker in assay precision, stability, and reproducibility. At least two sets of QC/VS and SC should be run with the standards in each accuracy and precision experiment. Accuracy and precision can be evaluated from the total error of VS data from the validation runs in a way similar to that of the macromolecular protein drug as analyte (DeSilva et al., 2003). However, given biological variability and other factors in biomarker research, more lenient acceptance criteria may
METHOD VALIDATION AND ASSAY APPLICATION
199
be used for biomarker PD than that for PK studies. Still, it should be recognized that accuracy and precision data of VS in buffer provide only a relative quantification, which may be quite different from measurements in the authentic matrix. Concentrations of the endogenous SCs will be determined by multiple validation runs. The true values are then defined after sufficient data collection from pre- and in-study validation. For example, the mean of 30 runs and 2 standard deviations can be used to define the target concentration and acceptable range of the SCs. Because the reference material may not fully represent the endogenous biomarkers, the SCs should be used for stability tests (such as minimum tests of exposure to ambient temperature and freeze–thaw cycle). In addition, the SC can be used for in-study, long-term storage stability analysis and for assessment of lot-to-lot variability of key assay reagents. Regression models are essential for data calculation from sigmoid curves like those in LBAs. The most commonly used four- or five-parameter logistic regression should be evaluated in conjunction with weighting factors during method development. Final decisions on which curve-fitting model to use should rest on which offers the best fit for all the standards in the precision profile. In some cases, a less than optimal fit will suffice to allow for greater assay sensitivity. Sensitivity To provide data for PK/PD studies, assay sensitivity is usually defined by the assay lower limit of quantification (LLOQ), which is the lowest concentration that has been demonstrated to be measurable with acceptable levels of bias and precision and total error (the sum of bias and precision). However, low-concentration clinical samples may fall below the LLOQ (i.e., the method lacks the required sensitivity). In some instances, the investigator may want to use values below the LLOQ but above the limit of detection (LOD), to obtain a numerical estimate of the changes while recognizing the high variability below the LLOQ region. LOD is often used as the analytical “sensitivity” of the assay in a commercial diagnostic kit. A common practice of diagnostic kits to determine the LOD is the use of extrapolated concentrations from a response signal of +3 SD (or −3 SD for a competitive assay) of the mean background signal from 30 or more blank samples. The National Committee for Clinical Laboratory Standards (CLSI) has recommended an approach for determining LOD (National Committee for Clinical Laboratory Standards, 1999; Tholen et al., 2004). This statistically sound approach evaluates the limit of blank (LOB) as type I error and LOD as type II error from sufficient numbers of blank and low concentration samples with normally distributed analyte concentrations. One approach of calculation is described briefly as follows: LOB is estimated from the blank result at the 5th percentile position. LOD is calculated as LOD = LOB + cβ SDs
200
FIT-FOR-PURPOSE METHOD VALIDATION AND ASSAYS
SDs is the estimated standard deviation of the low-concentration sample measurement, and cβ is a correction factor for the population bias: cβ =
1.645 1 − 1 [ 4 ( N − K )]
Here N is the number of determinations and K is the number of sample sets. The LOD determined by this approach would avoid the inclusion of analytical false negative and false positive values. Due to assay conditions being variable, each bioanalytical laboratory should determine the LOD using the CLSI approach instead of using the LOD stated in a kit brochure. If the LOD is used as the lower limit for data inclusion instead of the LLOQ, the user should justify such a choice and be aware of the risk being taken with the higher variability at the LOD to LLOQ range, interpreting the data with caution. Selectivity and Matrix Effect Selectivity is the ability of the assay to discriminate the analyte unequivocally in the presence of components that may be expected to be present in the sample and alter assay results. Lack of selectivity can result in signal inhibition or enhancement. The results can appear as false or negative positives (an over or underestimation of the analyte concentration). In general, signal suppression from binding proteins occurs more often for LBAs than that of enhancement, resulting in a negative bias. Interference in an assay is the product of concentration of the interfering substance times the cross-reactivity (or binding inhibition). However, the concentration–response relationship of LBA is nonlinear, and often the magnitude of cross-reactivity (or inhibition) is not evenly dispersed over the entire assay range. Thus, a standard curve prepared in a protein buffer solution has higher optical density (OD) responses than that in biological matrices, while the differences in responses are variable over the standard curve. Moreover, a standard curve prepared in biological matrices from one person would have different responses than that of other people. Therefore, the estimate of interference should not simply extrapolate across the entire assay range or be represented by an IC50 value. Matrix interference should be surveyed in samples from normal and diseased donors, especially in samples from the anticipated patient population. Spike recovery using samples from persons in the target population should be performed. A later selectivity evaluation of LBA involves a parallelism test of high-concentration endogenous samples from many people diluted at ≥4 dilution factors with the standard diluent. Specificity Specificity reflects the ability of an assay to distinguish between the analyte of interest and other structurally similar components of the sample. For LBAs a specificity test should include test of interference from components that may specifically affect the binding reaction. For example, inter-
METHOD VALIDATION AND ASSAY APPLICATION
201
ference from the drug compound may alter ligand interactions with the capture reagent. If the ligand reagent of an LBA is the target drug or its analog, the presence of the drug compound at certain concentrations can interfere with the assay. In addition, biomarker homologs or endogenous molecules of the same family may contribute to specificity problems. If reference material of the potential interferent is available, specificity can be tested using various amounts of the test material spiking into various levels of VSs (Lee and Ma, 2007). Specificity for the biomarker in the presence of the drug under study should also be tested, using samples from endogenous pools (such as the SCs), with and without the addition of the drug compound at various concentrations spanning the expected therapeutic range. Figure 5 illustrates a specificity experiment of a target biomarker (A) and its proximal biomarker (B), each tested with three sample pools at pM levels against various nanomolar drug concentrations. As expected, the test drug inhibited the quantification of the target biomarker at concentrations greater than 2 nM. For the proximal
pM Target Biomarrker
7 6 5 4 3 2 1 0 1
2
3
4
5
4
5
nM Drug Compund
(A)
pM Proximal Biomarrker Conc
6 5 4 3 2 1 0 1 (B)
2
3 nM Drug Compund
Pool 1
Pool 2
Pool 3
Figure 5 Method validation specificity tests of a target biomarker (A) and a proximal biomarker (B) against a drug compound.
202
FIT-FOR-PURPOSE METHOD VALIDATION AND ASSAYS
biomarker, the test drug did not affect the assay at all test concentrations, which exceeds the highest expected concentration of the drug in the serum. The results illustrated that the drug affected only its target protein. Stability In addition to sample integrity during collection, stability of protein biomarkers and assay reagents is required to show that the assay is not compromised by this preanalytical factor. Stability information is gathered during method validation on short-term storage and process stability, and is extended through in-study validation for long-term storage stability. Reagent storage stability is tested on reference material stock solutions and ligand-binding reagent stock solution if information from the supplier is lacking. Working solutions will be prepared from the stored stock solution and assay performance compared to those from that of a freshly reconstituted solution. Process stability is evaluated at different time durations and various conditions to mimic possible sample handling in the laboratory from the results of SCs subjected to stress from multiple freeze–thaw cycles and temperature exposures (e.g., four cycles and up to 16 to 72 hours to at 2 to 8 °C and ambient temperature). Long-term storage stability of the SCs should be started during validation and continued to support the drug development program. The acceptance criteria of storage stability are according to those established during method validation, prior to conducting the experiment. For long-term stability evaluation, in addition to comparison against the initial concentration, trend analysis can be more useful than the acceptance criteria strictly applied. This is due to the challenges of a kit lot change, potentially shifting the mean value of the SCs, and a new preparation of the standard curve, introducing systemic bias to add an error component that is not caused by the storage conditions (similar to that illustrated in Figure 4). Assay Application Before sample analysis begins, a written method standard operating procedure (SOP) must be in place. Often, a validation report should also be issued. Documentation of sample assay should follow the SOP in a GLP compliance manner (see the discussion later on regulatory issues), with traceable records. The data should be stored in a secure place for auditing. Standard Operating Procedure A validated method should have defined procedures, assay performance parameters collected from experiments using the same method during prestudy validation, and acceptance criteria established reflecting the acceptability of the method performance. These elements constitute a method SOP and are discussed in the following section. Method Procedures following:
A written method procedure should clearly describe the
METHOD VALIDATION AND ASSAY APPLICATION
• • • • • • • • • • •
203
The intended purpose of the method The principle of the method A validation performance summary Materials and reagents, including lot numbers and expiration dates Solutions preparation Standards and QC, SC preparations Sample pretreatment (processing) Analytical steps Data regression Acceptance criteria References
During the development of a novel biomarker, a series of method modifications can take place. It is important that specific names (version number) and effective dates be given to the same method as that used in a clinical study. A historical log should be kept on the modifications made for each version, and its application in specific studies identified. Sometimes, for various reasons, if a method change has to happen within a clinical study, a crossover comparison of the old and new methods applied to a subset of clinical samples must be designed with a statistical approach to show method equivalence. Regression Model The response–concentration relationship of the LBA method is nonlinear. Curve fitting would require regression model selection based on multiple runs of standard curves from the method validation accuracy and precision experiments. Typically, a four- or five-parameter logistic model will fit most methods with a choice of weighting factors (such as 1/ response or 1/variance). The initial selection is based on the residual of the standard mean. It can be chosen to obtain the best fit over the entire range and with regard to the low end if sensitivity is an issue. The minimal total error of the VS data generated in the validation batches should be used to determine the regression algorithm and weighting. Once the regression model is chosen, it is a basic element of the method that should not be changed during assay implementation. Justification must be given and documented for a regression model change. Acceptance Criteria Without sufficient data from the healthy and patient samples, the acceptance criteria for a novel biomarker can initially be set as determined by the assay performance from prestudy method validation. The biological data obtained from in-study validation from subject samples can then be used to refine the initial acceptance criteria set by the prestudy validation. The process of setting acceptance criteria for a protein biomarker measurement follows an evolving path. During the exploratory phase, the acceptance criteria were initially set according to the initial assay performance
204
FIT-FOR-PURPOSE METHOD VALIDATION AND ASSAYS
of prestudy method validation. After use in pilot studies, assessment should be made on the suitability of the assay stringency vs. the effect observed. The biological data obtained from the in-study validation from subject samples during in-study phase can then be used to refine the initial acceptance criteria set by the prestudy validation. For example, an assay with 50% total error may still be acceptable for detecting a twofold treatment effect observed in the in-study phase clinical trial. Setting the same acceptance criteria of a given method may be most convenient for an analytical laboratory. However, for a novel biomarker, it may not be the most appropriate for all of its applications. One could take into account the intended purpose of the application and the possible outcomes in the analytical phase. For example, the effect of one population or indication may be different from another (e.g., change from a twofold treatment effect into only 30%), which may require a more stringent method and/or more subjects to increase the predictive power in such an application for the other population. Controls in Clinical Studies For a well-planned assay application, both analytical quality controls and biological sample controls should be available to assess the assay variability. Data of analytical variability are gathered from standards, QC and SC of each analytical run of sample analysis. A common set of SCs (or pools of incurred samples from a previous study) analyzed by multiple bioanalytical laboratories can provide an assessment of assay performance among the laboratories. Variability of supplies in reagents and reference standard materials can be detected by tracking the assay performance of a common set of SCs used within and between studies. An example is shown by an SC chart for serum C-terminal telopeptides of type I collagen (CTx) assay in Figure 6, using the Westgard rule for monitoring (Westgard, 2003) as well as a priori acceptance criteria of 25%. The low and high SCs were prepared by pooling a large volume of authentic samples, and the mean values were established during method validation. Figure 6 shows the Levey Jennings charts of more than 117 analytical runs from five clinical studies (A to F) over more than 2.5 years. During this time span, two reference stock materials and three kit batches were used. The plots show a slight positive bias in the SC values after the initial sequence of studies B and C. In addition, a slight negative bias was observed prior to study A. However, since the SC values were still acceptable within ±25%, no action was taken. Other statistical analysis can be use to assess SC data on other assay variability sources, such as operator differences and sample storage stability trend. At the same time, data from predosed samples can furnish information on biological variability among the subjects and between studies. Figure 7 shows the baseline concentration distribution of TRACP 5b, a bone resorption biomarker, in various patient populations from four clinical studies. Statistical tools can be used to construct distribution charts of the predosed samples of each population, and cohorts can be compared. For pilot clinical trials with small number of cohorts, one should note that the data may not be normally
LSC ng/mL
METHOD VALIDATION AND ASSAY APPLICATION 0.18 0.17 0.16 0.15 0.14 0.13 0.12 0.11 0.10
1:2S 1:2S 1:2S 1:2S 1:2S
205
UCL = 0.1637
Avg = 0.1336 1:2S 1:2S 1:2S
1:2S
LCL = 0.1034 B C C B C B A F F C F F F F
0.75
HSC ng/mL
0.70
1:2S 1:2S 1:2S 1:2S 1:2S
UCL = 0.7052
0.65 0.60
Avg = 0.5949
0.55 0.50
1:2S
LCL = 0.4846
0.45 B C C B C B A F F C F F F F Sequence and Study
Figure 6 Levey Jenning plots of low and high sample controls of serum CTx. Concentrations were plotted against run sequences labeled with the specified studies from A to F. UCL, upper confidence limit; LCL, lower confidence limit. Sample results exceeding the Westgard rule of 1:2 S (beyond 2 standard deviations) were marked.
distributed (example in Figure 7A and C) as compared to those with the latephase clinical trials (example in Figure 7B and D). In addition, multiple collections of predose samples from the same subject can provide sample collection variables of multiple clinical sites. Confirming Selectivity in Patient Populations Biomarkers can reflect the dynamic changes in disease progression and response to therapy, but homeostatic and compensatory biological responses can alter the expected profile in response to drug treatment and concomitant therapies (Heeschen et al, 2003; Vural et al., 2006). In the latter case, biomarker assay selectivity is a critical component for interpreting biomarker responses and clinical data, particularly when a concomitant therapy is a potential interferent in the assay. For example, the bone turnover biomarker, serum CTx, is being used in clinical studies involving many different populations. During method validation, serum from healthy males and females and from patients with breast cancer, multiple myeloma, osteoporosis, and rheumatoid arthritis were analyzed with and without the addition of CTx at 0.145 ng/mL. The spike recovery was calculated
206
FIT-FOR-PURPOSE METHOD VALIDATION AND ASSAYS
(A)
3.5
(C)
7 10.5 14 17.5 21 24.5 28 31.5 35
1
2
3
4
5
6
7
8
(B)
1
2
3
4
5
6
7
8
9
9
10
(D)
10
1
2
3
4
5
6
7
8
9
10
Figure 7 Baseline concentration distribution of TRACP5b in various patient populations from four clinical studies: (A) patients with advanced cancer (n = 105); (B) patients with prostate cancer (n = 1427); (C) postmenopausal patients (n = 222); (D) postmenopausal patients (n = 7828). The curved line represents a fit by lognormal distribution. Outlier box plots are shown in the panels above the histograms. X-axis, concentrations of TRACP5b in ng/mL; Y-axis, relative frequency.
by subtracting the endogenous concentration of the unspiked sample from the spiked sample. The recovery data of the percent difference from the nominal value of the various populations were presented in Figure 8. The results show that all normal male and female serum recovered with the acceptance criteria (dashed line of ±25%). Recoveries were more variable with patient samples, although most of them were acceptable. The recovery data set of normal females had the best precision and was used as the control in Dunnett’s method to compare with the other groups. The Figure 8 side bar shows that the circles of all groups were overlapping with the control group (bold circle). All groups had p-values less than an α of 0.05, while that of the rheumatoid arthritis group was 0.0587, possibly due to a slight interference from the rheumatoid factor in the samples. Specificity and Sensitivity of Novel Biomarker on Drug Effect In-study data should be compiled and assessed to determine if (1) assay performances of
METHOD VALIDATION AND ASSAY APPLICATION
207
50
% Diff of spike recovery
40 30 20 10 0 –10 –20 –30 –40 BC
MM Normal F Normal M Population
Osteo
RA
With Control Dunnett’s 0.05
Figure 8 Selectivity test of CTx in sera from healthy and patient populations. The populations included healthy male (normal M, n = 20) and female (normal F, n = 20); patients with breast cancer (BC, n = 20), multiple myeloma (MM, n = 10), osteoporosis (Osteo, n = 18) and rheumatoid arthritis (RA, n = 10). The normal male age range was 45 to 74, that of female was 38 to 67. Serum samples were analyzed without and with the addition of CTx at 0.145 ng/mL. The spike recovery was calculated by subtracting the unspiked sample concentration from that of spiked sample for each person. Percent (%) difference from the nominal value of 0.145 was calculated and the data presented in box plot for each group of population. Each box presents the mean, standard deviations, and range. Side panel: Comparisons with a control using Dunnett’s method.
standards and SCs are within acceptance criteria, (2) patient sample concentrations are in the ranges expected for predosed samples, and (3) postdosed patient sample concentrations are consistent with the values expected for previous pilot study (such as PD profile, dose effect, and recovery to normal after treatment change, if applicable). Data should be evaluated for false positive (specificity) and false negative (sensitivity). For example, the variability of the SCs could reflect the assay noise of incurred samples. This variance should be smaller than that of the difference in disease subject vs. normal (specificity) and that of concentration change due to drug versus placebo effect (sensitivity). The data mining, interpretation, and linkage of biomarker data to clinical outcome is beyond the scope of this chapter. It should be borne in mind that a useful predictive index of a biomarker is dependent on whether it is used as one of a panel of biomarkers. The statistical power is affected by the sampling number (number of subject and time point) and assay stringency. Therefore, the bioanalytical laboratory should be included in the biomarker team to interpret data use, the model being tested, and the analytical component required for the predictive index desired. Regulatory Issues Regulatory guidance for bioanalytical validation indicates that the paramount objective of a bioanalytical method validation is to ensure
208
FIT-FOR-PURPOSE METHOD VALIDATION AND ASSAYS
that the assay is reliable for its intended use (ICH Guidelines, 1994; Shah et al., 1992). This philosophical viewpoint is central to many guidance documents, conference reports, and “white papers” for bioanalytical validation of assays for conventional small-molecule drugs and macromolecular therapeutics (Shah et al., 2000; Code of Federal Regulations, 2001a; FDA, 2001; Findlay et al., 2000; Miller et al., 2001; DeSilva et al., 2003). Currently, two basic bioanalytical approaches are being applied to biomarker validation. One is based on the FDA bioanalytical drug assay guidance and is generally referred to as “GLP-like” (Code of Federal Regulations, 2001a; FDA, 2001). The second is based on the CLSI and CLIA (National Committee for Clinical Laboratory Standards, 1999; Code of Federal Regulations, 2001b). Presently, no regulatory guidance has been given concerning what experiments should be performed and what data are necessary and appropriate for biomarker assay validation (Lee et al., 2007). This has led to inconsistent application of validation procedures and acceptance criteria for biomarkers in clinical trials. For biomarker assays to be characterized and used for drug development support, they should be GLP-like (Lee et al., 2005, 2006). However, the guidelines must be flexible, to allow assay- and technology-specific applications to meet specific clinical study objectives on a case-by-case basis. Thus, general guidelines for bioanalytical laboratories should be developed to include both good scientific and clinical sense with flexibility to develop specific applications (Lee et al., 2007). At the same time, documentation of change and version control of biomarker methods is important, especially to those who produce elements of the assay or kit. The FDA has set up an Interdisciplinary Pharmacogenomic Review Group (IPRG) to carry out specific review function for the assessment of biomarker qualification data. The team will evaluate study protocols and review study results of novel biomarkers of drug safety, using appropriate preclinical, clinical, and statistical considerations. The team will then develop recommendations and guidance for the submission of biomarker data, assess the original biomarker context proposal through voluntary data submission (VXDS, where the X indicates a wide range of data sources), and then evaluate the qualification study protocol together with the sponsor to reach a consensus protocol (Goodsaid and Frueh, 2007). Characterization data of novel biomarkers to support efficacy can be categorically in the voluntary data submission, and discussion with the IPRG will clarify the data requirement for drug labeling. Documentation Documents and data on biomarker characterization to support drug development must be kept in a secure place and made accessible for internal or external audits. All processes and data generation should be traceable. It is critically important to define and document the reference standard. Because novel biomarkers do not usually have established reference standards, characterization of the standard material is dependent on the supplier laboratories from commercial or in-house sources. Documentation such as a certificate or record of analysis and a certificate of stability should be
CONCLUSIONS AND PERSPECTIVES
209
obtained. If these are not available, at least the source, identity, potency (or concentration), and lot number of the reference material should be documented. A consistent and sufficient supply of the same reference material lot should be reserved and used for the duration of a study and program if interstudy comparisons are intended. Similar to the reference material, documentation of the source, identity, potency or concentration, and lot number of critical ligand reagents and their stability should be kept. If possible, the same lot of the capturing ligand should be used throughout a study. It is prudent to test for method robustness using multiple lots of reagents during advanced method validation. Partial Validation for Change Control The time span of biomarker characterization can be a long one. Changes in operators, assay equipment, reagent lots, and a variety of other unforeseen events usually happen and will require additional validation. The significance of the alteration in a method will determine the extent of validation necessary, leading to cross-validation, partial validation, or complete revalidation of an assay to control for changes during or between studies. A validation plan should be prepared for partial validations, describing the intended purpose of change, scope, experiments to be carried out, data comparison, and a prior acceptance criterion for method equivalence. A common set of clinical samples analyzed by both methods can be used to show method equivalence. The sampling size should be powered by statistics based on the desired criterion of acceptance and the method variability. If the method change is expected to be extensive, a new validation would be conducted according to the new analytical procedure, followed by a new written validation report. CONCLUSIONS AND PERSPECTIVES In this chapter we have focused on a discussion of quantitative method validation and assay for biomarker characterization to support drug development using the fit-for-purpose approach. The main processes include: defining the purpose of application, assay development, validation and use meeting the purpose and statistical treatment, and interpretation of analytical and clinical data. Due to the heterogeneous nature of many protein biomarkers, discrepancies may exist between the endogenous forms and the reference standard. It is important to understand what is being measured and the clinical meaning of the moieties measured as related to disease progression and drug effect. For example, during the course of development of a LBA method, biophysical methods (such as mass spectrometry coupled to ligand binding solid-phase extraction) can be used to select reagents and identify the chemical species being captured in the ligand solid phase. In addition, there are differences in the standard curve blank control matrix from the intended biological matrix. Parallelism is a characteristic that must
210
FIT-FOR-PURPOSE METHOD VALIDATION AND ASSAYS
be tested using authentic samples from the target population. Another important consideration is that binding of the biomarker to endogenous binding proteins, receptors, and the xenobiotic drug candidate may cause assay anomaly. Different methods may measure different forms of the biomarker, which often leads to the confounding data in the literature. Specific binding reagents and processing steps can be designed and manipulated to measure “bound” or “free” species of the biomarkers. In contrast to the exploratory phase, biomarker characterization requires more intense rigor in a quantitative method. For example, method robustness and long-term storage stability must be established; sufficient and consistent supplies of reagents should be assured to sustain the applications through multiple studies; and the same methods should be validated and applied in the participating clinical sites on sample collection and in bioanalytical laboratories on sample assays. Only when a robust method has been established and the assay performance defined, is it possible for the biological variability (clinical variance) to be distinguished from analytical variability and be assigned its clinical significance. Statistical treatment of biomarker assay performance enables assessing if the method will meet the study purpose before its application. Statistical data from in-study performance continue to add information to the method appropriateness, especially with data from the sample controls, which provide trends in authentic sample stability and reagent lot variability. Data from multiple studies and drug candidates (from the first-in-class to later backups, or to another candidate of related pathways) contribute to the accumulated knowledge of the biomarker. Whereas the exercise of utilizing biomarkers to support drug development of a single or several candidates can be controlled within a pharmaceutical company, the movement of characterization and qualifications of a novel biomarker will depend on the concerted efforts of multiple institutes/organizations. The collaborative effort of sharing and using the same method and reagents may be a challenge. If different methods are to be used, the data interpretation and comparison will add complexity to this challenge. For example, metaanalysis of clinical biomarker data from publications has been problematic because method details and validation parameters were missing to enable proper comparisons. It will be helpful if a consortium organization will handle data repository and communication (such as standardization of publication) of novel biomarkers. This is the goal of the Biomarker Consortium (launched in October 2006) by the FDA, Foundation for the National Institutes of Health, and Pharmaceutic Research and Manufactures of America through their various working groups. Acknowledgments The authors are grateful to the Biomarker Subcommittee in the Ligand Binding Assay Bioanalytical Focus Group of the American Association of
REFERENCES
211
Pharmaceutical Scientists for ideas and collaborations in the fit-for-purpose method validation concept. We thank Binodh DeSilva, Daniel Burns, Chad Ray, Han Gunn, and Lennie Uy for their contributions to the content and critical review of this manuscript. REFERENCES Banks RE (2008). Preanalytical influences in clinical proteomic studies: Raising awareness of fundamental issue in sample banking. Clin Chem, 54;16–17. Berger J, Wagner JA (2002). Physiological and therapeutic roles of peroxisome proliferator-activated receptors. Diabetes Technol Ther, 4:163–174. Bjornsson TD (2005). Biomarkers: applications in drug development. Eur Pharm Rev, 1:17–21. Bloom JC, Dean RA (eds.) (2003). Biomarkers in Clinical Drug Development. Marcel Dekker, New York. Chu CS, Lee KT, Lee MY, et al. (2006). Effects of rosiglitazone alone and in combination with atorvastatin on nontraditional markers of cardiovascular disease in patients with type 2 diabetes mellitus. Am J Cardiol, 97:646–650. Code of Federal Regulations (2001a). CFR Title 21, vol. 1, Good Laboratory Practice for Nonclinical Laboratory Studies, rev. Apr. 1. Code of Federal Regulations (2001b) CFR Title 42, vol. 3, Clinical Laboratory Improvement Amendment, rev. Oct. 1 DeSilva B, Smith W, Weiner R, et al. (2003). Recommendations for the bioanalytical method validation of ligand-binding assays to support pharmacokinetic assessments of macromolecules. Pharm Res, 20:1885–1900. FDA (2001). Guidance for industry on bioanalytical method validation: availability. Fed Reg, 66:28526–28527. FDA (2004). Innovation or stagnation? Challenge and opportunity on the critical path to new medical products. http://www.fda.gov/opacom/hpview.html. FDA (2006). Table of valid genomic biomarkers in the context of approved drug labels. http://www.fda.gov/cder/genomics/genomic_biomarkers_table.htm. FDA CDER (2006). Using disease, placebo, and drug prior knowledge to improve decisions in drug development and at FDA. Case Studies Across Companies Disease Models at FDA: Overview and Case Studies (Diabetes and Obesity), p. 65. Ferguson RE, Hoschstrasser DF, Banks RE (2007). Impact of preanalytical variable on the analysis of biological fluids in proteomic studies. Proteom Clin Appl, 1:739–746. Findlay JWA, Smith WC, Lee JW, et al. (2000). Validation of immunoassays for bioanalysis: a pharmaceutical industry perspective. J Pharm Biomed Anal, 21: 1249–1273. Goodsaid F, Frueh F (2007). Biomarker qualification pilot process at the US Food and Drug Administration. AAPS J, 9:E105–E108. Haffner SM (2007). Abdominal adiposity and cardiometabolic risk: Do we have all the answers? Am J Med. 120(9 Suppl 1):S10–S106.
212
FIT-FOR-PURPOSE METHOD VALIDATION AND ASSAYS
Heeschen C, Dimmeler S, Hamm CW, Boersma E, Zeiher AM, Simoons ML (2003). Prognostic significance of angiogenic growth factor serum levels in patients with acute coronary syndromes. Circulation, 107:524–530. ICH Guidelines. (1994). Text on Validation of Analytical Procedures, Q2A. International Conference on Harmonization, Geneva, Switzerland. Khatami M (2007). Standardizing cancer biomarkers criteria: data elements as a foundation for a database. Inflammatory mediator/M-CSF as model marker. Cell Biochem Biophys, 47:187–198. Krishnamurti U, Steffes MW (2001). Glycohemoglobin: a primary predictor of the development or reversal of complications of diabetes mellitus. Clin Chem, 47:1157–1165. Kummar S, Kinders R, Rubinstein L, et al. (2007). Opinion: Compressing drug development timelines in oncology using phase ‘0’ trials. Nat Rev Cancer, 7:131–139. Lee JW, Ma H (2007). Specificity and selectivity evaluations of ligand binding assay of protein therapeutics against concomitant drugs and related endogenous proteins. AAPS J, 9:E164–E170. Lee JW, Smith WC, Nordblom GD, Bowsher RR (2003). Validation of assays for the bioanalysis of novel biomarkers. In Bloom JC, Dean RA (eds.), Biomarkers in Clinical Drug Development. Marcel Dekker, New York, pp. 119–149. Lee JW, Weiner RS, Sailstad JM, et al. (2005). Method validation and measurement of biomarkers in nonclinical and clinical samples in drug development: a conference report. Pharm Res, 22(4):499–511. Lee JW, Devanarayan V, Barrett YC, et al. (2006). Fit-for-purpose method development and validation for successful biomarker measurement. Pharm Res, 23(2):312–328. Lee JW, Figeys D, Vasilescu J (2007). Biomarker assay translation from discovery to clinical studies in cancer drug development: quantification of emerging protein biomarkers. In Hampton GM, Sikora K (eds.), Genomics in Cancer Drug Discovery and Development. Advances in Cancer Research. Elsevier, Amsterdam, pp. 269–298. Lee JW, O’Brien P, Pan Y, et al. (2008). Development and validation of ligand binding assays for biomarkers. In Khan M, Findlay JWA (eds.), Ligand-Binding Assays: Development, Validation and Implementation in the Drug Development Arena. Wiley, Hoboken, NJ. Lee YH, Pratley RE (2005). The evolving role of inflammation in obesity and the metabolic syndrome. Curr Diabetes Rep, 5:70–75. Liu S, Tinker L, Song Y, et al. (2007). A prospective study of inflammatory cytokines and diabetes mellitus in a multiethnic cohort of postmenopausal women. Arch Intern Med, 167:1676–1685. Mager DE, Wyska E, Jusko WJ (2003). Diversity of mechanism-based pharmacodynamic models. Drug Metab Dispos, 31:510–518. Miller KJ, Bowsher RR, Celniker A, et al. (2001). Workshop on bioanalytical methods validation for macromolecules: summary report. Pharm Res, 18:1373–1383. National Committee for Clinical Laboratory Standards (1999). Evaluation of Precision Performance of Clinical Chemistry Devices: Approved Guideline. CLSI Document EP5-A. Pepe MS, Etzioni R, Feng Z, et al. (2001). Phases of biomarker development for early detection of cancer. J Natl Cancer Inst, 93:1054–1061.
REFERENCES
213
Quraishi I, Rishi M, Feldman M, Wargovich MJ, Weber B (2007). Clinical validation of breast cancer biomarkers using tissue microarray technology. App Immunohistochem Mol Morphol, 15:45–49. Rubin MA (2004). Using molecular markers to predict outcome. J Urol, 172(5 Pt 2): S18–S21. Stoch SA, Wagner JA (2007). Biomarker analysis as a decision-making tool in drug discovery and development: implications for peroxisome proliferator–activator receptors. Int J Pharm Med, 21:271–277. Sweep FC, Fritsche HA, Gion M, Klee GG, Schmitt M (2003). Considerations on development, validation, application, and quality control of immuno(metric) biomarker assays in clinical cancer research: an EORTC-NCI working group report. Int J Oncol, 23:1715–1726. Shah VP, Midha KK, Dighe S, et al. (1992). Analytical methods validation: bioavailability, bioequivalence, and pharmacokinetic studies. Pharm Res, 9:588–592. Shah VP, Midha KK, Findlay JWA, et al. (2000). Bioanalytical method validation: a revisit with a decade of progress. Pharm Res, 17:1551–1557. Tholen DW, Linnet K, Kondratovich M, et al. (2004). Protocols for Determination of Limits of Detection and Limits of Quantitation; Proposed Guideline. NCCLS EP17-P. van Doorn M, Kemme M, Ouwens M, et al. (2006). Evaluation of proinflammatory cytokines and inflammation markers as biomarkers for the action of thiazolidinediones in type 2 diabetes mellitus patients and healthy volunteers. Br J Clin Pharmacol, 62:391–402. Vural P, Akgul C, Canbaz M (2006). Effects of hormone replacement therapy on plasma pro-inflammatory and anti-inflammatory cytokines and some bone turnover markers in postmenopausal women. Pharm Res, 54:298–302. Wagner JA (2002a). Overview of biomarkers and surrogate endpoints in drug development. Dis Markers, 18:41–46. Wagner JA (2002b). Early clinical development of pharmaceuticals for type 2 diabetes mellitus: from pre-clinical models to human investigation. J Clin Endocrinol Metab, 87:5362–5366. Wagner JA, Williams SA, Webster CJ (2007). Biomarkers and surrogate end points for fit-for-purpose development and regulatory evaluation of new drugs. Clin Pharm Ther, 81:104–107. Westgard JO (2003). Internal quality control: planning and implementation strategies, Ann Clin Biochem, 40:593–611. Yu LR, Veenstra TD (2007). AACR-FDA-NCI cancer biomarkers collaborative. Expert Rev Mol Diagn, 7:507–509.
11 MOLECULAR BIOMARKERS FROM A DIAGNOSTIC PERSPECTIVE Klaus Lindpaintner, M.D., M.P.H.* F. Hoffmann–La Roche AG, Basel, Switzerland
INTRODUCTION Molecular biomarkers may be useful in a number of applications, including mechanistic studies, preclinical safety and efficacy testing, early clinical pharmacodynamic assessment, dose-finding studies, and stratification for efficacy or safety and dose determination or dose guidance for a marketed pharmaceutical product. Although applications for a marketed drug product will pose the most demanding requirements in terms of the characterization of the test and its development into a commercially available product, all biomarker tests will need to satisfy certain requirements to be useful for derivation of the information they are intended to deliver. A broader definition of the term molecular biomarker comprises any and all in vitro testing for biologically informative molecules in tissues or body fluids, be they for purposes of diagnosing disease, preventive health screening applications, prognostication of the natural history of a disorder, prediction of likely efficacy or safety parameters of a treatment, monitoring of treatment efficacy, or screening health care product manufacture or processing, including blood products. *This communication represents the author’s personal views and not necessarily those of any of his institutional or corporate affiliations, including F. Hoffmann–La Roche. Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
215
216
MOLECULAR BIOMARKERS FROM A DIAGNOSTIC PERSPECTIVE
For purposes of the discussion below, we focus primarily on the diagnostic perspective as relevant to novel biomarker tests that accompany the development of a drug, in particular those that will require the development of a commercially available diagnostic test. Since such tests, which are often referred to as companion diagnostics, represent the most demanding scenario, they serve as an example for the spectrum of possible considerations that may be applicable to any diagnostic test—even though, in reality, most tests that add value at various points during the discovery or development of a drug will not need to live up to such scrutiny. The fundamental question determining whether a biomarker is useful is the added information content that it conveys regarding the question it is intended to answer, and whether or to what extent this information adds value in the treatment of patients. Determination of these parameters will generally require characterization of the test on multiple levels that may be defined, broadly, as analytical and clinical. Analytical performance includes a number of metrological aspects, and clinical performance is largely subdivided into validation and utility.
ANALYTICAL PERFORMANCE Describing and characterizing analytical performance is very much oriented toward metrology. The general conventions and standards to consider include precision, trueness, and accuracy, along with a number of other more technical characteristics that describe the performance of a test. Analytical Precision and Imprecision Precision is a measure of the random error or variability observed in measurement results and is a product of the sample handling and analytical process. Precision is typically expressed as the standard deviation found in sample results. Precision includes the definitions of repeatability and reproducibility. Repeatability is the precision with which, in a series of measurements carried out in sequence, results of a measurement agree with each other. Reproducibility, on the other hand, means the precision with which measurements of the same standard specimen carried out not in sequence, but under conditions that differ in some specified way (common examples are day to day, or lab to lab), agree with each other. The difference between actual and perfect agreement of these values is referred to as imprecision. Analytical Trueness and Bias Trueness describes the closeness of agreement of an average value from a large series of measurements with a “true value” or an accepted reference value. The numerical value that represents the difference between the two is gener-
ANALYTICAL PERFORMANCE
217
ally referred to as bias. Bias refers to systematic differences between measurement results and the true value of a parameter that is being measured. Bias in measurement results can be introduced in a number of ways, including through sampling, sample handling, sample preparation, matrix interference, cleanup, and determinative processes. Analytical Accuracy Accuracy refers to a composite assessment that comprises both random and systematic influences (i.e., both precision and trueness). Its numerical value is the total error of measurement. Other Metrics Describing Analytical Performance Aside from metrics for analytical accuracy, a number of parameters are of importance in defining and/or determining the utility of a particular test: 1. The limit of detection (LOD) is the smallest amount of an analyte that can reliably be detected by an assay, with a stated confidence limit. The definition includes a number of different detection limits that define different properties of the assay, including the lower limit of detection (LLOD), the instrument detection limit (IDL), the method detection limit (MDL), the (lower) limit of quantitation (LOQ or LLOQ), and the practical quantitation limit (PQL). The detection limits are estimated from the mean of a number of repeated measurements of the blank, the standard deviation of the blank measurements, and a defined confidence factor. The PQL is defined simply as about five times the MDL. In practical terms, the lower limit of detection is the lowest level of analyte that can be statistically distinguished from the blank (i.e., from background noise). LLOD is a function of the variability of the blank and the sensitivity of the assay. The LLOD is usually considered to be a value that is 3 standard deviations above the mean of the blank. Using this formula, the chance of misclassification is 7%. If 2 standard deviations are used, the chance of misclassification is 16%. Values below the LLOD should be reported as “less than the LLOD value” rather than as a finite value. The LLOD can commonly be distinguished from an additional variable that is not assay-specific but instrument-specific. Most analytical instruments produce a signal even when a blank (matrix without analyte) is analyzed. This signal is referred to as the instrument detection level or instrument noise level. The IDL is the analyte concentration that is required to produce a signal greater than three times the standard deviation of the noise level. Ideally, this would be equivalent to the assay’s LLOD (as determined under optimal conditions), but is usually somewhat higher.
218
MOLECULAR BIOMARKERS FROM A DIAGNOSTIC PERSPECTIVE
2. The method detection limit (MDL) is a metric that is similar to the IDL, but is based on samples that have gone through the entire sample preparation scheme prior to analysis, such as extractions, digestions, concentrations or dilutions, or fractionations, as well as interference by other components present in a complex matrix. The recovery of an analyte in an assay is the detector response obtained from an amount of the analyte added to and extracted from the biological matrix, compared to the detector response obtained for the true concentration of the pure authentic standard. Recovery pertains to the extraction efficiency of an analytical method within the limits of variability. Recovery of the analyte need not be 100%, but the extent of recovery of an analyte and of the internal standard should be consistent, precise, and reproducible. Recovery experiments should be performed by comparing the analytical results for extracted samples at three concentrations (low, medium, and high) with unextracted standards that represent 100% recovery. 3. The limit of quantitation (or quantification), also referred to as lower limit of quantification (LLOQ), is set at a higher concentration than the LLOD; in the statistical method, it generally is defined as 10 standard deviations above the mean blank value, thus presenting a greater probability that a value at the LLOQ is “real” and not just a random fluctuation of the blank reading. The lowest standard on the calibration curve should be accepted as the limit of quantification if the analyte response at the LLOQ is at least five times the response compared to the blank response, and if the analyte peak (the response) is identifiable, discrete, and reproducible with a precision of 20% and an accuracy of 80 to 120%. The LLOQ can differ drastically between laboratories, so another parameter for detection limit is commonly used, the practical quantitation limit (PQL). The PQL is commonly defined as 3 to 10 times the MDL. 4. Selectivity is the ability of an analytical method to differentiate and quantify the analyte in the presence of other components in the sample. For selectivity, analyses of blank samples of the appropriate biological matrix (plasma, urine, or other matrix) should be obtained from a sufficiently large and representative number of sources. Each blank sample should be tested for interference, and selectivity should be ensured at the lower limit of quantification (LLOQ). 5. The coefficient of variation (CV) is a normalized measure of dispersion of a probability distribution. It is defined as the ratio of the standard deviation to the mean. It is often reported as a percentage (%) by multiplying the calculation by 100. The coefficient of variation is useful because the standard deviation of data must always be understood in the context of the mean of the data. The coefficient of variation is a
DIAGNOSTIC ACCURACY AND CLINICAL VALIDATION
219
dimensionless number, so when comparing between data sets with different units or wildly different means, one should use the coefficient of variation for comparison instead of the standard deviation.
DIAGNOSTIC ACCURACY AND CLINICAL VALIDATION Diagnostic accuracy, determined by a process commonly referred to as clinical validation, refers to the degree with which results of a test concur with what would be considered the current gold standard of clinical assessment of the question under study. This may be another biomarker (i.e., an established reference test) or—the real gold standard—a clinical outcome or endpoint, such as survival or death. Accuracy can be expressed through sensitivity and specificity, positive and negative predictive values, or positive and negative diagnostic likelihood ratios. Each measure of accuracy should be used in combination with its complementary measure: sensitivity complements specificity, positive predictive value complements negative predictive value, and positive diagnostic likelihood ratio complements negative diagnostic likelihood ratio [1,2]. All of these parameters are not intrinsic to the test and are determined by the clinical context in which the test is employed. A summary of the characteristics, and the strengths and weaknesses of these metrics, is presented in Table 1. Sensitivity The sensitivity (“positivity in disease”) of a test is the probability that it will produce a true positive result when used on a diseased population (compared to a reference test or gold standard assessment). For the purpose of biomarkers, this would translate to the probability of, or the frequency with which, a particular qualitative marker, or a certain value of a quantitative marker is observed in the group of people harboring the condition of interest. After inserting the test results into a table set up like Table 2, the sensitivity of a test can be determined by calculating TP/(TP + FN). High sensitivity corresponds to high negative predictive value and is the ideal property of a rule-out test. Specificity The specificity (“negativity in health”) of a test is the probability that a test will produce a true negative result when used on a nondiseased population (as determined by a reference test or gold standard assessment, such as clinical outcome). For the purpose of biomarkers, this would translate to the probability or the frequency with which a particular qualitative marker, or a certain value of a quantitative marker, is not observed in the reference group of nonaffected individuals. After inserting the test results into a table set up like
220
MOLECULAR BIOMARKERS FROM A DIAGNOSTIC PERSPECTIVE
TABLE 1 Summary of Characteristics for Various Parameters Used in Describing Clinical Test Performance Metric
Definitiona
Strengths
Weaknesses
Accuracy
(TP + TN)/N
Intuitive
Sensitivity
TP/(TP + FN)
Specificity
TN/(TN + FP)
Does not depend on prevalence Does not depend on prevalence
Positive predictive value Negative predictive value Positive likelihood ratio
TP/(TP + FP)
Clinical relevance
TN/(TN + FN)
Clinical relevance
[TP/(TP + FN)]/ [FP/ (TN + FP)] [FN/(TP + FN)]/ [TN/ (TN + FP)] (TP × TN)/ (FN × FP)
Does not depend on prevalence
Applies only to positive tests
Does not depend on prevalence
Applies only to negative tests
Does not depend on prevalence; combines sensitivity and specificity Does not depend on prevalence; combines sensitivity and specificity
Values FP and FN errors equally; not intuitive
Negative likelihood ratio Odds ratio
Area under curve
Area under ROC curve
Depends on prevalence Applies only to diseased persons Applies only to nondiseased persons Depends on prevalence Depends on prevalence
Lack of clinical interpretation
a
FN, false negative; FP, false positive; N, sample size; ROC, receiver operating characteristic; TN, true negative; TP, true positive.
Table 2, the specificity of a test can be determined by calculating TN/(TN + FP). High specificity corresponds to high positive predictive value and is the ideal property of a rule-in test. Positive Predictive Value The positive predictive value of a test is the probability that a person is diseased when a positive test result is observed. In practice, predictive values should only be calculated from cohort studies or studies that legitimately reflect the number of people in that population who are diseased with the disease of interest at that time [i.e., when the prevalence (or incidence, for predictive tests) of the disorder is known]. This is because predictive values are inher-
DIAGNOSTIC ACCURACY AND CLINICAL VALIDATION
221
TABLE 2 Basic Model for Clinical Test Performancea Gold Standard (Reference Test Results or Clinical Outcome) New Test Results + −
+
−
TP FN
FP TN
a TP, number of true-positive specimens; FP, number of falsepositive specimens; FN, number of false-negative specimens; TN, number of true-negative specimens.
ently dependent on the prevalence of disease, called the prior probability. After inserting results into a table set up like Table 2, the positive predictive value of a test can be determined by calculating TP/(TP + FP).
Negative Predictive Value The negative predictive value of a test is the probability that a person is not diseased when a negative test result is observed. Again, this measure of accuracy should be used only if prevalence is available from the data. After inserting test results into a table set up like Table 2, the negative predictive value of a test can be determined by calculating TN/(TN + FN).
Positive Diagnostic Likelihood Ratios Diagnostic likelihood ratios (DLRs) are not yet commonly reported in peerreviewed literature or in marketing information provided by test manufacturers, but they can be a valuable tool for comparing the accuracy of several tests to the gold standard, and they are not dependent on the prevalence of disease [3]. The positive DLR represents the odds ratio that a positive test result will be observed in a diseased population compared to the odds that the same result will be observed among a nondiseased population. After inserting test results into a table set up like Table 2, the positive DLR of a test can be determined by calculating [TP/(TP + FN)]/[FP/(FP + TN)], or it can also be expressed as sensitivity/(1 − specificity). Useful tests will therefore have larger positive DLRs, and less useful tests will have smaller positive DLRs. An example interpretation of a positive diagnostic likelihood ratio equal to 5.0 is that for every 1% of nondiseased subjects that test as positive, 5% of the diseased subjects will test as positive. DLRs leverage pretest into posttest probabilities of a condition of interest, and there is some evidence that they are more intelligible to users.
222
MOLECULAR BIOMARKERS FROM A DIAGNOSTIC PERSPECTIVE
Negative Diagnostic Likelihood Ratios The negative DLR represents the odds ratio that a negative test result will be observed in an diseased population compared to the odds that the same result will be observed among a nondiseased population. After inserting the test results into a table set up like Table 2, the negative DLR for a test can be determined by calculating [FN/(TP + FN)]/[TN/(FP + TN], or as false negative rate/true negative rate. Useful tests will therefore have negative DLRs close to zero, and less useful tests will have higher negative DLRs. As an example, interpretation of a negative diagnostic likelihood ratio equal to 2.5 is for every one false negative, we observe 2.5 true negatives. Receiver Operating Characteristic The determination of an “ideal” cutoff value of a quantitative test that delivers continuous values is almost always a trade-off between sensitivity (true positives) and specificity (true negatives). As both change with each cutoff value, it becomes difficult to determine which cutoff is ideal. A graph of sensitivity against (1 − specificity) is called a receiver operating characteristic (ROC) curve. The ROC curve offers a graphical illustration of these trade-offs at each cutoff for any diagnostic test that uses a continuous variable. Ideally, the best cutoff value provides both the highest sensitivity and the highest specificity, easily located on the ROC curve by finding the highest point on the vertical axis and the farthest to the left on the horizontal axis (upper left corner) (Figure 1). ROC curves compare sensitivity versus specificity across a range
1
Sensitivity
0.8
0.6
0.4
0.2
0 0
Figure 1
0.2
0.4 0.6 1–Specificity
0.8
Receiver operating characteristic curve.
1
DIAGNOSTIC ACCURACY AND CLINICAL VALIDATION
223
of values of a quantitative biomarker for the ability to predict a dichotomous outcome. When the cutoff value for a continuous diagnostic variable is increased (assuming that larger values indicate an increased chance of a positive outcome), the proportions of both true and false positives decrease. These proportions are the sensitivity and 1 − specificity, respectively. A perfect test would have sensitivity and specificity both equal to 1. If a cutoff value existed to produce such a test, the sensitivity would be 1 for any nonzero values of 1 − specificity. The ROC curve would start at the origin (0,0), go vertically up the y-axis to (0,1), and then go horizontally across to (1,1), as shown in Figure 1 (dotted line). A good test would be somewhere close to this ideal. If a variable has no diagnostic capability, a test based on that variable would be equally likely to produce a false positive or a true positive: sensitivity = 1 − specificity, or sensitivity + specificity = 1. This equality is represented by a diagonal line from (0,0) to (1,1) on the graph of the ROC curve, as shown in Figure 1 (dashed line). The curve for a hypothetical test shown in Figure 1 (solid line) suggests that this marker does not provide a very good indication of the outcome of interest, but that it is better than a random guess. The performance of a diagnostic variable can be quantified by calculating the area under the ROC curve (AUROC) [4]. The ideal test would have an AUROC of 1, whereas a random guess would have an AUROC of 0.5. The AUROC can be calculated as a sum of the areas of trapeziums. For example, in Figure 1, the area under the curve between points (0.2, 0.5) and (0.4, 0.7) is given by (0.4 − 0.2) × (0.5 + 0.7)/2 = 0.12, or in other words, the difference between the x-values multiplied by half the sum of the y-values. In the example shown, the AUROC for the hypothetical marker is 0.68. This is interpreted as the probability that a patient who has the disease has a test value greater than that for a patient who does not. Like all summary measures, however, there are confidence intervals around this value that must also be taken into consideration for proper interpretation. The area under the ROC curve has become a particularly important metric for evaluating diagnostic procedures because it is the average sensitivity over all possible specificities. Precision of Diagnostic Accuracy It is important to emphasize that as in other empirical studies, specific values of diagnostic accuracy are merely estimates. Therefore, when evaluations of diagnostic accuracy are reported, the precision of the sensitivity and specificity estimates or likelihood ratios should be stated. If sensitivity and specificity estimates are reported without a measure of precision, clinicians cannot know the range within which the true values of the indices are likely to lie. Evaluations of diagnostic accuracy should therefore be described with the inclusion of confidence intervals. In the case of agreement with (an)other biomarker(s), the driving factor of such an evaluation may be the replacement of an existing, clinically validated
224
MOLECULAR BIOMARKERS FROM A DIAGNOSTIC PERSPECTIVE
test with a new one that may be operationally or economically more attractive, or one that may have greater diagnostic precision. Odds Ratio and Relative Risk The odds ratio is a way of comparing whether the relative odds of a certain event (e.g., of harboring a certain biomarker) is the same for two groups (e.g., those who have the disease as compared to those who do not). An odds ratio of 1 implies that the presence of the biomarker is equally likely in both groups (i.e., that the biomarker does not distinguish between the two). An odds ratio greater than 1 implies that the biomarker is more frequent or prevalent in the first group, and may thus be a useful test to distinguish between the two groups. From Table 2, the odds ratio would be calculated as (TP × TN)/ (FN × FP). Odds ratios are applicable solely to retrospective case–control studies, whereas in prospective studies, randomized clinical trials, or cohort studies, calculation of relative risk is appropriate. The relative risk (sometimes called the risk ratio) compares the probability of disease in the biomarker-positive and biomarker-negative groups, rather than the odds. It is calculated as TP/(TP + TN)/FN/(FN + TN), which is the quotient of the positive and negative predictive values. It is applicable only to prospective studies, randomized clinical trials, or cohort studies (i.e., to situations where the incidence/prevalence of the outcome is not biased by retrospective selection). However, in rare conditions (i.e., conditions with low incidence or prevalence), where both TP and FN are very small, the odds ratio asymptotically approaches the relative risk and is therefore sometimes substituted for the latter. For small probabilities, TP and FN are very small compared to TN and FP. Translating Different Metrics of Information Content Academic biomarker research tends to be concerned primarily with the statistically determined reliability of a finding. Thus, if the association of a biomarker with a particular phenotype is reproducible, it is interpreted to convey real biological findings that may be of great interest to fundamental understanding of biological mechanisms, even if the magnitude of effect that can be seen under a particular set of experimental constraints is small. Similarly, the Geoffrey Rose paradox [5] explains that true, reproducible observations may be of great importance from the perspective of public health even if the overall magnitude of effect is modest. Thus, targeted modulation of variables that affect disease risk by small amounts may have important effects if applied to large populations and are thus commonly used to guide health policy. Academic epidemiological research evaluating associations of predictors for outcomes often uses odds ratios or relative risk as its primary parameters for reporting results, particularly in the field of genetic association studies. In complex polygenic diseases, the magnitude of effects commonly found for associations
CLINICAL UTILITY
225
for individual genetic variants generally ranges between odds ratios of 1 to 2. Demonstrating statistical significance for such modest-sized effects requires usually fairly large studies, particularly if multiple comparisons are made (as is the case with genome-wide association studies). On the other hand, when tests are to be applied to clinical decision making in individual patients, information content with regard to the magnitude of association becomes critically important. Only reliable tests with an acceptably low rate of false positive and/or false negative results can responsibly be used in this setting. These performance parameters tend to be gleaned most directly from stating sensitivity and specificity (or positive predictive value and negative predictive value), and these are therefore the parameters commonly referred to in clinical diagnostics. In general, tests are not considered particularly useful if the area under the ROC curve does not exceed 0.8, which translates into balanced sensitivities and specificities of about 0.75. On the other hand, a balanced sensitivity and specificity of 0.75 translates into an odds ratio of about 9; if an effect of this magnitude is indeed present, it will be readily recognizable even in quite modestly sized studies.
CLINICAL UTILITY The term clinical utility, although used widely, is ill-defined. It is commonly used as a synonym for studies of clinical effectiveness and/or economic evaluations. The most basic definition of clinical utility refers to an estimation of the respective benefits and risks resulting from test use. Risks and benefits are, at this higher-level perspective, to be seen as encompassing both medical and economic connotations and considerations, even though the discussion of benefits and risks is often restricted to the former. As health care payers are becoming increasingly cost-conscious, and reimbursement decisions are being more commonly influenced by medical– economic considerations, clinical utility is quickly becoming the overriding consideration with regard to the introduction of a companion or other diagnostic. This adds significantly to the burden of research and development expenses for diagnostic companies since reliable estimates of clinical utility will usually require prospective, controlled studies in which clinical endpoints are reached and where interventions are or are not guided by testing for the biomarker of interest. Medical Considerations On the level of medical considerations, in the narrowest sense of the term, clinical utility refers to the ability of a screening or diagnostic test to prevent or ameliorate adverse health outcomes such as mortality, morbidity, or disability through the adoption of efficacious treatments conditioned on test results. A screening or diagnostic test in isolation does not have inherent
226
MOLECULAR BIOMARKERS FROM A DIAGNOSTIC PERSPECTIVE
utility; because it is the adoption of therapeutic or preventive interventions that influence health outcomes, the clinical utility of a test depends on effective access to appropriate interventions, or to the way it can beneficially affect the choice of an intervention. This use of the term utility is consistent with standard practice in evidence-based medicine, which focuses on objective measures of health status to evaluate interventions. Clinical utility can more broadly refer to any use of test results to inform clinical decision making. Finally, in its broadest sense, the medical interpretation of clinical utility can refer to any outcome considered important to individuals and their families, as well as to other societal strata. Medical–Economic Considerations If economic considerations are included in the definition of clinical utility, the question of whether the actual medical benefit derived from the use of the test indeed results in good value for the cost incurred becomes part of the consideration. Value in this context can be assessed by either cost–benefit or costeffectiveness/utility analyses. In the case of a cost–benefit analysis, costs and benefits are expressed in the same units of measurement, usually monetary. Cost-effectiveness and cost-utility analyses compare the respective monetary cost of using or not using the test with defined clinical outcomes, either in physical terms or by using an index that also includes quality-of-life aspects. As summarized in Table 3, when considering such cost–benefit analysis to evaluate the utility of a companion diagnostic that aids in the selection of a particular treatment, one would compare the aggregate costs of testing all eligible patients, of treating test-positive patients with the associated therapy, and of treating those that are test-negative with the conventional standard-ofcare approach against the cost of using the standard of care, without any testing, on all patients. In a cost–benefit analysis, the net (pecuniary) gain or loss of the two approaches would be calculated based on direct medical and nonmedical, and on indirect (productivity) costs of either alternative. As an example, if the test is intended to increase the efficacy of a particular treatment, the cost–benefit ratio may be advantageous if the savings in productivity losses due to faster recovery from an illness based on the test-guided stratification of treatments offsets the added cost of testing and use of test-directed treatment alternatives (which may have a greater direct medical cost than the standard of care). Similarly, if a test is used to avoid serious adverse effects, the cost–benefit analysis may indicate advantages if the cost savings realized by avoiding the treatment of adverse reactions outweighs the cost of testing. In a cost-effectiveness analysis, typically the overall cost per outcome unit (such as relapses or survival years), adjusted for a global quality of life in the case of a cost-utility analysis, is compared among the treatment alternatives. In the case of a companion diagnostic, the incremental cost of using the test in all patients and then treating only those in whom the test-driven therapy is either effective or safe may result in an acceptable cost-effectiveness ratio (as
CLINICAL UTILITY
TABLE 3
Cost–Benefit Considerations for Companion Diagnosticsa Efficacy Marker Direct Costs
Cost of testing plus Cost of diagnostics-driven therapy in selected group plus Cost of alternative therapy in de-selected group versus Cost of alternative therapy in all, no testing a
227
, increased cost;
Adverse Reaction Marker
Indirect Costs
Direct Costs
— Novel, presumFaster return to ably more productivity expensive drug —
No productivity loss due to ADR
—
Standard of care presumably cheaper
—
—
Later return to Cost of productivity treating ADR
Productivity loss due to ADR
, decreased cost.
No test Conventional chemo-Rx in all
Incr. QALYs
Hercep test in all→ Pos. test: Herceptin + conv.chemo-Rx Neg. test: conv. chemo-Rx
Incr. cost (in 10,000 UK £) Incr. cost/QALY (in 10,000 UK £)
No test Herceptin + conv. chemo-Rx in all 0
Figure 2
Indirect Costs
2
4
6
Incremental cost-effectiveness ratio Herceptin/HerCep test.
expressed cost per quality-adjusted years of life) for an expensive novel treatment where a less discerning use of the novel treatment may be associated with an unacceptable ratio based on the inclusion of nonresponders. An example for such an analysis was recently conducted for the case of Herceptin and its companion diagnostic test (HerCep test) (Figure 2). In this example
228
MOLECULAR BIOMARKERS FROM A DIAGNOSTIC PERSPECTIVE
[6], it could be clearly shown that the use of the test in all patients, followed by a targeted prescription of the new drug only to those in whom it was indicated based on the test, showed an acceptable incremental cost-effectiveness ratio, and one that was clearly superior to a more indiscriminate use of the drug. Cost-effectiveness calculations are affected by numerous factors, including the prevalence of a (positive) test result which materially affects the choice of therapy, the test performance as outlined above in terms of analytical performance and clinical validity, the context with regard to indication and intended use (see below), the magnitude of the advantage over conventional treatments that the test-guided alternative therapy provides, by the cost of tests and different treatments, and last but certainly not least, by the reimbursement environment, including pricing policies that will differ from market to market. Yet another way of looking at the question of clinical utility with regard to medical economics would be to calculate the number needed to screen (i.e., the number of patients that need to undergo the diagnostic test to achieve a particular outcome in at least one case). The outcome could either be defined as the prevention of a death or as an adverse event. Such a definition could be reached for both efficacy markers (where the choice of a certain drug based on screening may result in more successful treatment of eligible patients, and thus lives saved as compared with standard of care therapy) and—perhaps more obviously—for safety markers where the result would be the avoidance of an adverse event.
CONTEXTUAL CONSIDERATIONS FOR COMPANION DIAGNOSTICS’ CLINICAL UTILITY While clinical performance, expressed as sensitivity and specificity, is of obvious importance for the utility of any test, it is important to recognize that with regard to companion diagnostics, a set of specific considerations apply depending on the intended use of the test. In this respect, both the clinical indication and the magnitude of expected benefit from the treatment, and the intended use to either enhance efficacy or avoid adverse reactions, are variables that will greatly influence the requirements for sensitivity and specificity, respectively, of the desired test (Table 4). Thus, on one end of the spectrum, in life-threatening indications (e.g., oncology, HIV) and in the case of a drug that may be lifesaving, and for which feasible alternatives do not exist, an efficacy marker would be expected to show a high sensitivity (i.e., few false negatives, as they would result in inappropriate withholding of the treatment), but may be acceptable at lower specificity (false positives will result, for an otherwise safe drug, in unnecessary and ineffective treatment). In contrast, a safety marker in such an indication would be required to demonstrate high specificity (i.e., few false positives, which again would result in unnecessary withholding of the treatment), while
NEW MEDICAL ENTITY AND COMPANION DIAGNOSTIC
TABLE 4
229
Contextual Considerations Regarding Companion Diagnosticsa Efficacy Marker
Indication
Sensitivity
Specificity
Adverse Event Marker Sensitivity
Specificity
Serious or life threatening Less serious or trivial a
, High requirements, important;
, low requirements, less important.
sensitivity is less important [use of the marker would still be expected to result in fewer adverse reactions (ARs) than would otherwise be encountered; in indications where ARs are often viewed as necessary trade-offs]. On the other end of the spectrum, in more trivial indications where treatment is less urgent, the requirement for a safety marker would be very high sensitivity, as a false-negative test would result in the occurrence of an AR, which in this setting is much less acceptable than in a life-threatening illness. Specificity is less important, as “wrongful” withholding in a condition of modest severity is clearly less of an issue. Conversely, again, an efficacy marker would be expected to show high specificity: If the added cost of the test is to be acceptable in such a condition, it would presumably need to be justified more by cost–benefit considerations that would be set off by inappropriate inclusion in the treatment group of patients with false-positive test results. One may view these considerations from the standpoint of that old medical principle: first, do no harm. In serious disease, the threat may come primarily from the illness, so harm would more likely result from inappropriate withholding of treatment based on a false test result. In contrast, in less serious disease, harm may, rather, come from inappropriate administration of a treatment, and one would perhaps be more aligned with the Hippocratic principle by avoiding noncritical treatments.
CO-DEVELOPMENT OF A NEW MEDICAL ENTITY AND COMPANION DIAGNOSTIC If the label of a new medicine is to include the obligatory use of a test, the manufacturer needs to take care that drug and test are developed in tandem and receive regulatory approval in such a fashion that they can reach the market simultaneously. Authorities are paying increasing attention to the challenges that this process may present. The U.S. Food and Drug Administration has issued a concept paper on “Drug–Diagnostic Codevelopment” (http://www.fda.gov/Cder/genomics/pharmacoconceptfn.pdf) that specifically addresses this issue. A number of working groups, including
230
MOLECULAR BIOMARKERS FROM A DIAGNOSTIC PERSPECTIVE
the International Conference on Harmonization, are also currently addressing aspects related to this issue, from coining unified terminology and language around biomarkers and companion diagnostics, to formal issues relating to regulatory submissions and the evidentiary standards that will be required to accept a companion diagnostic as valid and useful.
CONCLUSIONS At the transition of a biomarker discovery and assessment program into the development for the use of a biomarker as a clinically applicable decision tool, a number of considerations that are not applicable to the discovery stage must come into focus. Many of them represent well-established concepts in analytical metronomy but are not necessarily familiar to investigators at the cutting edge of biomedical research. It is critical that these concepts be considered carefully, as this will directly influence the feasibility of any biomarker test that is to be used in the setting of clinical practice. Although many of the considerations are of a technical nature, the ultimate driver of such workability is, of course, rooted in biology and in the information content conveyed by the data.
REFERENCES 1. Jaeschke A, Guyatt GH, Sackett DL (1994). Users’ guides to the medical literature: III. How to use an article about a diagnostic test. A. Are the results of the study valid? JAMA, 271:389–391. 2. Jaeschke A, Guyatt GH, Sackett DL (1994). Users’ guides to the medical literature: III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for patients? JAMA, 271:703–707. 3. Halkin A, Reichman J, Schwaber M, Paltiel O, Brezis M (1998). Likelihood ratios: getting diagnostic testing into perspective. Q J Med, 91:247–258. 4. Faraggi D, Reiser B (2002). Estimation of the area under the ROC curve. Stat Med, 21:3093–3106. 5. Rose G (1981). Strategy of prevention: lessons from cardiovascular disease. Br Med J, 282:1847–1851. 6. Elkin EB, Weinstein MC, Winer EP, et al. (2004). HER-2 testing and trastuzumab therapy for metastatic breast cancer: a cost-effectiveness analysis. J Clin Oncol, 22(5):854–863.
12 STRATEGIES FOR THE CO-DEVELOPMENT OF DRUGS AND DIAGNOSTICS: FDA PERSPECTIVE ON DIAGNOSTICS REGULATION Francis Kalush, Ph.D., and Steven Gutman, M.D., M.B.A.* U.S. Food and Drug Administration, Rockville, Maryland
INTRODUCTION The U.S. Food and Drug Administration (FDA) introduced its critical path initiative in 2004 as an effort to stimulate and facilitate the scientific process for the development of new drugs, biological products, and medical devices [1]. The initiative was grounded in the sobering reality that between 1993 and 2003, despite a doubling in research and development spending by both industry and the National Institutes of Health (NIH), the FDA observed a decrease in the number of new drug applications (NDAs) and an erratic but flat curve in the approval of new molecular entities. Of particular concern was the dramatic increase in the developmental cost of new drugs and the high rate of new drug failures occurring relatively late in the developmental life cycle (phase III or phase IV drug failures). The critical path was developed to ensure that FDA fosters a collaborative and robust scientific environment to *The views presented in this chapter do not necessarily reflect those of the U.S. Food and Drug Administration. Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
231
232
STRATEGIES FOR THE CO-DEVELOPMENT OF DRUGS AND DIAGNOSTICS
help strengthen the assortment of tools that might be used to bring new medical products to the marketplace in an expedited manner. Since its initiation in 2004, a new office, the Critical Path Office, has been created, a comprehensive and ambitious opportunities list has been generated [9], and numerous collaborative projects and initiatives have been born, all with the singular goal of using good science to address the challenge of timely and cost-effective product development. With the unveiling of the complete human genomic map in 2001–2003 [2,3] and the development of sophisticated new technologies for molecular diagnostics (including use of microarrays and bioinformatics to create integrated multiplex tests with complex molecular signals), considerable attention has been focused on biomarkers (especially those with a pharmacogenomics base) to help improve drug candidacy and to streamline drug studies. The introduction of a diagnostic biomarker to refine drug discovery has obvious appeal. Although biomarkers in a broad sense can include both imaging technologies and in vitro diagnostic devices (IVDs), for the sake of this chapter the focus is on the latter. When used clinically, IVDs are subject to the same regulations as those applied to all other medical devices. Both the science and regulation required for proper deployment of these companion diagnostics products is quite different than the science and regulation of drugs. As a result, to bring closely linked products to the market in a timely manner, both FDA and sponsors must work collaboratively to address what may often be dual device and drug regulatory requirements.
FDA REGULATION OF IN VITRO DIAGNOSTIC DEVICES A medical device is defined as an “instrument, apparatus, implement, machine, contrivance, implant, in vitro reagent, or other similar or related article, including a component part, or accessory which is: recognized in the official National Formulary, or the United States Pharmacopoeia, or any supplement to them, intended for use in the diagnosis of disease or other conditions, or in the cure, mitigation, treatment, or prevention of disease, in man or other animals, or intended to affect the structure or any function of the body of man or other animals, and which does not achieve any of its primary intended purposes through chemical action within or on the body of man or other animals and which is not dependent upon being metabolized for the achievement of any of its primary intended purposes” by the Federal Food, Drug, and Cosmetic Act [21 United States Code [321] (h)]. In vitro diagnostics (IVDs) are defined as reagents, instruments, and systems intended for use in the diagnosis of disease or other conditions, including a determination of the state of health, in order to cure, mitigate, treat, or prevent disease or its sequelae. Such products are intended for use in the collection, preparation, and examination of specimens taken from the human body [21 CFR 809.3(a)]. IVDs are devices, as defined in section 201(h) of the act, and may also be
FDA REGULATION OF IN VITRO DIAGNOSTIC DEVICES
233
biological products subject to section 351 of the Public Health Service Act. While drugs are regulated in the Center for Drug Evaluation and Research (CDER) under the provisions of the state law, IVDs are regulated in the Center for Devices and Radiological Health (CDRH) under provisions of the Medical Device Amendments of 1976. Like the regulation of all medical devices, the regulation of IVDs by the FDA is risk-based with devices classified into low-risk (class I, e.g., adjunctive immunohistochemical stains used in conjunction with standard microscopic analysis to subtype tumors), moderate-risk (class II, e.g., prognosis, monitoring in already diagnosed cancer patients), or high-risk (class III, e.g., cancer diagnosis, screening) categories. The FDA regulatory program is comprehensive and includes requirements for registration and listing of products, for high-quality production using good manufacturing practices, and for postmarket reporting of adverse events. For some class I, most class II, and all class III devices, FDA review is required before a new medical device can enter the marketplace. • Class I devices: typically present minimal potential for harm to the user and the person being tested. They are subject to general controls, which include registration and listing, labeling, and adverse event reporting requirements [section 513(a)(1)(A) of the act]. Most class I devices are exempt from premarket notification (see definition below), subject to certain limitations found in section 510(l) of the act and in 21 CFR 862.9, 864.9, and 866.9. An IVD example of class I devices is complement reagent (21 CFR 866.4100). • Class II devices: devices for which general controls alone are insufficient to provide reasonable assurance of their safety and effectiveness and for which establishment of special controls can provide such assurances. Special controls may include special labeling, mandatory performance standards, risk mitigation measures identified in guidance, and postmarket surveillance [section 513(a)(1)(B) of the act]. Most class II devices require premarket notification. IVD examples of class II devices include glucose test systems (21 CFR 1345), antinuclear antibody immunological test systems (21 CFR 866.5100), and coagulation instruments (21 CFR 864.5400). • Class III devices: devices for which insufficient information exists to provide reasonable assurance of safety and effectiveness through general or special controls. Class III devices are usually those that support or sustain human life, are of substantial importance in preventing impairment of human health, or which present a potential, unreasonable risk of illness or injury [section 513(a)(1)(C) of the act]. Most class III devices require premarket approval (PMA), defined below. IVD examples of these include automated PAP smear readers, nucleic acid amplification devices for tuberculosis, and total prostate-specific antigen (PSA) for the detection of cancer (21 CFR 866.6010).
234
STRATEGIES FOR THE CO-DEVELOPMENT OF DRUGS AND DIAGNOSTICS
Laboratory-developed tests (LDTs), in-house tests, and “home-brew” tests are tests developed in a laboratory for use only in that laboratory. Historically, FDA has applied enforcement discretion to most LDTs and not regulated them actively. Instead, FDA has generally chosen to regulate the components of LDTs, such as instruments and reagents. Considerable discussion has occurred over the past 10 years on the existence of two parallel but noncongruent regulatory pathways to market (IVD “kits” vs. LDTs); these have been the subject of a number of reports, including the Task Force on Genetic Testing and the Secretary’s Advisory Committee on Genetic testing [4]. The Secretary’s Advisory Committee on Genomics, Health and Society (SACGHS) is in the process of discussing the issue of genetic tests with regard to potential scientific or regulatory gaps and is currently seeking public comment on a draft report to the Secretary of Health and Human Services (HHS) on the oversight of genetic testing [5]. Major Elements of a Submission Major elements of a submission can be found by browsing the Office of In Vitro Diagnostic Devices and Safety (OIVD) 510(k) data templates posted on the OIVD Web page [6]. These summarize information related to specific cleared products, including the intended use/indications for use, analytical and clinical validation information, device description including platform and software information, information on instrument and software validation when applicable, and labeling (package insert information). IVD labeling is unique among medical devices since it is described specifically in regulation [21 CFR 809.10(b)]. For PMAs, information describing manufacturing, design controls, and adherence to quality system regulations are also needed. The intended use should specify what analyte the test measures, the clinical indication for which the test is to be used, and the target population for which the test is intended. It should also indicate whether the test is qualitative and quantitative. Analytical validation includes precision (repeatability and reproducibility), accuracy, limit of detection, interferences, cross-reactivity, software, performance around the cutoff, carryover, cross hybridization, sample preparation/ conditions, and assay limitations. FDA recognizes dozens of standards by the Clinical Laboratory Standards Institute (CLSI) to assist companies in developing this information. A list of recognized standards [10] may also be found on the OIVD Web page. Clinical validation should be established in appropriate clinical studies to support the indication for use and claims of the device. These studies generally need to be provided by the sponsor or, if available, sponsors may also cite applicable clinical literature. For clinical literature to be acceptable for product clearance or approval, the quality must be carefully assessed and the link between the published studies and the device being reviewed must be carefully established. In instances where tests are developed based on sequential use of
REGULATORY TOOLS AND SOLUTIONS
235
training sets followed by independent test validation, the FDA encourages that rigorous and meticulous attention be paid to the training sets used for new diagnostic markers. However, FDA review is focused on the independent validation that supports the merits of the device itself based on feasibility studies and development of a clear hypothesis in a defined population under established conditions of use. When the system includes software, software documentation including software design, hazard analysis, and complete verification and validation should be provided [11]. Labeling needs to be detailed sufficiently to satisfy the requirements of 21 CFR 807.87(e). Final labeling for in vitro diagnostic devices must comply with the requirements of 21 CFR 809.10 before the device is introduced into interstate commerce. Among the particularly important components of required labeling are an intended use, device description, directions for use, quality control, precautions, warnings and limitations, performance characteristics, interpretation of results, and values expected.
REGULATORY TOOLS AND SOLUTIONS FOR PHARMACOGENOMICS MARKERS Over the past several years, the Office of In Vitro Diagnosis (OIVD) has made flexible use of its regulatory tools to help bring important new diagnostics to market. These include the use of the preinvestigational device exemption (pre-IDE process), expedited review, real-time work interactions, and de novo classification of new devices into classes II and I under section 513(f)(2) of the FDCA. The purpose of a pre-IDE protocol review with the FDA is to assist the sponsor and familiarize FDA with cutting-edge technology in advance of the submission. Furthermore, assistance may be needed in defining possible regulatory pathways, study design and analysis of complex data, and statistical approaches before initiation. This is a work process performed by review staff in the Center for Devices and Radiological Health at no cost to sponsors with a 60-day time line. OIVD now processes almost 300 pre-IDEs per year and finds that use of this process decreases uncertainty in the review process, provides an early opportunity for question and answers, and drives the least burdensome approach for sponsors. Both 510(k)s and PMAs for cutting-edge IVDs may also be designated for expedited review if the device is expected to provide unique public health benefits. Expedited review ensures that at every stage in the review process, the device under consideration goes to the front of the review queue. Even so, the outcome of an expedited review, like the outcome of all reviews, will depend ultimately on the quality of the submission. A subcategory of PMA supplements is eligible for what is known as realtime review, which calls for rapid interactive review between the FDA and sponsors. Although OIVD gets only a small number of real-time applications
236
STRATEGIES FOR THE CO-DEVELOPMENT OF DRUGS AND DIAGNOSTICS
per year, it has found rapid interactive review with sponsors to be so successful that it has been applied more broadly to review pre-IDEs and IDEs and processing of some devices subject to expedited review. In some cases, such interaction has facilitated review of cutting-edge new products in two weeks or less. OIVD has also made use of the de novo classification process authorized by section 513(f)(2) of the FDCA. A device that measures a new analyte or has an intended use for which no submission has previously been cleared or classified as a class I or II device is automatically assigned a class III designation, which would require a PMA submission. Under the de novo process, a sponsor that receives a determination that its device is not substantially equivalent may petition for classification directly into class II or class I if it can provide sufficient scientific basis for the FDA to conclude that a reasonable assurance of the safety and effectiveness of that device can be assured without a PMA. In this case, the new device may be tested by the manufacturer in the same way as it would be for a PMA to establish that the test is safe and effective, but the review is performed following the administrative procedures commonly applied to the 510(k) process. As a result, many PMA requirements, including premarket review of manufacturing processes, premarket inspection of manufacturing facilities, and requirements for annual reports, are all not required. Mandatory time lines for review are also shorter (90 vs. 180 days). For de novo classification requests, OIVD has been able to use clinical data and supportive literature information, current U.S. clinical guidelines and practices, as well as product-specific data generated by the sponsor to conclude that classification into class II is appropriate and to establish special controls. This type of classification occurs more quickly than other classification processes and more quickly than review of a PMA, and the newly classified device not only may be legally marketed but becomes the first predicate device for a new classification, creating a means for subsequent similar devices to reach the market through the 510(k) process. Devices to detect cytochrome P450 DNA polymorphisms, cystic fibrosis mutations, West Nile antibody, breast cancer gene expression signatures, and the hemagglutinin 5 (H5) subtype of influenza are among the products cleared using the 510(k) de novo process and the establishment of special control guidances.
REGULATORY FRAMEWORK TO ACCOMMODATE CO-DEVELOPING DRUG DIAGNOSTICS The lifecycle to obtain marketing approval for the development of a new drug is outlined in FDA law and regulation and progresses through a well-defined multistep process: basic research, prototype design or discovery, preclinical development, clinical development (phases I, II, and III), FDA filing/approval/ launch, and postlaunch (phase IV).
ACCOMMODATE CO-DEVELOPING DRUG DIAGNOSTICS
Basic research
Prototype design or discovery
Preclinical development
Clinical development Phase I
Phase II
Phase III
FDA filing approval launch
237
Phase IV post-launch
(A)
Basic research
Feasibility studies
Analytical validation
Clinical validation
Qualification of biomarker
FDA filing approval launch
(B)
Figure 1
(A) Drug and (B) device development pathways.
The development of a new diagnostic also goes through a well-defined multistep process: basic research, feasibility studies, analytical validation, clinical validation, and FDA approval/clearance. The developmental pathways for a drug and device are illustrated in Figure 1. When use of a diagnostic test is identified as important in selecting which patients receive or avoid a new drug or are to be given a higher or lower dose of a drug, the efficacy/ safety of the drug becomes inextricably linked to the effectiveness of the diagnostic. Drug performance will only be as good as the ability of the diagnostic to properly select patients for that drug treatment. Drug performance is judged based on an assessment of the drug to meet appropriate clinical endpoints or responses with a reasonable safety profile. Diagnostic studies are based on demonstrating that a test properly identifies an outcome or target of interest. When the diagnostic is used to detect drug response, the outcome of interest becomes the drug response or avoidance of drug toxicity. The test parameters of interest in the context of co-development with a new drug do not differ from those of ubiquitous interest in the study of a new diagnostic. Standardized techniques to establishing test performance have been promoted in the literature and in FDA guidances. Clinical parameters of importance include sensitivity—in this case the ability of the test to identify patients who will exhibit the desired drug response of interest—and specificity—in this case the ability of the test to identify patients who will not demonstrate the drug response of interest. Alternative useful techniques include a definition of the predictive value of a positive result (the fraction of test positives who respond), the predictive value of a negative result (the fraction of test negatives who do not respond), or the likelihood ratio of drug response after testing. The hazard ratio (HR) in prospective studies/analysis and the odds ratio (OR) in retrospective analysis could be used when interpreting probabilities in a different context of use.
238
STRATEGIES FOR THE CO-DEVELOPMENT OF DRUGS AND DIAGNOSTICS
CHANGES IN STRATEGIC THINKING Because there are now two separate medical products under investigation which may have different life cycles, companies involved in co-development are facing new challenges and may have to create complex arrangements that allow for regulatory and business issues to be addressed. FDA is committed to approaching products being submitted as part of a co-development paradigm to assuring that good science and appropriate regulatory thresholds be met but also bringing to bear a regulatory framework that is as flexible as possible. Companies may either wish to rethink their business models, shift strategies to accommodate new types of products, and change administrative structures to allow drugs and devices to be developed either in parallel or at least in a scientifically streamlined manner. Moreover, companies may be inclined to develop business alliances, seek out business partners and financial solutions to developmental costs that will encourage product development, and demonstrate the proper premium on testing so that value-based reimbursement becomes possible. Some authors have called for health system reforms that promote value-based, flexible reimbursement for innovative diagnostics and therapeutic products, and others have highlighted potential value to creating stronger economic incentives for the development of personalize medicine [7]. As health care costs continue to skyrocket, the role of FDA regulation appears to be a less important barrier to development of product than obtaining reimbursement or widespread clinical acceptance. Increasingly, both third-party payers and health care gatekeepers are demanding studies grounded in evidence-based medicine before making coverage decisions or before choosing to order new tests. There is also heightened interest among third-party payers to performing complex cost-effectiveness analysis to help make payment determinations.
PATHS IN CO-DEVELOPMENT There are several major paths of co-developing a drug and associated test: (1) drug and test are developed in parallel (true co-development); (2) drug is developed first, followed by the diagnostic test(s) development (drug rescue in case there are safety issues discovered in phase IV); and (3) the already existing test can be used for drug(s) selection (patient enrichment/stratification). Multiple tests may be developed for the selection of a certain drug, either in parallel with the drug or at a later date. Figure 2 illustrates major drug diagnostics collaboration scenarios. In the end, regardless of the path taken, co-development results in crosslabeling/relabeling of drug and/or the diagnostics. FDA drug package inserts frequently cite the relevant in vitro diagnostic test(s), and FDA cleared/ approved devices reference the targeted drug(s). Table 1 presents current
PATHS IN CO-DEVELOPMENT
239
Scenario 1 (Co-development) Drug development
Diagnostics development
Co-development
Scenario 2 (Drug Safety/Rescue) Development of diagnostics
Drug Scenario 3 (Drug Efficacy) Drug
Approved diagnostics
New drug
New diagnostics
Figure 2
TABLE 1
Drug-diagnostics collaboration scenarios.
Current Pharmacogenomics Examples
Drug
Efficacy Test a
Trastuzumab (Herceptin)
Imatinib mesylate (Gleevec)a Cetuximab, Panitumumaba Imatinib mesylate (Gleevec)a Gefitinib Irinotecana 6MP and azathioprinea 5-HT3R antagonist, codeine derivative Warfarina
HER-2/neu IHC and FISHb C-kit IHCb EGFR IHCb bcr/abl or 9:22 tr EGFR mutations — — — —
Safety Test
UGT1A1 mutb TPMT CYP450(2D6) mutb CYP2C9&VKORC1b
a
FDA drug package insert cites the relevant test. FDA cleared/approved device in reference to the drug.
b
examples. This document is intended primarily to provide direction for the development of diagnostics that are used to select or avoid drugs (predictive tests). A predictive test/biomarker would be defined as a single trait or a signature of traits that separates different populations with respect to the outcome of interest in response to a particular (targeted) treatment (e.g., response to Herceptin, Erbitux, Panitumumab). Predictive tests could be categorized into three classes: (1) a predictive test for efficacy, to identify patients most likely to respond beneficially to a
240
STRATEGIES FOR THE CO-DEVELOPMENT OF DRUGS AND DIAGNOSTICS
targeted treatment; (2) a predictive test for safety, to identify patients most likely to respond adversely to a targeted treatment; and (3) a predictive test for dosing, to identify a patient’s specific dosing to optimize the benefit of or minimize the risk of a targeted treatment. Following Sargent et al. [8], a predictive biomarker provides information about the effect of treatment and is useful for selecting patients for the drug. In this document, diagnostic tests are assumed to be predictive biomarkers. However, much of the discussion also applies to a prognostic test or biomarkers defined as follows: A prognostic biomarker (e.g., risk for cancer recurrence) is a single trait or a signature of traits that separates different populations with respect to the risk of an outcome of interest in the absence of treatment or despite nontargeted “standard” treatment. Co-Development and Phase I and II Drug Studies Ideally, when a drug and a diagnostic are to be used in combination, the development of the drug and diagnostic should be coordinated as much as possible. For the drug candidate, CDER requires phase I studies to demonstrate drug safety and phase II studies to confirm proof of concept, determine dosing and expand on drug safety. In cases where biomarkers are being studied for possible use with a drug, the diagnostic work is generally exploratory but, ideally, should include initial biomarker validation. Phase I and II drug studies do offer opportunities to gather preliminary data about associations between biomarkers and drug use. This information may include descriptive data on prevalence of the biomarker, association with adverse events or the safety profile being identified with early drug use, and/ or information on possible ability of the test to predict drug response. In addition, the information gathered helps improve benefit–risk analysis at an earlier stage of development. A number of interesting proposals have been made to streamline the path to market by expanding phase II studies to allow better selection and characterization of biomarkers for drug use. Jiang et al.’s [12] proposed design combines a test for overall treatment effect in all randomly assigned patients with the establishment and validation of a cut point for a prespecified biomarker for identifying the sensitive subpopulation. The procedure provides prospective tests of the hypotheses that the new treatment is beneficial for the entire patient population or that it is beneficial for a subset of patients defined by the biomarker. The good news is that this allows an opportunity to introduce a drug without or with a biomarker as circumstances may dictate. However, the bad news is that each separate hypothesis cannot be performed without a p-value penalty, so a recommendation is made that testing for the drug affect in the all-comers population be with p = 0.04 and, if needed, follow-up testing in the biomarker positive population occurs with a p = 0.01. Proposals have also been made to find mechanisms of adaptive design applying Bayesian statistics to allow phase II and III studies to be performed in a more seamless manner.
PATHS IN CO-DEVELOPMENT Prototype Assay Used
Final Locked Device
PMA or 510 (k) Application
Pre-IDE Process, IDE Review
Basic Research
Prototype Design or Discovery
241
Clinical Development* Preclinical Development Phase 1 Phase 2 Phase 3
Clinical Utility for Stratification Marker Contact OCP to Inform of Co-Development Plan
FDA Filing/ Approval & Launch
Drug/Diagnostic Labeling Consult
Clinical Validation for Stratification Marker
Co-Approval
*Adapted from Drug-Diagnostic Co-Development Concept Paper (draft).
Figure 3
Co-development strategy timing.
Co-Development and Phase III Drug Studies If a diagnostic is actually intended for use in patient selection, the safety and efficacy of the drug may in fact depend on the ability of the test to identify patients properly. Clinical trials for drug approval are generally multicenter. Therefore, during co-development the test needs to be standardized and a consensus methodology for test validation among the centers should be established. Ideally, analytical validation should be performed prior to phase III trials, so that clinical validation can be established using an analytical robust test. When the drug approval is contingent on use of the diagnostic, the diagnostic should optimally be studied either at the time of the original phase III study or in a follow-up phase III–like study before drug approval can be obtained. These studies would most commonly be based on an appropriately designed and powered prospective blinded and randomized clinical trial usually associated with drug approvals and validated in follow-up trials. Codevelopment strategy timing is illustrated in Figure 3. Co-Development After Drug Approval In those cases in which the drug approval is not contingent upon the use of the diagnostic or when changes in drug labeling for improving safety or effectiveness profiles are being requested, studies may be performed during phase IV drug studies or at any interval following drug approval (e.g., postmarket
242
STRATEGIES FOR THE CO-DEVELOPMENT OF DRUGS AND DIAGNOSTICS
commitments). In some cases, prospective studies may be needed; in others, use of banked samples from prospective studies, retrospective studies, and/or cross-sectional studies may be sufficient to support diagnostic claims. If it is not possible to conduct coordinated studies of the drug and diagnostic, the phase III clinical trial may represent the only opportunity available to obtain and bank prospectively collected samples for future diagnostic test study. Particular attention should be paid to ensuring that samples are collected properly with appropriate informed consent to support future bridging studies (e.g., platform change), diagnostic studies, and evaluation of the analyte of interest. The samples collected should be banked in storage conditions that maintain sample integrity. Concerns about any bias, missing samples, or selection in accrual of banked samples should be addressed appropriately.
INTERCENTER REVIEW CONSIDERATIONS IN CO-DEVELOPMENT Overcoming potential regulatory barriers to speed up introduction of innovative technologies and products, while ensuring their safety and effectiveness, is a major challenge. Co-development, to be most efficient, requires coordination and harmonization between FDA reviewing centers. The success of this coordination in part depends on the ability of all stakeholders to foster an increase in the dialogue between the FDA and industry and the ability of the FDA to engage sponsors in collaborative discussion early in the process as science develops. Co-developed products that would be used together may or may not be combination products as defined in 21 CFR 3.2(e).1 FDA anticipates that many therapeutic drug and diagnostic test products will be marketed separately. For the purposes of this document, co-development refers to products that raise development issues that affect both the drug therapy and the diagnostic test, regardless of their regulatory status as a combination product or as a noncombination product. For example, when co-developed products are considered together, unique questions may arise that would not exist for either product alone. Scientific or technological issues for one product alone may be minimal, but they may have substantial implications for the other product. Also, postapproval changes in one may affect the safety and effectiveness of the other. The FDA has recently established the Office of Combination Products (OCP)2 to work with sponsors and FDA work groups to ensure that proper regulatory tools are applied to products used in combination and to assist in addressing problems that joint reviews may encounter. If the diagnostic becomes integral to approval of the drug, diagnostic approval is needed in parallel with the drug. FDA review centers (CDER and CDRH) are willing to work with single sponsors or collaborating sponsors
CONSULTATION PROGRAMS
243
(drug–diagnostics companies) to coordinate review processes and have in the past had conjoint teams, tandem panel meetings, and same-day approval of linked products. If the diagnostic has broad use not specifically linked to the drug or if the drug has use not specifically linked to the diagnostic, approvals would follow usual CDER or CDRH procedures and timelines. In some cases, development of a diagnostic may follow development of a drug and be used to refine or improve the safety or effectiveness profile of the drug. CDER and CDRH will work collaboratively with OCP to determine best practices for addressing these situations in a timely manner.
CONSULTATION PROGRAMS The parallel development of a drug and a diagnostic is a relatively new aspect of drug development and calls for careful coordination. FDA has three mechanisms for an early interaction with companies about the use of drug–diagnostic combinations. CDER has developed a new approach for data review referred to as voluntary genomic data submissions (VGDSs). These submissions can be used throughout this development process to present and discuss data with the agency that are not used for regulatory decision making but could have an effect on the overall development strategy. Such data submitted voluntarily will not be used for regulatory decision making by the FDA and is not included in the evaluation of an investigational new drug (IND), investigational device exemption (IDE), or market application. Both CDER and CDRH representatives participate in VGDSs and will work with the sponsor to identify when pre-IDE interactions for development of the test in conjunction with the drug are advisable. CDER has also developed a pre-IND process for early interactions with a sponsor to improve a development program and expedite entry into clinical trials. Pre-IND advice may be requested for issues related to data needed to support the rationale for testing a drug in humans; the design of nonclinical pharmacology, toxicology, and drug activity studies, including design and potential uses of any proposed treatment studies in animal models; data requirements for an investigational new drug (IND) application; initial drug development plans; and regulatory requirements for demonstrating safety and efficacy. Pre-IND interactions should be considered as preliminary communications based on early development information and will generally take the form of written comments that may be supplemented by teleconferences or meetings as needed and appropriate. The pre-IDE process is the term CDRH uses for a voluntary system available to sponsors for protocol review and has already been described in this chapter. Additional regulatory tools that CDRH uses to support introduction of new molecular diagnostics and co-development products are expedite review submissions (the front of the queue), the de novo process for regulating new unclassified devices (first of a kind) and real-time reviews by reducing
244
STRATEGIES FOR THE CO-DEVELOPMENT OF DRUGS AND DIAGNOSTICS
regulatory working time and improving the quality of results (see the Section “Regulatory Tools and Solutions for Pharmacogenomics Markers). By employing these types of review processes, the intent of CDER and CDRH is to provide useful information to sponsors as they work on codeveloped products. The FDA is willing to coordinate work between these reviews as appropriate. The FDA recommends that sponsors of new drug– diagnostic products interact early with the FDA to determine best practice information and current scientific and regulatory thinking.
SUMMARY One of multiple important parts of the critical path initiative at the FDA has been a growing interest and attention to the issue of biomarkers, particularly with regard to the role they play as tools to aid in development of drugs and other new therapies. The FDA appreciates that the coordination of reviews across these two product lines may be challenging and has both administrative and regulatory tools in place that can contribute to review successes for these products. Scientific models for optimal evaluation are currently a matter of great interest and discussion at the FDA, and scientists in all three centers regulating human products (CDER, CBER, and CDRH) have had multiple interactions with stakeholders and sponsors of new technologies. It is our belief that the FDA is well positioned to help foster the use of this new technology. We believe that the challenge at hand is taming the science itself and that the FDA can in a nimble manner review new products, assure they are safe and effective, and bring them to market in a manner consistent with the precepts of the critical path program.
NOTES 1. Under 21 CFR 3.2(e), a combination product is defined to include: (1) A product comprised of two or more regulated components, i.e., drug/device, biologic/device, drug/biologic, or drug/device/biologic, that are physically, chemically, or otherwise combined or mixed and produced as a single entity; (2) Two or more separate products packaged together in a single package or as a unit and comprised of drug and device products, device and biological products, or biological and drug products; (3) A drug, device, or biological product packaged separately that according to its investigational plan or proposed labeling is intended for use only with an approved individually specified drug, device, or biological product where both are required to achieve the intended use, indication, or effect and where upon approval of the proposed product the labeling of the approved product would need to be changed, e.g., to reflect a change in intended use, dosage form, strength, route of administration, or significant change in dose; or
REFERENCES
245
(4) Any investigational drug, device, or biological product packaged separately that according to its proposed labeling is for use only with another individually specified investigational drug, device, or biological product where both are required to achieve the intended use, indication, or effect. 2. Contact information and information on combination products and intercenter review processes may be found on the OCP Web site: http://www.fda.gov/oc/ combination.
REFERENCES 1. FDA (2004). Innovation or stagnation: challenge and opportunity on the critical path to new medical products. http://www.fda.gov/oc/initiatives/criticalpath/whitepaper.html. 2. Venter JC, Adams MD, Myers EW, et al. (2001). The sequence of the human genome. Science, 291(5507):1304–1351. 3. Lander ES, Linton LM, Birren B, et al. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822):860–921. 4. http://www4.od.nih.gov/oba/sacgt/reports/oversight_report.pdf. 5. http://www4.od.nih.gov/oba/sacghs/public_comments.htm. 6. http://www.fda.gov/cdrh/oivd. 7. Garrison LP, Austin F (2007). The economics of personalized medicine: a model of incentives for value creation and capture. Drug Inf J, 41. 8. Sargent DJ, Conley BA, Allegra C, Collette L (2005). Clinical trial designs for predictive marker validation in cancer treatment trials. J Clin Oncol, 23(9):2020–2027. 9. http://www.fda.gov/oc/initiatives/criticalpath/reports/opp_report.pdf. 10. http://www.fda.gov/cdrh/oivd/regulatory-standards.html. 11. http://www.fda.gov/cdrh/ode/guidance/337.pdf. 12. Jiang W, Freidlin B, Simon R (2007). Biomarker-adaptive threshhold design: a procedure for evaluating treatment with possible biomarker-defined subset effect. J Natl Cancer Inst, 99(13):1036–1043.
13 IMPORTANCE OF STATISTICS IN THE QUALIFICATION AND APPLICATION OF BIOMARKERS Mary Zacour, Ph.D. BioZac Consulting, Montreal, Quebec, Canada
INTRODUCTION Learning how to apply statistical analyses … may take a lifetime of trial (and sometimes error), as it has done in the author’s case. There is no evidence that biomedical investigators of the present generation are on a steeper learning curve. [1]
Sitting in the Drill Hall at the University of Toronto with row upon row of shifting, sighing, and coughing co-sufferers, cold fingers clutching one of my two well-sharpened HB pencils while I carefully write out each step of my standard deviation calculation. Statistics 101 Christmas exam. One finger rasping down rows and across columns of tables, p-values, degrees of freedom, these are the memories I share with countless scientists who have studied statistics in preparation for their research careers. Although I left that course and others with respectable marks and a seemingly clear understanding of the null hypothesis and the normal curve, I found myself some years later, scratching my head as I considered a sheet whose bland columns of numbers belied the painstaking hours spent injecting, isolating, pipetting, incubating, and centrifuging, from which they had been distilled. Looked at one way, it seemed
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
247
248
IMPORTANCE OF STATISTICS
to suggest one thing, but was that the correct approach? Looked at another way, the picture was different. The conclusion to be drawn would influence the direction that the entire project would take. How to apply the right statistical test correctly to a particular research question, given the many choices, assumptions, and caveats, is not always obvious; indeed “gross misunderstandings of the purpose and functions of statistical analysis are apparent in applications to research grant-giving bodies and ethics committees, in manuscripts submitted to journals and sometimes in published papers” [1]. In fact, there seems to be a growing field of forensic biostatistics, pointing out data analysis errors that have crept into respected publications [2–4]. This is not surprising, considering the advent of genomics, proteomics, and other-omic-style research, in which the ability to produce huge quantities of complex data are outstripping the common biologist’s understanding of the mathematical issues concerning its interpretation. Scientists focused on biomarker research in the drug development industry face an additional set of challenges, where the pressures of the business world to use less time and fewer resources, to focus only on the immediate goal, and to quickly release positive findings to investors or other onlookers are often at odds with the expert, thorough, and careful consideration that the more challenging research problems may require. Understandably, biomedical scientists who are often highly expert in their scientific discipline are not necessarily so in the mathematical issues underlying analysis of their data, and many are also without the luxury of a biostatistician with whom to consult at times of indecision. Although some lucky others may feel that there is no need to master the field when they can simply rely on in-house expertise, it is important to recognize that a working knowledge of the factors affecting data analysis facilitates better communication between the statistician and researcher, which, in turn, generally leads to better research design. Furthermore, the question of how to interpret data wisely is not limited to applying the correct statistical test in the correct manner, but also includes understanding the limits and pitfalls of the technical procedures used to generate the data. In this regard, biostatisticians, too, face challenges, such as being unfamiliar with or intimidated by the rapidly evolving and complex technical methods used in the lab. As so well voiced by Ransohoff [5], “how well do biologists … appreciate the nature or seriousness of problems that can be introduced by methods of handling data … ? How well do epidemiologists or biostatisticians understand technical details of specimen collection, handling, and analysis in a way they can use to anticipate and manage specific sources of bias?” In addition to an understanding of both technical and mathematical issues, there is a third element important to teasing out the elusive meaning of the data, which I like to think of as the HP factor (Hercule Poirot, that is). Born of intellect and experience, modeled by a higher-order view, a wider vision of extraneous factors that may play roles in the outcome, infused with a measure of skepticism and questioning of assumptions, this factor may more commonly be known as judgment.
FUNDAMENTALS
249
Now, I sincerely hope that I am not marked, Rushdie-like, by the statisticians guild for this comment, since this softer science is undoubtedly in the realm of the subjective, and as such, in danger of being classified in the same bin as bias, data selection, and the bubonic plague. To my mind, however, judgment is critical to the success of most endeavors, including data evaluation and statistical analysis. Even though the rules of application of such analyses may be straightforward (in our day, in fact, completed for us by a simple twitch of an index finger on a mouse), understanding potential errors in our data and/ or determining which of many analytical possibilities best fit the situation presented by our own experimental paradigm and data set are not, and this is where judgment often comes in. There is certainly a large mass of statistical reference material already published and available to answer questions that most biologists might have, however, most texts and articles focused on statistical analysis issues are written by, well … statisticians. This is a good thing, of course, but sometimes brings with it the negative corollary of a specialized dialect and viewpoint. Joe or Josephine Biologist may find these, at best, difficult to adapt to their own needs, or, at worst, frankly unintelligible. This chapter is addressed to fellow Joes and Josephines. It is written in plain language by a nonstatistician who has often grappled at the interface where experimental results are transduced through an intermediate state as digits, before manifesting themselves, newly metamorphosed, into conclusions that flutter their wings right into our collective scientific knowledge database. Whereas some basic statistics explanations may embellish its text, this is by no means intended to serve as a statistics manual, nor to explore the many intricacies of the field; in fact, statisticians may well be appalled by its gaps and oversimplifications. The publisher, on the other hand, is likely to be pleased at the paucity of equations, since it is common knowledge in the industry that book sales drop with each equation in a text. This chapter addresses data analysis issues pertinent to biomarker research in drug development. More specifically, it is intended to clarify some common misconceptions or confusion concerning which situations are appropriate for alternative choices of statistical analysis and to integrate statistical issues with nonstatistical issues that also affect our data interpretation.
FUNDAMENTALS Is That Significance Significant? (And Conversely, the Significant Insignificance) Statistics texts all deal with the issue of type I and type II errors, or, respectively, the false finding of a significant difference where none exists and the false finding of no significant difference where one indeed exists. They seem invariably to omit mention, however, of how a statistically significant
250
IMPORTANCE OF STATISTICS
difference that truly exists and is truly shown can yet remain truly irrelevant. This particular possibility gets no press time in statistics manuals because it remains in the realm of that softer science mentioned above, judgment. As trained scientists sensitized to the dangers of making inferences based on anecdotal or potentially chance findings, we seem as a group to have acquired the tendency to overexalt the statistically significant finding when it makes its appearance. A p-value of 0.05 or less has become our holy grail, the golden chalice that shines from the paragraphs of our grant and investigational new drug (IND) applications, presaging, we hope, the sweet wine to spill forth from them. This, of course, is not unnatural when one considers how often we are faced with budgetary or time constraints on the number of subjects that can be tested and therefore on the ease of statistically confirming such differences when they truly exist. The converse situation of which we need to be mindful, however, is that with enough samples being evaluated, even tiny differences can be statistically significant, and furthermore, even differences that are not tiny might still not be relevant. What constitutes a “significant” increment of an analyte, in the physiological or clinical sense, is likely to vary with the situation; it is not necessarily on the same scale as a statistically significant increment. An additional point to consider, addressed below in the section on measurement error, is whether the magnitude of a statistically significant difference surpasses the magnitude of inaccuracy of measurement. A qualitative judgment on the significance of significance, rather than simply adjusting our blinders to keep out the sun while logging p-values, is certainly worth a moment of silence on our parts. Furthermore, the converse case is also relevant; for example, what if one’s inner Hercule is suspicious that a clinical significance may be lurking behind the harmless façade of p > 0.05?. How could we reconcile the lack of statistical significance in that case? One possibility is that too few samples were evaluated to achieve the statistical power to reveal significance (discussed further later in the chapter). Another is that part of the group of samples tested reacts differently than (an) other part(s), resulting in a total effect across the disparate subgroups that does not reach statistical significance, even though it reflects a clinical situation of great significance. The latter concept is central to the development of personalized medicine, an emerging field that seeks to improve disease treatment by targeting molecularly distinct subpopulations within a given group, such that response rates to treatments are increased and the administration of ineffective and potentially toxic treatments to inappropriate individuals is diminished. As an example, selection of breast cancer patients according to ErbB2 tyrosine kinase status has had a significant impact on management of the disease; moderate, but significant, response rates to ErbB2-targeted antibody therapeutics in some studies might have been overlooked had study groups not first been selected for its overexpression [6]. Some basic concepts concerning statistical methods of relevance to pharmacogenomics are discussed later, in addition to some of the common errors in their application.
FUNDAMENTALS
251
Air the Error Because statistical descriptions and inferences are based on probability, a certain amount of chance (random) error is always inherent in any statistically based conclusion. This error is routinely quantified by upper and lower confidence limits and mentioned in results sections. Unfortunately, such routine error expression seems to induce a false sense of security in readers that all the error inherent in the results has been taken into account. In fact, there are numerous sources of nonrandom error, generally covered by the blanket term bias (i.e., when subjects, specimens, or data in the groups being compared are inherently different or are handled differently in a way that systematically introduces a signal into data for one of the groups compared). Differences such as types of tubes used or length of time until spinning or time samples are maintained frozen prior to analysis would fall into this category. Bias has been deemed such a serious problem in nonexperimental research that some experts consider such studies guilty of bias and erroneous results until proven innocent [7]. The term nonexperimental research refers to research where the effects of some perturbation to the system are not being tested, but rather, information is simply gathered (i.e., in epidemiological studies) or quantified (i.e., in laboratory analyses, such as most proteomic analysis). Bias is also a serious and often overlooked problem in experimental research [i.e., where effects of deliberate maniuplation of (a) variable(s) are tested]. A recent perspective article on how potential flaws in preclinical research may play a major role in late-phase drug failures discusses this in detail, particularly as it relates to animal models [8]. Measurement imprecision, included by some under the term bias, but by others considered separately, constitutes an additional significant source of error. In the case of continuous data quantifications, although measurement error almost always occurs, it is almost never quantified, or at least such quantification is in no way accounted for in the result reported. In the case of noncontinuous data, such as in diagnostic tests that may generate a yes/no dichotomy, measurement error can have a different but equally profound effect, potentially contributing to false positives or negatives. Continuous Data Quantifications Every assay result is only the best approximation to the true value that a particular assay is able to produce. This is readily acknowledged by all, and, in fact, it is standard practice in drug development to quantify the inaccuracy of assays using known amounts of a reference standard. How the quantification should best be accomplished is beyond the scope of this chapter, but the total error approach wisely incorporates both the components of random error and bias/measurement error in a single measure and is discussed further elsewhere [9–11]. Despite the depth of consideration given to measurement error, there appears to be an element of lip service to its acknowledgment, since error quantifications are generally used only to determine acceptance or rejection
252
IMPORTANCE OF STATISTICS
of an assay’s results, rather than to modify or qualify in any way the numerical result produced by the assay. Measurement error is thus consistently brushed under the carpet, with results often reported in ways that dramatically overstate their precision. As an end result, differences from one measure to another in some cases become infused with spurious meaning, when they actually represent nothing more than noise in the signals. One can only speculate on how many drugs that failed during late-phase trials might never have gotten that far had the measurement error in their “promising” preclinical results been accounted for in the data analysis. Sometimes measurement error is compounded dramatically by introducing more than one erroneous measurement into a final calculation of the result, and again the norm is to ignore this inflated error rather than account for it in the results reported. Let us consider the example of a clinical trial examining the effect of a treatment on the activity of a disease biomarker in white blood cells. In this example, two assays are required: one to quantify the disease biomarker in a lysate prepared from the cells, and one to quantify the amount of protein that was culled in each cell preparation. The activity result from the first assay is then normalized against the protein result from the second assay, to give an activity per mass of protein as the final assay result. Looking more closely at the potential error in this example, we should note that standard acceptance criteria for bioassays generally include specification that a certain level of demonstrated measurement error will be considered acceptable, with data still included “as is” in the analysis. Assuming a typical acceptance criterion of up to ±20% inaccuracy, two measurements of a single identical sample of true value X could therefore quantify as differing from each other by 40% of X, and no one would raise an eyebrow. In fact, Example 1 shows how once you account for the compound error of the two assays in
Example 1. One assay for enzyme activity and one for protein content of sample are performed, with the final activity reported as per mass of protein. Assuming ±20% measurement inaccuracy in each of the two assays, we see that if patient A has a true activity of 10 activity units, acceptably accurate activity assays would reveal values of 8 to 12 units, and each milligram of protein would be quantified as 0.8 to 1.2 mg. Therefore, within each assay, an identical sample could have acceptable measurement error, resulting in a 150% increase of one value relative to the other (12 units/8 units, or 1.2 mg/0.8 mg). Normalizing enzyme activity units to milligrams of protein inflates this possible discrepancy between two identical sample results from 150% to a 220% apparent increase in one sample versus another; that is, the most extreme acceptable values from each assay, normalized against each other, would be 8 units/1.2 mg and 12 units/0.8 mg, for a final resulting range of 6.7 to 15 units/ mg as possible results from replicate aliquots of the same sample.
FUNDAMENTALS
253
this example, two assay results showing a 220% increase in one sample’s final result versus another are possible when the two actually are replicates, having an identical amount of the biomarker. This example demonstrates the compounding of only two measurement errors. Imagine the case when more measurements go into a calculation, each having its own error: for example, when using reporter gene induction in a cell-based system as a model of induction of an in vivo response. In this case there are four measurements that need to be done to control properly, each with its own measurement error range: the reporter gene expression in transfected cells challenged with the experimental treatment, its expression in transfected cells challenged with only the vehicle in which the treatment compound is dissolved, and those same two conditions in cells that were transfected with empty vectors (as a control for the effects of the vector backbone on the system). All too often researchers using models of this sort tend to “simplify” by expressing induction of reporters as simply a fold-increase over the vehicle effect, essentially pretending that the complex and compounded error does not exist. Noncontinuous Data Measurement error can also have an important effect on noncontinuous data, an effect which, as noted above, is discrete from random error and is rarely taken into account. In this case, misclassifications can occur, such as false negatives or positives. We can all imagine the personal impact of such misclassifications, such as being falsely diagnosed with a deadly disease, or conversely, not receiving appropriate treatment because of a falsenegative test result. Misclassification errors also decrease the power of studies to detect significant differences and have the potential to obscure or even falsify study results, leading to equally devastating poststudy impact on a much wider scale. For example, successful efforts to develop an effective vaccine for malaria, considered to be the deadliest pediatric disease, have been hampered by measurement errors leading to misclassifications. In this case, the fact that vaccine efficacy (VE) is a ratio of the number of infected, vaccinated persons over the number of infected, nonvaccinated persons gives rise to the misconception that false-positive or false-negative diagnosis in the numerator and denominator will cancel out or balance. Actually, since measurement error increases as the lower limit of detection is approached, a partial effect of vaccination causing lower-level signals at the time of measurement (such as a slowing of parasite growth, the biomarker for VE), can lead to systematic error in the measurement of the test group that is not matched in the control group, and an overestimation of VE [12]. The end effects of misclassification error of this sort can vary substantially, depending not only on the goal of the testing and the decisions that are dependent on it, but also on the overall frequency of the disease being studied. For instance, elevated false-positive rates for a test may have a huge impact in the case of rare diseases, and conversely, elevated false-negative rates may have
254
IMPORTANCE OF STATISTICS
a bigger impact with very common diseases. Example 2 shows that if a test for rare disease X (occurring in 1 in 1000 people) gives 99% accurate positive results in people who truly have the disease and 95% accurate negative results in people who truly do not have it, any positive test result it gives will be false in 98% of the cases. Example 2. A diagnostic test has the following known accuracy: It returns a positive result in 99% of patients who truly have the disease and a negative result in 95% of patients who truly do not have the disease. The disease is rare, occurring in the general population at a rate of only 1/1000 people. Although you have no known predisposition or reason to think you have the disease, you decide to get tested and are shocked to learn that the result has come back positive. What are your chances that the diagnosis is actually a false positive? False positive probability = 1 − true positive probability probability of true positives total of true and false positive probabilities 0.99 × 0.001 = (0.99 × 0.001) + (0.05 × 0.999 ) = 0.019
True positive probability =
False positive probability = 1 − 0.019 = 0.981 Your chances are about 98% that the diagnosis of positive is false. The test would have to be 99.9% accurate for negative results before it would have a zero-rated false-positive detection rate. Whereas, intuitively, one might think of 95% and 99.9% accuracy as both sounding pretty good, you can see that the seemingly small gap between them could have a surprisingly important impact. What to Do? Samuel Taylor Coleridge once said that literary reviewers are usually people who would have been poets, historians, and biographers if they could, but having tried their talents at one or the other, and failed, they turn to critics. As in other areas of life, with the issue of measurement error, too, it is much easier to be a critic than to do something well oneself. What should one actually do? This is a very good question indeed, and, feeling a little uncomfortable under Samuel’s gaze from the sentence above, I shall attempt to come up with some useful suggestions. First, the appropriate approach is likely to vary according to the situation, and the most crucial issue to consider when choosing an approach to dealing
FUNDAMENTALS
255
with measurement error is the potential risk of the erroneous measurement to the end user of the drug, device, or process under study. Experiments and/ or data interpretation should then be designed to err on the side of caution, if risk is an issue. Other key determinants may include practicalities such as resources and time available to approach the problem, and questions of clinical or physiological relevance of the potential error margin, thresholds, or measurement ranges of interest. If, for example, a 10- to 100-fold increase in biomarker expression would be required for clinical relevance, and the maximum estimated measurement error is in the range of less than threefold, perhaps there is no issue. The discussion above, concerning whether significant differences are clinically relevant also applies to the relevance of measurement errors. If measurement error is of a magnitude that could be clinically relevant, its quantification becomes more important. Perhaps because of the status quo of ignoring measurement error quantification in biomedical methods (for which, after all, error quantification is much more challenging than for analysis of less complex models), this is not a widespread practice in the drug development industry. One way of acknowledging this error while avoiding complex quantifications might be to use the definition of maximum allowable measurement error that is already made routinely for the purposes of acceptance criteria as a threshold of level of difference, below which results would either not be considered to reflect anything but error, or would be more carefully scrutinized in some predefined way. In other words, if you have determined that up to a doubling in endpoint values in a pretreatment versus posttreatment sample could be attributed solely to compounded measurement error rather than to treatment per se, one might decide not to consider effects to be a result of treatment unless they surpass this measurement error threshold. Of course, the fact that a method has been validated according to across-the-board standards such as having better than 15% or 20% total error does not mean that it actually produces results that are this far from the nominal values of the reference standards, but merely that it may do so at times, and that results of assays where these limits are exceeded will be rejected from the study. Error quantifications may well indicate much less error than the maximum allowed for assay acceptance, so given the circumstances it may be worth putting the time in to determine what the actual error is rather than simply assuming that the worst possible error has always occurred. Actual error might be revealed by data mining from previously amassed quality control data and/or on a perbatch basis using internal standards, for example. In these cases it is as simple as quantifying the proportional values obtained for accuracy rather than simply noting a pass or fail. In the case where biomarker analysis involves more complex sources of error than a single or a few analytical assays (such as for epidemiological studies or metaanalyses to qualify particular biomarkers for their intended use), more complex uncertainty quantifications would be required. Various such methods are available, including Bayesian [13] and Monte Carlo simula-
256
IMPORTANCE OF STATISTICS
tions [14]. Bayesian approaches, which stem from an alternative view of probability compared to the more common frequentist inferential statistics, are touched upon further later. Monte Carlo simulations represent the brute-force type of computation, basically executing a huge number of possible permutations using random draws from all the potential inputs, so as to spit out a final probability function (made up of all the output values from the various random inputs) that comprises the compound error from all the unknown inputs. In a simple example, consider the film Groundhog Day, where a man woke every morning to find that it was the same day as the one before. The film consisted of this man going through his interminable repeated days, making different choices when the fixed events of the day occurred, as they did every day; you can think of this man as being stuck inside a Monte Carlo simulation (well, a simple one, since his own behavior choices were the only independent variable with randomly chosen modifications; the range of outcomes were consequences of his varied choices from that one variable). In questions of compound error, of course, there would be many more variables, also all being sampled randomly over and over again. One advantage of choosing a Monte Carlo approach to resolving complex error sources should your experimental paradigm present them, is that user-friendly software is readily available, due to its common use for risk analysis in financial and engineering applications. Once measurement error has been quantified, by whatever means, error distributions could be used to express uncertainty limits around results, analogous to the random error-related confidence intervals common in statistical analyses. These would help to clarify how much of the apparent differences between sets of results might be an artifact of the measuring tool(s) and not necessarily an effect of the treatment that one wishes to measure. In many cases, such as the reporter gene model example above, literature search will reveal readily applicable statistical models for specific types of assay that incorporate compound sources of error into the calculation of statistical significance of differences [15]. Phillips and Lapole suggest a much simpler approach to partially quantify method error uncertainty without any complicated analysis [16]. Like Pablo Picasso, who once said it took him his entire life to learn how to paint like a child, these authors advocate a return to earlier ways: in this case, the gradeschool lesson of rounding figures. Rounding can adjust the measurement scale to reflect the limitations of the measurement error. In their words: “Rounding does not create imprecision—the imprecision exists even if we do not accurately report it.” Thus even though the measurement instrument may spit out values with several decimal places in them, samples with results in the range of 100 for a test demonstrating 2% inaccuracy (i.e., 98 to 102) might better be reported rounding to the nearest 5 rather than even to 1’s, let alone the decimal points reported by the instrumentation. Apart from attempting to quantify the error, contain it within a wider measurement scale, or set thresholds of interest that exceed it, as discussed
FUNDAMENTALS
257
above, we should also not forget to implement strategies to reduce it, when it is possible to do so. For instance, internal controls such as reference standards or paired samples can reduce the impact of the between-assay variance proportion of measurement error. For noncontinuous data, misclassification rather than numerical differences are the end effect of measurement error. In this case, a suggested approach to avoid the negative impact of measurement error might be to assume that error will happen, and design studies to have higher power from the beginning, in order to minimize the impact of error on the power of the study. Statistical tests for noncontinuous data are already inherently less powerful than those for continuous data, and a lower initial power level in the experimental design can further magnify the impact of measurement error. For example, Gordon and Finch illustrate how the same misclassification error rate causes much more power loss in a study that has lower power to start with than in a study that has higher power to start with. In their example, given exactly the same error rate, if the study is designed to have 99% power, it ends up with 95.5% after the error is taken into account, whereas if it is designed to have 80% power, it ends up with only 65.4% [17]. Therefore, specifying a higher power in the experimental design gives an advantage of robustness to the obscuring effects of misclassification error. In real terms, designing an experiment with higher power most often translates into recruiting more subjects; these authors advocate that “paying the cost up front” strategy. Guilty by Association (and Other Misuses of Correlation Analysis) “Statistics show that of those who contract the habit of eating, very few ever survive.” This William Wallace Irwin quote [18] may bring a small smile to your lips. Should we conclude that eating is a bad habit that causes death? Ridiculous, right? Especially for well-educated biomedical scientists. How about this one: “Greater adiposity is associated with lower everyday physical functioning, such as climbing stairs or other moderate activities, as well as lower feelings of well-being and greater burden of pain” [19]. This quote from a respected medical journal does not seem quite so ridiculous, does it? Whereas the first one is clearly a joke, we might be tempted to nod our heads at the second and experience discomfort in direct proportion to the increases we noted over the past few years in our own level of adiposity. If we take a closer look at both quotes we will notice that neither author actually claimed a cause-and-effect relationship. As we all know, however (especially those employed in the marketing department), just putting them together in a small room can’t help but to make everyone jump to conclusions. The strength of an association is indeed appropriately quantified by correlation analysis, but this analysis should not be misinterpreted to infer causation. Zou et al. hit the nail on the head when they note that rather than being misused to test hypotheses, correlation analysis should serve to generate them [20].
258
IMPORTANCE OF STATISTICS
The fact remains, however, that correlation is very commonly used to suggest that A causes B, or as in the second example above, being fat causes lower physical ability to function and lower feelings of well-being. Such guilt by association certainly is one possibility, but another possibility is that B causes A, or having lower ability to function and poor self-esteem induces weight gain. A third equally plausible hypothesis is that another cause or set of causes is (are) responsible for both the poor feelings/abilities and the weight gain. A fourth is that the observer has just observed and documented the effects of coincidence; if an insufficient sample number were examined, for instance, a fluke might be documented. Even with larger sample numbers, statistical conclusions are always subject to chance, with the type II error (or p-value) describing the percent chances that the “significant difference” is actually just due to chance. Some extra consideration needs to be given to this when choosing an appropriate biomarker for one’s particular purposes. Biomarkers are, after all, by definition, a marker for something else, or an associative entity. If a particular biomarker has been qualified properly, its association with the disease of interest will not be due to chance, but that still does not imply causation. Whether a marker is simply associated, is a cause or an effect of disease, or is a cause or effect of something else causing a disease needs to be considered carefully in the experimental design. The issue of what considerations are important to determine causation is discussed in a landmark paper by Hill [21]. The reader is further referred to an interesting perspective commentary on this subject by Phillips and Goodman [22]. A further frequent misuse of correlation is to test agreement between methods. As pointed out by Bland and Altman [23], even poorly agreeing methods can produce quite high correlations, and for methods to be in perfect agreement their results would plot against each other along the line of equality, whereas results plotting on any straight line would give perfect correlation. Furthermore, the test of significance may show that the two methods are related, but this is irrelevant to the question of agreement, and indeed, it would be amazing if two methods designed to measure the same quantity were not related. One can almost hear their teeth gnashing as they bemoan, “Why has a totally inappropriate method, the correlation coefficient, become almost universally used for this purpose?” It seems clear that this is a case of copycat crime … perhaps a case for the remarkable Hercule Poirot?
Checking the Dessert Menu First In much the same way that Hercule Poirot might consider the temptations of the dessert menu before settling on his choice of main courses, we all need to consider potential data analysis issues before settling on our experimental design. Some questions need to be asked long before the data are in hand. For example:
FUNDAMENTALS
259
1. Is the assay to measure the biomarker as good as it can be, or the best type of assay given the circumstances? Can you make it better? All assays have error, but the more you fine-tune the accuracy, precision, specificity/sensitivity, and robustness via appropriate optimization, the more your result will reflect for you the marker you wish to measure rather than the shortcomings of your technique. Assay optimization is beyond the scope of this chapter, but should definitely be completed before starting the study. Assay-specific parameters such as accuracy, precision, and specificity of the measurements all have a direct impact on the uncertainty of the statistical analysis outcomes. 2. Have you chosen the best biomarker, or should you rethink from the bottom up? To evaluate this, you must be well aware not only of the qualification that a biomarker is relevant, linked appropriately, and specific to the pathology of interest (important issues, beyond the scope of this chapter), but also of quantitative issues of what to expect in the sample and control groups you have chosen. If one wishes to use a biomarker to differentiate between normal healthy persons and those with a disease, for example, one of the many questions to be considered prior to deciding which indeed is the most informative biomarker for the job concerns the potential degrees of measurement between them. In many cases the biomarker of interest is present in normal healthy persons as well as in those with disease, but with different typical levels in the two populations. In this case it is very important to establish a priori the descriptive characteristics of the two groups in order to define what a clinically relevant treatment effect might be and how one might judge that such an effect occurred. Published literature, unpublished previous data, and pilot studies all provide potential resource material for defining these descriptive characteristics and should be evaluated carefully in advance. Since both populations will present with a range of values, relevant descriptive characteristics include not only mean values, but also the variances, ranges, and the difference between the population distributions (or extent of overlap, if that is the case). One should consider that the biomarker that is more consistent in its levels (i.e., less variance) may be a better choice than one that shows bigger mean differences between populations (but higher variability from person to person within a group). In the latter case, the variability can translate into overlap of two populations, which can be quite a complication when one wishes to discriminate between them. Another point for reflection in advance concerns integrating the quantitative ability of the measurement technique you intend to use with the quantitative characteristics of the populations you hope to discriminate between, or the effect size that you have determined you will need to achieve clinical relevance. For example, the smaller the differences are between expression levels of a biomarker that you wish to use to
260
IMPORTANCE OF STATISTICS
discriminate between two populations, the more sensitive the quantitation method you choose needs to be. Consider the example of wishing to measure the treatment effect of drug X on a disease by assessing whether a biomarker is increased from disease-typical levels to the level seen in healthy persons. Would one be able to accomplish this with statistical confidence if the two populations showed close levels or overlap between their two distributions? If the difference between the populations is not big but the measurement error margin is, one may be doomed to an incapacity of discerning what one hopes to, regardless of the potential effects of the treatment. In this case it is wiser to rethink the fundamental premises of your study design and methodology before starting. Although this may sound like an obvious point, when one considers the common practice of ignoring measurement error and ignoring the compound error introduced by common study practices such as normalizing one assay result to another, it should perhaps give pause for thought. 3. Is your design sufficiently powered to detect what you wish to detect? Statistical power is the ability of a study to enable detection of a statistically significant difference when there truly is one; power in the statistical sense is analogous to the sensitivity of a diagnostic test [24]. Eng says: “One could argue that it is as wasteful and inappropriate to conduct a study with inadequate power as it is to obtain a diagnostic test of insufficient sensitivity to rule out a disease” [25]. In clinical trials, power often receives attention, perhaps thanks to ethics committees that have pointed out that it is frankly unethical to expose people to the risks and discomforts of research if the study has no potential for scientific gain, such as if a study was not designed to include enough people to adequately test the research hypothesis [26]. In laboratory experiments, on the other hand, statistical power is rarely estimated [1] and seems in general overlooked and poorly understood. To ensure sufficient power, one needs to think ahead and choose an appropriate sample size. Factors playing a role in how one would define appropriate include: a. The smallest meaningful difference between the two means being compared that the investigator would like the study to detect (this is usually based on subjective criteria such as judgment of clinical importance; additionally, sensitivity and error margins of the measurement method may influence its choice). b. The estimated standard deviation within each comparison group (this can be based on a pilot study, prior literature that has been carried out with the same outcome measure, or subjective criteria). c. The desired power level (this is usually 80% or higher; however, although high power is desirable, it is always in a trade-off with a fixed amount of time and resources for the study).
FUNDAMENTALS
261
d. The significance criterion (usually set at 0.05). e. Whether one- or two-tailed analysis is planned. There are different formulas for computing power or sample size, depending on the experimental design and whether data are continuous or categorical; Example 3 shows an illustrative calculation using a formula suited to a standard design involving comparing means of two groups.
Example 3. You have qualified your biomarker X as an efficacy marker in your disease model. In your studies qualifying X as a biomarker and developing a method to assay it, you found that its expression was low in rats with your disease of interest (10 ± 3 units; mean ± SD) but higher in healthy rats (100 ± 30 units). You also noted that interventions that increased X were associated with improvements in disease morbidity and that previous groups of animals studied tended to show biological variability of about 30% coefficient of variation between rats in a treatment group. You wish to design an appropriately powered study comparing untreated diseased rats to those treated with a single dose of your new treatment Y, to determine if Y demonstrates efficacy, and have decided that a mean 5-unit change in X is the lowest meaningful change that you wish to be able to detect. You will settle for standard levels of a significance level of 0.05 and a power of 80%. How many rats do you need in each group? n of each group =
(SD12 + SD22 ) (zcrit + zpwr )2 D2
where SD1 is the standard deviation of group 1 and SD2 that of group 2 to be compared to each other (for a minimum difference from 10 to 15 units, this would be 3 and 4.5, respectively); the z-values are standard normal deviates corresponding to selected significance criteria and statistical powers (zcrit for two-tailed p < 0.05 is 1.960, zpwr for 0.80 power is 0.842; these and other values are found in standard statistical tables and software packages); and D is the minimum difference that one wishes to detect (5 units). n=
( 32 + 4.52 ) × (1.960 + 0.842)2
52 29.25 × 7.851 = 25 = 9.2
Round up to 10 rats in each group.
262
IMPORTANCE OF STATISTICS
Numerous Web page calculators and statistical programs are available to perform this type of calculation for you; however, it is difficult to use them properly without a working knowledge of the factors affecting power. This subject is well explained by Eng [25,26]. If your study is already complete and you found a significant difference, you do not have to be concerned as to whether the study was sufficiently powered. This is akin to wondering if you put enough dynamite under the bridge after you have blown it to smithereens. If, however, you found no significant difference and you are feeling uncomfortable about the fact that you did not consider calculating the power before starting the study, you may be tempted to do so after the fact, by plugging the values actually observed in your study for standard deviation and the difference between the populations into the power formula. This practice, dubbed retrospective power analysis, is problematic, because it does not tell you what the power to find the smallest meaningful difference was, only what the power was to find statistically significant the difference that actually existed between the groups in your analysis; furthermore, this “observed” power will be inversely related to the pvalue observed, so it does not add meaningfully to the information already obtained with the p value [26]. Instead, approaches involving confidence intervals or chi-square tests are recommended to guide the interpretation of negative results [27]. Some practical considerations about power and sample size are discussed further on page 264. 4. Have you planned in advance which groups you are going to compare to which, for significant differences? Planned comparisons allow the researcher a greater rejection level α or p-value than do unplanned comparisons. This is because if you first look at your data, then choose your comparisons, you are introducing bias into your analysis, and the rejection level will have to be tightened to compensate for that [28]. Planning ahead also allows you clear-headed time to consider which comparisons you really need when your design includes many different groups. The more comparisons you make, the more the p-value will have to be adjusted against false positives, in effect making it harder for you to see a true positive. Therefore, it is important not to compare groups that are irrelevant to your experimental goals. This is discussed further on page 272 and in Example 6.
Taking Care of Business As touched upon previously, the challenges faced by industry-based scientists often include business-related considerations that are not considered in most statistics manuals. In particular, time and resource constraints are often significant, as well as pressures to proceed at a pace that is intended to satisfy management, investor, or profit-related concerns, but is often incompatible with sufficient depth of study or optimal experimental design. In short, between
FUNDAMENTALS
263
the business and science aspects of the industry, a strong pressure front constantly flows from the business side to the science side. At times when shortsightedness and lack of understanding of the impact of financial decisions on scientific practice threaten to nullify the potential uses of our scientific output, we should certainly consider exerting a pressure front in the opposite direction, from science to business. Since we cannot realistically hope to change the pressures of the business world, however, we need to at least consider modifying our experimental design and/or data interpretation approaches to take their effects into account. For example, let us consider the case of a study design for analysis of a biomarker as a pharmacodynamic (PD) endpoint in a clinical trial,where each patient provides a pretreatment control sample to be compared with a time series of their posttreatment samples. Given an endpoint assay with a high between-experiment variance component, the wisest experimental design might be to study all of a given patient’s posttreatment samples within the same assay, together with the matched pretreatment sample. Instead, scientists are often pressured to run assays and provide PD results as soon as the first or first few of several posttreatment time point samples become available in clinical trials. This business-driven decision is often made despite freezer stability of samples, simply because drug development companies are anxious to get the news about their product’s effects in clinical trial announced as quickly as possible; good news has an economic payoff, and similarly, the earlier that bad news is known, the better the chances for diverting precious resources before they are wasted. In the laboratory, however, the corollary of this pressure may be that precious volumes or tissue samples from a pretreatment control are used up in the first analysis, and later analyses on samples not yet collected will have to be calculated in assays that lack this internal per-patient control. As well as introducing bias between the time points that were assayed within a single run and those that incorporate between-assay variability, the added component of variability between assays may add unnecessary noise, obscuring potential treatment effects. Certainly, as drug development scientists we cannot ignore concerns such as continued supply of cash by investors; indeed, such issues are supremely important to whether a scientific endeavor can even be continued. The point is that the effects of making less than ideal experimental design choices, too, should not be ignored or glossed over (which is, in fact, what tends to happen). Often, no consideration is given in the statistical analysis model to sample processing issues such as that described above, with raw values instead being logged into tables as if they had all been generated in a single assay of balanced design, regardless of the circumstances under which the data were generated. Obviously, there is no one correct approach to remedy such a situation, and the researcher must decide what best fits the situation, hopefully with the help of a statistician who understands the sample processing issues. For instance, in some cases the best solution might be to convince decision makers to wait for a controlled comparison of pre- and posttreatment samples, in others it might
264
IMPORTANCE OF STATISTICS
be including an internal standard for normalization and changing to a ratiodata design, yet others might be amenable to another analysis strategy. The point is that analysis and/or interpretation of the data should be rethought in the light of particular business decisions that have affected their generation. In another example of rethinking assay design strategies to respond to the challenges of the business world, one might consider various strategies to increase statistical power in the face of time and resource constraints that typically limit the number of patients one can study. Browner et al. [29] and Eng [25] list a number of strategies for minimizing the sample size while maintaining power. 1. Use continuous measurements instead of categories; this is because, for a given sample size, statistical tests for continuous values are mathematically more powerful than those used for proportions. 2. Seek to decrease the variability of the measurement process (i.e., improve the precision of your processes, optimize your assays; the less standard deviation, the more power). 3. If possible, use paired measurements, matching each patient to his or her own control. Paired measurements are more powerful than unpaired measurements. 4. Add control subjects, even if case subjects are too difficult to obtain. Although it takes more complex equations to calculate power in unbalanced designs than if both groups have the same size, this will still add power. 5. Expand the minimum expected difference-perhaps that which you have specified is unnecessarily small and a larger expected difference could be justified, especially if the study is preliminary, or intended as a screening test. The smaller the expected difference you want to pick up, the greater the number of samples needed to give an equivalent power. 6. Reflect on whether your hypothesis could have merit as a one-tailed, rather than a two-tailed one (although in most cases this is not an option). You can obtain equivalent power using less samples for a onetailed test. The sample size necessary for a one-tailed design with a 0.05 significance criterion is the same as that for a two-tailed design with 0.10 significance, if all other factors are equal. Many researchers think of power calculations as a futile exercise, likely to specify an unrealistically high sample size, given the resources on hand for the study. In fact, proper thought to power and the factors that contribute to it could, in many instances, lead to cost and resource savings. Rather than applying an across-the-board animal toxicity experimental design including, for example, 10 animals per group, Example 4 shows a case where five animals per group would be sufficient; as this example is built on a data set similar to the one used in Example 3, which showed 10 animals per group as being
FUNDAMENTALS
265
appropriate for the experimental paradigm, it also illustrates that the number of samples that need to be in a test group for appropriate power depends strongly on issues specific to each particular research question. Thus, industry-based scientists grappling with business cases, profitand-loss issues, how best to spread their research budget over the necessary items, and other financial-based decisions with an impact on scientific quality would do well to manage their resources appropriately by incorporating power calculations into the design of their laboratory research. Example 4. Imagine an example similar to that described in Example 3, except that the biomarker is now a toxicity marker, which you have qualified and shown low under normal conditions in rats (10 ± 3 units; mean ± SD) but markedly increased in response to toxic reactions (10- to 100-fold). You wish to test a compound for toxicity using the same experimental design as in Example 3, but with a few different assumptions: 1. Because of being cautious about the risks of toxicity, you want to ensure higher than 80% power this time: You want to ensure 95% power, so you have more certainty that any finding of no significant difference in the toxicity marker would not be attributable to too few animals being studied. 2. Because you are not concerned if the toxicity marker decreases in association with the treatment, but only if it increases, you adopt a one-tailed design. 3. Because toxicity so markedly increases the biomarker, you decide that the smallest meaningful increase you wish to detect can be greater than in Example 3. You decide that 20 units of biomarker would be greater than 3SD over the normal condition and is thus an acceptable threshold, so you set D at 20 − 10 = 10. How many rats do you need in each group? Using the same formula as in Example 3, but substituting a zpwr value of 1.645 (the value for 95% power), a zcrit value of 1.645 (the one-tailed value for a p < 0.05 significance level), a D value of 10, and SD1 and SD2 values of 3 and 6, respectively: n=
( 32 + 62 )(1.645 + 1.645)2
10 2 45 × 10.824 = 100 = 4.87
Round up to 5 rats in each group.
266
IMPORTANCE OF STATISTICS
The Odd Risk Biomarkers are often used in studies that involve nominal data, such as a positive/negative dichotomy for a test result. There are various ways of quantifying the strength of the relationship between groups when it comes to this type of data, which is sometimes also referred to as proportional data, in reference to the fact that one measurement is often expressed as relative (proportional) to another (i.e., a percentage of patients responding, or the accuracy or sensitivity of a diagnostic test). Proportional data is often presented in a 2 × 2 contingency table. The chi-squared statistic is appropriate to test the null hypothesis of independence between row and column variables. Data are often described, however, in terms of difference of proportions, relative risk, or odds ratio. These are not statistical terms, but rather, quantification terms. Often, these are misunderstood, sometime leading to serious confusion and misinterpretation of published research. When considering the use of diagnostic tests classifying disease states, based on biomarkers, the terminology of odds and risks is actually less useful than true and false positive and negative rates, since it does not capture the aspect of error in disease classification. It is nonetheless important to be able to interpret research correctly where results are described in odds/risks terminology. This subject is well explained in a review by Sistrom and Garvan [30], from which most of the following is paraphrased. Assuming a data table comparing some outcome measure in two groups, one of which was exposed to some variable and the other of which was not, the difference in proportions is the proportion with the outcome in the exposed group minus the proportion with the outcome in the unexposed group. It is always between −1.0 and 1.0, and equals zero when the response Y is completely independent of the explanatory variable X. Relative risk is a measure of association between exposure to a factor and risk of a certain outcome, and is defined as risk in the exposed groups divided by risk in the unexposed groups. Public health impact depends not just on relative risk but also absolute risk, so just looking at relative risk can be misleading. For example, the public health impact of a vaccine that halves the risk of an infection is very different for a rare (say, two cases per million people) or a common disease (say two cases per 10 people); in both cases the relative risk for vaccinated subjects would be 0.5 compared to unvaccinated subjects, but in the former case, for every million people vaccinated, only one would be saved, whereas in the latter case it would be 100,000 people saved [31]. Odds are the probability that the outcome does occur divided by the probability that it does not occur, with the odds ratio being the ratio of the odds of an outcome in one group (“exposed”) versus another (“unexposed”). A key concept about using odds to estimate risk of an event is that the relationship between the odds ratio and the risk ratio depends on the outcome frequency of the event. If the odds of an event are greater in one group than another, it does not mean that the risk ratio is increased by the same amount.
PRACTI(STI)CAL MAGIC: WHICH WITCH IS WHICH?
267
This is a very common misunderstanding, but in fact, odds ratios and risk ratios are similar only when the outcome being studied is rare (i.e., about 10% or less probability in the unexposed group) [32]. Odds ratios are not a good estimate of risk ratio when the outcome is common in the population being studied. In that case, to obtain correct risks one needs to apply the equation relating odds to risks, as shown in Example 5. Example 5. The equation relating risk ratio (RR) to odds ratio (OR) is as follows: RR =
OR
(1 − Pr0 ) + ( Pr0 )(OR )
where Pr0 is the probability of the outcome in the unexposed group. Schulman et al. [33] compared the frequency of referral for cardiac catheterization for white men (exposed) to that for black men (unexposed). Black men were referred 90% of the time, and the odds ratio for black to white men being referred was 0.6 : 1. The risk ratio for black men to white being referred was 0.6 (1 − 0.9 ) + (0.9 × 0.6 ) 0.6 = 0.64 = 0.94
RR =
This example shows how an odds ratio of 0.6 for referral to cardiac catheterization for black men vs. white men (as reported by Schulman et al. [33]) does not mean that blacks were referred 40% less often than whites, as was reported in extensive media coverage following publication of that study. Once the correction for a 90% referral frequency for black men was applied, the risk ratio for black men to be referred was 0.94, or 6% less relative to white men rather than the 40% less that the media coverage of the original study had interpreted [34]. PRACTI(STI)CAL MAGIC: WHICH WITCH IS WHICH? During my time as a contract research organization scientist, I was astonished at the frequency of requests from pharmaceutical and biotechnology clients to perform inappropriate statistical treatment of the results; it seems that in the drug development industry at least, there is an appreciable level of confusion over when particular tests are appropriate and what underlying assumptions must be met for them to be valid. Accordingly, some refresher basic statistics background is summarized briefly below.
268
IMPORTANCE OF STATISTICS
Parametric vs. Nonparametric Parametric tests are so named because they use parameters, such as mean and standard deviation, in their computations. Student’s t-test and the analysis of variance (ANOVA) are examples of this type of test. Parametric tests assume that the data sets are normally distributed and that the various groups have roughly equivalent variance between members of the group. They are prone to falsely indicating a significant difference between the groups if these assumptions are not met. In that case, nonparametric tests provide an alternative. Nonparametric tests rely on ranking of samples relative to each other, and make fewer numerical assumptions. They are less likely to commit type I error, but usually also have less power to reveal true significant differences. Researchers need to evaluate data with descriptive statistics in order to decide whether the assumptions of the more powerful parametric tests are met or whether nonparametric tests would be the more appropriate choice. Some assumptions apply to both types of test and must also be considered. In that regard, both parametric and nonparametric tests assume that samples included in the analysis are independent of each other. How does one judge whether these assumptions are met, and what should one do about it if they are not? Independence To judge whether samples are independent, one needs to ask if within any one group of data included in the analysis, the value of one data point is somehow related to another. If the answer is yes, these samples are not independent. For example, if biomarker expression is measured in two separate tissue samples three separate times each and you enter n = 6 data items into the analysis, you have violated the independence criterion. The three replicates of each sample are related to one another (from the same tissue sample); there are actually only two independent measures, not six. Since n = 2 is not enough samples to analyze statistically, potential plans to redress this analysis error could include either omitting statistical analysis in favor of descriptive results only, or redesigning the experiment to include additional tissue samples. Often, experiments are designed to measure biomarker expression repeatedly in a single subject, such as in the case of pharmacodynamic measures at different time points following administration of a drug in clinical trial patients. In this case, the independence assumption is not violated, since the related measures will be analyzed as different groups rather than being included within one group. Repeated-measure statistical tests are then applicable, such as the paired t-test (for two measures) or repeated-measure ANOVA (for more than two measures). To reiterate, then, if samples are not independent of one another and they do not constitute repeated measures, the correct statistical approach is to redesign your experiment. Although more complex analysis strategies do exist
PRACTI(STI)CAL MAGIC: WHICH WITCH IS WHICH?
269
for clustering correlated samples within groups, there is no manipulation when using simple tests of significance that will correct for samples in one data set being selected such that the choice of one sample depends on another sample being picked. Normally Distributed Data How do you judge if your data are normally distributed? The first step is to perform descriptive statistics and plot the data out as a histogram. In a normal distribution, about 68% of the data falls within 1 standard deviation (1 SD) of the mean, and 95% within 2 SD of the mean. Many statistics programs will compute normality for you using statistical tests. The Kolmogorov–Smirnoff test, for instance, has nothing to do with martinis, being instead one such test for the normal distribution of data. If you have around n = 25 or more in each group, you can safely apply the simple eyeball test to your data, accepting data as normal if points fall into a roughly bellshaped curve when plotted as a histogram [35]. What if you are not sure whether your data are normally distributed? If you have small data sets, for instance, normality can be difficult to determine visually. Similarly, statistical tests do not have much power to make the determination in this case. At this point, one’s own personal choice comes into play: If you are conservative, you might choose to play safe with a nonparametric test; if not, you might choose a parametric test. As with other life choices, there are risks associated with either: With parametric tests you risk error of type I, whereas with nonparametric ones you risk type II error. The best solution might be to collect reasonably large data sets so as to avoid this uncertainty. Assuming you do have a large enough data set to be certain, and what you are certain of is that the data are not distributed normally, well, what then? One solution might be to transform data mathematically such that the transformed data sets become normally distributed, in which case the more powerful parametric tests can then be used. Some commonly used transformations and appropriate types of data for their use [35–38] are listed in Table 1. If data are still not normally distributed with equal variances following mathematical transformation, a nonparametric test should be chosen for analysis. Some appropriate nonparametric equivalents of parametric tests are shown in Table 2. Equivalent Variances Between Groups As with the question of whether data are normally distributed, equivalence of variance between groups needs to be evaluated prior to choosing appropriate tests of significance. Sometimes, homogeneity of variance from one group of data to another can be amply judged by plotting a frequency histogram of each group and seeing how spread out the data are, or calculating the variance for each group and qualitatively assessing if the numbers are very different or appear close. It is simple enough, however, to apply the appropriate statistical test, the F-test for two groups, or the Fmax-test for more than two.
270
IMPORTANCE OF STATISTICS
TABLE 1
Common Data Transformations
Type of Data
Example
Transformation
Proportional data
0–100%
Arcsin of the squareroot transformation
Poisson distributed (counts of random events)
Number of cells of a particular type in a given volume of blood
Square-root transformation
Variance proportional to mean squared (i.e., standard deviation proportional to mean)
Serum cholesterol levels in patients
Log transformation
Ratio data (normalized to internal standard on a per-assay basis)
Blots normalized to a housekeeping gene
Log transformation
Highly variable quantities, where variance is proportional to the mean to the fourth power (i.e., standard deviation proportional to mean squared)
Serum creatinine levels in patients
Reciprocal transformation
TABLE 2
Nonparametric Equivalents of Parametric Tests
Parametric Student’s t-test (unpaired) t-test (paired) t-test (paired), in cases where data are not symmetrical around the median ANOVA (without repeated measures) ANOVA (with repeated measures)
Nonparametric Mann–Whitney test (Wilcoxon rank sum) Wilcoxon signed-rank test Sign test Kruskal–Wallis test Friedman’s test
Source: Adapted from ref. 35.
If variances differ between groups and are not rendered equivalent by mathematical transformation (see Table 1), you may still be able to use parametric statistics. If, for example, the t-test is appropriate to your experimental design, and variances are unequal between groups but not extremely disparate, you may choose to use the t-test for unequal variances (in the case of extreme differences in variance, however, a nonparametric test is the recommended alternative). If your intended parametric test is ANOVA, on the other hand, the assumption of equivalent variance between test groups is essential [35]. If variances are not equivalent and transformation also does not render them equivalent, you must test statistically using a nonparametric equivalent (see Table 2).
PRACTI(STI)CAL MAGIC: WHICH WITCH IS WHICH?
271
Simple Regression and Correlation Analysis At times there seems to be some confusion about simple linear regression vs. correlation analysis. This issue is well summed up in a review by Zou et al. [20], from which the following information is, for the most part, culled. The two are similar mathematically, but their purposes are different. Both relate to a function that describes the relationship between a given X and Y; however, regression analysis generally focuses on the form of that relationship and correlation generally on the strength. Furthermore, regression focuses on evaluating the relative impact of a predictor variable on a particular (dependent) outcome, whereas correlation’s purpose is to examine the strength and direction of the relationship between two random variables. Correlation analysis commonly involves either the Pearson coefficient (r) or the Spearman coefficient (r 2), with the former reflecting proportional changes in one variable when the other is changed, and the latter using ranks and reflecting instead a monotonic relationship between two variables (i.e., whether one tends to take either a larger or a smaller value than the other, but not necessarily with a proportional change in one variable when the other one is changed). If data sets are skewed or contain outliers, the Spearman coefficient rather than the Pearson is the appropriate choice. Interpretation of correlation coefficients is often rather qualitative, with the sign indicating the direction of the relationship (positive or negative). Values range from 0.0 (no correlation) to 1.0 (perfect correlation), with 0.5 generally being thought of as moderate, 0.8 as strong, and 0.2 as weak. Statistical significance can also be computed, by formulating a null hypothesis of no correlation and a one-sided alternative hypothesis that the underlying value exceeds or is less than that value, then computing the z-test statistic and rejecting the null hypothesis based on the p-value. Simple linear regression analysis results in an r2 value that is calculated on the basis of Pearson r coefficient and reflects the fraction of the variability in y that can be explained by the variability in x through their linear relationship (or vice versa). As with correlation analysis, a finding of strong linear relationship in a regression analysis does not mean that the variable causes the outcome (as discussed in the section on misuses of correlation), and should not be interpreted that way. Student’s t-test can be used, for example, to test if there is a linear relationship (i.e., null hypothesis of slope = 0) or whether the y-intercept is a particular value. As a salient point, no extrapolation outside the range of values of the independent variable in the regression analysis should be used to make any predictions.
t-Test vs. ANOVA Using inappropriate tests of significance for the circumstances, such as testing differences between more than two groups by means of repeated t-tests rather than by ANOVA, is exceedingly common [35]. Don’t do it! Repeating
272
IMPORTANCE OF STATISTICS
pairwise comparisons ignores the experimental effects in all but the two groups being compared and increases the number of false-positive “significant results” expected; multiple means comparisons following ANOVA correct for the increasing chances of false positives by adjusting the acceptable hypothesis rejection level. If you have only two groups to compare to each other and they meet the assumptions of parametric tests, use Student’s t-test. If you are testing more than two groups, however, use ANOVA. Since a standard ANOVA will tell you if there is a statistically significant difference somewhere in your data sets but will not tell you where, you need to apply a multiple means comparison test after the ANOVA, to pinpoint the culprits. There are many tests available, each suited to different types of data, so it is advised that you consult the help menu of your own statistical program to decipher which one best fits your own experimental design. As a rule of thumb, do not choose to make any more comparisons than are appropriate for the goals of the experiment, or you risk to lose the ability to detect significant differences that may truly exist. This is because multiple means comparison tests apply corrections that tighten the rejection level in proportion to an increased group comparison number (i.e., effectively lowering the p-value required to find a difference significant). In general, you can make one less comparison between groups than the total number of groups, without affecting this rejection level; however, if you exceed this, the rejection level is adjusted. Dunnett’s test, for instance, is suited to designs such as a dose–response curve, which include a matched control sample and several different doses; this test compares each group only to the control (not to each other), to answer the question “Which dose(s) give(s) a response” but not “Was the response to one dose different than to another?” Example 6 shows how a significant difference between the control and treatment groups could be missed if more comparisons were made than necessary. Example 6. You wish to test for whether any of several doses of your treatment has a significant effect on your outcome. You have collected samples a, b, c, d, (i.e., four groups), where a is the control and b, c, d represent responses to three different concentrations of your treatment. Assuming that your highest dose, d, shows a p-value of 0.02 with respect to control, this difference would be reported as significant by Dunnett’s test; that is, three comparisons would be made (b vs. a, c vs. a, d vs. a), one less than the total number of groups, so no adjustment to the rejection level would be necessary to compensate for multiple comparisons, and the d vs. a calculated p-value of 0.02 (≤0.05) would be interpreted as statistically significant. If instead of Dunnett’s test, the Bonferroni test were chosen, six comparisons would be made (the same three as above, plus b vs. c, b vs. d, and c vs. d). Since this would be two more comparisons than the number of groups, the significance threshold of 0.05 would be adjusted to compensate for multiple
PRACTI(STI)CAL MAGIC: WHICH WITCH IS WHICH?
273
comparisons (0.05/6 = 0.008), such that a computed p-value of ≤0.008 would be required for significance at the so-called 0.05 rejection level (i.e., p = 0.02 would not be considered significant). One-Tailed vs. Two-Tailed The tails referred to in your statistical package are the nonwagging variety; the term represents the tail edges of the bell-shaped normal distribution. Tests are one- or two-tailed depending on whether they compute the probability of values exceeding the range expected from members of a population in only one direction or potentially in both directions (i.e., greater than or less than). Using a one-tailed test is justified only if there was an a priori determination that only differences in one specified direction would be assessed, not because one decides after seeing that the one-tailed p is significant but the two-tailed is not, that one only really cares about one direction anyway. Furthermore, having an expectation that the difference to be found will go in a particular direction is not considered adequate justification for performing one-tailed analysis; rather, one-tailed analysis could be considered appropriate if having a large difference in one direction vs. having no difference at all would have equivalent consequences [39]. Under most circumstances a two-tailed test is the appropriate choice, and if the researcher is in any doubt about which is appropriate, the two-tailed should be chosen. Paired vs. Unpaired Paired analysis is appropriate in the following circumstances: • When you do a before and after measurement in each subject. • When you match subjects in pairs (i.e., age, etc.), then treat one of the subjects and not the other. Pairs must be made before data are collected. • When you compare relatives (i.e., sibling studies). • When you perform an experiment many times, each time with the experimental and control sample treated in parallel (including log-transformed ratio data, as described in Example 7). Example 7. You have done Northern blots of gene expression in mutant animals vs. wild-type animals. Each of your blots shows wildly different signal magnitudes from other blots, due to technical reasons such as exposure time; having mutant and wild-type samples on every blot allows you to control for this variability by expressing data as a ratio of mutant to wild-type values. Ratios are not normally distributed, but logs of ratios are; therefore, you log-transform your data. Ratios become differences when log-transformed [i.e., log(mutant/wild type) = log(mutant) − log(wild type)], so to analyze this type of data you can take the log of each data point and perform a paired ttest on the transformed data.
274
IMPORTANCE OF STATISTICS
In judging whether distributions are normal in paired testing, it is the differences between the two members of each pair that must be approximately normally distributed rather than the two distributions themselves. If data from two sets to be compared do not meet one of the criteria above, they should be analyzed by unpaired testing [35]. Miscellaneous Data Outliers Within any set of observations it is not unusual to obtain occasional data points that vary substantially from others found in the same group of subjects. These outlier data may be the result of unknown errors or may represent genuine information, and indeed, whether such data should be included or rejected is the subject of much debate. Methods for detecting outlier data and the use of robust methods of statistical analysis to accommodate such data are complex issues, beyond the scope of this chapter; a thorough treatment of the subject is presented by Barnett and Lewis [40], to which the reader is referred. BLQ Data Statisticians sometimes warn against left-censored distributions, a term evocative of political intrigue and vaguely racy data sets that actually refers to the more bland reality of omitting results that are below the levels of quantitation of the assay (BLQ). BLQ results are fairly common in clinical and preclinical biomarker analyses and can cause serious problems with interpretation of the data if they are not taken into account, since unreported values are unusable in any statistical analysis of the data. Data imputation, a process to replace unreported BLQ values with valid estimates, can also affect the data interpretation, since all values in a data set affect the measures of central tendency, error, and statistical power. It seems clear that estimating a theoretical value for BLQ data points would allow for more accurate conclusions to be drawn from the data than simply removing data through leftcensoring, but what value should indeed be assigned to BLQ data points? Some researchers report BLQ values as zero, perhaps due to the influence of the sometimes black-and-white perspective of assay validation-style cutoffs. Assigning a zero value regardless of the sensitivity of a given test could be misleading, however, since zero is an absolute, whereas BLQ represents a scale from zero up to an upper limit that differs from one test to another in an assay-specific manner. Furthermore, assigning zero values creates other mathematical difficulties, such as not being able to compute fold-differences of another value with respect to the zero, or not being able to work with log transforms of data. One common approach to address these issues in the data imputation process has been to assign arbitrary nonzero values to BLQ data, most commonly either the limit of detection (LOD) divided by 2 or divided by the square root of 2, depending on the skewness of the data set in question [41]. Both appreciable bias and loss of power have been demonstrated with this
ISSUES SPECIFIC TO SPECIALIZED FIELDS
275
type of approach, however [42,43]. Succop et al. compared the bias of alternative methods for imputing BLQ values and concluded that by imputing the values based on either median percentiles below the detection limit or based on predicting the BLQ value from an equation model fit to the noncensored data set, both produced a good correlation between predicted and reported low values. In their example data set, bias between the predicted and true values was only 2.9% this way, compared to 348% overestimation bias when LOD/2 was used to impute the BLQ values [44]. Standard curves of analytical assays typically provide an equation model fit that can be used to calculate values down to the zero level, but assay validation typically limits the data reported to levels that have met a cutoff level that is deemed to have accuracy and precision within proscribed acceptance limits (i.e., the lower limit of quantitation). The study above suggests that although equation-predicted values below the levels of quantitation of an assay may not meet acceptance criteria per se, they still represent more accurate estimations for BLQ data than either arbitrary imputation methods or left-censoring of data sets. Succop et al. recommended that analytical labs should provide a numerical result for all samples analyzed, with a flag of those values that are below the detection limit [44].
ISSUES SPECIFIC TO SPECIALIZED FIELDS Genetic Association Studies Genetic association studies evaluate the association between specific genetic polymorphisms and disease. They often assess relatively small effects against a noisy background of biological and social complexity, and consequently, tend to lack statistical power [45]. Power calculations are critical for genetics researchers who wish to map susceptibility genes, with chi-square being the most commonly used test statistic for association studies. The noncentrality parameter for the chi-square asymptotic distribution developed by Mitra [46] is the key to computing power and sample size for this type of study. Gordon and Finch present detailed instructions (including step-by-step use of a Webbased calculator) for this, as well as reviewing other concepts of statistical genetics and the best study/analysis design to optimize power for this type of study [17]. Synthesizing evidence from multiple studies has been used as a means of increasing power; however, it seems that metaanalyses in this field have been fraught with serious flaws. As well as more general concerns, there are genetic issues particular to molecular association studies, including checking Hardy– Weinberg equilibrium (HWE, discussed below), handling data from more than two groups while avoiding multiple comparisons, and pooling data in a way that is sensitive to genetic models [47]. As regards the latter two points, there are always at least three possible genotypes to compare, but in practice
276
IMPORTANCE OF STATISTICS
most studies reduce the number of comparisons by assuming a specific genetic model, such as dominant or recessive, even though there are often no biological justifications for assumption of the model. Using an inappropriate genetic model or inappropriately pooled data produce misleading estimates of odds ratios. Appropriate metaanalysis methods and avoiding assumptions about genetic models are discussed further by Minelli et al. [45] and Thakkinstian et al. [48]. Bias seems to be another serious problem in this type of study. Some methods for detecting and correcting it are discussed by Sterne et al. [49]; publication bias, for example, may be detected by a simple funnel plot of the data. Not just the metaanalyses, but also single studies in this field seem excessively prone to various sources of bias, with some critics going so far as to assert that most documented associations, even those that are replicated, represent nothing other than false positives based on bias [50]. The list of prevalent bias sources is impressive and includes biological plausibility bias, ascertainment bias, publication bias, selective reporting bias, spectrum of disease bias, population stratification, biased selection of controls, lack of blinding in the genotyping process, and genotyping error. Some of these error sources can be addressed simply through researchers being cognizant of them and striving to avoid them. For example, one can choose to blind genotyping rather than failing to do so. In another example, ascertainment bias can be avoided by scrutinizing controls with the same intensity as cases; this type of bias relates to the situation where affected individuals (“cases”) have their DNA resequenced to identify rare variants. Identification of such rare variants in an affected group does not necessarily signify a role in disease (as is often assumed when only cases are ascertained by resequencing) since sequencing DNA from any group tends to turn up a few rare mutations. Instead, a strong and statistically convincing preferential presence of variants in the cases compared to controls (that had been scrutinized in the same manner) would support the involvement of the variants in disease [51]. In cases where being cognizant of the potential for bias does not allow us to avoid it, designing experiments and/or data analysis with an expectation of inevitable bias can dampen its influence. For example, designing studies to have higher than desired power can combat the power loss expected from genotyping misclassification errors, as discussed earlier. As another example, various statistical techniques that are robust to bias can be used, such as genomic control techniques or family-based methods to address problems of population stratification [17]. Population stratification refers to the situation in case–control studies where control groups are not well matched to case groups and in fact have different typical levels of what is being measured simply because they are from different populations. In this situation, association tests like chi-square may falsely indicate associations, even with as high as 100% probability.
ISSUES SPECIFIC TO SPECIALIZED FIELDS
277
In fact, having appropriate control groups is a key determinant of the validity of genetics association studies, and determining whether controls deviate from HWE is a standard way of evaluating this. Theoretically, disease-free control groups from outbred populations should follow HWE, as should combined cases and controls if they both have a particular disease (i.e., such as in studies where different treatments are evaluated). If they do not, it is a signal of some peculiarity, error, or problem with the data sets that could invalidate the key inferences from a genetic association study [52]. For example, if there is a recessive model and the control group has an excess or deficit of one group of homozygotes, this will directly affect calculation of the odds ratio (the control homozygotes divided by the other genotypes is the denominator of the odds ratio). Overall, if there is a significant deviation from HWE it should induce some thinking about the study; deviations could result from genotyping errors but may also arise from other sources. For example, HWE deviation may suggest that allele-based estimates of genetic effects are biased, or may give further insight into the population from which the data are derived. A recent review of studies published in high-quality specialized genetics journals demonstrated that HWE is very commonly tested improperly or inadequately [3]. Of 776 associations tested, only 29% reported on HWE, introducing uncertainty as to if it was tested in the others. Furthermore, where HWE testing was described, the test was applied to the correct control group in only 50% of the associations tested. It is a common error to include the disease cases with the control cases in the HWE test when controls are diseasefree. Combined cases and controls should be tested only when the controls have the same disease as the cases; otherwise, only controls should be tested. These authors recalculated HWE for the studies reviewed and noted that in most of the samples where HWE was actually violated, this was either not mentioned in the original article or HWE conformity was actually claimed. Another common problem in the studies reviewed was the unjustified use of an inappropriate test. A number of different statistical tests could be used to test HWE (including chi-square, exact tests, and Bayesian methods), all based on the conditional probability that there would be the number of homozygotes that actually turned out to be in the sample. The chi-square test was the only test applied to HWE calculation in the studies reviewed, despite the fact that the chi-square asymptotic distribution is inadequate to deal with low genotype frequencies and is therefore not justified in studies involving them; in this case an exact test provides a simple and superior alternative [3,54]. A final serious concern in the studies reviewed was that only 7% of the studies had an acceptable power to detect HWE deviation, with most studies being much too underpowered to make any claim of lack of deviations. Whereas power to detect HWE is of secondary importance compared to the prime consideration of power to detect a genetic association, undetected modest HWE deviations could affect considerably the inferences of many genetic association studies [3].
278
IMPORTANCE OF STATISTICS
Microarray or Other “Omics”-Type Data Relating “omics”-style biomarker data (gene/protein arrays, multiple m/z peaks from mass spectrometry, etc.) to clinical outcomes is the focus of a large body of research. Making statistically based decisions about what is expressed or produced differentially, and how this relates to outcomes is not simple for these types of studies. For one, the data are highly dimensional, with the number of variables often exceeding the number of samples by magnitudes (i.e., in contrast to the desirable format for statistical comparisons of a group of sufficient numbers of samples for each variable tested). Furthermore, data tend to be very “noisy,” with widely different magnitudes of signals and the potential for large irrelevant signals to obscure lesser (but relevant) ones. This translates into an inherent tendency for algorithms fit to the data to be more influenced by noise than by salient interest points (a phenomenon known as overfitting the noise). A recent review of analysis of genomics/proteomics data outlines the steps of quality control, clustering, classification, feature ranking, and validation [55]. In Joe/Josephine terms these could be described as checking and normalizing the data, grouping like things together, statistically evaluating their likely involvement with the outcome, getting rid of the dead wood, and checking the conclusions on an independent data set. In the first stage, quality control and normalization of signals reduce the technical variability and ensure that biomarkers discovered are statistically significant. Cui and Churchill reviewed statistical tests that have been specifically adapted to cDNA microarray data analysis [56]; many of the issues discussed are similar for highly dimensional data from other study types. They conclude that fold-change is the simplest method for detecting differential expression of genes, but the arbitrary nature of assigned cutoff values, the lack of statistical confidence measures, and the potential for biased conclusions all detract from its appeal. They also stress the need to normalize data properly, such as by using intensity-specific thresholds, since otherwise an excess of low-intensity genes may be identified as being expressed differentially simply because their fold-changes have larger variance than the fold-change values of high-intensity genes. Expression analysis steps following this generally fall into two categories, supervised or unsupervised, based on whether or not they are led by prior knowledge. Clustering is an unsupervised method that groups genes of similar expression profile together; clustered genes may have diverse biological functions, but this is nonetheless useful for exploring new territory and seeing who is dancing together. Supervised classification methods, on the other hand, train a mathematical model using prior biological knowledge of certain genes’ functional involvement with each other, then use the model for predictions on other genes. This is a useful approach for building on what is already known, but in cases of limited available information it is difficult for a supervised method to achieve accurate predictions.
ISSUES SPECIFIC TO SPECIALIZED FIELDS
279
Flawed statistical analysis is very common in this emerging field, even in studies published in high-impact factor journals. Dupuy and Simon recently found that half of a survey of 90 microarray studies had at least one of three major analysis flaws [2]. (These authors also present a list of analysis do’s and don’ts specific to this study type, to which the reader is referred for more detail.) Analysis flaws like those discussed in this review and elsewhere [53,57] can lead to completely erroneous conclusions about the utility of given biomarkers or “signatures” of multiple biomarkers as prognostic or diagnostic tools, and may underlie some of the problems reproducing data in this field. One of the most common flaws present in the published studies was to perform multiple comparisons without making adequate adjustment for the expected increased false-positive rate. Although the standard of setting p at less than or equal to 0.05 is accepted as a reasonable compromise between type I and II errors when data sets are relatively small, in the relatively largeomic data sets a p-value this high is not sufficiently stringent to identify disease screening markers with an acceptably low false-positive rate. For example, a microarray containing 5000 genes examined for meaningful relationships between the genes and some outcome using this p-value could be expected to yield 250 false-positive relationships due simply to chance. Dupuy and Simon suggest that the simplest method to combat this type of error is to use a p-value of less than or equal to 0.001, which would give one false positive for every 1000 genes tested. Several other accepted methods are grouped under the term family-wise error-rate control (FWER), but they have been criticized for decreasing power substantially [58]; the Bonferroni correction (dividing the nominal significance level by the number of tests) is the simplest of these. The false detection rate (FDR) test (or Benjamini–Hochberg test) is a favored control statistic for microarray analysis and is included in statistical analysis packages such as SAM (statistical analysis for microarray). The FDR achieves the same control of false significant results while greatly improving the statistical power compared to FWER methods [59,60]. An FDR of 5% is not the same as a p-value of 0.05: p = 0.05 means that 5% of tested genes will be false positive, whereas 5% FDR means that 5% of positive genes detected will be false. For example, given a scenario of 5000 genes tested, with 20 of them being found positive, a p-value of 0.05 means that there could have been 250 false positives (due to chance alone), whereas an FDR of 0.05 means that only one false positive was attributable to chance alone. A second common flaw was to make spurious claims of a meaningful correlation between clusters of genes and clinical outcome, when, in fact, correlation was detected after clustering using a selection of outcome-related differentially expressed genes. Therefore, genes being related to the outcome was a consequence of the selection bias, but was instead reported as independent proof of a correlation between the clinical outcomes and the gene clusters. A number of classification methods decrease noise in the data by requiring selection of the relevant and informative predictor variables before modeling
280
IMPORTANCE OF STATISTICS
is performed [61], so users need to be aware that if they have used one of these methods there is no point in performing correlation analysis afterward. The third major flaw concerned the mechanics of classification model validation. Once a model is constructed it is generally validated using either an external test set or cross-validation, with the end goal being to use the validated model to predict outcomes for unknown samples. In the studies reporting supervised prediction, Dupuy and Simon noted the common flaw of biased, overly optimistic estimation of the prediction accuracy through an incorrect cross-validation procedure. Different errors in cross-validation procedure were noted, but generally they were all related to failing to have test sets (data used to test the model’s predictive capacity) completely uninvolved in generating the algorithm in the first place. Bayesian Information Integration for Personalized Medicine A key challenge for the success of pharmacogenomics lays in defining analytical methods to integrate and interpret the increasingly complex information that it provides, so that decisions can be made about how people can best benefit from certain therapeutic interventions. Bayesian statistics approaches, often favored by decision theorists, have proven useful in classification models that integrate traditional clinical risk factors with complex genomics data. The point of doing this is to try to predict the outcome of future patients who share certain key clinicogenomic attributes with those from whom the model was derived and to use these predictions for patient-customized diagnostic, prognostic, and treatment purposes. Most of this chapter is based on the classical frequentist approach to inferential statistics with which most of us are familiar. In this approach, a single hypothesis (the null hypothesis) is formed about the effect of (or, rather, lack thereof) a variable or several variables that are tested using a fairly rigid and well-defined format. The evidence against the hypothesis is all that is considered, and it is in the form of probabilities computed from the distribution of the outcome measure(s). A baseline assumption of randomness is assumed in defining what the probabilities are, and rejection of a null hypothesis is founded on the experimental result being too different from the random probability function to actually be random (i.e., therefore it isn’t). Bayesian statistics, on the other hand, represent a fundamentally different way of thinking about probability, with statistical inferences being made using accumulated evidence of all kinds, both for and against a hypothesis, or several alternative hypotheses. These hypotheses do not even have to be testable, let alone be tested, and certainly not by any particular format. (The name of the first publication of Bayes himself shows an example of such a hypothesis: Divine Benevolance, or an Attempt to Prove that the Principal End of Divine Providence and Government Is the Happiness of His Creatures [62].) Importantly, there is no baseline assumption of randomness: Instead, a “prior probability” is included in the paradigm. Although assigning it a value based
ISSUES SPECIFIC TO SPECIALIZED FIELDS
281
on random chances is one option, another is that prior knowledge is incorporated into this value; in this way, Bayesian methods build on past experience to make statistical inferences. To compute the probability of each particular hypothesis being true, a calculation using Bayes’ theorem would multiply all the conditional probabilities that each is true by their prior probabilities and weight the final values relative to each other. Conditional probabilities are those from each piece of evidence for or against the hypothesis, and prior probability is defined in a very open-ended way; it could be pretty much anything, including subjective feelings, provided that it is decided upon in a consistent manner between all the hypotheses, it sums to 1 for all the hypotheses, and the information used to produce it is not used again later as evidence. As more evidence accumulates, it can be incorporated into updated models [63]. A simple Bayes calculation has already been shown (see Example 2); in that case, known disease frequency in the population constituted the prior probability, and the predictive values of the diagnostic test for true and false positives were the conditional probabilities. Biomarker researchers aiming to define predictive models for personalized medicine have showed how Bayesian methods can strengthen these models by integrating different types of information into them. For instance, Qi et al. noted that both supervised and unsupervised methods have different drawbacks for genomics data analysis, with the latter capturing relevant expression profiles but ignoring important a priori knowledge, and the former being limited where little prior knowledge exists. Accordingly, they used a Bayesian approach to perform accurate predictions even when scant prior knowledge is available; their integrated model achieved higher sensitivity and specificity compared to both unsupervised and supervised methods [64]. Similarly, Pittman et al. [65] used Bayesian tree modeling to integrate both clinical information and genomic biomarker status for a sample set of breast cancer patients. They showed that the integrated clinicogenomic model gave substantially greater log-model likelihood (>7) than the model based on genomic biomarker signatures alone, and blew the clinical predictors alone right out of the picture, with a weight of evidence for the clinicogenomic vs. the clinical predictors only of more than 26 log-likelihood units. Tree models in general involve successive splitting of a given patient sample group with certain known characteristics (i.e., gene signature, clinical risk factors) and outcomes (e.g., cancer status, survival, relapse) into more and more homogeneous subgroups. At each split, the collection of evidence (e.g., clinical or gene factors) is sampled to determine which of them optimally divide the patients according to their outcome, and a split is made if significance exceeds a certain level. Multiple possible splits generate “forests” of possible trees. Some caveats noted for those using these methods include how to choose among alternative potential models that are identified as being of similar or significant probability, and the issue of uncertainty. In that regard, whereas in
282
IMPORTANCE OF STATISTICS
some other applications Bayesian approaches are used to choose a single best hypothesis from among several, such an approach is warned against for pharmacogenomic modeling applications [61,65,66]. In this case it is typical to see multiple plausible tree models representing the data adequately, and this is consistent with the physical reality of multiple plausible combinations of genetic and clinical factors that could lead to the same outcome measures. Rather than choosing one of them, it is critical to define overall predictions by averaging across the multiple candidate models using appropriate weights that reflect the relative fits of the trees to the data observed. The impact of averaging is seen in greater accuracy of the model predictive capacity and, importantly, in accurate estimation of the uncertainty about the resulting prediction uncertainty (i.e., the prediction uncertainty of such a model is conceptually akin to the measurement uncertainty associated with a laboratory method, as discussed earlier). Nevins et al. underlined well the importance of prediction uncertainty when they pointed out: “A further critical aspect of prognosis is the need to provide honest assessments of the uncertainty associated with any prediction. A predicted 70% recurrence probability, for example, should be treated quite differently by clinical decision makers if its associated uncertainty is ±30% than if it were ±2% …” [66].
SUMMARY: QUICK DO’S AND DON’TS The field of biomarkers is a wide one, with a huge diversity of potential applications, all of which may have different and complex statistical analysis issues. This chapter has undoubtedly missed many of these and glossed over others, but summarizing concisely what it has covered is still a challenge. Instead, I present below a distillation in the form of one-liners addressing some of the more critical points (“do”) and more common misconceptions (“don’t”). Do: 1. Put on Hercule Poirot’s hat and use your judgment when considering analytical issues and your experimental situation. 2. Think ahead: Consider data analysis issues before settling on your experimental design (and long before having the data in hand). 3. Take the limitations of your techniques and experimental error into account in your interpretations. 4. Check that your design is powered appropriately to detect what you wish to detect. 5. Use appropriate tests (i.e., ANOVA when comparing means from more than two groups). 6. Use appropriate controls and check HWE in genetic association studies. 7. Make adequate adjustment for the elevated false-positive rates when dealing with-omics-style data.
REFERENCES
283
8. Average over multiple plausible candidate models with appropriate weights (rather than choosing a single one), for best predictive accuracy and uncertainty estimations in pharmacogenomic applications of Bayesian modeling strategies. Don’t: 1. Use parametric tests if data do not meet the assumptions of these tests (such as being normally distributed). 2. Use repeated t-tests when comparing means from more than two groups in one experimental design. 3. Compare more groups than are relevant to the goals of your experiment when applying multiple means testing. 4. Use correlation analysis to infer cause and effect or to test agreement between different methods. 5. Simply equate the odds ratio and risk ratio without considering the outcome frequency. 6. Use data to test correlation to outcomes or as a test set for validation of algorithms if they were already involved with those earlier in the process of mathematical modeling of genomics data (i.e., selected based on outcomes in the former case, or involved in algorithm generation in the latter).
REFERENCES 1. Ludbrook J (2001). Statistics in physiology and pharmacology: a slow and erratic learning curve. Clin Exp Pharmacol Physiol, 28(5–6):488–492. 2. Dupuy A, Simon RM (2007). Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Nat Cancer Inst, 99(2):147–157. 3. Salanti G, Amountza G, Ntzani EE, Ioannidis JP (2005). Hardy–Weinberg equilibrium in genetic association studies: an empirical evaluation of reporting, deviations, and power. Eur J Hum Genet, 13:840–848. 4. Attia J, Thakkinstian A, D’Este C (2003). Meta-analyses of molecular association studies: methodological lessons for genetic epidemiology. J Clin Epidemiol, 56:297–303. 5. Ransohoff D (2005). Lessons from controversy: ovarian cancer screening and serum proteomics. J Nat Cancer Inst, 97(4):315–319. 6. Goncalves A, Borg JP, Pouyssegur J (2004). Biomarkers in cancer management: a crucial bridge towards personalized medicine. Drug Discov Today: Ther Strategies, 1(3):305–311. 7. Ransohoff DF (2005). Bias as a threat to validity of cancer molecular-marker research. Nat Rev Cancer, 5:142–149. 8. Gawrylewski A (2007). The trouble with animal models. Scientist, 21(7):45–51.
284
IMPORTANCE OF STATISTICS
9. Hubert Ph, Nguyen-Huu JJ, Boulanger B, et al. (2004). Harmonization of strategies for the validation of quantitative analytical procedures: a SFSTP proposal— part 1. J Pharm Biomed Anal, 36(3):579–586. 10. Findlay JWA, Smith WC, Lee JW, et al. (2000). Validation of immunoassays for bioanalysis: a pharmaceutical industry perspective. J Pharm Biomed Anal, 21: 1249–1273. 11. Hubert Ph, Nguyen-Huu JJ, Boulanger B, et al. (2006). Validation des procédures analytiques quantitatives: harmonisation des démarches. Partie II—Statistiques. STP Pharma Prat, 16:30–60. 12. Prudhomme O’Meara W, Fenlon Hall B, Ellis McKenzie F (2007). Malaria vaccine efficacy: the difficulty of detecting and diagnosing malaria. Malaria J, 6:136. 13. Carlin BP, Louis TA (1996). Bayes and Empirical Bayes Methods for Data Analysis. Chapman & Hall, London. 14. Phillips CV, Maldonado G (1999). Using Monte Carlo methods to quantify the multiple sources of error in studies. Am J Epidemiol, 149:S17. 15. Plant N, Ogg M, Crowder M, Gibson G (2000). Control and statistical analysis of in vitro reporter gene assays. Anal Biochem, 278:170–174. 16. Phillips CV, LaPole LM (2003). Quantifying errors without random sampling. BMC Med Res Methodol, 3:9. 17. Gordon G, Finch SJ (2005). Factors affecting statistical power in the detection of genetic association. J Clin Invest, 115(6):1408–1418. 18. Cohen MJ (1998). The Penguin Thesaurus of Quotations. Penguin Books, Harmondsworth, UK. 19. Coakley EH, Kawachi I, Manson JE, Speizer FE, Willet WC, Colditz GA (1998). Lower levels of physical functioning are associated with higher body weight among middle-aged and older women. Int J Obes Relat Metab Disord, 22(10):958–965. 20. Zou KH, Tuncali K, Silverman SG (2003). Correlation and simple linear regression. Radiology, 227:617–628. 21. Hill AB (1965). The environment and disease: association or causation? Proc R Soc Med, 58:295–300. 22. Phillips CV, Goodman KJ (2006). Causal criteria and counterfactuals; nothing more (or less) than scientific common sense. BMC Med Res Methodol, 3:5. 23. Bland JM, Altman DG (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, 1(8476):307–310. 24. Browner WS, Newman TB (1987). Are all significant p values created equal? The analogy between diagnostic tests and clinical research. J Am Med Assoc, 257:2459–2463. 25. Eng J (2003). Sample size estimation: How many individuals should be studied? Radiology, 227:309–313. 26. Eng J (2004). Sample size estimation: a glimpse beyond simple formulas. Radiology, 230:606–612. 27. Detsky AS, Sackett DL (1985). When was a “negative” clinical trial big enough? How many patients you need depends on what you found. Arch Intern Med, 145:709–712. 28. Sokal RR, Rohlf FJ (1981). Biometry, 2nd ed. W.H. Freeman, New York.
REFERENCES
285
29. Browner WS, Newman TB, Cummings SR, Hulley SB (2001). Estimating sample size and power. In Hulley SB, Cummings SR, Browner WS, Grady D, Hearst N, Newman TB (eds.), Designing Clinical Research: An Epidemiological Approach, 2nd ed. Lippincott Williams & Wilkins, Philadelphia, pp. 65–84. 30. Sistrom CL, Garvan CW (2004). Proportions, odds, and risk. Radiology, 230:12–19. 31. Motulsky H (1995). Intuitive Biostatistics. Oxford University Press, Oxford, UK. 32. Agresti A (2002). Categorical Data Analysis. Wiley, Hoboken, NJ. 33. Schulman KA, Berlin JA, Harless W, et al. (1999). The effect of race and sex on physicians’ recommendations for cardiac catheterization. N Engl J Med, 340: 618–626. 34. Schwartz LM, Woloshin S, Welch HG (1999). Misunderstandings about the effects of race and sex on physicians’ referrals for cardiac catheterization. N Engl J Med, 341:279–283. 35. Ryder EF, Robakiewicz P (1998). Statistics for the molecular biologist: group comparisons. In Ausubel FM, Brent R, Kingston RE, et al. (eds.), Current Protocols in Molecular Biology. Wiley, New York, pp. A.31.1–A.31.22. 36. Bland JM, Altman DG (1996). The use of transformation when comparing two means. BMJ, 312(7039):1153. 37. Bland JM, Altman DG (1996). Transforming data. BMJ, 312(7033):770. 38. Ludbrook J (1995). Issues in biomedical statistics: comparing means under normal distribution theory. Aust N Z J Surg, 65(4):267–272. 39. Bland JM, Altman DG (1994). One and two sided tests of significance. BMJ, 309:248. 40. Barnett V, Lewis T (1994). Outliers in Statistical Data, 3rd ed. Wiley, New York. 41. Hornung RW, Reed DL (1990). Appl Occup Environ Hyg, 5:46–51. 42. Hughes MD (2000). Analysis and design issues for studies using censored biomarker measurements with an example of viral load measurements in HIV clinical trials. Stat Med, 19:3171–3191. 43. Thiebaut R, Guedj J, Jacqmin-Gadda H, et al. (2006). Estimation of dynamic model parameters taking into account undetectable marker values. BMC Med Res Methodol, 6:38. 44. Succop PA, Clark S, Chen M, Galke W (2004). Imputation of data values that are less than a detection limit. J Occup Environ Hyg, 1(7):436–441. 45. Minelli C, Thompson JR, Abrams KR, Thakkinstian A, Attia J (2005). The choice of a genetic model in the meta-analysis of molecular association studies. Int J Epidemiol, 34:1319–1328. 46. Mitra SK (1958). On the limiting power function of the frequency chi-square test. Ann Math Stat, 29:1221–1233. 47. Attia J, Thakkinstian A, D’Este C (2003). Meta-analyses of molecular association studies: methodological lessons for genetic epidemiology. J Clin Epidemiol, 56:297–303. 48. Thakkinstian A, McElduff P, D’Este C, Duffy D, Attia J (2005). A method for meta-analysis of molecular association studies. Stat Med, 24(9):1291–1306. 49. Sterne JAC, Egger M, Smith GD (2001). Investigating and dealing with publication and other biases in meta-analysis. BMJ, 323:101–105.
286
IMPORTANCE OF STATISTICS
50. Ntzani EE, Rizos EC, Ioannidis JPA (2007). Genetic effect versus bias for candidate polymorphisms in myocardial infarction: case study and overview of largescale evidence. Am J Epidemiol, 165(9):973–984. 51. Hirschhorn JN, Altshuler D (2002). Once and again: issues surrounding replication in genetic association studies. J Clin Endocrinol Metab, 87(10):4438–4441. 52. Khoury MJ, Beaty TH, Cohen BH (1993). Fundamentals of Genetic Epidemiology. Oxford University Press, New York. 53. Baggerly KA, Morris JS, Edmonson SR, Coombes KR (2005). Signal in noise: evaluating reported reproducibility of serum proteomic tests for ovarian cancer. J Nat Cancer Inst, 97(4):307–309. 54. Emigh T (1980). A comparison of tests for Hardy–Weinberg equilibrium. Biometrics, 36:627–642. 55. Phan JH, Quo CF, Wang MD (2006). Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics. Prog Brain Res, 158:83–108. 56. Cui X, Churchill GA (2003). Statistical tests for differential expression in cDNA microarray experiments. Genome Biol, 4:210. 57. Liotta LA, Lowenthal M, Mehta A, et al. (2005). Importance of communication between producers and consumers of publicly available experimental data. J Nat Cancer Inst, 97(4):310–314. 58. Dudoit S, Schaffer JP, Boldrick JC (2003). Multiple hypothesis testing in microarray experiments. Stat Sci, 18:71–103. 59. Benjamini Y, Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B, 57:289–300. 60. Benjamini Y, Drai D, Elmer GL, Kafkafi N, Golani I (2001). Controlling the false discovery rate in behavior genetics research. Behav Brain Res, 125:279–284. 61. Tong W, Xie Q, Hong H, et al. (2004). Using Decision Forest to classify prostate cancer samples on the basis of SELDI-TOF MS data: assessing chance correlation and prediction confidence. Toxicogenomics, 112:1622–1627. 62. Dale AI (2003). Most Honourable Remembrance: The Life and Work of Thomas Bayes. Springer-Verlag, New York. 63. Strachan T, Read AP (2003). Human Molecular Genetics. Garland Science, London. 64. Qi Y, Missiuro PE, Kapoor A, et al. (2006). Semi-supervised analysis of gene expression profiles for lineage-specific development in the Caenorhabditis elegans embryo. Bioinformatics, 22(14):e417–e423. 65. Pittman J, Huang E, Dressman H, et al. (2004). Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomes. Proc Nat Acad Sci USA, 101(22):8431–8436. 66. Nevins JR, Huang ES, Dressman H, Pittman J, Huang AT, West M (2003). Towards integrated clinico-genomic models for personalized medicine: combining gene expression signatures and clinical factors in breast cancer outcomes prediction. Hum Mol Genet, 12:R153–R157.
PART IV BIOMARKERS IN DISCOVERY AND PRECLINICAL SAFETY
287
14 QUALIFICATION OF SAFETY BIOMARKERS FOR APPLICATION TO EARLY DRUG DEVELOPMENT William B. Mattes, Ph.D., DABT The Critical Path Institute, Rockville, Maryland
Frank D. Sistare, Ph.D. Merck Research Laboratories, West Point, Pennsylvania
HISTORICAL BACKGROUND TO PRECLINICAL SAFETY ASSESSMENT It is often forgotten that the first “blockbuster” drug, sulfanilamide, was discovered and developed in an era devoid of regulatory oversight and guided only by free-market forces. Domagk discovered the antibacterial properties of Prontosil in 1932, and with the subsequent discovery in 1935 that the active moiety was the off-patent and widely available substance sulfanilamide, a number of companies rushed to make preparations for sale to the public. This explosion of therapy options was unfettered by requirements for medicines to be tested for efficacy or safety, although preparations could receive the endorsement of the American Medical Association [1]. Thus, in 1937, when the S.E. Massengill Company of Bristol, Tennessee, sought to prepare a flavored syrup, it simply identified an appropriate excipient to dissolve the drug, prepared 240 gallons of the raspberry-tasting Elixir Sulfanilamide, and marketed it across the nation. Unfortunately, the excipient chosen was diethylene glycol, also used as an antifreeze. We now know this agent to be lethal to a Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
289
290
QUALIFICATION OF SAFETY BIOMARKERS
large number of species, causing acute kidney injury at relatively modest doses. Before Elixer Sulfanilamide was identified as the causative agent and pulled from pharmacies, 34 children and 71 adults died. A year later, Congress passed the 1938 Federal Food, Drug and Cosmetic Act, requiring pharmaceutical manufacturers to show product safety before distribution [1]. Thus, society transitioned from its former free-market approach to pharmaceutical development to one of safety testing (initially in animals), careful clinical trials, and government oversight. Safety testing as it is practiced today owes most of its form to procedures developed by the U.S. Food and Drug Administration (FDA) for testing food [2]. As early as 1949, the publication of “Procedures for the Appraisal of the Toxicity of Chemicals in Foods” began to formalize the practices the agency expected industry to follow in safety testing [3]. These practices included standard study designs and expectations as to what experimental observations would be recorded. They have evolved into the descriptive toxicity tests well known to modern toxicology [4]. Key to the value of these tests is not only their experimental design in terms of dose, route of administration, and duration, but also the endpoints evaluated, going beyond the observations of overall animal behavior and health. Thus, a battery of clinical pathology tests examining urine, hematological parameters, and serum chemistry is commonly evaluated [5]. Importantly, a number of tissues from the animal are examined both macroscopically and microscopically after sacrifice, and this histopathological examination allows for the identification of unusual and subtle lesions and changes following compound treatment [6]. In the arena of pharmaceutical product development, these studies, carried out in at least two animal species, one rodent and one nonrodent, are used to assure the safety of human subjects exposed to experimental doses of novel compounds [7]. The types and durations of studies required to support safety in various types of clinical studies have been codified by the International Conference on Harmonisation (ICH) and described in their guidelines on nonclinical safety studies [8]. Even so, human subjects need to be monitored for “adverse events and/or laboratory abnormalities identified in the protocol as critical to safety evaluations” [9].
LIMITATIONS FACED IN PRECLINICAL SAFETY ASSESSMENT A critical problem faced by nonclinical safety assessment groups in pharmaceutical drug development is the disparity of responses sometimes seen between the two nonclinical test species in the tools used to assess these responses. Historically, microscopic histopathology is used as a primary tool for identifying compound-induced damage. When damage is identified only at exposures far exceeding those expected in clinical studies, clinical safety is expected. However, microscopic histopathology is not a tool generally applicable to human studies, where clinical pathology measurements play the criti-
LIMITATIONS IN SAFETY ASSESSMENT
291
cal role in assessing adverse responses to drugs. Thus, if damage is observed in one nonclinical species at exposures close to those anticipated for human studies, the crucial question is whether the onset and reversibility of such damage could be monitored with clinical pathology or some other relatively noninvasive technology. Unfortunately, as described here, there are several types of drug-induced organ injury where current clinical pathology assays do not detect damage with sufficient certainty at early stages, and where assurances are needed that discontinuation of drug treatment would be followed by a complete and swift return to normal structure and function.
Kidney Injury Kidney injury may be produced by a variety of insults, including those induced by drugs or toxicants [10]. Given the known morbidity and mortality associated with acute kidney injury [11,12], evidence of drug-induced kidney injury in preclinical studies is a matter of serious concern. While on one hand the kidney is capable of recovery from mild damage [13] if the injurious agent is removed, the very real clinical problem is that traditional noninvasive measures of kidney function are insensitive and confounded by many factors [13–16]. Thus, even modest increases in serum creatinine are associated with significant mortality [12]. There is a real need for noninvasive markers that would detect kidney damage or loss of function at a stage before significant and irreversible damage has occurred. Several markers with just such a potential have been described in numerous reviews [13–20]. However, many of these are described in relatively few clinical studies, most have not been examined carefully for their performance in animal models of drug-induced kidney injury, and no consensus understanding between drug development sponsors and regulatory review authorities had been reached as to their utility for regulatory decision-making purposes. Ultimately, if a microscopic histopathological examination shows evidence of even mild drug-induced kidney injury in a preclinical study at exposures close to those anticipated for clinical use, development of that compound may be stopped, even if human relevance may be questioned, yet unproven.
Liver Injury The fact that medicines can cause liver injury and failure has been appreciated for some time [21], and drug-induced liver injury remains a serious public health and drug development concern [22–25]. As with other drug-induced organ damage, it may be detected in preclinical studies through microscopic histopathology and clinical chemistry measurements. Since the late 1950s, serum transaminase measurements, in particular that of alanine aminotransferase (ALT), have served as a sensitive and less-invasive measure of liver damage in both animal and human settings [26]. In conjunction with serum
292
QUALIFICATION OF SAFETY BIOMARKERS
cholesterol, bilirubin, alkaline phosphatase, and other factors, ALT has served as a translational biomarker for drug-induced liver injury [27–30]. However, ALT elevations are not always associated with clear evidence of liver injury [31–33], and ALT elevations cannot clearly indicate the etiology of damage [26,27,30,34]. Furthermore, ALT measurements either alone or with bilirubin cannot distinguish patients on a trajectory to severe liver disease and inability to heal or recover from injury, from patients with a full capacity to compensate and return ALT levels to normal despite continuation of drug dosing [35]. Combinations of clinical pathology changes have been used, including ALT and bilirubin, to assure safety in clinical trials [36] but there remains a need for diagnostic assays that reliably link and/or predict the histological observation of liver injury in both a preclinical and clinical setting, and discriminate the types and trajectory of apparent injury [30]. Vascular Injury Although injury to the vascular system is known to be caused by a variety of agents [37], many classes of therapeutic agents produce vascular lesions, in preclinical species with or without clinical signs, and with normal routine clinical pathology data [38]. Often, different preclinical species show a different level and type of response, and in many cases (e.g., minoxidil) the vascular injury reported in preclinical species is not observed in a clinical setting [39]. Drug-induced vascular injury in animals may result from altered hemodynamic forces, from a direct chemical-mediated injury to cells of the vasculature, and/or to an indirect immune-mediated injury of the endothelium and/ or medial smooth muscle. The conundrum faced in drug development is that there are no specific and sensitive biomarkers of endothelial and/or vascular smooth muscle injury that are clearly linked to the histological observations in animals and could be used to monitor for injury in clinical settings. Although it is assumed that an inflammatory component may be active at some stage in this process, and biomarkers are proposed for such processes [40], biomarkers that are sufficiently sensitive at early and fully reversible stages of vascular injury have not been fully evaluated [38,41]. Furthermore, specific markers of vascular injury/inflammation are sought that can discriminate from the multitude of other more benign causes for elevations of inflammatory biomarkers. Drug-Induced Skeletal Myopathy With the introduction of hydroxymethylglutaryl-coenzyme A (HMG-CoA) reductase inhibitors (statins), not only was there a successful treatment of hypercholesterolemia and dyslipidemia, but soon also a heightened awareness of the issue of drug-induced myopathy [42]. Statin-induced myotoxicity ranges from mild myopathy to serious and sometimes fatal rhabdomyolysis. Not surprisingly, a variety of drugs have been reported to induce myotoxicities
WHY QUALIFY BIOMARKERS?
293
[43]. While skeletal muscle toxicity may be monitored with elevations in serum creatinine kinase (CK), urinary myoglobin, and other markers [42], these biomarkers lack the sensitivity to definitively diagnose early damage or to distinguish the various etiologies of the muscle injury [43]. Thus, there is a need for markers of drug-induced muscle injury with improved sensitivity, specificity, and general utility.
WHY QUALIFY BIOMARKERS? More often than not, new biomarkers are judged on the basis of whether they have been subjected to a process of validation. Strictly speaking, validation is a process to “establish, or illustrate the worthiness or legitimacy of something” [44]. For judgments of biomarker worthiness or legitimacy, an assessment is needed of both (1) the assay or analytical method to measure the biomarker, and (2) the performance against expectations of the biomarker response under a variety of biological or clinical testing conditions. The term validation reasonably applies to the first of these, the process by which the technical characteristics of an assay of a biomarker are defined and determined to be appropriate for the desired measurements [45]. Thus, Wagner has defined validation as the “fit-for-purpose process of assessing the assay and its measurement performance characteristics, determining the range of conditions under which the assay will give reproducible and accurate data” [46]. Even for assay validation, the concept of fit-for-purpose is introduced, which connotes that the process depends on context, and its level of rigor depends on the application of and purpose for the assay. Thus, a biomarker used for an exploratory purpose may not require the more rigorous analytical validation required of a biomarker used for critical decision making. The elements of biomarker assay validation that would be addressed for different categories of biomarker data and for different purposes have been discussed in this book and elsewhere, and they essentially constitute a continuum of bioanalytical method validation [45,47]. Such technical bioanalytical assay method validation is a familiar process and does not generally pose a problem for an organization embarked on assay development [45]. The term validation has also been applied to a process by which a new test method is confirmed to be broadly applicable to interpretation of biological meaning in a wide variety of contexts and uses, such as in the validation of alternatives to animal tests as overseen by the Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) [48]. This process involving assessment of biological performance expectations is in contrast to that of qualification, which Wagner defines as “the fit-for-purpose evidentiary process of linking a biomarker with biological processes and clinical endpoints” [46]. As for assay validation, this fit-for-purpose biological qualification concept marries the nature and extent of testing rigor to the intended application. In the case of biomarkers applied to predicting human
294
QUALIFICATION OF SAFETY BIOMARKERS
outcomes, four general phases have been proposed [49], and as the level of qualification progresses, the utility of a biomarker in clinical use increases [46]. In the case of biomarkers of safety (i.e., those that are used to predict or diagnose adverse responses to drug treatment), this qualification process will necessarily involve certain steps. As with qualification of a clinical disease or outcome marker, these steps link biomarker results with biological processes and endpoints. For example, in qualifying a biomarker for nonclinical use, the levels of a protein biomarker in urine may be correlated with certain chemically induced microscopic histopathology lesions in the kidneys of treated animals and thus serve as a noninvasive diagnostic of the appearance of that lesion. As with a clinical disease or outcome marker, qualification of such a marker could proceed in stages. For the example given, an initial stage may be the correlation described, using a variety of treatments that produce only that lesion measured. Such a correlation could establish the sensitivity of the biomarker. However, establishing the specificity of that biomarker for that particular lesion would require a number of treatments that did not produce the lesion being monitored, but instead produce no lesions in the kidney or anywhere else, produced different lesions in the kidney, and/or produced lesions in different organs. Furthermore, the diversity of chemical treatments (i.e., structural and mechanistic diversity) would also need to be considered in the qualification such that a variety of mechanisms underlying the genesis and progression of the lesion can be evaluated. Clearly, more data could support a higher level of qualification and thus a higher level of utility. For biomarker qualification, the highest phase or level of qualification is the status surrogate endpoint, in which case the biomarker can, in fact, substitute for and serve as the new standard clinical endpoint of how a patient feels, functions, or will survive: for example, in efficacy determinations to support marketing approval decisions. Qualified biomarkers that fall short as surrogate endpoints are nevertheless extremely valuable for both drug development and the general practice of medicine. The key to appreciating the value that will come from opportunities to deploy such biomarkers appropriately is in revealing a thorough understanding of their inherent strengths and limitations. In the early steps of designing studies to test sensitivity and specificity aspects of a biomarker’s performance to reveal that thorough understanding, the strategy may be relatively clear. For safety biomarker qualification, the first and most important attributes to benchmark are knowledge of the biomarker link to biology and outcome, sufficient test sensitivity, and minimizing false test negatives. Evaluating the response of a new proposed biomarker in animals against biomarkers in conventional use using an agreed-upon set of well-recognized toxicants known to induce the desired organ injury is a fairly straightforward strategy. Pivotal, however, to the successful execution of such studies is to provoke a sufficient number of cases where the timing of the samples taken and the choice of dose levels will yield subtle and mild treatment-related effects at the boundary between normal and abnormal. A full spectrum of histologic treatment-related lesions from very
WHY QUALIFY BIOMARKERS?
295
slight, to slight, mild, moderate, marked, and severe will be important in this regard for evaluating biomarker sensitivity. The approaches taken for qualification of a safety biomarker for clinical uses will necessarily be different from those taken for nonclinical uses. Clearly, one cannot expect to have studies with healthy subjects intentionally exposed to a variety of toxicants, nor can one regularly use microscopic histopathology as a benchmark for clinical toxicity. Nonetheless, the goal is to reproducibly link the biomarker to a clinical outcome currently recognized and widely accepted as adverse. For certain types of drug-induced injury, there are standard-of-care treatments that unfortunately are associated with a known incidence of drug-induced injury. As an example, aminoglycoside antibiotic treatment is associated with a significant incidence of nephrotoxicity [50], mirroring the effects seen in animal models. Similarly, isoniazid treatment has a known risk of hepatotoxicity [51]. Thus, one approach to safety biomarker qualification in the clinic is to monitor novel biomarker levels longitudinally over the course of such a treatment and compare these with the current gold standard commonly used clinical biomarkers and outcomes [52]. Of course, one can complement these studies with those examining biomarker levels in patients with organ injury of a disease etiology [53]. The number of known agents appropriate for testing the sensitivity of safety biomarkers for certain target organ toxicities may be limited. It is generally expected that the number of studies conducted to evaluate sensitivity should reasonably represent a high percentage of the known but limited diverse mechanisms available for testing in animal and human studies. If the mechanisms are varied for each test agent and sensitivity performance remains high, the biomarker will probably find strong use potential. Specificity tests then become very important considerations in a qualification strategy. The number of test compounds that could be deployed to assess the false-positive test rate is far more expansive than the number of known compounds for testing sensitivity. Specificity testing therefore becomes a highly individualized dimension to a biomarker biological qualification strategy for biomarkers that pass tests of sensitivity. The two critical questions to address are whether (1) there are alternative tissue sources to account for alterations in test safety biomarker levels, and (2) whether there are benign, nontoxicologic mechanisms to account for alterations in test biomarker levels. To evaluate specificity, therefore, a prioritized experimental approach should be taken using logical reasoning to avoid the testing burdens of an endless number of possible studies. The ultimate goal of these studies is the qualification to the level of what Wagner et al. define as a characterization biomarker [49], a biomarker “associated with adequate preclinical sensitivity and specificity and reproducibly linked clinical outcomes in more than one prospective clinical study in humans.” Such a level of qualification then supports the regulatory use of these biomarkers for safety monitoring in early clinical studies with a new pharmaceutical candidate. A strong case has been made that for safety biomarkers for regulatory decision-making purposes, where the rigor of
296
QUALIFICATION OF SAFETY BIOMARKERS
supporting evidence would be expected to be high, that only fully qualified or “characterization” biomarkers are appropriate, and that there is really no regulatory decision-making role for exploratory and “emerging” or “probable valid” biomarkers. In this regard, measurements of such unqualified biomarkers in animal studies used to support the safe conduct of clinical trials would not be expected to contribute unambiguously and sufficiently to study interpretation, should not require submission to regulatory authorities, and therefore the exploration of their utility in such highly regulated studies should be encouraged [54] in order to accelerate the pace of biomarker evaluations and understanding.
COLLABORATION IN BIOMARKER QUALIFICATION Clearly, the number of preclinical and clinical studies and resources required to qualify a biomarker as a characterization biomarker appropriate for regulatory decision making are significant. Thus, it is no surprise that in their Critical Path Opportunities List the FDA called for “collaborations among sponsors to share what is known about existing safety assays” [55]. Collaborations of this type have indeed played key roles in addressing technological problems common to a competitive industry. Thus, Sematech, a consortium formed in 1987 and made up of 14 leading U.S. semiconductor producers, addressed common issues in semiconductor manufacture and increased research and development (R&D) efficiency by avoiding duplicative research [56]. Sematech demonstrates that consortia can provide an opportunity for industry scientists to pool their expertise and experience to confront mutual questions collectively. The International Life Sciences Institute has for several years served as a forum for collaborative efforts between industry and academia [57], and for the past six years a Biomarkers Technical Committee has been pursuing assay development and evaluation of biomarkers of nephrotoxicity and cardiotoxicity [58]. Recently, the Critical Path Institute was incorporated as a “neutral, third party” to serve as a consortium organizer [59] and interface between industry members and the FDA. One of its first efforts was the Predictive Safety Testing Consortium (PSTC) [60,61], with a specific focus on qualification of biomarkers for regulatory use. The PSTC legal agreement addresses issues, such as intellectual property, antitrust concerns, and confidentiality, and thus assures open collaboration in a manner consistent with applicable legal requirements. The PSTC solicited representatives from the FDA and the European Medicines Agency (EMEA) to serve as advisors. As experts in various areas of toxicity, these advisors bring not only their expertise but also the experience of how problems of a given target-organ toxicity are confronted and could be addressed in a regulatory setting. Thus, the development of qualification data is targeted with a keen eye toward what will ultimately support safety decisions in the regulated drug development and regulatory review process.
REFERENCES
297
REFERENCES 1. Wax PM (1995). Elixirs, diluents, and the passage of the 1938 Federal Food, Drug and Cosmetic Act. Ann Intern Med, 122:456–461. 2. Miller SA (1993). Science, law and society: the pursuit of food safety. J Nutr, 123:279–284. 3. Stirling D, Junod S (2002). Arnold J. Lehman. Toxicol Sci, 70:159–160. 4. Eaton D, Klaassen CD (2001). Principles of toxicology. In Klaassen CD (ed.), Casarett and Doull’s Toxicology, 6th ed. McGraw-Hill, New York, pp. 11–34. 5. Weingand K, Brown G, Hall R, et al. (1996). Harmonization of animal clinical pathology testing in toxicity and safety studies. The Joint Scientific Committee for International Harmonization of Clinical Pathology Testing. Fundam Appl Toxicol, 29:198–201. 6. Bregman CL, Adler RR, Morton DG, Regan KS, Yano BL (2003). Recommended tissue list for histopathologic examination in repeat-dose toxicity and carcinogenicity studies: a proposal of the Society of Toxicologic Pathology (STP). Toxicol Pathol, 31:252–253. 7. FDA (1997). International Conference on Harmonisation; Guidance on General Considerations for Clinical Trials. Federal Register, p. 66113. 8. FDA (2008). International Conference on Harmonisation; Draft Guidance on M3(R2) Nonclinical Safety Studies for the Conduct of Human Clinical Trials and Marketing Authorization for Pharmaceuticals. Federal Register, pp. 51491–51492. 9. FDA (1997). International Conference on Harmonisation; Good Clinical Practice: Consolidated Guideline. Federal Register, pp. 25691–25709. 10. Schnellmann RG (2001). Toxic responses of the kidney. In Klaassen CD (ed.), Casarett and Doull’s Toxicology, 6th ed. McGraw-Hill, New York, pp. 491–514. 11. Hoste EA, Clermont G, Kersten A, et al. (2006). RIFLE criteria for acute kidney injury are associated with hospital mortality in critically ill patients: a cohort analysis. Crit Care, 10:R73. 12. Chertow GM, Burdick E, Honour M, Bonventre JV, Bates DW (2005). Acute kidney injury, mortality, length of stay, and costs in hospitalized patients. J Am Soc Nephrol, 16:3365–3370. 13. Vaidya VS, Ferguson MA, Bonventre JV (2008). Biomarkers of acute kidney injury. Annu Rev Pharmacol Toxicol, 48:463–493. 14. Trof RJ, Di Maggio F, Leemreis J, Groeneveld AB (2006). Biomarkers of acute renal injury and renal failure. Shock, 26:245–253. 15. Molitoris BA, Melnikov VY, Okusa MD, Himmelfarb J (2008). Technology Insight: biomarker development in acute kidney injury—what can we anticipate? Nat Clin Pract Nephrol, 4:154–165. 16. Ferguson MA, Vaidya VS, Bonventre JV (2008). Biomarkers of nephrotoxic acute kidney injury. Toxicology, 245:182–193. 17. Devarajan P (2007). Emerging biomarkers of acute kidney injury. Contrib Nephrol, 156:203–212. 18. Bagshaw SM, Langenberg C, Haase M, Wan L, May CN, Bellomo R (2007). Urinary biomarkers in septic acute kidney injury. Intensive Care Med, 33:1285–1296.
298
QUALIFICATION OF SAFETY BIOMARKERS
19. Nguyen MT, Devarajan P (2007). Biomarkers for the early detection of acute kidney injury. Pediatr Nephrol, 23:2151–2157. 20. Dieterle F, Marrer E, Suzuki E, Grenet O, Cordier A, Vonderscher J (2008). Monitoring kidney safety in drug development: emerging technologies and their implications. Curr Opin Drug Discov Dev, 11:60–71. 21. Zimmerman HJ (1999). Hepatotoxicity: The Adverse Effects of Drugs and Other Chemicals on the Liver, 2nd ed. Lippincott Williams & Wilkins, Philadelphia. 22. Maddrey WC (2005). Drug-induced hepatotoxicity: 2005. J Clin Gastroenterol, 39: S83–S89. 23. Arundel C, Lewis JH (2007). Drug-induced liver disease in 2006. Curr Opin Gastroenterol, 23:244–254. 24. Watkins PB, Seeff LB (2006). Drug-induced liver injury: summary of a single topic clinical research conference. Hepatology, 43:618–631. 25. Bleibel W, Kim S, D’Silva K, Lemmer ER (2007). Drug-induced liver injury: review article. Dig Dis Sci, 52:2463–2471. 26. Kim WR, Flamm SL, Di Bisceglie AM, Bodenheimer HC (2008). Serum activity of alanine aminotransferase (ALT) as an indicator of health and disease. Hepatology, 47:1363–1370. 27. Reichling JJ, Kaplan MM (1988). Clinical use of serum enzymes in liver disease. Dig Dis Sci, 33:1601–1614. 28. Ozer J, Ratner M, Shaw M, Bailey W, Schomaker S (2008). The current state of serum biomarkers of hepatotoxicity. Toxicology, 245:194–205. 29. Lock EA, Bonventre JV (2008). Biomarkers in translation; past, present and future. Toxicology, 245:163–166. 30. Amacher DE (2002). A toxicologist’s guide to biomarkers of hepatic response. Hum Exp Toxicol, 21:253–262. 31. Pettersson J, Hindorf U, Persson P, et al. (2008). Muscular exercise can cause highly pathological liver function tests in healthy men. Br J Clin Pharmacol, 65:253–259. 32. Giboney PT (2005). Mildly elevated liver transaminase levels in the asymptomatic patient. Am Fam Physician, 71:1105–1110. 33. Gaskill CL, Miller LM, Mattoon JS, et al. (2005). Liver histopathology and liver and serum alanine aminotransferase and alkaline phosphatase activities in epileptic dogs receiving Phenobarbital. Vet Pathol, 42:147–160. 34. Shapiro MA, Lewis JH (2007). Causality assessment of drug-induced hepatotoxicity: promises and pitfalls. Clin Liver Dis, 11:477–505. 35. Andrade RJ, Lucena MI, Fernandez MC, et al. (2005). Drug-induced liver injury: an analysis of 461 incidences submitted to the Spanish registry over a 10-year period. Gastroenterology, 129:512–521. 36. Hunt CM, Papay JI, Edwards RI, et al. (2007). Monitoring liver safety in drug development: the GSK experience. Regul Toxicol Pharmacol, 49:90–100. 37. Ramos KS, Melchert RB, Chacon E, Acosta D Jr (2001). Toxic responses of the heart and vascular systems. In Klaassen CD (ed.), Casarett and Doull’s Toxicology, 6th ed. McGraw-Hill, New York, pp. 597–651.
REFERENCES
299
38. Kerns W, Schwartz L, Blanchard K, et al. (2005). Drug-induced vascular injury: a quest for biomarkers. Toxicol Appl Pharmacol, 203:62–87. 39. Mesfin GM, Higgins MJ, Robinson FG, Zhong WZ (1996). Relationship between serum concentrations, hemodynamic effects, and cardiovascular lesions in dogs treated with minoxidil. Toxicol Appl Pharmacol, 140:337–344. 40. Blake GJ, Ridker PM (2001). Novel clinical markers of vascular wall inflammation. Circ Res, 89:763–771. 41. Louden C, Brott D, Katein A, et al. (2006). Biomarkers and mechanisms of druginduced vascular injury in non-rodents. Toxicol Pathol, 34:19–26. 42. Tiwari A, Bansal V, Chugh A, Mookhtiar K (2006). Statins and myotoxicity: a therapeutic limitation. Expert Opin Drug Saf, 5:651–666. 43. Owczarek J, Jasinska M, Orszulak-Michalak D (2005). Drug-induced myopathies: an overview of the possible mechanisms. Pharmacol Rep, 57:23–34. 44. Merriam-Webster Online Dictionary (2008). 45. FDA (2001). Guidance for Industry on Bioanalytical Method Validation. Federal Register, pp. 28526–28527. 46. Wagner JA (2008). Strategic approach to fit-for-purpose biomarkers in drug development. Annu Rev Pharmacol Toxicol, 48:631–651. 47. Lee JW, Devanarayan V, Barrett YC, et al. (2006). Fit-for-purpose method development and validation for successful biomarker measurement. Pharm Res, 23:312–328. 48. Stokes WS, Schechtman LM, Rispin A, et al. (2006). The use of test method performance standards to streamline the validation process. Altex, 23 (Suppl):342–345. 49. Wagner JA, Williams SA, Webster CJ (2007). Biomarkers and surrogate end points for fit-for-purpose development and regulatory evaluation of new drugs. Clin Pharmacol Ther, 81:104–107. 50. Wiland P, Szechcinski J (2003). Proximal tubule damage in patients treated with gentamicin or amikacin. Pol J Pharmacol, 55:631–637. 51. Tostmann A, Boeree MJ, Aarnoutse RE, de Lange WC, van der Ven AJ, Dekhuijzen R (2008). Antituberculosis drug-induced hepatotoxicity: concise upto-date review. J Gastroenterol Hepatol, 23:192–202. 52. Mishra J, Dent C, Tarabishi R, et al. (2005). Neutrophil gelatinase-associated lipocalin (NGAL) as a biomarker for acute renal injury after cardiac surgery. Lancet, 365:1231–1238. 53. Han WK, Bailly V, Abichandani R, Thadhani R, Bonventre JV (2002). Kidney injury molecule-1 (KIM-1): a novel biomarker for human renal proximal tubule injury. Kidney Int, 62:237–244. 54. Sistare FD, DeGeorge JJ (2008). Applications of toxicogenomics to nonclinical drug development: regulatory science considerations. Methods Mol Biol, 460:239–261. 55. FDA (2006). Critical Path Opportunities Report and List, p 28. 56. Irwin DA, Klenow PJ (1996). Sematech: purpose and performance. Proc Natl Acad Sci USA, 93:12739–12742. 57. ILSI (2007). ILSI About ILSI.
300
QUALIFICATION OF SAFETY BIOMARKERS
58. ILSI and HESI (2007). ILSI: Development and Application of Biomarkers of Toxicity. 59. Woosley RL, Cossman J (2007). Drug development and the FDA’s Critical Path Initiative. Clin Pharmacol Ther, 81:129–133. 60. Marrer E, Dieterle F (2007). Promises of biomarkers in drug development: a reality check. Chem Biol Drug Des, 69:381–394. 61. Mattes WB (2008). Public consortium efforts in toxicogenomics. Methods Mol Biol, 460:221–238.
15 DEVELOPMENT OF SERUM CALCIUM AND PHOSPHORUS AS CLINICAL BIOMARKERS FOR DRUG-INDUCED SYSTEMIC MINERALIZATION: CASE STUDY WITH A MEK INHIBITOR Alan P. Brown, Ph.D., DABT Pfizer Global Research and Development, Ann Arbor, Michigan
INTRODUCTION The mitogen-activated protein kinase (MAPK) signal transduction pathways control key cellular processes such as growth, differentiation, and proliferation, and provide a means for transmission of signals from the cell surface to the nucleus. As a part of the RAS–RAF–MEK–MAPK pathway, MEK (MAP kinase kinase) phosphorylates the MAPK proteins ERK1 and ERK2 (extracellular signal-regulated kinases) as a means for intracellular signaling (SeboltLeopold, 2000). Although MEK has not been identified as having oncogenic properties, this kinase serves as a focal point in the signal transduction pathway of known oncogenes (e.g., RAS and RAF) (Mansour et al., 1994). MEK exists downstream of various receptor tyrosine kinases (such as the epidermal growth factor receptor) which have been demonstrated to be important in neoplasia (Jost et al., 2001). RAS activation occurs first, followed by recruitment of RAF (A-RAF, B-RAF, or RAF-1) proteins to the cell membrane through binding to RAS, with subsequent activation of RAF. RAF phosphorylates MEK1 and MEK2 on multiple serine residues in the activation Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
301
302
DEVELOPMENT OF SERUM CALCIUM AND PHOSPHORUS
process. MEK1 and MEK2 phosphorylate tyrosine or threonine residues on ERK proteins in the signal transduction process, with phosphorylated ERK activating various transcription factors (Friday and Adjei, 2008). Aberrant activation of this pathway has been observed in a diverse group of solid tumors, along with leukemia, and is believed to play a key role in tumorigenesis (Hoshino et al., 1999; Milella et al., 2001). Based on a significant amount of preclinical data, development of smallmolecule inhibitors of MEK appears to be a rational approach for treatment of various malignancies (Sebolt-Leopold et al., 1999; Dent and Grant, 2001). The first MEK inhibitor to enter clinical trials was CI-1040 (also known as PD0184352), which was intended for oral administration. However, the level of antitumor activity in a multicenter phase II study in patients with various solid tumors was not sufficient to warrant further development of this drug (Rinehart et al., 2004; Wang et al., 2007). CI-1040 exhibited low oral bioavailability and high metabolism, which were primary factors resulting in insufficient plasma drug levels for antitumor activity. PD0325901 [Figure 1; chemical name of N-((R)-2,3-dihydroxypropoxy)-3,4difluoro-2-(2-fluoro-4-iodo-phenylamino)benzamide] is a highly potent and specific non-ATP competitive inhibitor of MEK (Ki of 1 nM against activated MEK1 and MEK2 in vitro), and demonstrated anticancer activity against a broad spectrum of human tumors in murine models (Sebolt-Leopold et al., 2004). Preclinical studies indicate that PD0325901 has the potential to impair growth of human tumors that rely on the MEK/MAPK pathway for growth and survival. PD0325901 inhibits the phosphorylation of MAPK proteins (ERK1 and ERK2) as a biochemical mechanism of action, and assays were developed to evaluate inhibition of protein phosphorylation in normal and neoplastic tissues (Brown et al., 2007). This compound has greatly improved pharmacologic and pharmaceutical properties compared with CI-1040 (i.e.,
OH OH
O HN
O H N
F
F
I
F
Figure 1
Chemical structure of PD0325901.
TOXICOLOGY STUDIES
303
greater potency for MEK inhibition, higher bioavailability, and increased metabolic stability) and has significant promise for determining the therapeutic potential for treating cancer with an orally active MEK inhibitor (Rinehart et al., 2004; Sebolt-Leopold et al., 2004). PD0325901 was selected for development as a clinical candidate due to its superior preclinical profile compared to CI-1040 (Wang et al., 2007). Toxicology studies were subsequently initiated to support the conduct of a phase I or II clinical trial in cancer patients with various solid tumors (advanced breast cancer, colon cancer, melanoma, nonsmall cell lung cancer) utilizing oral administration of the drug.
TOXICOLOGY STUDIES The nonclinical safety of PD0325901 was evaluated in Sprague–Dawley rats given single oral or intravenous (IV) doses, in beagle dogs given oral and IV escalating doses, and in cynomolgus monkeys given escalating oral doses to assess acute toxicity and assist in dose selection for subsequent studies. The potential effects of PD0325901 on central nervous system, cardiovascular, and pulmonary function were evaluated in single-dose safety pharmacology studies. Two-week dose-range finder (nonpivotal) oral toxicity studies were conducted in rats, dogs, and monkeys to assist in dose and species selection for the pivotal one-month oral toxicology studies. In addition, an investigative oral toxicity study was conducted in female Balb/c mice. The dog was selected as the nonrodent species for the pivotal toxicology study because of the following data. Metabolites of PD0325901 identified in human liver microsomal incubations were also present following incubation with dog liver microsomes, plasma protein binding of PD0325901 is similar in dogs and humans (>99%), and oral bioavailability in dogs is high (>90%). Finally, injury to the mucosa of the gastrointestinal tract occurred at lower doses and exposures in dogs than in monkeys (based on dose-range-finding studies), indicating the dog as the more sensitive nonrodent species. Pivotal one-month oral toxicity studies, including one-month reversal phases, were conducted in beagle dogs and Sprague–Dawley rats to support submission of an investigational new drug (IND) application to the U.S. Food and Drug Administration. A list of toxicology studies of PD0325901 conducted prior to initiation of human testing is presented in Table 1. Upon completion of the first two-week dose-range-finding study in rats, a significant and unique toxicity was observed that involved mineralization of vasculature (Figure 2) and various soft tissues (i.e., ectopic or systemic mineralization) as determined by routine light-microscopic evaluation. In a follow-up study in rats, dysregulation of serum calcium and phosphorus homeostasis, and systemic mineralization occurred in a time- and dosedependent manner. This toxicity was not observed in dogs or monkeys, despite systemic exposures to PD0325901 more than 10-fold higher than those associated with mineralization in rats and pharmacologic inhibition of
304
DEVELOPMENT OF SERUM CALCIUM AND PHOSPHORUS
TABLE 1
Summary of Toxicology Studies Conducted with PD0325901a
Acute and escalating dose Single dose in rats Single dose in rats, IVb Escalating dose in dogs Escalating dose in dogs, IV Escalating dose in monkeys Safety pharmacology Neurofunctional evaluation in rats Neurofunctional evaluation in rats, IV Cardiovascular effects in monkeys Pulmonary effects in rats Purkinje fiber assay HERG assay Nonpivotal repeated-dose studies 2-Week dose-range finder in rats Exploratory 2-week study in rats 2-Week dose-range finder in dogs 2-Week dose-range finder in monkeys Pivotal repeated-dose studies One month in rats (plus one-month reversal phase) One month in dogs (plus one-month reversal phase) Pivotal genetic toxicity studies Bacterial mutagenicity Structural chromosome aberration In vivo micronucleus in rats Special toxicity studies Pharmacodynamic and toxicokinetic in rats, oral and IV Time course and biomarker development in rats Serum chemistry reversibility study in rats Investigative study in mice Enantiomer (R and S) study in rats PD0325901 in combination with pamidronate or Renagel in rats a
All animal studies were conducted by oral gavage unless otherwise indicated. IV, intravenous (bolus).
b
phosphorylated MAPK in canine or monkey tissue (demonstrating biochemical activity of PD0325901 at the target protein, i.e., MEK). Various investigative studies were conducted to examine the time course and potential mechanism of systemic mineralization in rats and to identify biomarkers that could be used to monitor for this effect in clinical trials. Next we describe the key studies conducted to investigate this toxicity, the results obtained, and how the nonclinical data were utilized to evaluate the safety risk of the compound, select a safe starting dose for a phase I trial, and provide measures to ensure patient safety during clinical evaluation of PD0325901 in cancer patients.
TOXICOLOGY STUDIES
305
Figure 2 Mineralization of the aorta in a male rat administered PD0325901 at 3 mg/kg in a dose-range-finding study. Arrows indicate mineral in the aorta wall. Hematoxylin and eosin–stained tissue section. (See insert for color reproduction of the figure.)
At the beginning of the toxicology program for PD0325901, a two-week oral-dose-range-finding study was conducted in male and female rats in which daily doses of 3, 10, and 30 mg/kg (18, 60, and 180 mg/m2, respectively) were administered. Mortality occurred in males at ≥3 mg/kg and females at ≥10 mg/ kg, with toxicity occurring to a greater extent in males at all dose levels. Increased serum levels of phosphorus (13 to 69%), and decreased serum total protein (12 to 33%) and albumin (28 to 58%) were seen at all doses. Lightmicroscopic evaluation of formalin-fixed and hematoxylin- and eosin-stained tissues was performed. Mineralization occurred in the aorta (Figure 2) and coronary, renal, mesenteric, gastric, and pulmonary vasculature of males at ≥3 mg/kg and in females at ≥10 mg/kg. Parenchymal mineralization with associated degeneration occurred in the gastric mucosa and muscularis, intestines (muscularis, mucosa, submucosa), lung, liver, renal cortical tubules, and/or myocardium at the same doses. Use of the Von Kossa histology stain indicated the presence of calcium in the mineralized lesions. Vascular/parenchymal mineralization and degeneration were generally dose related in incidence and severity. PD0325901 produced increased thickness (hypertrophy) of the femoral growth plate (physis) in both sexes at all doses, and degeneration and necrosis of the femoral metaphysis in males at ≥3 mg/kg and females at 30 mg/ kg. In addition, skin ulceration, hepatocellular necrosis, decreased crypt goblet cells, reduced hematopoetic elements, and ulcers of cecum and duodenum were observed. Systemic mineralization of the vasculature and soft tissues was
306
DEVELOPMENT OF SERUM CALCIUM AND PHOSPHORUS
the most toxicologically significant finding of this study. At this time, it was not known whether the hyperphosphatemia was due to decreased renal clearance (York and Evans, 1996) and/or related to the mineralization. However, hyperphosphatemia and elevated serum calcium–phosphorus (Ca × P) product can result in vascular and/or soft tissue mineralization (Spaulding and Walser, 1970; Block, 2000; Giachelli et al., 2001). In addition, morphologic findings similar to those seen in this study are observed in various animal species (e.g., dogs, horses, pigs, rats) with vitamin D toxicosis and altered calcium homeostasis (Grant et al., 1963; Spangler et al., 1979; Harrington and Page, 1983; Long, 1984). Tissue mineralization is observed in the aorta, various arteries, myocardium, gastric mucosa, and renal tubules, along with other soft tissues in these animals. An exploratory two-week oral toxicity study was next conducted in male and female rats to further investigate the toxicities observed in the initial twoweek dose-range finder. The objectives of this study were to identify a minimal or no-adverse-effect level and to provide toxicity, toxicokinetic, and pharmacodynamic data to aid in dose selection for future studies. In addition, an attempt was made to assess whether alterations in phosphorus and calcium homeostasis occur and whether changes can be monitored as potential biomarkers of toxicity. Doses tested in this study were 0.3, 1, or 3 mg/kg (1.8, 6, or 18 mg/m2, respectively) and animals were dosed for up to 14 days. Cohorts of animals (5/sex/group) were necropsied on days 4 and 15, and hematology, serum biochemistry, plasma intact parathyroid hormone (PTH), and urinary parameters were evaluated. Urinalysis included measurement of calcium, phosphorus, and creatinine levels. Select tissues were examined microscopically, and samples of liver and lung were evaluated for total and phosphorylated MAPK (pMAPK) levels by Western blot analysis to evaluate for pharmacologic activity of PD0325901 (method described in Brown et al., 2007). Satellite animals were included for plasma drug-level analyses on day 8. In this study, systemic mineralization occurred at ≥0.3 mg/kg in a dosedependent fashion, was first observed on day 4, and was more severe in males. By day 15, mineralization was generally more pronounced and widespread. Skeletal changes included hypertrophy of the physeal zone in males at ≥1 mg/ kg and at 3 mg/kg in females, and necrosis of bony trabeculae and marrow elements with fibroplasia, fibro-osseous proliferation, and/or localized hypocellularity at ≥1 mg/kg in males and 3 mg/kg in females. The minimal plasma PD0325901 AUC(0–24) values associated with toxicity were 121 to 399 ng · h/mL, which were well below exposure levels associated with antitumor efficacy in murine models (AUC of 1180 to 1880 ng · h/mL). Pharmacologic inhibition of tissue pMAPK occurred at ≥1 mg/kg and was not observed in the absence of toxicity. The gastric fundic mucosa appeared to be the most sensitive tissue for evaluating systemic mineralization, which probably resulted from alterations in serum calcium and phosphorus homeostasis. This was based on the following observations. On day 4, serum phosphorus levels were increased 12
TOXICOLOGY STUDIES
307
TABLE 2 Mean Clinical Chemistry Changes in Male Rats Administered PD0325901 for Up to 2 Weeks PD0325901
Serum phosphorus (mg/dL) Serum calcium (mg/dL) Serum albumin (g/dL) Plasma PTH (pg/mL)b
Day
Control
0.3 mg/kg
1 mg/kg
3 mg/kg
4 15 4 15 4 15 4 15
12.90 11.30 10.58 10.38 2.74 2.56 492 1099
13.08 11.56 10.36 10.36 2.56 2.54 297 268
14.48 12.88 10.10 10.52 2.10a 2.36a 114a 457
16.24a 13.62a 10.16 10.36 2.04a 1.98a 155a 115a
p, < 0.01 vs. control; n = 5/group. Intact parathyroid hormone.
a
b
to 26%, and albumin was decreased 17 to 26% at ≥1 mg/kg (Table 2, male data only). In addition, PTH levels were decreased in a dose-dependent fashion (60 to 77%) at ≥1 mg/kg. On day 15, phosphorus levels were increased 21% in males at 3 mg/kg, and albumin was decreased 8 to 32% at ≥0.3 mg/kg. PTH levels were decreased 77 to 89% at 3 mg/kg. Changes in urinary excretion of calcium and phosphorus were observed in both sexes at ≥1 mg/kg and included increased excretion of phosphorus on day 15. Although increases in excretion of calcium were observed on day 4 in females, males exhibited decreases in urinary calcium. In this study, PD0325901 administration resulted in significantly decreased levels of serum albumin without changes in serum (total) calcium levels (Payne et al., 1979; Meuten et al., 1982; Rosol and Capen, 1997). This indicates that free, non-protein-bound calcium levels were increased. Hyperphosphatemia and hypercalcemia result in an increased Ca × P product, which is associated with induction of vascular mineralization (Block, 2000; Giachelli et al., 2001). The changes observed in urinary excretion of calcium and phosphorus probably reflected the alterations in serum levels. After completion of the two studies in rats described above it was concluded that PD0325901 produces significant multiorgan toxicities in rats with no margin between plasma drug levels associated with antitumor efficacy, pharmacologic inhibition of pMAPK (as an index of MEK inhibition), and toxicity in rats. Systemic mineralization was considered the preclinical toxicity of greatest concern, due to the severity of the changes observed and expectation of irreversibility, and the data suggested that it was related to a dysregulation in serum phosphorus and calcium homeostasis. Furthermore, skeletal lesions were seen in the rat studies that were similar to those reported with vitamin D toxicity and may be related to the calcium–phosphorus dysregulation. In concurrent toxicology studies in dogs and monkeys, neither systemic
308
DEVELOPMENT OF SERUM CALCIUM AND PHOSPHORUS
mineralization nor skeletal changes were observed, despite higher plasma drug exposures, lethal doses, or pharmacologic inhibition of MEK. Therefore, the following questions were posed regarding PD0325901-induced systemic mineralization: (1) What is a potential mechanism? (2) Is this toxicity relevant to humans or rat-specific? and (3) Can this toxicity be monitored clinically? The ability of an anticancer agent that modulates various signal transduction pathways to produce dysregulation in serum calcium homeostasis is not unprecedented. 8-Chloro-cAMP is an experimental compound that has been shown to modulate various protein kinase signal transduction pathways involved in neoplasia. In preclinical models, this compound produced growth inhibition and increased differentiation in cancer cells (Ally et al., 1989). In a clinical trial, 8-chloro-cAMP was administered to patients with advanced cancer via intravenous infusion and resulted in dose-limiting toxicity of reversible hypercalcemia, as serum calcium was increased by up to approximately 40% (Saunders et al., 1997). This drug produced a parathyroid hormone-like effect in these patients, resulting in increased synthesis of 1,25-dihydroxyvitamin D (up to 14 times baseline value) as a mechanism for the hypercalcemia. Intravenous administration of 8-chloro-cAMP to beagle dogs also resulted in hypercalcemia (serum calcium increased 37 to 46%), indicating similar actions across species (Brown et al., 2000). Experience with this compound was important with respect to designing investigative studies with PD0325901 in which the hormonal control of serum calcium and phosphorus were evaluated. An investigative study was designed in rats to examine the time course for tissue mineralization in target organs and to determine whether clinical pathology changes occur prior to, or concurrent with, lesion development (Brown et al., 2005a). These clinical pathology parameters may therefore serve as biomarkers for systemic mineralization. Male rats (15/group) were used due to their increased sensitivity for this toxicity compared with females. Oral doses tested were 1, 3, or 10 mg/kg (6, 18, or 60 mg/m2, respectively). Five animals per group were necropsied on days 2, 3, or 4 following 1, 2, or 3 days of treatment, respectively. Clinical laboratory tests were conducted at necropsy that included serum osteocalcin, urinalysis, and plasma intact PTH, calcitonin, and 1,25-dihydroxyvitamin D. Lung samples were evaluated for inhibition of pMAPK, and microscopic evaluations of the aorta, distal femur with proximal tibia, heart, and stomach were conducted for all animals. Administration of PD0325901 resulted in inhibition of pMAPK in lung at all doses, demonstrating pharmacologic activity of the drug. On day 2, mineralization of gastric fundic mucosa and multifocal areas of necrosis of the ossifying zone of the physis were present only at 10 mg/kg. Necrosis of the metaphysis was present at ≥3 mg/kg. Serum phosphorus levels increased 33 to 43% and 1,25-dihydroxyvitamin D increased two- to sevenfold at all doses (Table 3). Osteocalcin increased 14 to 18%, and serum albumin decreased 8 to 14% at ≥3 mg/kg (Table 4). Osteocalcin is a major noncollagenous protein of bone matrix and synthesized by osteoblasts (Fu and Muller, 1999). Changes in serum osteocalcin can reflect alterations in bone turnover (resorption/
TOXICOLOGY STUDIES
309
TABLE 3 Mean Serum Phosphorus and Plasma 1,25-Dihydroxyvitamin D in Male Rats Administered PD0325901 for Up to 3 Days of Dosing PD0325901
Serum phosphorus (mg/dL) 1,25-Dihydroxyvitamin D (pg/mL)
Day
Control
1 mg/kg
3 mg/kg
10 mg/kg
2 3 4 2 3 4
12.06 11.48 11.34 309 257 191
16.10* 12.96* 13.18* M 856* 396 236 M
17.22* 15.62* M 15.40*M 1328* 776* M 604* M
16.84* Ma 19.02* M 21.70* M 2360* M 1390* M 1190* M
a
M, systemic mineralization observed. *, p < 0.01 vs. control; n = 5/group.
TABLE 4 Mean Serum Calcium and Albumin in Male Rats Administered PD0325901 for Up to 3 Days of Dosing PD0325901
Serum calcium (mg/dL)
Serum albumin (g/dL)
Day
Control
1 mg/kg
3 mg/kg
10 mg/kg
2 3 4 2 3 4
10.42 9.60 10.44 3.08 2.88 2.90
11.04 10.58 10.44 M 2.92 2.68 2.34** M
11.00 10.64 M 10.58 M 2.82* 2.62 M 2.34** M
10.66 Ma 10.58 M 7.24** M 2.66** M 2.34** M 1.98** M
a
M, systemic mineralization observed. *, p < 0.05 vs. control. **, p < 0.01 vs. control; n = 5/group.
formation). Serum osteocalcin appears to reflect the excess of synthesized protein not incorporated into bone matrix, or released protein during bone resorption (Ferreira and Drueke, 2000). The increases in osteocalcin seen in this study may have been reflective of bone necrosis. On day 3, mineralization of gastric fundic mucosa, gastric and cardiac arteries, aorta, and heart were present in all rats at 10 mg/kg. Myocardial necrosis was also seen at 10 mg/kg. Mineralization of gastric fundic mucosa was present in all rats at 3 mg/kg, and focal, minimal myocyte necrosis was present in one rat at 3 mg/kg. Thickening of the physeal zone of hypertrophying cartilage, and necrosis within the physeal zone of ossification and in the metaphyseal region in femur and tibia were seen in all animals at 10 mg/kg. Necrosis within the metaphyseal region was also present at 3 mg/kg. Serum phosphorus increased 13 to 66% at all doses and 1,25-dihydroxyvitamin D increased twoto fourfold at ≥3 mg/kg. Osteocalcin increased 12 to 28% at ≥3 mg/kg and serum albumin was decreased (7 to 19%) at all doses. Urine calcium increased
310
DEVELOPMENT OF SERUM CALCIUM AND PHOSPHORUS
fivefold at 10 mg/kg, resulting in a fivefold increase in urine calcium/creatinine ratio. This increase may have represented an attempt to achieve mineral homeostasis in response to the hypercalcemia. In addition, hypercalciuria can occur with vitamin D intoxication (Knutson et al., 1997). On day 4, mineralization of gastric fundic mucosa, gastric muscularis, gastric and cardiac arteries, aorta, and heart were present in the majority of animals at ≥3 mg/kg. Myocardial necrosis with accompanying neutrophilic inflammation was also seen in all rats at 10 mg/kg and in one animal at 3 mg/kg. Mineralization of gastric fundic mucosa was present at 1 mg/kg. Thickening of the physeal zone of hypertrophying cartilage, and necrosis within the physeal zone of ossification and/or in the metaphyseal region in femur and tibia, were present at ≥3 mg/kg. At 1 mg/kg, thickening of the physeal zone of hypertrophying cartilage and metaphyseal necrosis were observed. Serum phosphorus increased 16 to 91% at all doses and 1,25-dihydroxyvitamin D increased twoto fivefold at ≥3 mg/kg. Osteocalcin increased 14 to 24% at ≥3 mg/kg, and serum albumin decreased 19 to 32% at all doses. At 10 mg/kg, serum calcium was decreased 31% (possibly resulting from the hypercalciuria on day 3) and calcitonin was decreased by 71%. Calcitonin is secreted by the thyroid gland and acts to lower serum calcium levels by inhibiting bone resorption (Rosol and Capen, 1997). The decrease in calcitonin may have resulted from feedback inhibition due to low serum calcium levels at 10 mg/kg on day 4. Urine creatinine, calcium, and phosphorus were increased at 10 mg/kg. This resulted in decreases of 41% and 21% in the calcium/creatinine and phosphorus/creatinine ratios, respectively. This four-day investigative study in rats resulted in several very important conclusions which were critical for supporting continued development of PD0325901. In the study, PD0325901 at ≥1 mg/kg resulted in systemic mineralization and skeletal changes in a dose- and time-dependent fashion. These changes were seen after a single dose at 10 mg/kg and after 3 doses at 1 mg/ kg. Elevations in serum phosphorus and plasma 1,25-dihydroxyvitamin D occurred prior to tissue mineralization. Although serum albumin was decreased throughout the study, calcium remained unchanged, consistent with an increase in non-protein-bound calcium. This study set the stage for the proposal of using serum phosphorus and calcium measurements as clinical laboratory tests or biomarkers for PD0325901-induced systemic mineralization. Whereas measurement of plasma 1,25-dihydroxyvitamin D is technically complex and costly, evaluation of serum calcium and phosphorus is rapid and performed routinely in the clinical laboratory with historical reference ranges readily available. Although the data obtained with urinalysis were consistent with dysregulation of calcium and phosphorus homeostasis, concerns existed as to whether specific and reproducible urinalysis parameters could be developed for monitoring the safety of PD0325901. Based on the data obtained thus far, hyperphosphatemia appeared to be the primary factor for eliciting tissue mineralization, and serum phosphorus was proposed as the key analyte for monitoring.
TOXICOLOGY STUDIES
311
An investigative study was conducted in male rats to assess the reversibility of serum chemistry changes following a single oral dose of PD0325901 (Brown et al., 2005a). The hypothesis was that serum phosphorus levels would return to control levels in the absence of drug administration. Male rats (10/group) received single oral doses at 1, 3, or 10 mg/kg, with controls receiving vehicle alone. Blood was collected on days 2, 3, 5, and 8 for serum chemistry analysis. Hyperphosphatemia (serum phosphorus increased up to 58%) and minimal increases in calcium occurred at all doses on days 2 and 3. Albumin was decreased at 10 mg/kg. These changes were completely reversible within a week. This study demonstrated that increases in serum phosphorus and calcium induced by PD0325901 are reversible following cessation of dosing. Although a single dose of 10 mg/kg produces systemic mineralization in rats, withdrawal of dosing results in normalization of serum calcium and phosphorus levels, indicating that the homeostatic mechanisms controlling these electrolytes remain intact. The results of this study were not unexpected. Oral administration to dogs of the vitamin D analogs dihydrotachysterol and Hytakerol (dihydroxyvitamin D2-II) results in hypercalcemia that is reversible following termination of dosing (Chen et al., 1962). Reversal of hypercalcemia and hypercalciuria has been demonstrated in humans following cessation of dosing of various forms of vitamin D (calciferol, dihydrotachysterol, 1-α-hydroxycholecalciferol, or 1-α,25-dihydroxycholecalciferol) (Kanis and Russell, 1977). Another investigative study was conducted in male rats to determine whether pamidronate (a bisphosphonate) or Renagel (sevelamer HCl; a phosphorus binder) would inhibit tissue mineralization induced by PD0325901 by inhibiting hyperphosphatemia. Bisphosphonates inhibit bone resorption and in so doing modulate serum calcium and phosphorus levels. Renagel is a nonabsorbable resin that contains polymers of allylamine hydrochloride, which forms ionic and hydrogen bonds with phosphate in the gut. Rats received daily oral doses of PD0325901 at 3 mg/kg for 14 days with or without co-treatment with pamidronate or Renagel. Pamidronate was given twice intravenously at 1.5 mg/kg one day prior to PD0325901 dosing and on day 6. Renagel was given daily as 5% of the diet beginning one day prior to PD0325901 dosing. Treatment groups consisted of oral vehicle alone, PD0325901 alone, pamidronate alone, Renagel alone, PD0325901 + pamidronate, and PD0325901 + Renagel. PD0325901 plasma AUC(0–24) values were 11.6, 9.17, and 4.34 μg · h/mL in the PD0325901 alone, PD0325901 + pamidronate, and PD0325901 + Renagel groups, respectively. Administration of PD0325901 alone resulted in hyperphosphatemia on days 3 and 15, which was inhibited by co-treatment with pamidronate or Renagel on day 3 only. PD0325901 alone resulted in systemic mineralization and skeletal changes consistent with changes seen in previous rat studies. Coadministration with either pamidronate or Renagel protected against systemic mineralization on day 3 only. Bone lesions were decreased with the co-treatments. Inhibition of toxicity with Renagel may have been due in part to decreased systemic drug exposure. However, the inhibition of toxic-
312
DEVELOPMENT OF SERUM CALCIUM AND PHOSPHORUS
ity with pamidronate supports the role of a calcium–phosphorus dysregulation in PD0325901-induced systemic mineralization, because the inhibition of systemic mineralization observed on day 3 coincided with attenuation in the rise in serum phosphorus in these animals. A two-week oral dose range-finding study was conducted in dogs in which doses tested were 0.2, 0.5, and 1.5 mg/kg (4, 10, and 30 mg/m2, respectively). Also, a two-week oral-dose-range-finding study was conducted in cynomolgus monkeys at doses of 0.5, 3, and 10 mg/kg (6, 36, and 120 mg/m2, respectively). In addition to standard toxicology and toxicokinetic endpoints, determination of inhibition of tissue and peripheral blood mononuclear cell pMAPK was performed to assess pharmacologic activity of PD0325901 in both studies. PTH and 1,25-dihydroxyvitamin D were evaluated in the monkey study. In both studies, mortality occurred at ≥0.5 mg/kg (dogs) and at 10 mg/kg (monkeys) due to injury to the gastrointestinal tract mucosa, inhibition of pMAPK occurred at all doses, and systemic mineralization was not observed in either study. Increases in serum phosphorus were seen in moribund animals and/or associated with renal hypoperfusion (resulting from emesis, diarrhea, and dehydration). These elevations in phosphorus were considered secondary to renal effects and were not associated with changes in serum calcium. Toxicologically significant increases in serum phosphorus or calcium were not evident at nonlethal doses in dogs or monkeys. In the two-week monkey study, a dose-related increase in 1,25-dihydroxyvitamin D was observed on day 2 only (after a single dose) at ≥3 mg/kg. This increase did not occur on days 7 or 15, and was not associated with changes in serum phosphorus or calcium, nor systemic mineralization. Therefore, there did not appear to be toxicologic significance to the day 2 increase in 1,25-dihydroxyvitamin D in monkeys.
DISCUSSION Mineralization of vasculature and various soft tissues (systemic mineralization) was observed in toxicology studies in rats in a time- and dose-dependent manner. This change was consistent with the presence of calcium–phosphorus deposition within the vascular wall and parenchyma of tissues such as the stomach, kidney, aorta, and heart. The stomach appeared to be the most sensitive tissue, since mineralization of gastric fundic mucosa occurred prior to the onset of mineralization in other tissues. Male rats were consistently more sensitive to this toxicity than were female rats. In the pivotal one-month toxicity study in rats, the no-effect level for systemic mineralization was 0.1 mg/kg (0.6 mg/m2) in males and 0.3 mg/kg (1.8 mg/m2) in females, which were associated with PD0325901 steady-state plasma AUC(0–24) values of 231 and 805 ng · h/mL, respectively. Systemic mineralization was not observed in dogs or monkeys, despite pharmacologic inhibition of tissue pMAPK levels (>70%), administration of lethal doses, and exposures greater than 10-fold of those that induced mineralization in rats (10,600 ng · h/mL in dogs and
DISCUSSION
313
up to 15,000 ng · h/mL in monkeys). Systemic mineralization was not observed in mice despite administration of PD0325901 at doses up to 50 mg/kg (150 mg/m2). Systemic mineralization observed in rats following administration of PD0325901 is consistent with vitamin D toxicity due to dysregulation in serum calcium and phosphorus homeostasis (Grant et al., 1963; Rosenblum et al., 1977; Kamio et al., 1979; Mortensen et al., 1996; and Morrow, 2001). A proposed hypothesis for the mechanism of this toxicity is depicted in Figure 3. Elevated serum phosphorus levels (hyperphosphatemia) and decreased serum albumin were observed consistently in rats administered PD0325901. Although serum albumin levels are decreased in rats treated with PD0325901, calcium values typically remain unchanged or slightly elevated in these animals, indicating that free, non-protein-bound calcium is increased (Rosol and Capen, 1997; Payne et al., 1979; Meuten et al., 1982). Decreased parathyroid hormone levels (PTH) were observed in the rat studies. PTH plays a central role in the hormonal control of serum calcium and phosphorus. PTH is produced by the parathyroid gland and induces conversion of 25-hydroxyvitamin D (which is produced in the liver) to 1,25-dihydroxyvitamin D (calcitriol) in the kidney. 1,25-Dihydroxyvitamin D elicits increased absorption of calcium from the gastrointestinal tract. In addition, PTH mobilizes calcium and phosphorus from bone by increasing bone resorption, increases renal absorption of calcium, and increases renal excretion of phosphorus (in order to regulate
↑ [Ca] ¥ [P] = Systemic mineralization
↑P GI Tract ↑ Ca P, Ca ↑ Ca
Parathyroid
= ↓ PTH P
1, 25-Dihydroxyvitamin D PD0325901
Kidney
? 25-Hydroxyvitamin D
Figure 3 Hypothesis for the mechanism for systemic mineralization in the rat following PD0325901 administration.
314
DEVELOPMENT OF SERUM CALCIUM AND PHOSPHORUS
serum phosphorus levels). Elevations in serum calcium typically elicit decreased PTH levels as a result of the normal control (negative feedback loop) of this endocrine system (Rosol and Capen, 1997). The decreases in PTH observed in the rats were believed to be due to the elevations in serum calcium (hypercalcemia). Hyperphosphatemia in the presence of normo- or hypercalcemia can result in an increased Ca × P product, which is associated with systemic mineralization (Block, 2000). Hyperphosphatemia was also observed in rats administered PD176067, which is a reversible and selective inhibitor of fibroblast growth factor receptor tyrosine kinase. In these animals, vascular and soft tissue mineralization also occurs (aorta and other arteries, gastric fundic mucosa, myocardium, renal tubules), probably due to increased Ca × P product (Brown et al., 2005b). Administration of PD0325901 to rats resulted in significantly increased levels of plasma 1,25-dihydroxyvitamin D. The mechanism for this action is not known but is not believed to be due to a metabolite of PD0325901. This is the most potent form of vitamin D and the primary metabolite responsible for regulating serum calcium and phosphorus. Vitamin D is converted to 25-hydroxyvitamin D in the liver and then 1-hydroxylated to 1,25-dihydroxyvitamin D in renal tubules. 1,25-Dihydroxyvitamin D acts by increasing absorption of calcium and phosphorus from the gastrointestinal tract, and can increase calcium and phosphorus reabsorption by renal tubules. Hyperphosphatemia and increased plasma 1,25-dihydroxyvitamin D levels in rats occurred 1 to 2 days prior to the detection of tissue mineralization at doses ≤3 mg/kg (18 mg/m2). Administration of PD0325901 to rats resulted in bone lesions that included necrosis of the metaphysis and the ossifying zone of the physis, and thickening of the zone of hypertrophying cartilage of the physis. The expansion of chondrocytes in the physis may be a response to the metaphyseal necrosis and loss of osteoprogenitor cells. These changes are characterized by localized injury to bone that appear to be due to local ischemia and/or necrosis. Skeletal vascular changes may be present in these animals, resulting in disruption of endochondral ossification. Skeletal lesions, including bone necrosis, can result from vitamin D intoxication (Haschek et al., 1978). The skeletal lesions observed in rats administered PD0325901 are similar to those reported with vitamin D toxicity, which provides additional evidence that toxicity occurred via induction of 1,25-dihydroxyvitamin D. Bone lesions similar to those observed in rats were not seen in dogs, monkeys, or mice administered PD0325901. In summary, PD0325901-induced systemic mineralization in the rat results from a dysregulation in serum phosphorus and calcium homeostasis. This dysregulation appears to result from toxicologically significant elevations in plasma 1,25-dihydroxyvitamin D levels following drug administration. Based on the toxicology data, rats are uniquely sensitive to this toxicity. A summary of the primary target organ toxicities observed in the preclinical studies is presented in Table 5. Toxicity to the skin (epidermal lesions) and gastrointes-
DISCUSSION
TABLE 5
315
Primary Target Organ Toxicities Observed in Preclinical Studies Species
Organ System Gastrointestinal tract Skin Systemic mineralizationb Bone Liver Gallbladder
Rat
Dog
Monkey
×a × × × × n/a
× × — — — —
× × — — — ×
a
Toxicity observed. Includes vascular (aorta, arteries) and soft tissue mineralization (e.g., stomach, heart, kidneys).
b
tinal tract (primarily ulcers/erosions in the mucosa) were observed across species and may have resulted from inhibition of MEK-related signal transduction pathways in these tissues (Brown et al., 2006). Gastrointestinal tract toxicity is dose-limiting in dogs and monkeys and was anticipated to be the dose-limiting toxicity of PD0325901 in the clinic. Therefore, gastrointestinal tract toxicity may preclude the development of other potential adverse events in humans, including potential dysregulation in serum phosphorus or calcium. It is not known whether systemic mineralization is relevant to humans. However, if PD0325901 does induce a dysregulation in serum calcium– phosphorus metabolism in humans, monitoring serum levels would provide an early indication of effects and guide modifications to dosing regimens. To ensure patient safety in the phase I clinical trial with PD0325901, procedures were incorporated into the trial design to monitor for potential dysregulation in serum calcium–phosphorus homeostasis. Measurements of serum calcium, phosphorus, creatinine, albumin, and blood urea nitrogen were performed frequently during the initial treatment cycle (21 days of dosing in a 28-day cycle), with periodic measurement in subsequent cycles, and the serum Ca × P product was calculated. The serum Ca × P product has been determined to be a clinically useful value as a means for evaluating risk for tissue and/or vascular mineralization with the recommendation that the value not exceed 70 based on clinical use of vitamin D analogs such as Rocaltrol and Hectorol (Roche Laboratories, 1998; Bone Care International, Inc., 1999; Block, 2000). Serum calcium and phosphorus are readily measured in a clinical setting with well-established reference ranges available. The trial included a protocolspecific dose-limiting toxicity for a Ca × P product > 70, which required a confirmatory measurement and dose interruption for that patient. In addition, serum vitamin D, PTH, alkaline phosphatase (total and bone), osteocalcin, and urinary C- and N-terminal peptide of collagen 1 (markers of bone resorption) were included for periodic measurement. Criteria considered for exclusion of candidate patients from the clinical trial included a history of
316
DEVELOPMENT OF SERUM CALCIUM AND PHOSPHORUS
malignancy-associated hypercalcemia, extensive bone metastasis, parathyroid disorder, hyperphosphatemia and renal insufficiency, serum calcium or phosphorus levels >1× the upper limit of normal, and/or concomitant use of calcium supplements and vitamin D in amounts exceeding normal daily allowances.
CALCULATION OF CLINICAL STARTING DOSE The currently accepted algorithm for calculating a starting dose in clinical trials with oncology drugs is to use one-tenth of the dose that causes severe toxicity (or death) in 10% of the rodents (STD10) on a mg/m2 basis, provided that this starting dose (i.e., 1/10 the STD10) does not cause serious, irreversible toxicity in a nonrodent species (in this case, the dog) (DeGeorge et al., 1998). If irreversible toxicities are induced at the proposed starting dose in nonrodents or if the nonrodent (i.e., the dog) is known to be the more appropriate animal model, the starting dose would generally be one-sixth of the highest dose tested in the nonrodent (the dog) that does not cause severe, irreversible toxicity. Calculation of the initial phase I starting dose of PD0325901 was based on the pivotal one-month toxicology studies in rats and dogs. Doses tested in the one-month rat study were 0.1, 0.3, and 1 mg/kg (0.6, 1.8, and 6 mg/m2, respectively), and doses in the one-month dog study were 0.05, 0.1, and 0.3 mg/kg (1, 2, and 6 mg/m2, respectively). Both studies included animals assigned to a one-month reversal phase, in the absence of dosing, to assess reversibility of any observed toxicities. In addition to standard toxicology and toxicokinetic parameters, these studies included frequent evaluation of serum chemistries, and measurement of vitamin D, osteocalcin, PTH, and inhibition of tissue pMAPK levels. In the one-month rat study, no drug-related deaths occurred and systemic mineralization occurred in multiple tissues in both sexes at 1 mg/kg. Hypocellularity of the metaphyseal region of distal femur and/or proximal tibia occurred in males at 1 mg/kg. Toxicologic findings at lower doses included skin sores (at ≥0.1 mg/kg) and mineralization of gastric mucosa in one male at 0.3 mg/kg. The findings at ≤0.3 mg/kg were not considered to represent serious toxicity. In previous two-week dose-range-finding studies in rats, death occurred at 3 mg/kg (18 mg/m2), indicating this to be the minimal lethal dose in rats. Based on these results, the STD10 in rats was determined to be 1 mg/kg (6 mg/m2) In the one-month dog study, doses up to 0.3 mg/kg (6 mg/m2) were well tolerated with minimal clinical toxicity. Primary drug-related toxicity was limited to skin sores in two animals at 0.3 mg/kg. One-tenth the STD10 in rats is 0.6 mg/m2, which is well below a minimally toxic dose (6 mg/m2) in dogs. These data indicate an acceptable phase I starting dose to be 0.6 mg/m2, which is equivalent to 1 mg in a 60-kg person. The relationships between the primary toxicities in rats and dogs with dose and exposure are presented in Figure 4.
CALCULATION OF CLINICAL STARTING DOSE Dog
Rat
1.8 Nonpivotal
6
18
1120– 2460 6
121– 399 1.8
60
180
8010– 19200
3850– 6070
0.6 Pivotal 195–231 733–805 2060–2820
Anticipated Phase 1 Starting Dose (0.6 mg/m2)
317
49600– 62100
Dose Range AUC (0–24) (ng·hr/mL)
Dose Range AUC (0–24) (ng·hr/mL) Skin Lesions Systemic Mineralization Bone Metaphyseal Hypocellularity Hepatocellular Necrosis GI Tract Lesions Death
4
10
Nonpivotal 971– 1860– 1850 3550 6
1 2 Pivotal 208–222 567–743 1440–1550
Dose Range
30
AUC (0–24) (ng·hr/mL) 4730– 10600 Dose Range AUC (0–24) (ng·hr/mL) Skin Lesions GI Tract Lesions Death
0.1
1
10
100
1000
PD 0325901 Dose (mg/m2)
Figure 4 Relationships between dose and exposure with the primary toxicities of PD0325901 in rats and dogs. Exposure is expressed as PD0325901 plasma AUC(0–24) in ng · hr/mL and dose in mg/m2. Results from nonpivotal (dose-range-finding) studies and the pivotal one-month toxicity studies are presented.
In the phase I trial of PD0325901 in cancer patients (melanoma, breast, colon, and non-small cell lung cancer), oral doses were escalated from 1 to 30 mg twice daily (BID). Each treatment cycle consisted of 28 days, and three schedules of administration were evaluated: (1) 3 weeks of dosing with one week off, (2) dosing every day, and (3) 5 days of dosing with 2 days off per week (LoRusso et al., 2005; Menon et al., 2005; Tan et al., 2007; LoRusso et al., 2007). Doses ≥2 mg BID suppressed tumor pMAPK (indicating biochemical activity of the drug) and acneiform rash was dose limiting in one patient at 30 mg BID. There were no notable effects on serum Ca × P product and the most common toxicities included rash, fatigue, diarrhea, nausea, visual disturbances and vomiting. Acute neurotoxicity was frequent in patients receiving ≥15 mg BID (all schedules) and several patients developed optic nerve ischemia, optic neuropathy, or retinal vein occlusion (LoRusso et al., 2007). In the phase I trial, there were three partial responses (melanoma) and stable disease in 24 patients (primarily melanoma) (LoRusso et al., 2007). In a pilot phase II study of PD0325901 in heavily pretreated patients with non-
318
DEVELOPMENT OF SERUM CALCIUM AND PHOSPHORUS
small cell lung cancer, 15 mg BID was given on various schedules over a 28-day cycle (Haura et al., 2007). The main toxicities were reversible visual disturbances, diarrhea, rash, and fatigue. The mean trough concentration of PD0325901 at 15 mg BID was 108 ng/mL (at steady state) and there were no objective responses (Haura et al., 2007).
CONCLUSIONS Tissue mineralization produced in rats administered the MEK inhibitor PD0325901 provides a case study of how a unique and serious toxicity observed in preclinical safety testing was effectively managed to allow progression of an experimental drug into human clinical trials. A number of key factors were critical for allowing continued development of this compound to occur, rather than early termination. PD0325901 represented a novel and targeted therapeutic agent for the treatment of various solid tumors, thereby allowing a high risk–benefit ratio to exist due to the significant unmet medical need posed by cancer. Phase I oncology trials typically occur in cancer patients with limited treatment options. Therefore, the barriers to entry for novel anticancer agents in the clinic are generally lower than for phase I trials involving healthy volunteers and therapies for non-life-threatening indications. Early in the toxicology program with PD0325901, lesions observed in rats were similar to those seen with vitamin D toxicity, and serum chemistry data indicated changes in phosphorus and calcium. This information provided the basis for the hypotheses to be proposed regarding the mechanism for vascular and soft tissue mineralization. Because mineralization occurred in rats administered PD0325901 rather than only in dogs or monkeys, an animal model suitable for multiple investigative studies was readily available. Despite the apparent species specificity for this toxicity, it was not appropriate to discount the risks toward humans because of a “rat-specific” finding. Rather, it was important to generate experimental data that characterized the toxicity and provided a plausible mechanism as a basis for risk management. Studies conducted with PD0325901 examined the dose–response and exposure–response relationships for toxicity and pharmacologic inhibition of MEK, the time course for lesion development, whether the changes observed were reversible or not, and whether associations could be made between clinical laboratory changes and anatomic lesions. We were able to identify biomarkers for tissue mineralization that were specifically related to the mechanism, were readily available in the clinical setting, noninvasive, and had acceptable assay variability. It is important that biomarkers proposed for monitoring for drug toxicity be scientifically robust and obtainable, and meet expectations of regulatory agencies. Finally, the data generated during the preclinical safety evaluation of PD0325901 were used to design the phase I–II clinical trial to ensure patient safety. This included selection of a safe starting dose for phase I,
REFERENCES
319
criteria for excluding patients from the trial, and clinical laboratory tests to be included as biomarkers for calcium–phosphorus dysregulation and tissue mineralization. In conclusion, robust data analyses, scientific hypothesis testing, and the ability to conduct investigative work were key factors in developing a biomarker for a serious preclinical toxicity, thereby allowing clinical investigation of a novel drug to occur. Acknowledgments Numerous people at the Pfizer Global Research and Development (PGRD), Ann Arbor, Michigan, Laboratories were involved in the studies performed with PD0325901, including the Departments of Cancer Pharmacology and Pharmacokinetics, Dynamics and Metabolism. In particular, the author would like to acknowledge the men and women of Drug Safety Research and Development, PGRD, Ann Arbor, who conducted the toxicology studies with this compound and made significant contributions in the disciplines of anatomic pathology and clinical laboratory testing during evaluation of this compound. REFERENCES Ally S, Clair T, Katsaros D, et al. (1989). Inhibition of growth and modulation of gene expression in human lung carcinoma in athymic mice by site-selective 8-Cl-cyclic adenosine monophosphate. Cancer Res, 49:5650–5655. Block GA (2000). Prevalence and clinical consequences of elevated Ca × P product on hemodialysis patients. Clin Nephrol, 54(4):318–324. Bone Care International, Inc., (1999). Package insert, HectorolTM (doxercalciferol) capsules. June 9. Brown AP, Morrissey RL, Smith AC, Tomaszewski JE, Levine BS (2000). Comparison of 8-chloroadenosine (NSC-354258) and 8-chloro-cyclic-AMP (NSC-614491) toxicity in dogs. Proc Am Assoc Cancer Res, 41:491 (abstract 3132). Brown AP, Courtney C, Carlson T, Graziano M (2005a). Administration of a MEK inhibitor results in tissue mineralization in the rat due to dysregulation of phosphorus and calcium homeostasis. Toxicologist, 84(S-1):108 (abstract 529). Brown AP, Courtney CL, King LM, Groom SL, Graziano MJ (2005b). Cartilage dysplasia and tissue mineralization in the rat following administration of a FGF receptor tyrosine kinase inhibitor. Toxicol Pathol, 33(4):449–455. Brown AP, Reindel JF, Grantham L, et al. (2006). Pharmacologic inhibitors of the MEK-MAP kinase pathway are associated with toxicity to the skin, stomach, intestines, and liver. Proc Am Assoc Cancer Res, 47:308 (abstract 1307). Brown AP, Carlson TCG, Loi CM, Graziano MJ (2007). Pharmacodynamic and toxicokinetic evaluation of the novel MEK inhibitor, PD0325901, in the rat following oral and intravenous administration. Cancer Chemother Pharmacol, 59:671–679.
320
DEVELOPMENT OF SERUM CALCIUM AND PHOSPHORUS
Chen PS, Terepka AR, Overslaugh C (1962). Hypercalcemic and hyperphosphatemic actions of dihydrotachysterol, vitamin D2 and Hytakerol (AT-10) in rats and dogs. Endocrinology, 70:815–821. DeGeorge JJ, Ahn CH, Andrews PA, et al. (1998). Regulatory considerations for preclinical development of anticancer drugs. Cancer Chemother Pharmacol, 41:173–185. Dent P, Grant S (2001). Pharmacologic interruption of the mitogen-activated extracellular-regulated kinase/mitogen-activated protein kinase signal transduction pathway: potential role in promoting cytotoxic drug action. Clin Cancer Res, 7:775–783. Ferreira A, Drueke TB (2000). Biological markers in the diagnosis of the different forms of renal osteodystrophy. Am J Med Sci, 320(2):85–89. Friday BB, Adjei AA (2008). Advances in targeting the Ras/Raf/MEK/Erk mitogenactivated protein kinase cascade with MEK inhibitors for cancer therapy. Clin Cancer Res, 14(2):342–346. Fu JY, Muller D (1999). Simple, rapid enzyme-linked immunosorbent assay (ELISA) for the determination of rat osteocalcin. Calcif Tissue Int, 64:229–233. Giachelli CM, Jono S, Shioi A, Nishizawa Y, Mori K, Morii H (2001). Vascular calcification and inorganic phosphate. Am J Kidney Dis, 38(4, Suppl 1):S34–S37. Grant RA, Gillman T, Hathorn M (1963). Prolonged chemical and histochemical changes associated with widespread calcification of soft tissues following brief calciferol intoxication. Br J Exp Pathol, 44(2):220–232. Harrington DD, Page EH (1983). Acute vitamin D3 toxicosis in horses: case reports and experimental studies of the comparative toxicity of vitamins D2 and D3. J Am Vet Med Assoc, 182(12):1358–1369. Haschek WM, Krook L, Kallfelz FA, Pond WG (1978). Vitamin D toxicity, initial site and mode of action. Cornell Vet, 68(3):324–364. Haura EB, Larson TG, Stella PJ, et al. (2007). A pilot phase II study of PD-0325901, an oral MEK inhibitor, in previously treated patients with advanced non-small cell lung cancer. Presented at the AACR-NCI-EORTC International Conference on Molecular Targets and Cancer Therapy, abstract B110. Hoshino R, Chatani Y, Yamori T, et al. (1999). Constitutive activation of the 41-/43kDa mitogen-activated protein kinase pathway in human tumors. Oncogene, 18:813–822. Jost M, Huggett TM, Kari C, Boise LH, Rodeck U (2001). Epidermal growth factor receptor–dependent control of keratinocyte survival and Bcl-XL expression through a MEK-dependent pathway. J Biol Chem, 276(9):6320–6326. Kamio A, Taguchi T, Shiraishi M, Shitama K, Fukushima K, Takebayashi S (1979). Vitamin D sclerosis in rats. Acta Pathol Jpn, 29(4):545–562. Kanis JA, Russell RGG (1977). Rate of reversal of hypercalcaemia and hypercalciuria induced by vitamin D and its 1α-hydroxylated derivatives. Br Med J, 1:78–81. Knutson JC, LeVan LW, Valliere CR, Bishop CW (1997). Pharmacokinetics and systemic effect on calcium homeostasis of 1α,25-dihydroxyvitamin D2 in rats. Biochem Pharm, 53:829–837. Long GG (1984). Acute toxicosis in swine associated with excessive dietary intake of vitamin D. J Am Vet Med Assoc, 184(2):164–170.
REFERENCES
321
LoRusso P, Krishnamurthi S, Rinehart JR, et al. (2005). A Phase 1–2 clinical study of a second generation oral MEK inhibitor, PD0325901 in patients with advanced cancer. 2005 ASCO Annual Meeting Proceedings. J Clin Oncol, 23(16S), abstract 3011. LoRusso PA, Krishnamurthi SS, Rinehart JJ, et al. (2007). Clinical aspects of a phase I study of PD-0325901, a selective oral MEK inhibitor, in patients with advanced cancer. Presented at the AACR-NCI-EORTC International Conference on Molecular Targets and Cancer Therapy, abstract B113. Mansour SJ, Matten WT, Hermann AS, et al. (1994). Transformation of mammalian cells by constitutively active MAP kinase kinase. Science, 265:966–970. Menon SS, Whitfield LR, Sadis S, et al. (2005). Pharmacokinetics (PK) and pharmacodynamics (PD) of PD0325901, a second generation MEK inhibitor after multiple oral doses of PD0325901 to advanced cancer patients. 2005 ASCO Annual Meeting Proceedings. J Clin Oncol, 23(16S), abstract 3066. Meuten DJ, Chew DJ, Capen CC, Kociba GJ (1982). Relationship of serum total calcium to albumin and total protein in dogs. J Am Vet Med Assoc, 180:63–67. Milella M, Kornblau SM, Estrov Z, et al. (2001). Therapeutic targeting of the MEK/ MAPK signal transduction module in acute myeloid leukemia. J Clin Invest, 108(6):851–859. Morrow C (2001). Cholecalciferol poisoning. Vet Med, 905–911. Mortensen JT, Lichtenberg J, Binderup L (1996). Toxicity of 1,25-dihydroxyvitamin D3, tacalcitol, and calcipotriol after topical treatment in rats. J Inv Dermatol Symp Proc, 1:60–63. Payne RB, Carver ME, Morgan DB (1979). Interpretation of serum total calcium: effects of adjustment for albumin concentration on frequency of abnormal values and on detection of change in the individual. J Clin Pathol, 32:56–60. Rinehart J, Adjei AA, LoRusso PM et al. (2004). Multicenter phase II study of the oral MEK inhibitor, CI-1040, in patients with advanced non-small-cell lung, breast, colon, and pancreatic cancer. J Clin Oncol, 22(22):4456–4462. Roche Laboratories (1998). Package insert, Rocaltrol® (calcitriol) capsules and oral solution. Nov. 20. Rosenblum IY, Black HE, Ferrell JF (1977). The effects of various diphosphonates on a rat model of cardiac calciphylaxis. Calcif Tissue Res, 23:151–159. Rosol TJ, Capen CC (1997). Calcium-regulating hormones and diseases of abnormal mineral (calcium, phosphorus, magnesium) metabolism. In Clinical Biochemistry of Domestic Animals, 5th ed. Academic Press, San Diego, CA, pp. 619–702. Saunders MP, Salisbury AJ, O’Byrne KJ, et al. (1997). A novel cyclic adenosine monophosphate analog induces hypercalcemia via production of 1,25-dihydroxyvitamin D in patients with solid tumors. J Clin Endocrinol Metab, 82(12):4044–4048. Sebolt-Leopold JS, Dudley DT, Herrera R, et al. (1999). Blockade of the MAP kinase pathway suppresses growth of colon tumors in vivo. Nat Med, 5(7):810–816. Sebolt-Leopold JS (2000). Development of anticancer drugs targeting the MAP kinase pathway. Oncogene, 19:6594–6599. Sebolt-Leopold JS, Merriman R, Omer C, (2004). The biological profile of PD0325901: a second generation analog of CI-1040 with improved pharmaceutical potential. Proc Am Assoc Cancer Res, 45:925 (abstract 4003).
322
DEVELOPMENT OF SERUM CALCIUM AND PHOSPHORUS
Spangler WL, Gribble DH, Lee TC (1979). Vitamin D intoxication and the pathogenesis of vitamin D nephropathy in the dog. Am J Vet Res, 40:73–83. Spaulding SW, Walser M (1970). Treatment of experimental hypercalcemia with oral phosphate. J Clin Endocrinol, 31:531–538. Tan W, DePrimo S, Krishnamurthi SS, et al. (2007). Pharmacokinetic (PK) and pharmacodynamic (PD) results of a phase I study of PD-0325901, a second generation oral MEK inhibitor, in patients with advanced cancer. Presented at the AACRNCI-EORTC International Conference on Molecular Targets and Cancer Therapy, abstract B109. Wang D, Boerner SA, Winkler JD, LoRusso PM (2007). Clinical experience of MEK inhibitors in cancer therapy. Biochimi Biophys Acta, 1773:1248–1255. York MJ, Evans GO (1996). Electrolyte and fluid balance. In Evans GO (ed.), Animal Clinical Chemistry: A Primer for Toxicologists. Taylor & Francis, New York, pp. 163–176.
16 BIOMARKERS FOR THE IMMUNOGENICITY OF THERAPEUTIC PROTEINS AND ITS CLINICAL CONSEQUENCES Claire Cornips, B.Sc., and Huub Schellekens, M.D. Utrecht University, Utrecht, The Netherlands
INTRODUCTION Therapeutic proteins such as growth factors, hormones, monoclonal antibodies (mAbs), and others have increased in use dramatically over the past two decades, although their first use dates back more than a century, when animal antisera were introduced for the treatment and prevention of infections. Therapeutic proteins have always been associated with immunogenicity, although the incidence differs widely [1]. Granulocyte colony-stimulating factor (G-CSF) is the only protein in clinical use that has not been reported to induce antibodies. The first proteins used in medicine around 1900 were of animal origin. As foreign proteins they induced high levels of antibodies in the majority of patients after a single or a few injections. The type of immunological response induced was identical to that seen with vaccines. Most of the therapeutic proteins introduced in recent decades are homologs of human proteins. However, in contrast to expectations, these proteins also appear to induce antibodies, and in some cases, in the majority of patients. Given that these antibodies are directed against auto-antigens, the immunological reaction involves breaking B-cell tolerance. Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
323
324
BIOMARKERS FOR THE IMMUNOGENICITY OF THERAPEUTIC PROTEINS
MECHANISMS OF ANTIBODY INDUCTION There are two main mechanisms by which antibodies are induced by therapeutic proteins (Table 1). If the proteins are of completely of foreign origin, such as asparginase or streptokinase, and the first generation mAbs derived from murine cells or partly foreign such as chimeric or humanized mAbs, the antibody response is comparable to a vaccination reaction. Often a single injection is sufficient to induce high levels of neutralizing antibodies, which may persist for a considerable length of time. The other mechanism is based on breaking B-cell tolerance, which normally exists, to self-antigens, such as human immunoglobulins or products such as epoetins and the interferons. For breaking B-cell tolerance, prolonged exposure to proteins is necessary. In general, it takes months before patients produce antibodies, which are mainly binding and disappear when treatment is stopped. To induce a classical immune reaction a degree of nonself is necessary. The trigger for this type of immunogenicity is the divergence from the comparable human proteins. The triggers for breaking tolerance are quite different. The production of these auto-antibodies may occur when the self-antigens are exposed to the immune system in combination with a T-cell stimulus or danger signal such as bacterial endotoxins, microbial DNA rich in GC motifs or denatured proteins. This mechanism explains the immunogenicity of biopharmaceuticals containing impurities. When tolerance is broken with this type of mechanism, the response is often weak with low levels of antibodies with low affinity. To induce high levels of IgG, the self-antigens should be presented to the immune system in a regular array form with a spacing of 50 to 100 Å, a supramolecular structure resembling a viral capsid [2]. Apparently the immune system has been selected to reach vigorously to these types of structures which
TABLE 1
Main Markers of Immunogenicity of Therapeutic Proteins Product
(Partly) foreign Level of nonself Presence of proteins T-cell epitopes inducing a Biological classical activity of the immune product response Self-protein Presence of breaking aggregates tolerance Biological activity of the product
Preclinical
Treatment
—
—
Induction of an immune response in immunetolerant mice
Chronic treatment
Patients Lack of immune tolerance Concomitant therapy Nonimmune compromised Concomitant therapy
FACTORS INFLUENCING IMMUNOGENICITY
325
normally are found only in viruses and other microbial agents. The most important factor in the immunogenicity of therapeutic proteins which are human homologs has been the presence of aggregates [3]. Aggregates may be presenting the self-antigens in a repeating form which is such a potent inducer of auto-antibodies.
FACTORS INFLUENCING IMMUNOGENICITY So the primary factors inducing an antibody response are aggregates in the case of human proteins and the degree of nonself in (partly) nonhuman proteins. There are also cases in which the immune response cannot be explained. There are, however, a number of other factors that may influence the level of the immune response [4]: • Product characteristics 1. Biological function 2. Impurities and contaminants 3. Product modification • Treatment characteristics 1. Length of treatment 2. Route of administration 3. Dosage 4. Concomitant therapy • Patient characteristics 1. Genetic background 2. Type of disease 3. Age 4. Gender • Unknown factors Product Characteristics The biological activities of the product are influencing the immune response. An immune stimulating therapeutic protein is more likely to induce antibodies than an immune suppressive protein. Monoclonal antibodies targeted to cellbound epitopes are more likely to induce an immune response than monoclonal antibodies with a target in solution. Also, the Fc-bound activities of monoclonal antibodies have an influence. Impurities may influence immunogenicity. The immunogenicity of products as human growth hormone, insulin, and interferon α-2 have declined over the years due to improved downstream processing and formulation, reducing the level of impurities. There are studies showing the induction of antibodies by
326
BIOMARKERS FOR THE IMMUNOGENICITY OF THERAPEUTIC PROTEINS
oxidized protein which cross-reacted with the unmodified product [5] and host cell–derived endotoxin acting as an adjuvant. The probability of an immune response therefore increases with the level of impurities. Product modifications that are intended to enhance half-life potentially also increase the exposition of the protein to the immune system and may increase immunogenicity. In addition, the modification may reduce biological activity necessitating more protein for the same biological effect. PEGylation (polyethylene glycol) is claimed to reduce the immunogenicity of therapeutic proteins by shielding. There is evidence that pegylation reduces the immunogenicity of nonhuman proteins such as bovine adenosine deamidase and asparginase. Whether pegylation also reduces the capacity of human proteins to break B-cell tolerance is less clear. There are reports of high immunogenicity of pegylated human proteins such as MDGF, but the immunogenicity of unpegylated MDGF products is unknown. Treatment Characteristics Foreign proteins such as streptokinase and asparaginase often induce antibodies after a single injection. Breaking B-cell tolerance by human protein takes in general more than six months of chronic treatment. The route of administration influences the likelihood of an antibody response independent of the mechanism of induction. The probability of an immune response is highest with subcutaneous administration, less probable after intramuscular administration, and intravenous administration is the least immunogenic route. There are no studies comparing parenteral and nonparenteral routes of administration. However, intranasal and pulmonary administration of therapeutic proteins may induce an immune response. Patient Characteristics Gender, age, and ethnic background have all been reported to influence the incidence of antibody response to specific therapeutic proteins. However, the only patient characteristic that has consistently been identified for a number of different products is the disease from which the patients suffer. Cancer patients are less likely to produce antibodies to therapeutic protein than other patients. The most widely accepted explanation for this difference is the immune-compromised state of cancer patients, both by the disease as by anticancer treatment. Also, the median survival of patients on treatment by therapeutic proteins may be too short to develop an antibody response. In any case, cancer reduces considerably the probability of an antibody response to a protein. As the experience in cancer patients shows, immune suppressive therapy reduces the probability to develop an immune response to proteins. In addition, immune-suppressive drugs such as methotrexate are used in conjunction with monoclonal antibodies and other protein drugs to reduce the immune reactions.
PREDICTION OF IMMUNOGENICITY IN ANIMALS
327
PREDICTION OF IMMUNOGENICITY IN ANIMALS In principle, all therapeutic proteins are immunogenic in conventional laboratory animals, and their predictive value depends on the type of proteins [6]. The immune reaction in animals to biopharmaceuticals of microbial or plant origin is comparable to humans, as they are comparably foreign for all mammalian species. Animal studies in which the reduction of immunogenicity is evaluated therefore have a high degree of predictability for immunogenicity in humans. The development of antibodies has been observed regularly in preclinical studies in animals of biopharmaceuticals homologous to human proteins. Being considered a normal reaction to a foreign protein, it has led to the generally held assumption that immunogenicity testing and, in some cases, even preclinical testing in animals is irrelevant. However, not all antibodies interfere with the biological activity of a biopharmaceutical. And if there is a biological or clinical effect, these may help to identify the possible sequelae of immunogenicity, as has been shown with human epoetin in dogs. In the canine model human epoetin is immunogenic and it induces antibodies that neutralize the native canine epoetin, leading to pure red cell aplasia. This severe complication of antibodies to epoetin was later confirmed in humans. Also, antibody-positive animals may provide sera for the development and validation of antibody assays; and the evaluation of an antibody response in animals is important to evaluate the safety and pharmacokinetic data in conventional laboratory animals. Nonhuman primates have been advocated as better models to predict the immunogenicity of human proteins because of a high sequence homology between the product and the monkey native molecule to which the animal is immune tolerant. Immunogenicity studies in nonhuman primates have, however, also shown mixed results. Products with a high immunogenicity in monkeys sometimes do not induce antibodies in human patients. The opposite has also been observed, although these studies may have been too limited in length of treatment or number of animals to be truly predictive. A good example of the possible use of monkeys was a study to determine the immunogenicity of different human growth hormone (hGH) preparations using a hyper-immunization protocol, including the use of adjuvant to provide a worst-case scenario of immunogenicity [7]. The monkeys were treated with cadaver-derived methionyl-hGH and natural sequence hGH. The antibody response was 81%, 69%, and 5 to 23%, respectively, which reflects the relative immunogenicity of these preparations in human patients. So in this example, rhesus monkeys predict relative immunogenicity. Also, the immunogenicity of lys-pro biosynthetic human insulin was compared with the immunogenicity of porcine insulin and native-sequence insulin in rhesus monkeys. Neither of these proved to induce antibodies. With tissue plasminogen activators the immunogenicity in nonhuman primates was reported by the same group to reflect the immunogenicity in patients. Rhesus monkeys (Macaca mulatta) and cynomolgus monkeys (M. fascicularis) were
328
BIOMARKERS FOR THE IMMUNOGENICITY OF THERAPEUTIC PROTEINS
used to test antibody response to EPO/GM-CSF hybrids. Two of the three constructs tested produced high levels of neutralizing antibodies. In the rhesus monkeys these hybrids caused severe anemia. In cynomolgus monkeys no effect on hematological parameters was found, indicating lack of crossreactivity of the antibodies with the native cynomolgus erythropoietin. There are also reports on immunogenicity of products in monkeys such as IFN α B/D, diaspirin cross-linked hemoglobin, and IL-3, which did not induce antibodies in patients. So monkeys cannot be used as absolute predictors of immunogenicity in humans, and even the response within the macaque family seems to differ. Theoretically, the best predictive model for immunogenicity of human proteins is the transgenic mouse. These animals are immune tolerant for the human protein they express. The caveats are that the wild-type mouse strain used for transgenesis should be able to produce antibodies to the protein and that the transgenic should show immune tolerance for the native molecule and may be differences in the processing of antigens and epitope recognition between mouse and humans. Mice, transgenic for human insulin, showed that immunogenicity to variant insulin molecules was dependent on the number of substitutions. Mice that were made transgenic for human tissue plasminogen activator variants with a single amino acid substitution proved to be immunogenic, showing the discriminatory potential of the model [8]. The transgenic approach proved useful in finding a reason for the increased immunogenicity caused by a specific formulation. In mice transgenic for interferon α-2 only the batches immunogenic in patients produced antibodies [9]. The presence of aggregates and oxidized proteins proved the main cause of immunogenicity in the transgenic animals as in patients. In mice transgenic for hGH the immunogenicity of hGH encapsulated in microspheres for sustained release was tested and no enhanced immunogenicity was observed. We have used the transgenic mouse models to study the product characteristics capable of breaking tolerance. In these models, aggregates were shown to be the major factor. These models, however, have been used by us and others mainly in a yes-or-no fashion to study factors such as aggregates and sequence variations. To find subtle differences that may exist between products and before these models can be used to fully predict the immunogenic potential of human therapeutic proteins, much more validation is necessary. With no models yet available with sufficient predictive power, clinical studies are the only sure way to establish the induction of antibodies by therapeutic proteins.
ASSAYS FOR ANTIBODIES A cellular immune response to therapeutic proteins has never been established. Also, the biological and clinical consequences of an immune response to these products have always been associated with the presence of antibodies. So, by definition, a positive assay for antibodies is the biomarker for the immunogenicity of proteins. The lack of standardized assays and international
CONSEQUENCES OF ANTIBODIES TO THERAPEUTIC PROTEINS
329
reference sera presents a major problem in assessing immunogenicity. It makes a comparison of antibody data generated by different laboratories impossible; comparisons of products based on published data (e.g., using information on package inserts) are also meaningless. Recently, excellent reviews have been published regarding the development and validation of the various assay formats for antibodies to therapeutic proteins. There are two principles for the testing of antibodies: assays that monitor binding of antibody to the drug (EIA, RIA, BIAcore), or (neutralizing) bioassays. These assays are used in combination: a sensitive binding assay to identify all antibody-containing samples is often more practical, as bioassays are usually more difficult and time consuming. If a native molecule is modified (e.g., pegylated or truncated) to obtain a new product with different pharmacological characteristics, both the “parent” and the new molecule should be used as capture antigens for antibody assays. A definition for the negative cutoff must be included in the validation process of this assay and is often based on the 5% false-positive rate. Such an analytical cutoff per se is not predictive of biological or clinical effect but rather, indicative of the technical limitations of the assay. And because the cutoff is set to include a relatively high number of false positives, all initial positive sera should be confirmed either by another binding assay or by a displacement assay. The confirmed positive samples should be tested in a bioassay for neutralizing antibodies, which correlate with a potential in vivo effect in patients, as these antibodies are usually neutralizing because they interfere with receptor binding. A confirmatory step for neutralizing antibodies is not necessary because these antibodies are a subset of binding antibodies. In some cases it may be important to show the neutralization to be caused by antibodies if the presence of other inhibitory factors, such as a soluble receptor, has to be excluded. Further characterization of the neutralizing antibody response may follow, such as isotype, affinity, and specificity assays. A positive or negative answer from the assays is not sufficient; development of antibodies is a dynamic process, and therefore the course of antibody development (kinetics) must be plotted quantitatively over time. Usually, persisting titers of neutralizing antibodies correlate with some biological effect. An analysis of the biological effect based on incidence may also be misleading. Akin to population kinetics in pharmacology, a method that compares relative immunogenicity in patient groups as “mean population antibody titers,” taking into account both incidence and titers of neutralizing antibodies, has also been found more useful than a percentage of seroconversions for the purpose of comparing two products.
CONSEQUENCES OF ANTIBODIES TO THERAPEUTIC PROTEINS In many cases the presence of antibodies is not associated with biological or clinical consequences. The effects that antibodies may induce depend on their
330
BIOMARKERS FOR THE IMMUNOGENICITY OF THERAPEUTIC PROTEINS
level and affinity and can be the result of antigen–antibody reaction in general or of the specific interaction. Severe general immune reactions as anaphylaxis associated with the use of animal antisera have become rare because the purity of the products increased substantially. Delayed-type infusion-like reactions resembling serum sickness are more common, especially with monoclonal antibodies and other proteins administered in relative large amounts and the formation of immune complexes. Patients with a slow but steadily increasing antibody titer are reported to show more infusion-like reactions than patients with a short temporary response. The consequences of the specific interaction between protein drugs is dependent on the affinity of the antibody translating in binding and/or neutralizing capacity. Binding antibodies may influence the pharmacokinetic behavior of the product, and both increases and reductions of half-life have been reported, resulting in enhancement or reduction in activity. Persisting levels of neutralizing antibodies in general result in a loss of activity of the protein drug. In some cases the loss of efficacy can easily be monitored by the increase of disease activity. For example, in interferon alpha treatment of hepatitis C, viral activity can be monitored by transaminase activity. Loss of efficacy is correlated directly by increased viral activity and increase in transaminase levels. In the case of interferon beta treatment of multiple sclerosis the loss of efficacy is much more difficult to measure because the mode of action of the therapeutic protein is not known and the disease progress is unpredictable and difficult to monitor. The reduction of Mx induction which is specific for interferon activity has been used successfully to evaluate the biological effect of antibodies to interferon beta. The adverse effects of therapeutic proteins are in general the result of an exaggerated pharmacodynamic effect. So the loss of side effects may also be the result of the induction of antibodies and may be the first sign of immunogenicity. For example, in patients treated with interferon the loss of flulike symptoms is associated with the appearance of antibodies. Because by definition neutralizing antibodies interact with ligand– receptor interaction, they will inhibit the efficacy of all products in the same class with serious consequences for patients if there is no alternative treatment. The most dramatic side effects occur if the neutralizing antibodies cross-react with an endogenous factor with an essential biological function. This had been described for antibodies induced by epoetin alpha [10] and megakaryocyte growth and differentiation factor (MGDF), which led, respectively, to life-threatening anemia and thrombocytopenia, sometimes lasting for more than a year. Skin Reactions Skin reactions are a common side effect of therapeutic proteins, and some of these reactions are associated with an immunogenic response. But can these skin reactions be used as a marker for the immunogenicity of therapeutic
CONSEQUENCES OF ANTIBODIES TO THERAPEUTIC PROTEINS
331
proteins? The hypersensitivity reactions are classified as type I, II, III, and IV reactions. The type I reaction is IgE mediated. The type II reactions are caused by activated T-killer cells and macrophages and complement activation. The type III hypersensitivity reaction is caused by the disposition of immune complexes [11]. Type IV is T-cell mediated. Type I hypersensitivity or IgE-mediated allergies are very rare, and most are related to the excipients present in formulation rather than the protein drug products. IgE-mediated reactions against human insulin have been reported, although is less common than with pork or beef insulin [12]. Theoretically, the type II hypersensitivity skin reaction may be a symptom of the immunogenicity of therapeutic proteins. During a type II hypersensitivity reaction, antibodies activate T-killer cells, macrophages, or complement factors to induce an immune response. However, the antibodies produced by most therapeutic proteins as a consequence of breaking B-cell tolerance and T-cells play only a minor role, if any. TNF inhibitors such as etanercept cause injection-site reactions by a T-cellmediated delayed-type hypersensitivity reaction. Antibodies against etanercept have been shown not to be correlated to adverse events. Skin reactions probably are a class effect of TNF inhibitors. Blockade of TNF can stimulate certain forms of autoimmunity by increasing T-cell reactivity to microbial and self-antigens [13]. The skin reactions that are part of the type III hypersensitivity reaction caused by the local disposition of immune complexes are seen after treatment with monoclonal antibodies, which are used in relative high and repeated doses. These immune complexes may lead to anaphylactoid reactions and serum sickness–like symptoms. Skin reactions such as urticaria and rashes are common symptoms of this complication [14]. Monoclonal antibodies may also lead to local reactions at the injection site. However, as shown in Table 2, there is no relation between immunogenicity and local skin reactions. So these local reactions cannot be used as an early marker for more serious symptoms of immunogenicity. Some skin reactions seen after treatment with therapeutic proteins are the result of their pharmacodynamics: for example, the epidermal growth factor
TABLE 2 Relation Between Local Skin Reactions and Immunogenicity of Monoclonal Antibodiesa
Humira Remicade Xolair Mabthera a
Based on SPCs.
INN
Incidence of Antibodies (%)
Incidence of Local Skin Reactions (%)
adalimumab infliximab omalizumab rituximab
12 24 0 1
20 0 45 >1
332
BIOMARKERS FOR THE IMMUNOGENICITY OF THERAPEUTIC PROTEINS
receptor (EGFR) inhibitors and beta interferon. EGFR plays a key role in normal skin function; thus, it is very likely that the rash and other skin reactions during therapy with EGFR inhibitors are caused by the inhibition of EGFR [14,15]. Interferon beta is known to induce high levels of neutralizing antibodies, but it seems that the adverse effects and skin reactions that occur during the first months of treatment have already disappeared when patients develop antibodies. Most probably, these adverse effects are a direct pharmacodynamic effect of interferon. Indeed, patients with neutralizing antibodies have a smaller risk of adverse events such as injection site reactions than do patients without [16,17]. So both local skin reactions at the site of injection and generalized reactions can be seen after the use of therapeutic proteins. These skin reactions can have different causes and cannot be used as biomarkers for immunogenicity.
BIOMARKERS FOR THE IMMUNOGENICITY OF THERAPEUTIC PROTEINS Two classes of biomarkers are used to indicate the clinical consequences of immunogenicity of therapeutic proteins: 1. General: persisting levels of neutralizing antibodies 2. Specific: loss of activity of endogenous homolog; increase in specific disease marker; decrease in efficiency marker Biomarkers are also used to indicate the loss of efficiency of therapeutic proteins by antibodies: 1. Monoclonal antibodies: increase in side effects 2. Other therapeutic proteins: loss of side effects Structural properties are the main primary factors of the induction of antibodies [18]. Therapeutic proteins of nonhuman origin will induce antibodies in the majority of patients after a limited number of applications. The degree of nonself and the presence of T-cell epitopes and the relative lack of immune tolerance are predictors of the antibody response. Human homologs are less likely to be immunogenic. The best structural predictor of breaking tolerance is the presence of aggregates. The only animal model available for this type of immunogenicity is immune-tolerant transgenic mice. Induction of antibodies to human proteins usually occurs only after prolonged exposure. Independent of whether the protein is self or nonself, the possible immune modulating effect of the therapeutic protein, concomitant immune suppressive therapy, and the immune status of the patients are important predictors of an antibody response.
REFERENCES
333
Antibody formation is by definition the marker for the immunogenicity of therapeutic proteins. The role of the cellular immunity is largely unknown and may be absent in the case of breaking B-cell tolerance. The occurrence of clinical consequences is in the majority of cases associated with relative high and persisting levels of neutralizing antibodies. In some cases the occurrence of neutralizing antibodies is preceded by binding antibodies, which may interfere with the pharmacokinetics of the proteins. As discussed extensively, skin reactions cannot be seen as signs of the immunogenicity of proteins. Often, the loss of efficacy by neutralizing antibodies is difficult to assess because the diseases involved are chronic diseases with an unpredictable development and the proteins have only a limited effect. In these cases surrogate markers for efficacy may be monitored. Also, the effect on the side effects may be used as a marker. If the side effects are caused by the pharmacodynamic effect of the protein drugs, the loss of side effects is indicative of the development of neutralizing antibodies. If the side effect is the result of immune complexes, their appearance is associated with the induction of antibodies. REFERENCES 1. Schellekens H (2002). Bioequivalence and the immunogenicity of biopharmaceuticals. Nat Rev Drug Discov, 1(6):457–462. 2. Chackerian B, Lenz P, Lowy DR, Schiller JT (2002). Determinants of autoantibody induction by conjugated papillomavirus virus-like particles. J Immunol, 169:6120–6126. 3. Hermeling S, Schellekens H, Maas C, Gebbink MF, Crommelin DJ, Jiskoot W (2006). Antibody response to aggregated human interferon alpha2b in wild-type and transgenic immune tolerant mice depends on type and level of aggregation. J Pharm Sci, 95(5):1084–1096. 4. Kessler M, Goldsmith D, Schellekens H (2006). Immunogenicity of biopharmaceuticals. Nephrol Dial Transplant, 21(Suppl 5):v9–v12. 5. Hochuli E (1997). Interferon immunogenicity: technical evaluation of interferonalpha 2a. J Interferon Cytokine Res, 17:S15–S21. 6. Wierda D, Smith HW, Zwickl CM (2001). Immunogenicity of biopharmaceuticals in laboratory animals. Toxicology, 158(1–2):71–74. 7. Zwickl CM, Cocke KS, Tamura RN, et al. (1991). Comparison of the immunogenicity of recombinant and pituitary human growth hormone in rhesus monkeys. Fundam Appl Toxicol, 16(2):275–287. 8. Palleroni AV, et al. (1997). Interferon Immunogenicity: Preclinical Evaluation of Interferon-a2a. J Interferon Cytokine Res, 19(Suppl 1):s23–s27. 9. Stewart TA, Hollingshead PG, Pitts SL, et al. (1989). Transgenic mice as a model to test the immunogenicity of proteins altered by site-specific mutagenesis. Mol Biol Med, 6(4):275–281. 10. Casadevall N, Nataf J, Viron B, et al. (2002). Pure red-cell aplasia and antierythropoietin antibodies in patients treated with recombinant erythropoietin. N Engl J Med, 346:469–475.
334
BIOMARKERS FOR THE IMMUNOGENICITY OF THERAPEUTIC PROTEINS
11. Janeway CA (2005). Immunobiology: The Immune System in Health and Disease, 6th ed. Garland Science, New York, pp. 517–555. 12. Frost N (2005). Antibody-mediated side effects of recombinant proteins. Toxicology, 209(2):155–160. 13. Thielen AM, Kuenzli S, Saurat JH (2005). Cutaneous adverse events of biological therapy for psoriasis: review of the literature. Dermatology, 211(3):209–217. 14. Lacatoure ME (2006). Mechanisms of cutaneous toxicities to EGFR inhibitors. Nat Rev Cancer, 6(10):803–812. 15. Robert C, Soria JC, Spatz A, et al. (2005). Cutaneous side-effects of kinase inhibitors and blocking antibodies. Lancet Oncol, 6(7):491–500. 16. Francis GS, Rice GP, Alsop JC (2005). Interferon beta-1a in MS: results following development of neutralizing antibodies in PRISMS. Neurology, 65(1):48–55. 17. Panitch H, Goodin D, Francis G (2005). Benefits of high-dose, high-frequency interferon beta-1a in relapsing–remitting multiple sclerosis are sustained to 16 months: final comparative results of the EVIDENCE trial. J Neurol Sci, 239(1):67–74. 18. Hermeling S, Crommelin DJ, Schellekens H, Jiskoot W (2004). Structure– immunogenicity relationships of therapeutic proteins. Pharm Res, 21(6):897–903.
17 NEW MARKERS OF KIDNEY INJURY Sven A. Beushausen, Ph.D. Pfizer Global Research and Development, Chesterfield, Missouri
INTRODUCTION The current biomarker standards for assessing acute kidney injury (AKI), caused by disease or as a consequence of drug-induced toxicity, include blood urea nitrogen (BUN) and serum creatinine (SC). Retention of either marker in the blood is indicative of impedance in the glomerular filtration rate (GFR), which if left untreated could escalate to serious kidney injury through loss of function and ultimately, death. Although the colorimetric assays developed for SC and BUN are relatively quick, seconds as compared to hours for antibody-based analysis platforms like enzyme-linked immunosorbent assay (ELISA), they are poor predictors of kidney injury because they both suffer from a lack of sensitivity and specificity. For example, SC concentration is greatly influenced by other nonrenal factors, including gender, age, muscle mass, race, drugs, and protein intake [1]. Consequently, increases in BUN and SC levels report out injury only after serious kidney damage has occurred. These shortcomings rather limit their clinical utility to patients who are at risk of developing drug-induced AKI or where instances of AKI have already been established, require frequent monitoring, and time to treatment is critical. Renal failure or AKI is often a direct consequence of disease, can result from complications associated with disease or postsurgical trauma like sepsis,
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
335
336
Heparin, warfarin, streptokinase
Aminoglycosides, radiocontrast media, cisplatin, nedaplatin, methoxyflurane, outdated tetracycline, amphotericin B, cephaloridine, streptozocin, tacrolimus, carbamazepine, mithramycin, quinolones, foscarnet, pentamidine, intravenous gammaglobulin, fosfamide, zoledronate, cidofovir,adefovir,tenofovir, mannitol, dextran, hydroxyethylstarch
Tubular toxicity
Discontinue medication, supportive care, plasmapheresis if indicated Drug discontinuation, supportive care
Fever, microangiopathic, hemolytic anemia, thrombocytopenia FENa > 2%, UOsm < 350 urinary sediment with granular casts, tubular epithelial cells
Discontinue medication, supportive care, plasmapheresis if indicated
Fever, microangiopathic, hemolytic anemia, thrombocytopenia
Ciclosporin, tacrolimus, mitomycin C, conjugated estrogens, quinine, 5-fluorouracil, ticlopidine, clopidogrel, interferon, valaciclovir, gemcitabine, bleomycin
Intrinsic renal injury Vascular effects Thrombotic microangiopathy
Cholesterol emboli
Suspend or discontinue medication, volume replacement as clinically indicated
Benign urine sediment, FENa < 1%, UOsm > 500
Diuretics, NSAIDs, ACE inhibitors, ciclosporin, tacrolimus, radiocontrast media, interleukin-2, vasodilators (hydralazine, calcium-channel blockers, minoxidil, diazoxide)
Treatment
Prerenal injury
Clinical Findings
Medication
Common Medications Associated with Acute Renal Injurya
Pathoetiology
TABLE 1
337
Quinine, quinidine, sulfonamides, hydralazine, triamterene, nitrofurantoin, mephenytoin
Penicillin, methicillin ampicillin, rifampin, sulfonamides, thiazides, cimetidine, phenytoin, allopurinol, cephalosporins, cytosine arabinoside, furosemide, interferon, NSAIDs, ciprofloxacin, clarithromycin, telithromycin, rofecoxib, pantoprazole, omeprazole, atazanavir
Gold, penicillamine, captopril, NSAIDs, lithium, mefenamate, fenoprofen, mercury, interferon-α, pamidronate, fenclofenac, tolmetin, foscarnet
Aciclovir, methotrexate, sulfanilamide, triamterene, indinavir, foscarnet, ganciclovir
Methysergide, ergotamine, dihydroergotamine, methyldopa, pindolol, hydralazine, atenolol
Severe hemolysis
Immune-mediated interstitial inflammation
Glomerulopathy
Obstruction Intratubular: (crystalluria and/or renal lithiasis)
Ureteral (secondary to retroperitoneal fibrosis)
Discontinue medication, supportive care
Edema, moderate to severe proteinuria, red blood cells, red blood cell casts possible
Benign urine sediment, hydronephrosis on ultrasound
Discontinue medication, decompress ureteral obstruction by intrarenal stenting or percutaneous nephrostomy
Discontinue medication, supportive care
Discontinue medication, supportive care
Fever, rash, eosinophilia, urine sediment showing pyuria, white cell casts, eosinophiluria
Sediment can be benign with severe obstruction, ATN might be observed
Drug discontinuation, supportive care
Drug discontinuation, supportive care
Treatment
High LDH, decreased hemoglobin
Elevated CPK, ATN urine sediment
Clinical Findings
Source: Adapted from ref. 2. a ACE, angiotensin-converting enzyme; ATN, acute tubular necrosis; CPK, creatinine phosphokinase; FENa, fractional excretion of sodium; LDH, lactate dehydrogenase; NSAIDs, nonsteroidal anti-inflammatory drugs; UOsm, urine osmolality.
Lovastatin, ethanol, codeine, barbiturates, diazepam
Medication
Rhabdomyolysis
Pathoetiology
338
NEW MARKERS OF KIDNEY INJURY
or is produced by drug-induced nephrotoxicity. Drug-induced renal injury is of great concern to physicians. Knowledge of toxicities associated with U.S. Food and Drug Administration (FDA)-approved compounds helps to guide product selection in an effort to manage risk and maximize patient safety. Drug-induced nephrotoxicity is of even greater concern to the pharmaceutical industry, where patient safety is the principal driver in the need to discover safer and more efficacious drugs. Because BUN and SC are such insensitive predictors of early kidney injury, many instances of subtle renal damage caused by drugs may go unrecognized. Consequently, true estimates for druginduced nephrotoxicity are likely to be far lower than previously realized. For example, studies have indicated that the incidence of acute tubular necrosis or acute interstitial nephritis due to medication has been estimated to be as high as 18.6% [2]. In addition, renal injury attributed to treatment with aminoglycosides has been reported to approach 36% [3,4]. Not surprisingly, many common drugs have been associated with renal injury that cause site-specific damage (Table 1). Fortunately, most instances of drug-induced nephrotoxicity are reversible if discovered early and medication is discontinued. Collectively, the combined shortcomings of BUN and SC as predictors of nephrotoxicity and the propensity for many classes of medicines to cause drug-induced nephrotoxicity underscore the urgent need for the development and qualification of more sensitive and specific biomarkers. The benefits such tools will provide include predictive value and earlier diagnosis of drug-induced kidney injury before changes in renal function or clinical manifestations of AKI are evident. More important, biomarkers of nephrotoxicity with increased sensitivity and specificity will be invaluable to drug development both preclinically and clinically. Preclinically, new biomarkers will aid in the development of safer drugs having fewer liabilities with an ultimate goal to considerably lower or even possibly eliminate drug-induced nephrotoxicity. Clinically, the biomarkers will be used to monitor potential nephrotoxic effects due to therapeutic intervention or the potential for new drugs to cause renal toxicity in phase I to III clinical trials.
NEW PRECLINICAL BIOMARKERS OF NEPHROTOXICIY In recent years, two consortia led by the nonprofit organizations ILSI-HESI (International Life Sciences Institute, Health and Environmental Sciences Institute, http://www.hesiglobal.org) and C-Path (Critical Path Institute, http:// www.c-path.org) aligned with leaders in academia, industry, and the FDA with a mission to evaluate the potential utility of newly identified biomarkers of nephrotoxicity for use in preclinical safety studies and to develop a process for the acceptance of the new biomarkers in support of safety data accompanying new regulatory submissions. Several criteria for the evaluation and development of new biomarkers of nephrotoxicity were considered, including:
NEW PRECLINICAL BIOMARKERS OF NEPHROTOXICIY
339
TABLE 2 Biomarkers of Renal Injury by Region of Specificity, Onset, Platform, and Application Biomarker
Injury Related to:
Onset
Platforms
Application
Proximal tubular injury Tubular epithelial cells Tubular dysfunction
Early
Mouse, rat, human, chicken, turkey Mouse, rat, dog, monkey, human Mouse, rat, human
Proximal tubular injury Distal tubule
Early
KIM-1
General kidney injury and disease
Early
Luminex, ELISA Luminex, ELISA Luminex, ELISA Luminex, ELISA Luminex, ELISA Luminex, ELISA
Microalbumin
Proximal tubular injury Tubulointerstitial fibrosis Proximal tubular injury Renal papilla and collecting ducts
Early
β2-Microglobulin Clusterin Cystatin-C GSTα GST Yb1
Osteopontin NGAL RPA-1
Early Late
Early
Late Early Early
Luminex, ELISA Luminex, ELISA Luminex, ELISA ELISA
Mouse, rat, human
Zebrafish, mouse, rat, dog, monkey, human Mouse, rat, dog, monkey, human Mouse, rat, monkey, human Mouse, rat, human Rat
Source: Adapted from ref. 5.
• A preference for noninvasive sample collection. • New biomarkers developed for preclinical use should optimally be translated to the clinic. • Assays for new biomarkers should be robust and kits readily available for testing. • Assays should be multiplexed to minimize cost and expedite sample analysis. • Biomarkers should ideally predict or report out site-specific injury. • Biomarkers must be more sensitive and specific of kidney injury than existing standards. • Biomarkers should be predictive (prodromal) of kidney injury in the absence of histopathology. The preference for noninvasive sample collection made urine the obvious choice of biofluid. Urine has proven to be a fertile substrate for the discovery of promising new biomarkers for the early detection of nephrotoxicity [5]. A number of these markers have been selected for further development and
340
NEW MARKERS OF KIDNEY INJURY
qualification by the ILSI-HESI and C-Path Nephrotoxicity Working Groups in both preclinical and clinical settings, with the exception of RPA-1 and GST Yb1 (Biotrin), which are markers developed specifically for the analysis of kidney effects in rats (Table 2). The utility and limitations of each marker used in the context of early and site-specific detection are discussed below. β2-Microglobulin Human β2-microglobulin (β2M) was isolated and characterized in 1968 [6]. β2M was identified as a small 11,815-Da protein found on the surface of human cells expressing the major histocompatibility class I molecule [7]. β2M is shed into the circulation as a monomer, from which it is normally filtered by the glomerulus and subsequently reabsorbed and metabolized within proximal tubular cells [8]. Twenty-five years ago, serum β2M was advocated for use as an index of renal function because of an observed proportional increase in serum β2M levels in response to decreased renal function [9]. It has since been abandoned due to a number of factors complicating the interpretation of the findings. More recently, increased levels of intact urinary β2M have been directly linked to impairment of tubular uptake. Additional work in rats and humans has demonstrated that increased urinary levels of β2M can be used as a marker for proximal tubular function when β2M production and glomerular filtration are normal in a setting of minimal proteinuria [10–13]. Urinary β2M has been shown to be superior to N-acetyl-β-glucosaminidase as a marker in predicting prognosis in idiopathic membranous neuropathy [14]. In this context β2M can be used to monitor and avoid unnecessary immunosuppressive therapy following renal transplantation. β2M is being considered for evaluation as an early predictor of proximal tubular injury in preclinical models of drug-induced nephrotoxicity. Although easily detected in urine, there are several factors that may limit its value as a biomarker. For example, β2M is readily degraded by proteolytic enzymes at room temperature and also degrades rapidly in an acidic environment at or below pH < 6.0 [15]. Therefore, great care must be taken to collect urine in an ice-cold, adequately buffered environment with the addition of stabilizers to preserve β2M levels during the period of collection and in storage. It is unlikely that β2M will be used as a stand-alone marker to predict or report proximal tubule injury preclinically or in the clinic. Rather, it is likely to be used in conjunction with other proximal tubule markers to support such a finding. A brief survey for commercially available antibodies used to detect β2M indicates that most are species specific (http://www.abcam.com). Instances of cross-reactivity were noted for specific reagents between human and pig, chicken and turkey, and human and other primates. A single monoclonal reagent is reported to have cross-reactivity with bovine, chicken, rabbit, and mouse, and none were listed that specifically recognized dog β2M. Because both commonly used preclinical species rat and dog β2M proteins share only 69.7% and 66.7% amino acid identity with the human protein (http://www.expasy.org), it would be prudent to develop and
NEW PRECLINICAL BIOMARKERS OF NEPHROTOXICIY
341
characterize antibody reagents specific to each species and cross-reacting antisera to specific amino acid sequences shared by all three proteins. Clusterin Clusterin is a highly glycosylated and sulfated secreted glycoprotein first isolated from ram rete testes fluid in 1983 [16]. It was named clusterin because of its ability to elicit clustering of Sertoli cells in vitro [17]. Clusterin is found primarily in the epithelial cells of most organs. Tissues with the highest levels of clusterin include testis, epididymus, liver, stomach, and brain. Metabolic and cell-specific functions assigned to clusterin include sperm maturation, cell transformation, complement regulation, lipid transport, secretion, apoptosis, and metastasis [18]. Clusterin is also known by a number of synonyms as a consequence of having been identified simultaneously in many parallel lines of inquiry. Names include glycoprotein III (GPIII), sulfated glycoprotein-2 (SG-2), apolipoprotein J (apo J), testosterone-repressed message-2 (TRPM2), complement associated protein SP-40, 40, and complement cytolysis inhibitor protein (see Table 1). Clusterin has been cloned from a number of species, including the rat [19]. The human homolog is 449 amino acids in length, coding for a protein with a molecular weight of 52,495 Da [20]. However, due to extensive posttranslational modification, the protein migrates to an apparent molecular weight of 70 to 80 kDa following sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). Amino acid identity between species is moderate. Human clusterin shares 70.3%, 76.6%, 71.7%, and 77% with the bovine, mouse, pig, and rat homologs, respectively (http://www.expasy.org). Clusterin is a heterodimer comprised of an α and a β subunit, each having an apparent mass of 40 kDa by SDS-PAGE. The subunits result from the proteolytic cleavage of the translated polypeptide at amino acid positions 23 and 277. This eliminates the leader sequence and produces the mature 205amino acid β subunit and the remaining 221-amino acid α subunit. The α and β subunits are held together by five sulfhydryl bonds afforded by cysteine residues clustered within each of the subunits [21]. In addition, each subunit has three N-linked carbohydrates that are also heavily sulfated, giving rise to the higher apparent molecular weight observed following SDS-PAGE. Considerable evidence has been provided suggesting that clusterin plays an important role in development. For example, clusterin mRNA expression has been observed at 12.5 days’ postgestation in mice, where it is present in all germ cell layers [22]. Furthermore, stage-specific variations of the transcript have been observed, as have changes in specific localization during development. Similarly, changes in the developmental expression of clusterin in kidney, lung, and nervous system have also been reported [23]. These observations suggest that clusterin might play a role in tissue remodeling. In the developing murine kidney, clusterin is expressed in the tubular epithelium and later in development is diminished as tubular maturation progresses [24]. Interestingly, clusterin is observed in newly formed tubules
342
NEW MARKERS OF KIDNEY INJURY
but appears to be absent in glomeruli. Of interest to many investigators of renal function is the reemergence of clusterin observed following induction of a variety of kidney diseases and drug-induced renal injury. Clusterin induction has been observed following ureteral obstruction [25] and ischemia reperfusion injury [26]. Elevations in the levels of clusterin have also been observed in the peri-infarct region following subtotal nephrectomy [27] and in animal models of hereditary polycystic kidney disease [28]. Marked increases of clusterin released in urine have also been recorded in animal models of aminoglycoside-induced nephrotoxicity [29–31]. Authors have opined that clusterin functions in either a protective role by scavenging cell debris or may play a role in the process of tissue remodeling following cellular injury based on these observations. Collectively, the body of work linking elevated levels of urinary clusterin to kidney damage has suggested that measurement of urinary clusterin may be useful as a marker of renal tubular injury. Indeed, an early study comparing urinary levels of clusterin against N-acetyl-β-glucosaminidase (NAG) following chronic administration of gentamicin over a two-month period demonstrated that while the urinary levels of both proteins rose rapidly, peaked, and then declined, clusterin levels remained significantly higher than control values over the duration of the experiment. By contrast, NAG levels dropped to within control values within 10 days of treatment even though evidence of tubulointerstitial disease persisted [30]. More recent work examining the levels of urinary clusterin in the autosomal-dominant polycystic kidney disease (cy/+) rat model compared to the FHH rat model of focal segmental glomerulosclerosis following bilateral renal ischemia demonstrated that clusterin levels correlated with the severity of tubular damage and suggested use as a marker for differentiating between tubular and glomerular damage [32]. Although the value of clusterin as an early marker of tubular epithelial injury has not yet been established clinically, preclinical findings suggest that it is an ideal candidate for translation to the clinic as an early marker of nephrotoxicity. Cystatin-C Cystatin C (Cys-C) is a 13-kDa nonglycosylated protein belonging to the superfamily of cysteine protease inhibitors [33]. Cys-C is produced by all nucleated cells and, unlike SC, is unaffected by muscle mass. Serum Cys-C was suggested to be closer to the “ideal” biomarker reporting GFR because although freely filtered by the glomerulus, it is not secreted. Instead, Cys-C is adsorbed by tubular epithelial cells, where it is catabolized and is not returned to the bloodstream, thus obviating the need to calculate urinary Cys-C to measure GFR [34]. Several studies have been designed to examine the usefulness of serum Cys-C as a measure or biomarker of GFR [35]. In one such study, serum Cys-C was shown to be a useful biomarker of acute renal failure and could be detected one to two days prior to the elevation in levels of SC, the accepted clinical diagnosis of AKI [36]. Although earlier in detection than
NEW PRECLINICAL BIOMARKERS OF NEPHROTOXICIY
343
SC, serum Cys-C levels were not predictive of kidney disease and, like SC, reported out kidney injury long after serious damage had occurred. In another study, investigators monitored and compared the levels of serum Cys-C and urinary Cys-C in patients following cardiothoracic surgery with and without complicating AKI [37]. The results clearly demonstrated that while plasma Cys-C was not a useful predictor of AKI, early and persistent increases in urinary Cys-C correlated with the development and severity of AKI. Another interesting but unexplained observation in this study was that women had significantly higher postoperative levels of urinary Cys-C than did men even though preoperative Cys-C levels were similar. These data have prompted groups like ILSI-HESI and C-Path to examine the utility of urinary Cys-C as a preclinical biomarker of drug-induced renal injury in the hope that elevated levels of Cys-C can be detected in urine prior to the emergence of overt tubular dysfunction. Glutathione S-Transferases The glutathione S-transferases (GSTs) form a family of homo-and heterodimeric detoxifying enzymes [38] identified originally as a group of soluble liver proteins that play a major role in the detoxification of electrophilic compounds [39]. They have since been shown to be products of gene superfamilies [40] and are classified into alpha, mu, pi, and theta subfamilies based on sequence identity and other common properties [41]. Tissue distribution and levels of GST isoform expression has been determined by immunohistochemical localization [42], isoform-specific peptide antibody Western blotting, and mass spectrometry [40]. Analysis of GST subunit diversity and tissue distribution using peptide-specific antisera has shown GST μ isoforms to be the most widely distributed class of GSTs, with expression evident in brain, pituitary, heart, lung, adrenal gland, kidney, testis, liver, and pancreas, with the highest levels of GST μ1 observed in adrenals, testis, and liver. Isoforms of the GSTα subfamily, also known by the synonyms glutathione S-transferase-1, glutathione S-transferase Ya-1, GST Ya1, ligandin, GST 1a-1a, GST B, GST 1-1, and GST A1-1 (http://www.expasy.org/uniprot/P00502), are more limited in distribution, with highest levels of expression observed in hepatocytes and proximal tubular cells of the kidney [42]. Indeed, proximal tubular GSTα levels have been reported to approximate 2% of total cytosolic protein following exposure to xenobiotics or renal toxins [43]. In the Rowe study [40], GSTα was found to be rather evenly distributed between adrenals, kidney, and pancreas, with highest levels observed in liver, whereas isoforms of the GSTπ subclass were expressed in brain, pituitary, heart, liver, kidney, and adrenals, with highest levels of expression observed in kidney. The high levels of expression and differential distribution of GST isoforms made them attractive candidates as biomarkers that could be used to indicate site-specific drug-induced nephrotoxicity. For example, development of a radioimmunoassay to quantify leakage of ligandin (GSTα)
344
NEW MARKERS OF KIDNEY INJURY
into the urine as a measure of nephrotoxicity in the preclinical rat model was reported as early as 1979 [44]. Subsequent work described the development of a radioimmunoassay for the quantitation of GSTπ in the urine [45a] later used as an indicator of distal tubular damage in the human kidney [45b]. Additional work described the development of a multiplexed ELISA for the simultaneous quantitation of GSTα and GSTπ to discriminate between proximal and distal tubular injury, respectively [46]. In terms of sensitivity, a study examining the nephrotoxic effects of the sevoflurane degradation product, fluoromethyl-2,2-difluoro-1-(trifluoromethyl) vinyl ether, in rats showed urinary GSTα to be the most sensitive marker of mild proximal tubular damage compared to other urinary markers measured, including protein and glucose [47]. A second study in which four human volunteers were given sevoflurane demonstrated abnormalities in urinary glucose, albumin, GSTα, and GSTπ, while levels of BUN or SC were unaffected, suggesting that the GSTs were more sensitive markers of site-specific dug-induced nephrotoxicity [48]. Immunohistochemical staining of the rat kidney with antibodies to different GST isoforms has shown that GSTα subunits are expressed selectively in the proximal tubule, whereas GSTμ and π subnits are localized to the thin loop of Henle and proximal tubules, respectively [38]. An examination of the distribution of the rat GSTμ equivalent, GSTYb1, in the kidney indicates that it is localized to the distal tubules. Simultaneous measurement of urinary GSTα and GSTYb1 has been used to discriminate between drug-induced proximal and distal tubular injury (cited by Kilty et al. [49]). The high levels of GSTs in the kidney and site-specific localization of different GST classes in addition to increased sensitivity in detecting drug-induced nephrotoxicity in humans make them ideal candidates for the development and testing of preclinical markers that predict or report early signs of nephrotoxicity to support preclinical safety studies and subsequent compound development. Kidney Injury Molecule 1 Rat kidney injury molecule 1 (KIM-1) was discovered as part of an effort to identify genes implicated in kidney injury and repair [50] using the polymerase chain reaction (PCR) subtractive hybridization technique of representational difference analysis originally developed to look at differences in genomic DNA [51] but adapted to examine differences in mRNA expression [52]. Complementary DNA generated from poly(A+) mRNA purified from normal and 48-hour postischemic rat kidneys was amplified to generate driver and tester amplicons, respectively. The amplicons were used as templates to drive the subtractive hybridization process to generate designated differential products, three of which were ultimately gel purified and subcloned into the pUC18 cloning vector. Two of these constructs were used to screen λZapII cDNA libraries constructed from 48-hour postischemic rat kidneys. Isolation and purification of positively hybridizing plaques resulted in the recovery of a
NEW PRECLINICAL BIOMARKERS OF NEPHROTOXICIY
345
2.5-kb clone that contained sequence information on all three designated differential products. A BLAST search of the NCBI database revealed that the rat KIM-1 sequence had limited (59.8%) amino acid homology to HAVcr-1, identified earlier as the monkey gene coding for the hepatitis A virus receptor protein [53]. The human homolog of KIM-1 was isolated by low-stringency screening of a human embryonic liver λgt10 cDNA library using the same probe that yielded the rat clones [50]. The plaque of one of two clones purified from this exercise was shown to code for a 334-amino acid protein sharing 43.8% identity and 59.1% similarity to the rat KIM-1 protein. Comparison to the human HAVcr protein [54] revealed 85.3% identity demonstrating a clear relationship between the two proteins. Subsequent work has demonstrated that KIM-1 and HAVcr are synonyms for the same protein, also known as T-cell immunoglobulin and mucin domain–containing protein 1 (TIMD-1) and TIM-1. The TIMD proteins are all predicted to be type I membrane proteins that share a characteristic immunoglobulin V, mucin, transmembrane, and cytoplasmic domain structure [55]. It is not clear what the function of KIM-1 (TIMD-1) is, but it is believed that TIMD-1 is involved in the preferential stimulation of Th2 cells within the immune system [56]. In the rat, KIM-1 mRNA expression is highest in liver and barely detected in kidney [50]. KIM-1 mRNA and protein expression are dramatically up-regulated following ischemic injury. Immunohistochemical examination of kidney sections using a rat-specific KIM-1 antibody showed that KIM-1 is localized to regenerating proximal tubule epithelial cells. KIM-1 was proposed as a novel biomarker for human renal proximal tubule injury in a study that demonstrated that KIM-1 could be detected in the urine of patients with biopsy-proven acute tubular necrosis [57]. Human KIM-1 occurs as two splice variants that are identical with respect to the extracellular domains but differ at the carboxy termini and are differentially distributed throughout tissues [58]. Splice-variant KIM-1b is 25 amino acids longer than the originally identified KIM-1a and is found predominantly in human kidney. Interestingly, cell lines expressing endogenous KIM-1 or recombinant KIM-1b constitutively shed KIM-1 into the culture medium, and shedding of KIM-1 could be inhibited with metalloprotease inhibitors, suggesting a mechanism for KIM-1 release into the urine following the regeneration of proximal tubule epithelial cells as a consequence of renal injury. Evidence supporting KIM-1’s potential as a biomarker for general kidney injury and repair was clearly demonstrated in another paper describing the early detection of urinary KIM-1 protein in a rat model of drug-induced renal injury. In this study increases in KIM-1 were observed before significant increases in SC levels could be detected following injury with folic acid and prior to measurable levels of SC in the case of cisplatin-treated rats [59]. In later, more comprehensive studies examining the sensitivity and specificity of KIM-1 as an early biomarker of mechanically- [60] or drug-induced renal injury [61], KIM-1 was detected earlier than any of the routinely used biomarkers of renal injury,
346
NEW MARKERS OF KIDNEY INJURY
including BUN, SC, urinary NAG, glycosuria, and proteinuria. Certainly, the weight of evidence described above supports the notion that KIM-1 is an excellent biomarker of AKI and drug-induced renal injury. The increasing availability of antibody-based reagents and platforms to rat and human KIM-1 proteins offer convenient and much needed tools for preclinical safety assessment of drug-induced renal toxicity and for aid in diagnosing or monitoring mild to severe renal injury in the clinic. Further work is required to determine if KIM-1 is a useful marker for long-term injury and whether it can be used in combination with other makers to determine site-specific kidney injury. Microalbumin The examination of proteins excreted into urine provides useful information about renal function (reviewed in [62]). Tamm–Horsfall proteins that originate from renal tubular cells comprise the largest fraction of protein excreted in normal urine. The appearance of low-molecular-weight urinary proteins normally filtered through the basement membrane of the glomerulus, including insulin, parathormone, lysozyme, trypsinogen and β2-microglobulin indicate some form of tubular damage [63]. The detection of highermolecular-weight (40- to 150-kDa) urinary proteins not normally filtered by the glomerulus, including albumin, transferrin, IgG, caeruloplasmin, α1-acid glycoprotein, and HDL, indicate compromised glomerular function [64]. Albumin is by far the most abundant protein constituent of proteinuria. Although gross increases in urinary albumin measured by the traditional dipstick method with a reference interval of 150 to 300 mg/mL have been used to indicate impairment of renal function, there are many instances of subclinical increases of urinary albumin within the defined reference interval that are predictive of disease [65–67]. The term microalbuminuria was coined to define this phenomenon, where such increases had value in predicting the onset of nephropathy in insulindependant diabetes mellitus [68]. The accepted reference interval defined for microalbuminuria is between 30 to 300 mg in 24 hours [69,70]. Because microalbuminuria appears to be a sensitive indicator of renal injury, there is a growing interest in the nephrotoxicity biomarker community to evaluate this marker in the context of an early biomarker predictive of drug-induced renal injury. Although microalbuminuria has traditionally been used in preclinical drug development to assess glomerular function there is growing evidence to suggest that albuminuria is a consequence of impairment of the proximal tubule retrieval pathway [71]. Evidence that microalbuminuria might provide value in diagnosing drug-induced nephrotoxicity was reported in four of 18 patients receiving cisplatin, ifosamide, and methotrextate to treat osteosarcoma [72]. Because microalbuminuria can be influenced by other factors unrelated to nephrotoxicity, including vigorous exercise, hematuria, urinary tract infection, and dehydration [5], it may have greater predictive value for renal
NEW PRECLINICAL BIOMARKERS OF NEPHROTOXICIY
347
injury in the context of a panel of markers with increased sensitivity and site specificity. Indeed, further evaluation of microglobulin as an early biomarker of site-specific or general nephrotoxicity is required before qualification for preclinical and clinical use. Osteopontin Osteopontin (OPN) is a 44-kDa highly phosphorylated secreted glycoprotein originally isolated from bone [73]. It is an extremely acidic protein with an isoelectric point of 4.37 (http://www.expasy.org/uniprot/P10451), made even more acidic through phosphorylation on a cluster of up to 28 serine residues [74]. Osteopontin is widely distributed among different tissues, including kidney, lung, liver, bladder, pancreas, and breast [75] as well as macrophages [76], activated T-cells [77], smooth muscle cells [78], and endothelial cells [79]. Evidence has been provided demonstrating that OPN functions as a calcium oxalate crystal formation inhibitor in cultured murine kidney cortical cells [80]. Immunohistochemical and in situ hybridization examination of the expression and distribution of OPN protein and mRNA in the rat kidney clearly demonstrated that levels are highest in the descending thin loop of Henle and cells of the papillary surface epithelium [81]. Uroprontin, first described as a relative of OPN was among the first examples of OPN isolation from human urine [82]. Although normally expressed in kidney, OPN expression can be induced under a variety of experimental pathologic conditions [83,84], including tubulointerstitial nephritis [85], cyclosporine-induced neuropathy [86], hydronephrosis as a consequence of unilateral ureteral ligation [87], renal ischemia [88], nephropathy induced by cisplatin, and crescentric glomeulonephritis [89]. Up-regulation of OPN has been reported in a number of animal models of renal injury, including drug-induced nephrotoxicity by puromycin, cylcoaporine, strptozotocin, phenylephrine, and gentamicin (reviewed in [90a]). In the rat, gentamicin-induced acute tubular necrosis model OPN levels were highest in regenerating proximal and distal tubules, leading the authors to conclude that OPN is related to the proliferation and regeneration of tubular epithelial cells following tubular damage [90b]. Although osteopontin has been proposed as a selective biomarker of breast cancer [91] and a useful clinical biomarker for the diagnosis of colon cancer [92], OPN shows great promise and requires further evaluation as a clinical biomarker for renal injury. Certainly, the high levels of OPN expression following chemically or physically induced renal damage coupled with the recent availability of antibodybased reagents to examine the levels of mouse, rat, and human urinary OPN provide ample opportunity to evaluate OPN as an early marker of AKI in the clinic and a predictive marker of drug-induced nephrotoxicity preclinically. Further planned work by the ILSI-HESI and C-Path groups hope to broaden our understanding regarding the utility of OPN in either capacity as an early predictor of renal injury.
348
NEW MARKERS OF KIDNEY INJURY
Neutrophil Gelatinase–Associated Lipocalin Neutrophil gelatinase–associated lipocalin (NGAL) was first identified as the small molecular-weight glycoprotein component of human gelatinase affinity purified from the supernatant of phorbol myristate acetate stimulated human neutrophils. Human gelatinase purifies as a 135-kDa complex comprised of the 92-kDa gelatinase protein and the smaller 25-kDa NGAL [93]. NGAL has subsequently been shown to exist primarily in monomeric or dimeric form free of gelatinase. A BLAST search of the 178-amino acid NGAL protein yielded a high degree of similarity to the rat α2 microglobulin-related protein and mouse protein 24p3, suggesting that NGAL is a member of the lipocalin family. Lipocalins are characterized by the ability to bind small lipophilic substances and are thought to function as modulators of inflammation [94]. More recent work has shown that NGAL, also known as siderocalin, complexes with iron and iron-binding protein to promote or accelerate recovery from proximal tubular damage (reviewed in [95]). RNA dot blot analysis of 50 human tissues revealed that NGAL expression is highest in trachea and bone tissue, moderately expressed in stomach and lung with low levels of transcript expression in the remaining tissues examined, including kidney [94]. Because NGAL is a reasonably stable small-molecular-weight protein, it is readily excreted from the kidney and can be detected in urine. NGAL was first proposed as a novel urinary biomarker for the early prediction of acute renal injury in rat and mouse models of acute renal failure induced by bilateral ischemia [96]. Increases in the levels of urinary NGAL were detected in the first hour of postischemic urine collection and shown to be related to dose and length of exposure to ischemia. In this study the authors reported NGAL to be more sensitive than either NAG or β2M, underscoring its usefulness as an early predictor of acute renal injury. Furthermore, the authors proposed NGAL to be an earlier marker predictive of acute renal injury than KIM-1, since the latter reports injury within 24 hours of renal injury compared to 1 hour for NGAL. Marked up-regulation of NGAL expression was observed in proximal tubule cells within 3 hours of ischemia-induced damage, suggesting that NGAL might be involved in postdamage reepithelialization. Additional work demonstrated that NGAL expression was induced following mild ischemia in cultured human proximal tubule cells. This paper also addressed the utility of NGAL as an early predictor of drug-induced renal injury by detecting increased levels of NGAL in the urine of cisplatin-treated mice. Adaptation of the NGAL assay to address utility and relevance in a clinical setting showed that both urinary and serum levels of NGAL were sensitive, specific, and highly predictive biomarkers of acute renal injury following cardiac surgery in children [97]. In this particular study, multivariate analysis showed urinary NGAL to be the most powerful predictor in children that developed acute renal injury. Measurable increases in urinary NGAL concentrations were recorded within 2 hours of cardiac bypass surgery, whereas
NEW PRECLINICAL BIOMARKERS OF NEPHROTOXICIY
349
increases in SC levels were not observed until 1 to 3 days postsurgery. Other examples demonstrating the value of NGAL as a predictive biomarker of early renal injury include association of NGAL with severity of renal disease in proteineuric patients [98] and NGAL as an early predictor of renal disease resulting from contrast-induced nephropathy [99]. NGAL has been one of the most thoroughly studied new biomarkers predictive of AKI as a consequence of disease or surgical intervention, and to a lesser extent, drug-induced renal injury. Sensitive and reliable antibody-based kits have been developed for a number of platforms in both humans and rodents (Table 2) and there is considerable interest in examining both the specificity and sensitivity of NGAL for acceptance as a fit-for-purpose predictive biomarker of drug-induced renal injury to support regulatory submissions. Certainly, because NGAL is such an early marker of renal injury, it will have to be assessed as a stand-alone marker of renal injury as well as in the context of a larger panel of markers that may help define site specific and degree of kidney injury. Renal Papillary Antigen 1 Renal papillary antigen 1 (RPA-1) is an uncharacterized antigen that is highly expressed in the collecting ducts of the rat papilla and can be detected at high levels in rat urine following exposure to compounds that induce renal papillary necrosis [100]. RPA-1 was identified by an IgG1 monoclonal antibody, designated Pap X 5C10, that was generated in mice immunized with pooled fractions of a cleared lysate of homogenized rat papillary tissue following crude DEAE anion-exchange chromatography. Immunohistochemical analysis of rat papillae shows that RPA-I is localized to the epithelial cells lining the luminal side of the collecting ducts and to a lesser extent in cortical collecting ducts. A second publication described the adaptation of three rat papillaspecific monoclonal antibodies, including Pap X 5C10 (PapA1), to an ELISA assay to examine antigen excretion in rat urine following drug-induced injury to the papillae using 2-bromoethanamine, propyleneimine, indomethicin, or ipsapirone [101]. Of the three antibodies evaluated, PapA1 was the only antigen released into the urine of rats following exposure to each of the toxicants. The authors concluded that changes in the rat renal papilla caused by xenobiotics could be detected early by urinary analysis and monitored during follow-up studies. This study also clearly demonstrated that the Pap X 5C10, PapA1, RPA-1 antigen had the potential for use as a site-specific biomarker predictive of renal papillary necrosis. Indeed, the Pap X 5C10 monoclonal antibody was adapted for commercial use as an RPA-1 ELISA kit marketed specifically to predict or monitor site-specific renal injury in the rat [49]. The specificity and sensitivity of the rat reagent has generated a great deal of interest in developing an equivalent reagent for the detection of human papillary injury. Identification of the RPA-1 antigen remains elusive. Early biochemical characterization of the antigen identified it as a large-
350
NEW MARKERS OF KIDNEY INJURY
molecular-weight protein (150 to 200 kDa) that could be separated into two molecular-weight species with isoelectric points of 7.2 and 7.3, respectively [100]. However, purification and subsequent protein identification of the antigen were extremely challenging. A recent attempt at the biochemical purification and identification of the RPA-1 antigen has been equally frustrating, with investigators providing some evidence that the antigen may be a large glycoprotein and suggesting that the carbohydrate moiety is the specific epitope recognized by the Pap X 5C10 monoclonal antibody [102]. This would be consistent with, and help to explain why, the rat reagent does not crossreact with a human antigen in the collecting ducts, as protein glycosylation of related proteins often differs dramatically between species, thereby precluding the likelihood of presenting identical epitopes. Nevertheless, continued efforts toward identifying a human RPA-1 antigen will provide investigators with a sorely needed clinical marker for the early detection of drug-induced renal papillary injury.
SUMMARY A considerable amount of effort has gone into identifying, characterizing, and developing new biomarkers of renal toxicity having greater sensitivity and specificity than the traditional markers, BUN and SC. The issue of sensitivity is a critical one, as the ideal biomarker would detect renal injury before damage is clinically evident or cannot be reversed. Such prodromal biomarkers would provide great predictive value to the pharmaceutical industry in preclinical drug development, where compounds could be modified or development terminated early in response to the nephrotoxicity observed. Even greater value could be realized in the clinic, where early signs of kidney injury resulting from surgical or therapeutic intervention could be addressed immediately before serious damage to the patient has occurred. Several of the candidate markers described above, including β2M, GSTα, microalbumin, KIM-1, and NGAL, have demonstrated great promise as early predictors of nephrotoxicity. Continued investigation should provide ample data from which to make a determination regarding the utility of these markers for preclinical and, ultimately, clinical use. The issue of biomarker specificity is also of great value because it provides information regarding where and to what extent injury is occurring. For example, increases in levels of SC and BUN inform us that serious kidney injury has occurred but does not reveal the precise nature of that injury, whereas the appearance of increased levels of β2M, GSTα, microalbumin, and NGAL indicate some degree of proximal tubule injury. Similarly, RPA-1 reports injury to the papilla, clusterin indicates damage to tubular epithelial cells, and GSTYb1 is specific to distal tubular damage. Low-level increases of early markers warn the investigator or clinician that subclinical damage to the kidney is occurring and provides the necessary time to alter or terminate a course of treatment or development.
REFERENCES
351
Monitoring toxicity is an important aspect of achieving a positive clinical outcome and increased safety in drug development. Incorporation of many or all of these markers into a panel tied to an appropriate platform allows for the simultaneous assessment of site-specific kidney injury with some understanding of the degree of damage. Several detection kits are commercially available for many of these new biomarkers of nephrotoxicity. For example, Biotrin International provides ELISA kits for the analysis of urinary GSTs and RPA-1, while Rules Based Medicine and Meso Scale Discovery offer panels of kidney biomarkers multiplexed onto antibody-based fluorescence or chemiluminescent platforms, respectively. As interest in new biomarkers of kidney injury continues to develop, so will the technology that supports them. Presently, all of the commercial reagents and kits supporting kidney biomarker detection are antibody-based. The greatest single limitation of such platforms is how well the reagents perform with respect to target identification, nonspecific protein interaction, and species cross reactivity. Although kits are standardized and come complete with internal controls, kit-to-kit and lab-to-lab variability can be high. Another technology being developed for the purpose of quantifying biomarkers in complex mixtures such as biofluids is mass spectrometry–based multiple reaction monitoring. This technology requires the synthesis and qualification of small peptides specific to a protein biomarker that can be included in a sample as an internal standard to which endogenous peptide can be compared and quantified. This platform is extremely sensitive (femtomolar detection sensitivity), requires very little sample volume, and offers the highest degree of specificity with very short analysis times. Limitations of the platform are related to the selection of an appropriate peptide and the expense of assay development and qualification for use. For example, peptides need to be designed that are isoform specific, being able to discriminate between two similar but not identical proteins. Peptide design is also somewhat empirical with respect to finding peptides that will “fly” in the instrument and produce a robust signal at the detector. The choice of peptides available in a particular target may be limiting given these design restrictions. Consequently, not all proteins may be amenable to this approach. In conclusion, continued improvement in technology platforms combined with the availability of reagents to detect new biomarkers of nephrotoxicity provides both the clinician and the investigator with a variety of tools to predict and monitor early or acute kidney injury. This will be of tremendous value toward saving lives in the clinic and developing safer, more efficacious drugs without nephrotoxic side effects.
REFERENCES 1. Bjornsson TD (1979). Use of serum creatinine concentrations to determine renal function. Clin Pharmacokinet, 4:200–222.
352
NEW MARKERS OF KIDNEY INJURY
2. Choudhury D, Ziauddin A (2005). Drug-associated renal dysfunction and injury. Nat Clin Pract Nephrol, 2:80–91. 3. Kleinknecht D, Landais P, Goldfarb B (1987). Drug-associated renal failure: a prospective collaborative study of 81 biopsied patients. Adv Exp Med Biol, 212:125–128. 4. Kaloyandes GJ, et al. (2001). Antibiotic and Renal Immunosuppression-Related Renal Failure. Lippincott Williams & Wilkins, Philadephia. 5. Vaidya VS, Ferguson MA, Bonventre JV (2008). Biomarkers of acute kidney injury. Annu Rev Pharmacol Toxicol, 48:463–493. 6. Berggard I, Bearn AG (1968). Isolation and properties of a low molecular weight β2-globulin occurring in human biological fluid. J Biol Chem, 213: 4095–4103. 7. Harris HW, Gill TJ III (1986). Expression of class I transplantation antigens. Transplantation, 42:109–117. 8. Bernier GM, Conrad ME (1969). Catabolism of β2-microglobulin by the rat kidney. Am J Physiol, 217:1350–1362. 9. Shea PH, Mahler JF, Horak E (1981). Prediction of glomerular filtration rate by serum creatinine and β2-microglobulin. Nephron, 29:30–35. 10. Eddy AA, McCullich L, Liu E, Adams J (1991). A relationship between proteinuria and acute tubulointerstitial disease in rats with experimental nephritic syndrome. Am J Pathol, 138:1111–1123. 11. Holm J, Hemmingsen L, Nielsen NV (1993). Low-molecular-mass proteinuria as a marker of proximal renal tubular dysfunction in normo- and microalbuminuric non-insulin-dependent subjects. Clin Chem, 39:517–519. 12. Kabanda A, Jadoul M, Lauwerys R, Bernard A, van Ypersele de Strihou C (1995). Low molecular weight proteinuria in Chinese herbs nephropathy. Kidney Int, 48:1571–1576. 13. Kabanda A, Vandercam B, Bernard A, Lauwerys R, van Ypersele de Strihou C (1996). Low molecular weight proteinuria in human imminodeficiency virus– infected patients. Am J Kidney Dis, 27:803–808. 14. Hofstra JM, Deegans JK, Willems HL, Wetzels FM (2008). Beta-2-microglobulin is superior to N-acetyl-beta-glucosaminindase in predicting prognosis in idiopathic membranous neuropathy. Nephrol Dial Transplant, 23:2546–2551. 15. Davey PG, Gosling P (1982). Beta-2-microglobulin instability in pathological urine. Clin Chem, 28:1330–1333. 16. Blashuck O, Burdzy K, Fritz IB (1983). Purification and characterization of cellaggregating factor (clusterin), the major glycoproteoin in ram rete testis fluid. J Biol Chem, 12:7714–7720. 17. Fritz IB, Burdzy K, Setchell B, Blashuck O (1983). Ram rete testes fluid contains a protein (clusterin) which influences cell–cell interactions in vitro. Biol Reprod, 28:1173–1188. 18. Rosenberg ME, Silkensen J (1995). Clusterin: physiologic and pathophysiologic considerations. Int J Biochem Cell Biol, 27:633–645. 19. Collard MW, Griswold MD (1987). Biosynthesis and molecular cloning of sulfated glycoprotein 2 secreted by rat Sertoli cells. Biochemistry, 26:3297–3303.
REFERENCES
353
20. Kirszbaum L, Sharpe JA, Murphy B, et al. (1989). Molecular cloning and characterization of the novel, human complement-associated protein, SP40,40: a link between the complement and reproductive systems. EMBO J, 8:711–718. 21. Kirszbaum L, Bozas SE, Walker ID (1992). SP-40,40, a protein involved in the control of the complement pathway, possesses a unique array of disulfide bridges. FEBS Lett, 297:70–76. 22. French LE, Chonn A, Ducrest D, et al. (1993). Murine clusterin: molecular cloning and mRNA localization of a gene associated with epithelial differentiation processes during embryogenesis. J Cell Biol, 122:1119–1130. 23. O’Bryan MK, Cheema SS, Bartlett PF, Murphy BF, Pearse MJ (1993). Clusterin levels increase during neuronal development. J Neurobiol I, 24:6617–6623. 24. Harding MA, Chadwick LJ, Gattone VH II, Calvet JP (1991). The SGP-2 gene is developmentally regulated in the mouse kidney and abnormally expressed in collecting duct cysts in polycystic kidney disease. Dev Biol, 146:483–490. 25. Pearse MJ, O’Bryan M, Fisicaro N, Rogers L, Murphy B, d’Apice AJ (1992). Differential expression of clusterin in inducible models of apoptosis. Int Immunol, 4:1225–1231. 26. Witzgall R, Brown D, Schwarz C, Bonventre JV (1994). Localization of proliferating cell nuclear antigen, vimentin, c-Fos, and clusterin in the post-ischemic kidney: evidence for a heterogeneous genetic response among nephron segments, and a large pool of mitotically active and dedifferentiated cells. J Clin Invest, 93:2175–2188. 27. Correa-Rotter R, Hostetter TM, Manivel JC, Eddy AA, Rosenberg ME (1992). Intrarenal distribution of clusterin following reduction of renal mass. Kidney Int, 41:938–950. 28. Cowley BD Jr, Rupp JC (1995). Abnormal expression of epidermal growth factor and sulfated glycoprotein SGP-2 messenger RNA in a rat model of autosomal dominant polycystic kidney disease. J Am Soc Nephrol, 6:1679–1681. 29. Aulitzky WK, Schlegel PN, Wu D, et al. (1992). Measurement of urinary clusterin as an index of nephrotoxicity. Proc Soc Exp Biol Med, 199:93–96. 30. Eti S, Cheng SY, Marshall A, Reidenberg MM (1993). Urinary clusterin in chronic nephrotoxicity in the rat. Proc Soc Exp Biol Med, 202:487–490. 31. Rosenberg ME, Silkensen J (1995). Clusterin and the kidney. Exp Nephrol, 3:9–14. 32. Hidaka S, Kranzlin B, Gretz N, Witzgall R (2002). Urinary clusterin levels in the rat correlate with the severity of tubular damage and may help to differentiate between glomerular and tubular injuries. Cell Tissue Res, 310:289–296. 33. Abrahamson M, Olafsson I, Palsdottir A, et al. (1990). Structure and expression of the human cystatin C gene. Biochem J, 268:287–294. 34. Grubb A (1992). Diagnostic value of cystatin C and protein HC in biological fluids. Clin Nephrol, 38:S20–S27. 35. Laterza O, Price CP, Scott M (2002). Cystatin C: an improved estimator of glomerular function? Clin Chem, 48:699–707. 36. Herget-Rosenthal S, Marggraf G, Husing J, et al. (2004). Early detection of acute renal failure by serum cystatin C. Kidney Int, 66:1115–1122.
354
NEW MARKERS OF KIDNEY INJURY
37. Koyner JL, Bennett MR, Worcester EM, et al. (2008). Urinary cystatin C as an early biomarker of acute kidney injury following adult cardiothoracic surgery. Kidney Int, 74:1059–1069. 38. Rozell B, Hansson H-A, Guthenberg M, Tahir G, Mannervik B (1993). Glutathione transferases of classes α, μ and π show selective expression in different regions of rat kidney. Xenobiotica, 23:835–849. 39. Smith GJ, Ohl VS, Litwack G (1977). Ligandin, the glutathione S-transferases, and chemically induced hepatocarcinogenesis: a review. Cancer Res, 37:8–14. 40. Rowe JD, Nieves E, Listowsky I (1997). Subunit diversity and tissue distribution of human glutathione S-transferases: interpretations based on electospray ionization-MS and peptide sequence–specific antisera. Biochem J, 325:481–486. 41. Mannervik B, Awasthi YC, Board PG, et al. (1992). Nomenclature for human glutathione transferases. Biochem J, 282:305–306. 42. Sundberg AG, Nilsson R, Appelkvist EL, Dallner G (1993). Immunohistochemical localization of alpha and pi class glutathione transferases in normal human tissues. Pharmacol Toxicol, 72:321–331. 43. Beckett GJ, Hayes JD (1993). Glutathione S-transferases: biomedical applications. Adv Clin Chem, 30:281–380. 44. Bass NM, Kirsch RE,Tuff SA, Campbell JA, Saunders JS (1979). Radioimmunoassay measurement of urinary ligandin excretion in nephrotoxin-treated rats. Clin Sci, 56:419–426. 45a. Sundberg AG, Appelkvist EL, Backman L, Dallner G (1994). Quantitation of glutathione transferase-pi in the urine by radioimmunoassay. Nephron, 66: 162–169. 45b. Sundberg AG, Appelkvist EL, Backman L, Dallner G (1994). Urinary pi-class glutathione transferase as an indicator of tubular damage in the human kidney. Nephron, 67:308–316. 46. Sundberg AG, Nilsson R, Appelkvist EL, Dallner G (1995). ELISA procedures for the quantitation of glutathione transferases in the urine. Kidney Int, 48: 570–575. 47. Kharasch ED, Thorning D, Garton K, Hankins DC, Kilty CG (1997). Role of renal cysteine conjugate b-lyase in the mechanism of compound A nephrotoxicity in rats. Anesthesiology, 86:160–171. 48. Eger EI II, Koblin DD, Bowland T, et al. (1997). Nephrotoxicity of sevofluorane versus desulfrane anesthesia in volunteers. Anesth Analg, 84:160–168. 49. Kilty CG, Keenan J, Shaw M (2007). Histologically defi ned biomarkers in toxicology. Expert Opin Drug Saf, 6:207–215. 50. Ichimura T, Bonventre JV, Bailly V, et al. (1998). Kidney injury molecule-1 (KIM-1), a putative epithelial cell adhesion molecule containing a novel immunoglobulin domain, is up-regulated in renal cells after injury. J Biol Chem, 273:4135–4142. 51. Lisitsyn N, Lisitsyn N, Wigler M (1993). Cloning the differences between two complex genomes. Science, 259:946–951. 52. Hubank M, Schatz DG (1994). Identifying differences in mRNA expression by representational difference analysis of cDNA. Nucleic Acids Res, 22:5640–5648.
REFERENCES
355
53. Kaplan G, Totsuka A, Thompson P, Akatsuka T, Moritsugu Y, Feinstone SM (1996). Identification of a surface glycoprotein on African green monkey kidney cells as a receptor for hepatitis A virus. EMBO J, 15:4282–4296. 54. Feigelstock D, Thompson P, Mattoo P, Zhang Y, Kaplan GG (1998). The human homolog of HAVcr-1 codes for a hepatitis A virus cellular receptor. J Virol, 72:6621–6628. 55. Kuchroo VK, Umetsu DT, DeKruyff RH, Freeman GJ (2003). The TIM gene family: emerging roles in immunity and disease. Nat Rev Immunol, 3:454–462. 56. Meyers JH, Chakravarti S, Schlesinger D, et al. (2005). TIM-4 is the ligand for TIM-1 and the TIM-1-TIM-4 interaction regulates T cell proliferation. Nat Immunol, 6:455–464. 57. Won KH, Bailly V, Aabichandani R, Thadhani R, Bonventre JV (2002). Kidney injury molecule-1(KIM-1): a novel biomarker for human renal proximal tubule injury. Kidney Int, 62:237–244. 58. Bailly V, Zhang Z, Meier W, Cate R, Sanicola M, Bonventre JV (2002). Shedding of kidney injury molecule-1, a putative adhesion protein involved in renal regeneration. J Biol Chem, 277:39739–39748. 59. Ichimura T, Hung CC, Yang SA, Stevens JL, Bonventre JV (2004). Kidney injury molecule-1: a tissue and urinary biomarker for nephrotoxicant-induced renal injury. Am J Renal Physiol, 286:F552–F563. 60. Vaidya VS, Ramirez V, Ichimura T, Bobadilla NA, Bonventre JV (2006). Urinary kidney injury molecule-1: a sensitive quantitative biomarker for early detection of kidney tubular injury. Am J Renal Physiol, 290:F517–F529. 61. Zhou Y, Vaidya VS, Brown RP, et al. (2008). Comparison of kidney injury molecule-1 and other nephrotoxicity biomarkers in urine and kidney following acute exposure to gentamicin, mercury and chromium. Toxicol Sci, 101:159–170. 62. Lydakis C, Lip GYH (1998). Microalbuminuria and cardiovascular risk. Q J Med, 91:381–391. 63. Kaysen JA, Myers BD, Cowser DG, Rabkin R, Felts JM (1985). Mechanisms and consequences of proteinuria. Lab Invest, 54:479–498. 64. Noth R, Krolweski A, Kaysen G, Meyer T, Schambelan M (1989). Diabetic nephropathy: hemodynamic basis and implications for disease management. Ann Intern Med, 110:795–813. 65. Viberti GC (1989). Recent advances in understanding mechanisms and natural history of diabetic disease. Diabetes Care, 11:3–9. 66. Morgensen CE (1987). Micoalbuminuria as a predictor of clinical diabetic nephropathy. Kidney Int, 31:673–689. 67. Parving HH, Hommel E, Mathiesen E, et al. (1988). Prevalence of microalbuminuria, arterial hypertension, retinopathy, neuropathy, in patients with insulindependent diabetes. Br Med J, 296:156–160. 68. Viberti GC, Hill RD, Jarrett RJ, Argyropoulos A, Mahmud U, Keen H (1982). Microalbuminuria as a predictor of clinical nephropathy in insulin-dependent diabetes mellitus. Lancet, 319:1430–1432. 69. Morgensen CK, Schmitz O (1988) The diabetic kidney: from hyperfiltration and microalbuminuria to end-stage renal failure. Med Clin North Am, 72:466–492.
356
NEW MARKERS OF KIDNEY INJURY
70. Rowe DJF, Dawnay A, Watts GF (1990). Microalbuminuria in diabetes mellitus: review and recommendations for measurement of albumin in urine. Ann Clin Biochem, 27:297–312. 71. Russo LM, Sandoval RM, McKee M, et al. (2007). The normal kidney filters nephritic levels of albumin retrieved by proximal tubule cells: retrieval is disrupted in nephritic states. Kidney Int, 71:504–513. 72. Koch Nogueira PC, Hadj-Assa A, Schell M, Dubourg L, Brunat-Metigny M, Cochat P (1998). Long-term nephrotoxicity of cisplatin, ifosamide, and methotrexate in osteosarcoma. Pediatr Nephrol, 12:572–575. 73. Prince CW, Oosawa T, Butler WT, et al. (1987). J Biol Chem, 262:2900–2907. 74. Sorensen ES, Hojrup P, Petersen TE (1995). Posttranslational modifications of bovine osteopontin: identification of twenty-eight phophorylation and three O-glycosylation sites. Protein Sci, 4:2040–2049. 75. Brown LF, Berse B, Van de Water L, et al. (1982). Expression and distribution of osteopontin in human tissues: widespread association with luminal epithelial surfaces. Mol Cell Biol, 3:1169–1180. 76. Singh PR, Patarca R, Schwartz J, Singh P, Cantor H (1990). Definition of a specific interaction between the early T lymphocyte activation 1 (Eta-1) protein and murine macrophages in vitro and its effect upon macrophages in vivo. J Exp Med, 171:1931–1942. 77. Weber GF, Cantor H. (1996). The immunology of eta-1/osteopontin. Cytokine Growth Factor Rev, 7:241–248. 78. Giachelli C, Bae N, Lombardi D, Majesky M, Schwartz S (1991). The molecular cloning and characterization of 2B7, a rat mRNA which distinguishes smooth muscle cell phenotypes in vitro and is identical to osteopontin (secreted phosphoprotein I, 2a). Biochem Biophys Res Commun, 177:867–873. 79. Liaw L, Lindner V, Schwartz SM, Chambers AF, Giachelli CM (1995). Osteopontin and beta 3 integrin are coordinately expressed in regenerating endothelium in vivo and stimulate ARG-GLY-ASP-dependent endothelial migration in vitro. Circ Res, 77:665–672. 80. Worcester EM, Blumenthal SS, Beshensky AM, Lewand DL (1992). The calcium oxalate crystal growth inhibitor protein produced by mouse kidney cortical cells in culture is osteopontin. J Bone Miner Res, 7:1029–1036. 81. Kleinman JG, Beshenky A, Worcester EM, Brown D (1995). Expression of osteopontin, a urinary inhibitor of stone mineral crystal growth, in rat kidney. Kidney Int, 47:1585–1596. 82. Shiraga H, Min W, Vandusen WJ, et al. (1992). Inhibition of calcium oxalate growth in vitro by uropontin: another member of the aspartic acid-rich protein superfamily. Proc Natl Acad Sci USA, 89:426–430. 83. Wuthrich RP (1998). The complex role of osteopontin in renal disease. Nephrol Dial Transplant, 13:2448–2450. 84. Rittling SR, Denhardt DT (1999). Osteopontin function in pathology: lessons from osteopontin-deficient mice. Exp Nephrol, 7:103–113. 85. Giachelli CM, Pichler R, Lombardi D, et al. (1994). Osteopontin expression in angiotensin II–induced tubulointerstitial nephritis. Kidney Int, 45:515– 524.
REFERENCES
357
86. Pichler RH, Franseschini N, Young BA, et al. (1995). Pathogenesis of cyclosporine nephropathy: roles of angiotensin II and osteopontin. J Am Soc Nephrol, 6:1186–1196. 87. Diamond JR, Kees-Folts D, Ricardo SD, Pruznak A, Eufemio M (1995). Early and persistent up-regulated expression of renal cortical osteopontin in experimental hydronephrosis. Am J Pathol, 146:1455–1466. 88. Kleinman JG, Worcester EM, Beshensky AM, Sheridan AM, Bonventre JV, Brown D (1995). Upregulation of osteopontin expression by ischemia in rat kidney. Ann NY Acad Sci, 760:321–323. 89. Yu XQ, Nikolic-Paterson DJ, Mu W, et al. (1998). A functional role for osteopontin expression in experimental crescentic glomerulonephritis in the rat. Proc Assoc Am Physicians, 110:50–64. 90a. Xie Y, Sakatsume M, Nishi S, Narita I, Arakawa M, Gejyo F (2001). Expression, roles, receptor, and regulation of osteopontin in the kidney. Kidney Int, 60:1645–1657. 90b. Xie Y, Nishi S, Iguchi S, et al. (2001). Expression of osteopontin in gentamicininduced acute tubular necrosis and its recovery process. Kidney Int, 59: 959–974. 91. Mirza M, Shaunessy E, Hurley JK, et al. (2008). Osteopontin-C is a selective marker for breast cancer. Int J Cancer, 122:889–897. 92. Agrawal D, Chen T, Irby R, et al. (2002). Osteopontin identified as lead marker of colon cancer progression, using pooled sample expression profiling. J Natl Cancer Inst, 94:513–521. 93. Kjeldsen L, Johnson AH, Sengelov H, Borregaard N (1993). Isolation and primary structure of NGAL, a novel protein associated with human neutrophil gelatinase. J Biol Chem, 268:10425–10432. 94. Cowland JB, Borregaard N (1997). Molecular characterization and pattern of tissue expression of the gene for neutrophil gelatinase–associated lipocalin from humans. Genomics, 45:17–23. 95. De Broe M (2006). Neutrophil gelatinase–associated lipocalin in acute renal failure. Kidney Int, 69:647–648. 96. Mishra J, Ma Q, Prada A, et al. (2003). Identification of neutrophil gelatinase– associated protein as a novel early urinary biomarker of renal ischemic injury. J Am Soc Nephrol, 14:2534–2543. 97. Mishra J, Dent C, Tarabishi R, et al. (2005). Neutrophil gelatinase–associated lipocalin (NGAL) as a biomarker for acute renal injury after cardiac surgery. Lancet, 365:1231–1238. 98. Bolignano D, Coppolino G, Campo S, et al. (2007). Urinary neutrophil gelatinase–associated lipocalin (NGAL) is associated with severity of renal disease in proteinuric patients. Nephrol Dial Transplant, 23:414–416. 99. Hirsch R, Dent C, Pfriem H, et al. (2007). HNGAL as an early predictive biomarker of contrast-induced nephropathy in children. Pediatr Nephrol, 22: 2089–2095. 100. Falkenberg FW, Hildebrand H, Lutte L, et al. (1996). Urinary antigens as markers of papillary toxicity: I. Identification and characterization of rat kidney papillary antigens with monoclonal antibodies. Arch Toxicol, 71:80–92.
358
NEW MARKERS OF KIDNEY INJURY
101. Hildebrand H, Rinke M, Schluter G, Bomhard E, Falkenberg FW (1999). Urinary antigens as markers of papillary toxicity: II. Application of monoclonal antibodies for the determination of papillary antigens in rat urine. Arch Toxicol, 73:233–245. 102. Price S, Betton G. Personal communication, unpublished data.
PART V TRANSLATING FROM PRECLINICAL RESULTS TO CLINICAL AND BACK
359
18 TRANSLATIONAL MEDICINE— A PARADIGM SHIFT IN MODERN DRUG DISCOVERY AND DEVELOPMENT: THE ROLE OF BIOMARKERS Giora Z. Feuerstein, M.D., Salvatore Alesci, M.D., Ph.D., Frank L. Walsh, Ph.D., J. Lynn Rutkowski, Ph.D., and Robert R. Ruffolo, Jr., Ph.D. Wyeth Research, Collegeville, Pennsylvania
DRUG TARGETS: HISTORICAL PERSPECTIVES Drugs are natural or designed substances used deliberately to produce pharmacological effects in humans or animals. Drugs have been part of human civilizations for millennia. However, until the very recent modern era, drugs have been introduced to humans by empiricism and largely by serendipitous events such as encounters with natural products in search of food or by avoiding hazardous plants and animal products. The emergence of the scientific era in drug discovery evolved alongside the emergence of physical and chemical sciences at large, first as knowledge to distill, isolate, and enrich the desired substance from its natural environment, followed by deliberate attempts to modify natural substances to better serve the human needs and desires. Scientific evolution throughout the past two centuries enabled identification of biologically active substances in humans (e.g., hormones) which were
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
361
362
TRANSLATIONAL MEDICINE
manipulated chemically to improve (potency, duration of action, and exposure), or to mitigate or abrogate undesirable actions. The cumulative knowledge of human, animal, and plant biology and chemistry provided the scientific foundation and technical capabilities to alter natural substances purposely in order to improve them. Such evolution marked the era of forward pharmacology. The era of forward pharmacology is about drug design that emanates from primary knowledge of the action of the biological target that has clear biological action. The exponential progress in molecular biology since the mid-twentieth century, culminating in deciphering the complete human genome in the year 2000, brought the dawn of pharmacogenomics and the reverse pharmacology era. The reverse pharmacology era is defined by the need, first, to clarify the biology and medical perspectives of the target so as to qualify it as a drugable and pharmaceutically exploitable for drug discovery and development scheme. The pharmacogenomic era provides vast opportunities for selection of new molecular targets from a gamut of approximately 30,000 primary genes, over 100,000 proteins, and multiples of their translational and metabolomics products. Thus, the permutations in respect to opportunities for pharmacological interventions are unprecedented, vast, and most promising for innovative medicines. The pharmacogenomics era as a source for drug targets also poses unprecedented hurdles in selection, validation, and translation into effective and safe drugs. New technologies continue to drive efficiency and robustness of mining the genomic drug discovery opportunities, but physiological and integrated biology knowledge is lagging. In this perspective, translational medicine and biomarkers research have taken center stage in validation of the molecular target for pharmaceutical exploitation. In this chapter we offer a utilitarian approach to biomarkers and target selection and validation that is driven by the translational medicine prospect of the target to become a successful drug target. We hereby offer classification and analytical process aimed to assess risk, innovation, feasibility, and predictability of success of translating novel targets into successful drugs. We provide clear definitions of the type of biomarkers that are the core of translational medicine and biomarkers research in modern pharmaceutical companies.
BIOMARKERS: UTILITARIAN CLASSIFICATION Biomarkers are the stepping-stones for modern drug discovery and development [1–4]. Biomarkers are defined as biological substances or biophysical parameters that can be monitored objectively and reproducibly and used to predict drug effect or outcome. This broad definition is, however, of little utility to the pharmaceutical process since it carries no qualification for the significance and use of the biomarker. The following classes and definitions of biomarkers are therefore offered:
BIOMARKERS: UTILITARIAN CLASSIFICATION
363
1. Target validation: biomarkers that assess the relevance and the potential for a given target to become the subject of manipulation that will modify the disease to provide clear therapeutic benefits while securing a sufficient therapeutic index of safety and tolerability. 2. Compound-target interaction biomarkers: biomarkers that define the discrete parameters of the compound (or biological) interaction with the molecular target. Such parameters include binding of the compound to the target, its residency time on the target, the specific site of interaction with the target, and the physical or chemical consequences to the target induced by the compound (or biological). 3. Pharmacodynamic biomarkers: biomarkers that predict the consequence(s) of compound (biological) interaction with the target. The pharmacodynamic biomarkers include events that are desired therapeutically and adverse events based on mechanism of action. Pharmacodynamic biomarkers can report on discrete molecular events that are proximal to the biochemical pathway that is modified by the manipulated target or remote consequences such as in vivo or clinical outcomes (morbidity or mortality). Pharmacodynamic biomarkers are diverse and frequently nonobvious. Advanced and sophisticated bioinformatics tools are required for tracking the divergence and convergence of signaling pathways triggered by compound interaction with the target. A subset of the pharmacodynamic biomarkers are consequences induced by the compound outside its intended mechanism of action. Such pharmacodynamic effects are often termed “off-target” effects, as they are not the direct consequence of the compound interaction with the target. Usually, such pharmacodynamic events are due to unforeseen lack of selectivity or metabolic transformations that yielded metabolites not present (or detected) in the animals used for safety and metabolic studies prior to launch of the compound into human trials or into human use. These issues are not dealt with in this chapter. 4. Disease biomarkers: biomarkers that correlate statistically with the disease phenotype (syndrome) for which therapeutics are developed. Correlation of levels (in the circulation, other fluids or tissue) or expression patterns (gene, protein) in peripheral blood cells or tissues should signify disease initiation, progression, regression, remission, or relapse. In addition, duration of aberrantly expressed biomarkers could also be associated with risk for disease, even if the level of the biomarker does not change over time. Since disease biomarkers are defined by their statistical correlation to features of the disease, it is imperative that the clinical phenotyping is clearly defined. Stratification of all possible phenotypic variables is clearly a prerequisite for accurate assessment of the discrete relationships of the biomarker to the disease. Gender, age, lifestyle, medications, and physiological and biochemical similarities are
364
TRANSLATIONAL MEDICINE
often not sufficiently inclusive, resulting in a plethora of disease biomarker claims that are often confusing and futile. 5. Patient selection: biomarkers that are used for selection of patients for clinical studies, specifically proof-of-concept studies or confirmation phase III clinical trials that are required for drug registration. These biomarkers are important in helping to select patients likely to respond (or conversely, not respond) to a particular treatment or a drug’s specific mechanism of action, and potentially predict those patients who may experience adverse effects. Such biomarkers are frequently genetic (single-nucleotide polymorphism, haplotypes) or pharmacogenomic biomarkers (gene expression), but could be any of the primary pharmacodynamic biomarkers. Biomarkers for patient selection are now mainstream in exploratory clinical trials in oncology, where genotyping of tumors in view of establishing the key oncogenic “driver(s)” are critical for the prediction of potential therapeutic benefits of modern treatments with molecular targeting drugs. The success of the new era of molecular oncology (as compared to the cytotoxic era) will depend largely on the ability to define these oncogenic signaling pathways via biomarkers such as phosphorylated oncogenes, or the functional state due to mutations that cause gain or loss of function. 6. Adaptive trial design: The objectives of adaptive design trials are to establish an integrated process to plan, design, and implement clinical programs that leverage innovative designs and enable real-time learning. The method is based on simulation-guided clinical drug development. In a first step, the situation is being assessed, the path forward and decision criteria defined, and assumptions analyzed. Adaptive trials have become an enabler strategy, and they work to integrate competing positions and utilities into a single aligned approach and to force much clearer articulation and quantification on the path forward. Once this framework is established, a formal scenario analysis that compares the fingerprints of alternative designs through simulation is conducted. Designs that appear particularly attractive to the program are further subjected to more extensive simulation. Decision criteria steer away from doses that are either unsafe or nonefficacious and aim quickly to hone in onto the most attractive dose range. Response-adaptive doseranging studies deploy dynamic termination rules (i.e., as soon as a noeffective-dose scenario is established, the study is recommended for termination). Bayesian approaches are ideally suited to enable ongoing learning and dynamic decision making [5]. The integrator role of adaptive trials is particularly strong in establishing links between regulatory accepted “confirm”-type endpoints and translational medicine’s efforts to develop biomarkers. Search for biomarkers that may enable early decision making need to be read out early to gain greater confidence in basing decisions on them. A biomarker can be of value even if it only
BIOMARKERS: UTILITARIAN CLASSIFICATION
365
allows a pruning decision. These considerations highlight the importance of borrowing strength from indirect observations and use mathematical modeling techniques to enhance learning about the research question. For example, in a dose-ranging study, it is assumed that there should be some relationship between the response of adjacent doses, and this assumption can be used to model an algorithm. Both safety and efficacy considerations can be built into this model: ideally, integration of all efforts, from disease modeling in discovery to PK/PD modeling in early clinical development to safety/risk and business case modeling in late development [4–7]. The utility of this system is represented in Figures 1 and 2, which suggest a semiquantitative scoring system that helps assess the strength of the program
Target Validation BioM
Disease BioM
Target/Compound Interaction BioM
Pharmacodynamic BioM
Patients Selection BioM
Minimal data such as human SNP; target expression not defined
1
Weak
2
Moderate
3
Strong
1
Weak
2
Moderate
3
Strong
Surrogate endpoint
1
Weak
Likely very difficult; no prior art
2
Moderate
3
Strong
1
Weak
2 Moderate
Good human genetic and genomics data as well as genetically modified animal models, specific expression of target, and known function All the above and marketed (phase III) compound POC available No validated biomarker Validated but not surrogate
Possible but not proven; prior art with reference cmpd Ligand available, POC with reference cmpd; access to target Unclear; no assays for animal models or humans PD BioM identified in experim ental models and human but no validated assays established PD BioM identified in experim ental models and human but no validated assays established No known genetic/SNP predisposition; no BioM assay
3
Strong
1
Weak
2
Moderate
Known genetic/SNP predisposition
3
Strong
All of the above, prior art with cmpd and assay in place
Figure 1
Criteria for biomarker scoring.
366
TRANSLATIONAL MEDICINE
examples Class A
Target highly specific to the disease
CML: BCI/Abl DVT: FV Leiden MG: anti-AchR Ab
Class B
Target normally present in humans but activated largely in disease state
Thrombosis: GPIIb/IIIa P-selectin
Class C
Target present and functions in normal physiology but excessively active in disease state
Stroke: -glutamate Breast Cancer -GFRK
Class D
Target known to be present and functions indiscriminately in normal or disease states
Hypertension -renin -Ca+2 channel (L-type) -Cholesterol
Safety High
Potential for MOA AE
Figure 2 Type I biomarkers: target validation translational medicine perspectives. CMI, chronic myelocytic leukemia; DVT, deep vein thrombosis; FV, factor V; MG, myasthenia gravis; AchR, acetylcholine receptor; GPIIb/IIIa, platelet integrin receptor; GFRK, growth factor receptor kinase; MOA, mechanism of action; AE, adverse effects.
overall and identification of the areas of weaknesses in each of the biomarkers needed along the compound (biological) progression path. Figure 3 illustrates the continuum of biomarkers research, validation, and implementation along the complete time line of drug discovery and development, including life-cycle management (phase IV) and new indication (phase V) when appropriate. Figure 3 illustrates the interfaces of translational medicine within the traditional drug discovery and development process, while Figure 4 represents the new model of “learn and confirm,” where biomarkers figure prominently in driving the “learn” paradigm. A program for which a STRONG score is established across all five biomarkers’ specifications provides confidence of the likelihood for success from the biological and medical perspectives and is likely to result in a more promising development outcome. Similarly, it would be prudent to voice concerns regarding programs that score WEAK, especially if low scores are assigned to target validation, pharmacodynamic, and in special cases, target-compound interaction (e.g., central nervous system target). This scoring system is complementary to other definitions of biomarkers based on certain needs. For example, surrogate biomarkers as defined by the U.S. Food and Drug Administration (FDA) are markers that can be used for drug registration in
BIOMARKERS: UTILITARIAN CLASSIFICATION
TV
CTI
PD
DM
PS/AD
C A N D I D A T E
PS/AD DM PD
367
L E A D
CTI TV
Experimental
PreDevelopment
Discovery
Dev Track
Phase 1
2
3
4
Figure 3 Building translational medicine via biomarker research. (See insert for color reproduction of the figure.)
Exp
PreDev
Discovery
Dev Track
Phase
Discovery
1
2
3
4
Confirm
L
CR & D Learn
C M
Early biomarkers team Strategy & initiatives
Validation & Implementation biomarkers team
Translational Medicine-Biomarkers Research
Figure 4 Translational medicine: biomarker implementation along the pipeline. Exp, exploratory phase; Pre-Dev, predevelopment track; CR&D, clinical research and development; LCM, life-cycle management.
368
TRANSLATIONAL MEDICINE
lieu of more definitive clinical outcome data. Surrogate biomarkers are few and difficult to establish (e.g., blood pressure and cholesterol; Figure 2).
PRINCIPLES OF TARGET SELECTION Two key guiding principles are essential in the early selection process of molecular targets: 1. Modulating the target carries the prospect of unequivocal medical benefit (efficacy) to patients beyond a standard of care. 2. Benefits can be garnered while maintaining a sufficient level of safety that can be realized within the attainable compound exposure. Such a mission is frequently unachievable, and hence establishing an acceptable therapeutic index is the practical goal for most drug development schemes. Commonly, a therapeutic index is established by calculation of the ratio of the maximum tolerated dose (MTD) and the minimum effective dose (MED) in animal efficacy and safety studies. In this light, targets selected for drug development can be classified with respect to risk assessment based on the following categories (Figure 2): Class A. The target is only present and contributing to the disease process. Class B. The target is present physiologically but in a nonactive form, but then is activated and contributes to the disease. Class C. The target functions physiologically but in an augmented, uncontrolled fashion that contributes to the disease. Class D. The target functions in normal states and indiscriminately in disease (e.g., no difference in target expression, functions, or distribution can be identified in disease as compared to the normal physiological state). Class A: Disease-Specific Target A disease-specific molecular target should be a molecule that operates only in the disease state and does not participate in physiological (normal) functions. Drug interaction with such targets should provide efficacy with the lowest chance for mechanism-based adverse effects when manipulated by drugs. Examples of such targets are genetic disorders, which result in either overactivity or loss of activity of the target. Such is the case in chronic myelogenous leukemia (CML), which results from aberrant recombination of DNA from chromosome 22 into chromosome 9 (Philadelphia chromosome), fusing the Bcr and Abl genes into an overacting tyrosine kinase, which drives oncogenic transformation. To cure the disease, potent and selective inhibitors of
PRINCIPLES OF TARGET SELECTION
369
this aberrant kinase had to be discovered, a task that took over a decade to accomplish [8]. Such targets have the potential for a high safety profile. It is however, important to note that this example may not necessarily represent the ultimate approach for this disease since the activity of the kinase (Bcr/Abl) is driven by the Abl kinase catalytic site, which is preserved in its physiological format. Thus, inhibitors of this target/kinase by drugs such as Gleevec may still carry the potential for interference in tissue and cells in which the Abl kinase is physiologically active. Another example that is applicable to this category is a disease such as myasthenia gravis, where specific antibodies that block the acetylcholine receptors cause progressive muscle weakness. Specific neutralizing agents to these antibodies are likely to provide high efficacy in treating the disease, with the likelihood of fewer adverse effects [9] since such antibodies are not physiologically present in human beings. These examples are typical for type 1 class A target validation. The biomarkers that need to be established for this category should focus on validating the specificity of the target to the disease state. Class B: Target Present Physiologically in a Nonactive Form but Is Activated and Contributes to the Disease This class of targets has little or no discernible physiological activity in normal states, yet in certain pathophysiological situations, the target is presented, activated, and plays a role in a pathophysiological event. Example of such targets in the type 1 class B category is the P-selectin adhesion molecule. This adhesion molecule is normally cryptic within platelets and endothelial cells. Upon activation of these cells, P-selectin is presented on the surface of the cell and mediates adhesion interaction with its ligand, a mechanism believed to play a role in thrombosis and inflammation. Inhibitors of P-selectin binding to its ligand, the P-selectin glycoprotein ligand (PSGL-1), are expected to provide clinical benefit with a lower likelihood of adverse events. To validate this situation, biomarkers that confirm the preferential role of the activated target in a pathophysiological process while maintaining little physiological function are essential. However, one must be aware of potentially serious limitations to this approach where cryptic targets in the physiological state that are activated in pathological conditions and where inhibition of the target may not only provide for significant therapeutic benefit, but where inhibition of such a target may also expose the patient to some other risk, such as loss of host defense from injury. Such is the case of the platelet adhesion integrin molecule, GPIIb/IIIa, which serves as the final common pathway for platelet aggregation. Interfering with activated GPIIb/IIIa binding to its ligand (e.g., fibrinogen) provides effective and often lifesaving therapy in patients with acute risk for thrombosis; however, chronic treatment with GPIIB/IIIa antagonists have not been particularly effective in providing benefits, due to the
370
TRANSLATIONAL MEDICINE
relatively high frequency of significant adverse effects due to bleeding since platelet adhesion to matrix protein is essential to seal bleeding sites in trauma and disease conditions. Thus biomarkers for this class must establish the full physiological significance of the target in order to assess the therapeutic index of tolerability (benefits as well as risks). Class C: Target Functions Physiologically but in an Augmented, Uncontrolled Fashion That Contributes to the Disease This class of targets includes molecules that play an active role in normal physiological processes, some of which may be critical to health. Such is the neurotransmitter glutamate in the central nervous system, which is essential to cognition, memory, thought processes, and state of arousal. However, in ischemic stroke or asphyxia, glutamate release is uncontrolled and reaches to high levels over prolonged periods that are believed to be neurotoxic and likely contribute to neuronal death following a stroke. Inhibitors of glutamate release or antagonists of its action at various receptors are believed to carry the potential for effective treatment for stroke provided that the inhibition of excess release of the neurotransmitter can be achieved in a timely manner and only to an extent that preserves the physiological need of this transmitter, and over short periods (only that limited period where excess glutamate is neurotoxic). Such targets may be pharmaceutically exploitable when their manipulation is tuned carefully to the pathophysiological context. Another example of a target in this category includes the human growth factor receptor kinase (hGFRK) inhibitor, Herceptin, which in certain cancers (e.g., breast cancer) is constitutively activated and participates in the oncogenic drive. Inhibition of the hGFRK, while clearly of therapeutic value in breast cancer, has also been associated with heart failure due to the physiological role of GFRK in the cardiac myocyte survival signaling pathway [10]. Thus, the biomarker challenge in modulation of class C targets of this nature is in identifying biomarkers that assess the needed “titration” for inhibition of the target activity only to an extent that is necessary for the maintenance of normal physiological function. Class D: Target Maintains Physiological Functions in Normal and Disease States This class of targets encompasses the largest group of molecular targets exploited so far by modern drugs. Many members of this class have yielded highly beneficial therapies. This class consists of molecular targets that are known to have important physiological functions, which cannot be differentiated within a disease context; that is, the target is not different in its expression levels (gene, protein) or signaling pathway in normal and disease states. A priori, such targets harbor the greatest risk for mechanism-based adverse effects, as there is no apparent reason to expect that modulation of the target in the disease state will spare the normal physiological function of the target.
SUMMARY
371
Examples of such targets include the coagulation factor inhibitors (e.g., FIX, FXa, and thrombin), which are critical to maintain a physiological level of homeostasis; hence, inhibition of these targets carries inherent bleeding liabilities. Similarly, all current antiarrhythmic drugs (e.g., amiodarone, lidocaine, dofetilide), while effective in treating life-threatening arrhythmias, all carry significant liability for mechanism-based pro-arrhythmic effects and the potential for sudden death. The biomarker challenges for this class are defining the fine balance needed between efficacy in the disease context and expected safety limitations. Biomarkers that define the acceptable therapeutic index are key to the successful utility of drugs that modulate such targets. However, targets in this class do not necessarily exhibit a narrow safety margin for clinically meaningful adverse effects. Significant examples are the L-type Ca2+ channel blockers. The L-type Ca2+ channel is an essential conduit of Ca2+ needed for “beat by beat” Ca2+ fluxes that secure precise rhythm and contractility of the heart, skeletal muscle, neuronal excitability, and hormone and neurotransmitter release. Yet L-type Ca2+ channel blockers are important and sufficiently safe drugs that are used to treat hypertension, angina, and cardiac arrhythmias with undisputable medical benefits. However, inherent to the L-type Ca2+ channel blockers in this class of targets are mechanism-based adverse effects associated with rhythm disturbances, hypotension, edema, and other liabilities. Probably the best example for system specificity of physiological targets that provide major medical benefits with a high safety margin is the renin– angiotensin–aldosterone (RAAS) system. The RAAS is an important blood pressure, blood volume, and blood flow regulatory system, yet its manipulation by several different pharmacological agents (rennin inhibitors, angiotensin I converting enzyme inhibitors, angiotensin II receptors antagonists) has yielded highly beneficial drugs that reduce risk of morbidity and mortality from hypertension, heart failure, and renal failure, despite the fact that the system does not demonstrate significant operational selectivity between normal and disease states (especially hypertension). However, mechanismbased hypotension and electrolyte disturbances can limit the therapeutic benefit of these drugs, and elicit significant adverse effects when the RAAS is excessively inhibited [11]. The biomarker challenge for these targets is to define the relative or preferential role of the target in its various physiological activities, where minor manipulation in one organ might provide sufficient therapeutic potential while providing a low likelihood for adverse effects that result from more substantial inhibition of the same target in other organs.
SUMMARY The analysis and classification offered in this chapter regarding biomarkers in drug discovery and development aim to highlight the need for careful study and analysis of the significance of the target selected for therapeutic intervention as the first crossroad for success or failure in the development of effective
372
TRANSLATIONAL MEDICINE
and safe drugs [12]. The analysis and utility of biomarkers along the process of drug discovery and development have become an integral part of the “learn and confirm” paradigm of drug discovery and development in leading pharmaceutical organizations such as Wyeth Research. Such analyses are useful to guide the “learn phase” in search for biomarkers that can better assess the benefits and risks associated with manipulation of the molecular target. The scope of this chapter does not allow for a detailed review of the “learn and confirm” paradigm, for which the readers are directed elsewhere [13,14]. Various technological and strategic activities are needed to establish the biomarker strategies for the various targets described. The need to address these issues via biomarker research, validation, and implementation beginning at the very early stages of the drug discovery and development process is emphasized. In the pharmaceutical setting, it means beginning efforts to identify biomarkers for all five categories listed above. Such efforts could begin even before a tractable compound (biological) is in hand, a time where target validation is a clear focus of the program. As a compound becomes available, compound–target interaction, pharmacodynamic (efficacy and safety) biomarkers, and strategies for patient selection and adaptive design needs must be explored. At the onset of the first-in-human studies, all strategies, plans, and biomarker research should be as well worked out as possible. We believe that fundamental changes in the structure, function, and interfaces of pharmaceutical R&D are urgently needed to provide a key role for translational medicine and biomarkers research toward more successful discovery and development of innovative medicines. REFERENCES 1. Biomarker Definition Working Group (2001). Biomarkers and surrogate biomarkers endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther, 69:89–95. 2. Trusheim R, Berndt ER, Douglas FL (2007). Stratified medicine: strategic and economic implications of combining drugs and clinical biomarkers. Nat Rev Drug Discov, 6:287–293. 3. Feuerstein GZ, Rutkowski JL, Walsh FL, Stiles GL, Ruffolo RR Jr (2007). The role of translational medicine and biomarkers research in drug discovery and development. Am Drug Discov, 2:23–28. 4. FDA (2004). Challenge and opportunity on the critical path to new medical products. http:www.fda.gov/oc/initiatives/criticalpath/whitepaper.pdf. 5. Berry DA (2006). Bayesian clinical trials. Nat Drug Discov Rev, 5:27–36. 6. Gallo P, Chuang-Stein C, Dragalin V, Gaydos B, Krams M, Pinheiro J (2006). Adaptive design in clinical drug development: an executive summary of the PhRMA working group. J Biopharm Stat, 16:275–283. 7. Krams M, Lees KR, Hacke W, Grieve AP, Orgogozo J-M, Ford GA (2003). Acute stroke therapy by inhibition of neutrophils (ASTIN): an adaptive dose–response study of UK-279,276 in acute ischemic stroke. Stroke, 34:2543–2548.
REFERENCES
373
8. Kurzrock R (2007). Studies in target-based treatment. Mol Cancer Ther, 6(9):2385. 9. Hampton T (2007). Trials assess myasthenia gravis therapies. JAMA, 298(1):29–30. 10. Chien KR (2006). Herceptin and the heart: a molecular modifier of cardiac failure. N Engl J Med, 354:789–790. 11. Hershey J, Steiner B, Fischli W, Feuerstein GZ (2005). Renin inhibitors: an antihypertensive strategy on the verge of reality. Drug Dev Today, 2:181–185. 12. Simmons D (2006). What makes a good anti-inflammatory drug target? Drug Discov Dev, 5–6:210–219. 13. Gombar C, Loh E (2007). Learn and confirm. Drug Discov Dev, 10:22–27. 14. Sheiner LB (1997). Learning versus confirming in clinical drug development. Clin Pharmacol Ther, 61:275–291.
19 CLINICAL VALIDATION AND BIOMARKER TRANSLATION David Lin, B.MLSc. University of British Columbia, Vancouver, British Columbia, Canada
Andreas Scherer, Ph.D. Spheromics, Kontiolahti, Finland
Raymond Ng, Ph.D. University of British Columbia, Vancouver, British Columbia, Canada
Robert Balshaw, Ph.D., and Shawna Flynn, B.Sc. Syreon Corporation, Vancouver, British Columbia, Canada
Paul Keown, M.D., D.Sc., MBA, Robert McMaster, D.Phil., and Bruce McManus, M.D., Ph.D. University of British Columbia, Vancouver, British Columbia, Canada
INTRODUCTION Throughout history, biological and pathogenic processes have been measured to monitor health and disease. The presence of “sweetness” in urine and blood was recognized thousands of years ago as an indication of the disorder now known as diabetes. By recent example, the characterization of infections has been achieved by performing cultures for microorganisms, both for identification and to establish sensitivities to antibiotics. Any such measures, however variously quantitated, were forerunners of what are now popularly referred Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
375
376
CLINICAL VALIDATION AND BIOMARKER TRANSLATION
to as biomarkers. When clinical laboratories became well established after the middle of the twentieth century, many components of body fluids and tissues were assayed. These are all biomarkers of the physiological state of being. Modern biomarkers being discovered or sought are pursued and measured based on the same principles as the first, most rudimentary of disease indicators. Hundreds of biomarkers are now used in modern medicine for the prediction, diagnosis, and prognostication about disease, as well as for monitoring preventive or therapeutic interventions. A sizable challenge now lies in selecting novel biomarkers that are most appropriate for clinical use in the face of many new candidates being suggested by various investigators. Biomarkers are often thought of as strictly molecular in nature, but in fact they include a vast range of objectively measurable features. Any characteristic that reflects a normal biological process, pathogenic process, or a pharmacologic response to an intervention can potentially become a clinically useful biomarker [1]. Thus, there are many different types of biomarkers, including molecular, physiological, or structural features of biological systems. However, how a specific measure reaches the status of customary has not been so clearly established. The focus on using biomarkers in clinical decision making and diagnosis has expanded, and similarly, biomarkers are playing an ever-increasing role in drug development processes and for regulatory decision making. As drug development costs continue to rise, biomarkers have become increasingly important, as they are a potential means to decrease the time for development, costs, and late-phase attrition rates in the regulatory approval process for new drugs: • Reduce clinical trial costs: • Decrease late-phase attrition rates. • Replace time-consuming clinical endpoints. • Reduce required clinical trial sample size by way of patient stratification: • Identify the population most likely to benefit. • Identify the population with a high level of risk for events of interest. • Provide more robust assays than some conventional clinical endpoints. • Help improve models for calculating return on investment. For example, cost and time can be reduced when biomarkers help to segment patient groups early in trials, and some new markers may be more timely than many of the more traditional clinical endpoints or outcomes used currently in assessing clinical trials, such as patient survival [2,3]. Biomarkers are also beneficial tools for selecting the best candidates for a trial or to increase safety through more effective drug monitoring. Ultimately, by reducing the required time investment to demonstrate a drug’s safety and efficacy,
BIOMARKER DISCOVERY, DEVELOPMENT, AND TRANSLATION
377
biomarkers may greatly reduce the costs and risks of performing a clinical trial.
BIOMARKER DISCOVERY, DEVELOPMENT, AND TRANSLATION Biomarker discovery can be performed using animal models, but it is now commonly carried out in humans from the very beginning stages of biomarker development. Biomarkers analyzed in preclinical animal studies are eventually transferred to the human, clinical settings. In a controlled laboratory environment the conditions are relatively constant, and “subjects” (i.e., the animals) are homogeneous and free of complicating co-morbidities. In these laboratory settings there are generally more options regarding the assays available, and the possibility of frequent and repeated testing in individual animals or groups of similar or identical animals allows for changes in the measured matrix of analytes to be detected with high precision and sensitivity. Ideally, a candidate biomarker discovered in this fashion would then be transferred into the clinical environment and evaluated further on human samples. The drawback of this approach is that many of the biomarkers discovered in animal models cannot be translated for use in humans. Animal models often do not accurately reflect human biology [4]. Further, the biomarker candidate may not achieve acceptable performance standards in heterogeneous patients with age, gender, and racial differences. This is why it has been suggested that pilot studies of biomarkers be conducted in humans first, in early phase II clinical trials that incorporate the variability in various ambient influences, and then that the marker be validated in preclinical studies, and in a later stage, clinical trials in parallel. This approach expedites the biomarker development process and minimizes the attrition rate of biomarker candidates since they are developed from the beginning under human clinical conditions. In patients, biomarker candidate discovery is performed preliminarily in an internal primary cohort and is confirmed subsequently in an external secondary cohort. These two discovery phases are typically performed in observational or retrospective cohorts, and less and less often in preclinical studies. Some companies are avoiding animal models for discovery all together to avoid spending time and money on animal-based biomarkers that can often lead to a dead end. It is not always practical to pursue validation of biomarker candidates identified in the discovery process. It is important to establish parameters for rational, statistically sound, and evidence-based selection and rejection of candidate biomarkers [5–7]. The decision to continue development of a biomarker candidate is largely based on its potential to contribute cost-effectively to disease management [8]. Biomarkers used in the early phases of clinical development may be useful in providing more timely proof of concept or dose-range information than a real clinical endpoint [9]. Biomarker development should also be driven by clinical need [8]. A clinically useful biomarker
378
CLINICAL VALIDATION AND BIOMARKER TRANSLATION
must favorably affect clinical outcomes such as decreasing toxicity or increasing survival [10]. Once cost-benefit ratios are evident and more than one institution has consistently confirmed a biomarker’s ability to perform at the requisite levels of sensitivity and specificity, a biomarker is ready for prospective testing in clinical trials [8]. The U.S. Food and Drug Administration (FDA) pharmacogenomic submission guidelines recommend that transferring genomic biomarker sets from microarrays to other platforms (such as quantitative real-time polymerase chain reaction) be attempted only once it has been demonstrated that the differential expression of such genes is sensitive, specific, and reproducible. Analytical quality assurance is performed continually throughout biomarker and drug development for most biomarkers; however, if a biomarker is to be used as an endpoint in a phase III efficacy study, it should be validated beforehand [11]. Assay and analytical specificity and sensitivity should be established and validated prior to clinical phases such that clinical qualification can be carried out using proven, analytically robust methods [12]. However, as biomarker and drug development are intertwined processes, they may often occur concurrently throughout the different stages of clinical trials.
BIOMARKER VALIDITY AND VALIDATION: THE REGULATORY PERSPECTIVE The FDA’s Guidance for Industry: Pharmacogenomic Data Submission, published in 2005, has helped introduce and define exploratory and valid biomarkers. The FDA defines a valid pharmacogenomic biomarker as one that is measured in an analytical test system with well-established performance characteristics and for which there is an established scientific framework or body of evidence that elucidates the toxicological, pharmacological, or clinical significance of the test result. The FDA further classifies valid biomarkers as “probable” and “known” in terms of the level of confidence that they attain through the validation process. Probable valid biomarkers may not yet be widely accepted or validated externally but appear to have predictive value for clinical outcomes, whereas known valid status is achieved by those that have been accepted in the breadth of the scientific community. It is important to realize that the different classes of biomarkers reflect their levels of confidence (Figure 1). This can be thought of in a hierarchical manner, with exploratory biomarkers being potential precursors of clinically useful (probable or known) valid biomarkers [13]. Integrating biomarkers into clinical trials for eventual clinical use by identifying the best or most valid biomarker candidates is not a clearcut process. The term validity, particularly in the field of biomarkers research, is a broad concept that has been used to describe everything from the analytical methods to the characteristics of the biomarkers identified [14]. Validity is also used across multiple industries, not only medical or health disciplines. Therefore,
379
THE REGULATORY PERSPECTIVE
se
Str atif ie & B d Use ibli ogr of Bio aph info ica r l To matic al ols
Re Th fine e m Ef rap ent fic eu of ac tic y
Do si n Sa (Su g & fe rr Ef ty og fi & at cac Ef e) Fi y fic t fo ac rP y
ur po
Lo ng W Pati -term ell en -b t ein g
rd s
Continued Surveillance (Phase IV) Large Clinical Trial (Phase III) Clinical Trial (Phase I, II)
r Su
Gr
c ga ro
External Validation Different (2nd) Cohorts, Research Team & Technique Internal Validation Initial (1st) Cohorts, Different Techniques & Platforms
Phased Strategies of Validation
ea t
er
y
n ow Kn
d ifie ls e trat & S & Mo d ign s De s c h e ds, pproa tho Me tical A tis Sta
T
ds ar w o
ed uc d Re
ce en u eq ns Co
To wa
k is /R
Va
Se ns i
tiv
ity
lid
le ab ob r P
C
l Va
/S
pe ci fi c
id
i ty
rs – y ke or ar t a m or i o pl B ce Ex date en i d d i an nf
om Bi
rC ke r a
o
Figure 1 Biomarker validation. Biomarker development and validation are driven by intended use or fit-for-purpose (FFP). The principle of FFP validation is that biomarkers with false-positive or false-negative indications pertaining to high patient consequences and risks necessitate many phases of validation. There are five phases of validation that can be executed with the stratified use of various bioinformatical and bibliographical tools as well as different designs, statistical approaches, and modeling: (1) internal validation, (2) external validation, (3) clinical trials (phases I and II; checking for safety and efficacy), (4) large clinical trials (phase III), and (5) continued surveillance. Sensitivity and specificity correlate with the intended purpose of the biomarker, and the level of confidence that a biomarker achieves depends on the phase of validation that has been reached. In ideal cases biomarkers reach surrogate endpoint status and can be used to substitute for a clinical endpoint. This designation requires agreement with regulatory authorities, as the consequences of an ambiguous surrogate endpoint are high.
when referring to biomarkers, validation is sometimes termed qualification for clarity. Biomarker qualification has been defined as a graded fit-for-purpose evidentiary process linking a biomarker with biology and clinical endpoints [15,16]. Traditionally, the validity of clinical biomarkers has become established in a typically lengthy process through consensus and test of time [17]. Now more than ever, clear guidelines for validation are needed, as technological advances have drastically increased biomarker discovery rates. With the recent explosion of “omics” technologies and advancements in the fields of genomics, proteomics, and metabolomics, high-throughput biomarker discovery strategies are now widely used. This has created some unforeseen issues. Biomarker
380
CLINICAL VALIDATION AND BIOMARKER TRANSLATION
Biomarker accepted or rejected for suggested use
Qualification study results reviewed
Qualification study strategy assessed
Biomarker context assessed and available data submitted in voluntary data submission
Biomarker qualification review team recruited (clinical and nonclinical)
Submit request to qualify biomarker for specific use
Figure 2
FDA biomarker qualification pilot process. (Adapted from ref. 17.)
candidate discovery now commonly outruns the rate at which the candidates are being validated, creating a bottleneck in biomarker (assay) development [18,19]. Great efforts are now being undertaken to accelerate the acceptance of biomarkers from exploratory to valid, as the goal of many research teams and drug companies is to streamline the translation of biomarker from basic science and discovery to clinical use [12]. Despite the definitions provided by the FDA and the availability of FDA’s Guidance for Industry: Bioanalytical Method Validation in 2001, there is still a lack of sufficient regulatory guidance for biomarker validation. The FDA has designed a qualification process map that sets the foundational framework toward establishing validation guidelines (Figure 2). This pilot structure, to start qualification processes for biomarkers in drug development, is designed around various FDA centers whereby the context and qualification of new biomarkers is assessed. Ultimately they are rejected or accepted for suggested use relative to current biomarkers. This may not be ideal, as it may be problematic to establish new biomarkers accurately based on the current biomarkers, which are themselves often imperfect relative to a specific endpoint [17]. Nonetheless, this pilot framework will eventually enable more detailed biomarker translation models, which address some of the remaining issues with the current guidelines, to be developed.
FIT-FOR-PURPOSE STRATEGY
381
There remains a lack of specific guidelines on which validation process(es) are recommended or expected in order to transition effectively from exploratory to valid biomarkers, or from probable valid biomarkers into known valid biomarkers [13,20]. Confusion still exists with regard to analyses or experiments that need to be performed and data that are both appropriate and sufficient for biomarker (assay) validation [20]. The confusion and inconsistency in the validation process are contributed to partially by the diverse nature of biomarker research [3,20]. Considering the large variety of novel biomarkers, their applications, and associated analytical methods, it is unlikely that FDA regulations or other available guidelines will easily be able to address validation issues associated with all possible research objectives [16,20]. Thus, it is incredibly difficult to establish, let alone use, a specific detailed universal validation guideline [20]. FIT-FOR-PURPOSE STRATEGY Despite the lack of universal guidelines or agreement on the specific requirements for biomarker assay development and validation, there is a general consensus in the biomarker research community that the foundation of validation efforts is to ensure that the biomarker(s), or the assay, is “reliable for its intended use” [20]. This principle is now commonly referred to as the fit-forpurpose validation strategy or guideline. This approach embraces the notion that depending on the intended purpose of the biomarker, the objectives and processes or types of validation will probably be different [21,22]. Dose selection, early efficacy assessment or candidate selection, and surrogacy development are all examples of biomarker clinical purposes, each of which have differing levels of validation requirement. The risk and consequence may be different depending on the purpose, even when the same biomarker is used [4]. Further, the degree of stringency and phase of the validation, both of which are discussed later in the chapter, should be commensurate with the intended application of the validation data (Figure 1) [22]. In that sense, the term validity should be thought of as a continuum—an evaluation of the degree of validity—rather than an all-or-none state [14,23]. Therefore, biomarker utility is not measured dichotomously and should not be classified as simply good or bad. The level of a biomarker’s worth is gauged on a continuous scale, with some being much more valuable than others, depending on what they indicate and how they can be applied [24]. Validation of biomarkers and establishment of their “worth” or “value” is a continually evolving and often iterative process (Figure 1). Application of the Fit-for-Purpose Strategy The fit-for-purpose (FFP) strategy is a fluid concept that can be applied to any clinical trial to validate biomarkers of interest. As described earlier, the classification and validation of a biomarker is context-specific, and the validation
382
CLINICAL VALIDATION AND BIOMARKER TRANSLATION
criteria required are dependent on the intended use of the biomarker. The concept of fit-for-purpose validation is sensible, as biomarkers may be used to include or exclude patients from treatment and to determine dosing, toxicity, or safety. The consequence or risk of a false-negative or false-positive biomarker indication must be considered since even the most sensitive and specific assays are not perfect. A biomarker intended as a population screen must be very sensitive and extremely specific [25]. Any tests that are to be used in isolation for decision making require particularly stringent validation, whereas drug efficacy biomarkers that are typically used in groups or involve subsequent testing may require less stringency since the consequence of a false indication of efficacy is lower [26]. Biomarker Validation Process Utilizing the FFP strategy, various questions can be asked at the beginning of any biomarker research and development (R&D) project: For what reason are the biomarker(s) being identified? Is the clinical validation going to validate the biomarker for specificity, sensitivity, and reproducibility for diagnostic purposes, or to serve as a surrogate endpoint? What business-critical decisions will be made based on the biomarker data? These questions not only help determine the level of confidence required for the biomarker, but also help strategize regarding the phases of validation. For example, in the case of developing surrogate biomarkers to replace clinical endpoints, candidate biomarkers would, theoretically, evolve over time toward the point of surrogacy as the research project moves through different phases of validation. For the purpose of this chapter, the validation process from initial biomarker development to postmarket surveillance has been broken down into five major phased strategies (Figure 1). It is important to note that the overall process is a continuous loop driven by the intended purpose of the biomarker data, but the flow of the overall process may be subjected to change depending on the results generated at each phase [22]. In general, the process of biomarker validation can be described as a multifaceted process that includes determining sensitivity, specificity, and reproducibility of the assay and also clinical sensitivity and specificity [27]. This applies to both methodologic and clinical validation. Method validation pertains to the process by which the assay and its performance characteristics are assessed. This assessment is performed throughout biomarker translation and is based on several fundamental parameters: accuracy, precision, selectivity, sensitivity, reproducibility, and stability [16,22]. Method validation is not discussed in detail in this chapter.
PRECLINICAL AND CLINICAL VALIDATION PHASES Clinical validation is the documented process demonstrating the evidentiary link of a biomarker with a biological process and clinical endpoints [15,16].
PRECLINICAL AND CLINICAL VALIDATION PHASES
383
Continued Surveillence (Phase IV)
Large-Scale Clinical Trials (Phase III) Initially/Preliminarily Validated Biomarkers (“Probable Valid”)
Safety & Efficacy Clinical Trials (Phase I-II)
Analytical Quality Assurance (SOP Driven)
Prospective
Valid Biomarkers Adopted for Clinical Use (“Known Valid”)
Observational/ Retrospective
Best (FFP) Biomarker Candidates
Biomarker Discovery (Secondary Cohorts-External) Biomarker Candidates
Preliminary Biomarker Discovery (Primary Cohorts-Internal)
Figure 3 Biomarker development and translation. Biomarker translation from discovery to clinical use involves five general steps or phases. Biomarker candidates are initially discovered in a primary (internal) cohort and confirmed in a secondary (external) cohort through clinical observations. Biomarker candidates that have the best fit-for-purpose and also satisfy a clinical need then enter prospective phase I and II clinical trials. Once a biomarker is used in these early clinical trials to demonstrate safety and efficacy, they may be considered “initially validated biomarkers.” Following large-scale clinical trials (phase III), and once biomarkers have been used for decision making, they may be considered valid and may be adopted for clinical use. Biomarker assessment continues in postmarket phase IV trials. SOP-driven analytical quality assurance is performed throughout all processes of biomarker translation.
The validation process whereby biomarkers are translated from discovery to clinical use should be customized according to biomarker type, use, variability, and prevalence. However, the general process for validating any biomarker is the same. Before any biomarker can be applied clinically, it is subjected to analytical/method validation and also clinical validation. Biomarker validation can be performed in five general translational steps: preliminary biomarker discovery, biomarker discovery, safety and efficacy clinical trials, large-scale clinical trials, and continued surveillance (Figure 3). Clinical biomarkers are validated in retrospective or prospective analyses and biomarker trials, or drug trials [12]. The validation process should reflect the clinical performance of the biomarker(s), based on existing clinical data, new clinical data, literature review findings, or current clinical knowledge [12]. Moreover, it should be an evidentiary and statistical process that aims to link
384
CLINICAL VALIDATION AND BIOMARKER TRANSLATION
a biomarker to specific biological, pathological, pharmacological (i.e., drug effect), or clinical endpoints [22]. Internal Validation During the biomarker discovery phase, the main focus is to identify biomarkers that distinguish between the treatment and control groups or correlate with the clinical observation of interest. Prior to this process, and depending on the sample size, plans can be made to allocate the subjects into two individual cohorts for the purpose of internal validation. It is important, however, to distinguish this allocation/splitting process from that used for the purpose of internal validation of classifiers: • Internal validation of candidate biomarkers identified in the discovery cohort • Use of different platforms • Use of different statistical methods • Internal validation of classifiers • Split-sample method • Some form of cross-validation method There are several different approaches to separate the initial pool of patients or samples for internal validation and creating classifiers; the traditional and alternative approaches are outlined here. Traditionally, a discovery and a validation cohort are created (Figure 4). Genomic biomarker candidates may first be identified in the discovery cohort using microarray analysis, for example. One way of identifying a panel of biomarkers from the candidates is by use of classification methods. The samples from the discovery data set can be split into a training set, used to find a classifier, and a test set, used for internal validation of classifiers. The classifier is an equation combining the expression values of some of the candidate markers that best distinguish the treatment from the control group. Once the classifier has been developed, the panel of biomarkers may be validated again in the validation cohort before the external validation phase is carried out. In this sense, the data obtained in the validation cohort serves purely as an additional test set. Although the traditional internal validation model is simple and logical, it may not be the most applicable strategy in the real world, given the complexity of most biomarker research today. There are two potential weaknesses to this approach. First, separating available samples into discovery and validation cohorts from the outset might unintentionally restrict the use of the samples or data to their arbitrary labels. In reality, during the developmental phase of biomarker research, different statistical analyses are often carried out to identify potentially useful biomarkers. Depending on the types of comparisons being made, a sample (or portion of data) could be used for discovery pur-
PRECLINICAL AND CLINICAL VALIDATION PHASES
385
Developmental/ Discovery Phase
“Discovery” cohort
Training Set
Testing Set
“Validation” cohort
Another Testing Set
External Validation
Figure 4 Traditional internal validation approach. Traditional approaches to internal validation rely on two cohorts: discovery and validation. The discovery cohort is typically separated into training and test sets for the development of classifiers based on biomarker(s) of interest. The classifiers are then tested in a separate cohort internally before external validation.
poses in one analysis while it is used for validation in another. Second, in the case where the classifier fails to show acceptable accuracy in the initial discovery cohort testing set, one might consider incorporating additional data and reanalyzing the training set. However, collecting additional patient samples or information may not always be possible; this type of situation may warrant the reallocation of data from the validation to the discovery cohort training set, in order to redevelop the classifier. An alternative model of internal validation may be particularly useful for smaller sample sizes (Figure 5). Similar to the traditional approach, the subjects are divided into two separate groups. However, neither cohort is marked strictly as discovery or validation, to minimize potential confusion in later analyses and maximize the utility of available data. As an example, a classifier can be generated and internally validated by creating training and testing sets from cohort 1. The same classifier can then be validated again in a separate cohort (cohort 2). Based on the outcome, the R&D team may decide that the classifier is ready for external validation or that a more robust classifier needs to be developed. In the latter case, a new classifier can be created by utilizing cohort 2 as the discovery cohort. Once the new classifier has been validated internally in cohort 2, it can be evaluated again in cohort 1, which is used
386
CLINICAL VALIDATION AND BIOMARKER TRANSLATION
Test Classifier 1
Cohort 2
Cohort 2
Test Classifier 2
Cohort 1 + 2 Classifier 3
External Validation 1) Obtain samples from an external collaborator or download publicly available data; test classifier using process and methods (SOPs) on-site 2) Send cohort 3 (separate set of samples) to an external site; analyzed using collaborator’s process and methods (SOPs) 3) Provide classifier to the collaborator; test classifier using true independent cohort with collaborator’s SOPs at an external site
Figure 5 Alternative internal validation approach, which may be particularly useful for smaller sample sizes. Like the traditional approach, the subjects are divided into two separate cohorts. However, in this model, a classifier can be developed from either cohort, or whenever necessary, from two cohorts combined to improve the robustness of the classifier.
entirely as another testing set. Finally, the same decision process is repeated: Is the classifier robust enough to stand up to the scrutiny of external validation? Are additional data required to create a new classifier? In the latter situation, it may be necessary to combine cohorts 1 and 2 to develop larger training and test sets for internal validation. This may, in turn, help create a new classifier that is potentially more robust and applicable to a larger intended patient population. The main advantage of this model is that it provides many possible internal validation approaches, particularly in a project where sample size is small. This gives flexibility to the overall validation process and allows decisions to be made based on the results generated at each step. During the development and testing of a biomarker panel, concurrent literature and technical validations may also take place. Depending on the characteristics of the biomarker, various technology and analytical techniques may be applied. For example, quantitative polymerase chain reaction (qPCR) may be used to validate the microarray data generated from cohort 1. Once the initial findings have been confirmed, such as the differential expression of particular genomic biomarkers, qPCR can also be used to test the candidate biomarkers in cohort 2.
PRECLINICAL AND CLINICAL VALIDATION PHASES
387
There are a number of potential advantages to cross-platform or crosstechnology validations. Relative to high-throughput technologies such as microarray chips, which are typically used for identifying genomic biomarker candidates, the use of low-throughput techniques such as qPCR may help reduce the cost of the validation process. Also, by applying different platforms and analytical methods, biomarkers that are found to be statistically significant across the various cohorts are less likely to be related to or influenced by platform-specific bias. More recently, studies have also suggested that the use of combined data from multiple platforms (i.e., genomics and proteomics) to assess potential biomarkers is far superior to those generated with one technical approach alone [21]. The internal validation strategy will be largely dependent on the characteristics of the biomarkers (i.e., genomic, proteomic, metabolomic), the intended use (i.e., prognostic or diagnostic), and the sample size (i.e., traditional or alternative model). Nonetheless, the general recommendation is that the preliminary or developmental studies should be large enough so that either split-sample validation or some form of cross-validation can be performed to demonstrate the robustness of the internally validated prediction [28,29]. External Validation The use of internal discovery and validation cohorts has helped studies to develop biomarker panels with impressive accuracy and precision for the predicted outcome [28]. However, internal validation should not be confused with external validation, which is typically performed on a cohort from a different geographical location and is meant to simulate a broader clinical application [28]. Internal validation, even with the use of “independent” cohorts, does not guarantee “generalizability.” In principle, the aim of external validation is to verify the accuracy and reproducibility of a classifier in a truly independent cohort with similar underlying condition, given a defined clinical context [30]. External validation is a crucial and essential step before a classifier is implemented in a larger clinical setting for patient management [30]. Like internal validation, the design of external validation processes will depend on the intended use of the biomarker. For predictive biomarkers, prospective trials may be necessary, as they are considered the gold standard [28,30]. Moreover, it has been argued that a biomarker is more readily tested in a prospective clinical trial once retrospective studies conducted at external institutions have consistently shown the ability for the biomarker to perform at the required levels of sensitivity and specificity [8]. To accelerate the translational process from external validation to phase I and II clinical trials, partnerships and collaborations are often established. In some circumstances, classifiers may need to be tested in intended patients at a collaborators’ site or external institution in a prospective manner. In other cases, retrospective studies using patient samples may suffice. There are
388
CLINICAL VALIDATION AND BIOMARKER TRANSLATION
several important factors to consider when conducting an external validation, regardless of the directionality of the study (i.e., prospective or retrospective). Many of these factors are also critical in designing clinical trials.
USE OF STATISTICS AND BIOINFORMATICS Due to the complexity and multifaceted nature of biomarker research, it is not uncommon, especially in the recent years, to see different “omics” techniques, bioinformatics, and statistical methods incorporated into biomarker research and development [21]. Thus, there has been a large increase in the number of possible approaches to analytical and clinical validations. As illustrated in Figure 1, the use of bioinformatics and statistics to validate biomarkers can accompany any preclinical or clinical stage. More specifically, they are applied to a greater degree of freedom during the initial phases of biomarker development. Ideally, by the time a biomarker enters large clinical trials or is put on the market under continued surveillance, its robustness should already have been tested with the use of a variety of statistical and bioinformatical approaches. Regardless, these computational approaches, although not very stringent, are fundamentally useful tools for ensuring the validity of results prior to using some of the more time-consuming and costly validation methods (i.e., external validation or clinical trials). Statistical Approaches High-throughput “omics” technologies such as microarray-based expression profiling generate massive amounts of data. As such, new statistical methods are continually being developed to deal with this challenging issue. The availability of a plethora of statistical techniques, when used with proper precautions, has provided a relatively quick and inexpensive way to validate biomarker candidates. The trial-and-error process with different statistical methods is especially common during early stages of biomarker development. In the exploratory phase of a biomarker project, various computational and mathematical techniques, such as multivariate analysis or machine learning, are often utilized to detect differences between treatment and control groups or between patients with different clinical presentations [18]. Statistically distinctive genomic biomarkers identified during the exploratory phase by one method may subsequently be subjected to a different technique. Similarly, given the same set of samples and expression measurements, permutation can be carried out on the data set prior to repeating an analysis. Congruency between the results generated using different methods may ultimately translate to an increase in biomarker confidence. This is especially useful during the internal and external phases of validation, when greater statistical freedom is exercised for the purpose of identifying or ranking biomarkers. Ideally, by the time a panel of biomarkers is selected for use in
FROM EXTERNAL VALIDATION TO CLINICAL TRIALS
389
a clinical trial, specific algorithms and statistical approaches should be established. It has been suggested in a recent FDA presentation [12] that the expression patterns or algorithms should be developed and confirmed in at least two independent data sets (a training set and a test set, respectively) [12]. Bioinformatical and Bibliographical Approaches Potential biomarker candidates should be checked continuously using bioinformatical and bibliographical tools to provide biological and clinical context. This step should be performed in parallel to the stratified statistical approaches. Although bioinformatics originated from the field of genomics, it now plays an important role in connecting and integrating biological and clinical data from a variety of sources and platforms [21]. Biomarkers that fit accepted theory are more likely to be accepted readily by the research community and the public [14]. This is especially important when selecting and transitioning biomarkers from internal and external validation phases into clinical trials. Numerous bioinformatical tools, many of which are open-source, are available to accelerate this evidentiary process. Gene and protein biomarker candidates can be processed first and then grouped through gene ontology using tools such as FatiGO and AmiGO [31,32]. More sophisticated pathway-oriented programs such as Ingenuity and MetaCore are also potentially useful in assisting the R&D team to zone in on biomarkers belonging to pathways and processes relevant to the clinical endpoints of interest [33,34]. Another advantage of the bioinformatical approach is the ability to link clinical measurements across platforms and/or identify a unique set of molecular signatures. For example, during external validation on a separate cohort of patients, results might indicate that the biomarkers identified initially as statistically significant were individually unable to strongly predict the clinical endpoint. However, linking expression data across platforms (i.e., genomic and proteomic biomarkers) may help provide a more comprehensive understanding of the biology and establish a stronger correlation between the biomarker and the clinical presentation of the patient [21].
FROM EXTERNAL VALIDATION TO CLINICAL TRIALS: THE IMPORTANCE OF COHORT, TECHNICAL, AND COMPUTATIONAL FACTORS As biomarker candidates continue to pour out of research laboratories, it has become increasingly evident that validation is much more difficult and complex than discovery. There are a multitude of general and specific considerations and obstacles to address in order to validate biomarker candidates clinically, some of which were discussed earlier and some of which we discuss now.
390
CLINICAL VALIDATION AND BIOMARKER TRANSLATION
Translation of a candidate biomarker from a discovery phase into the validation phase of clinical trials imposes huge organizational, monetary, and technical hurdles that need to be considered at the onset. The team needs to meet requirements in sample number and feature distribution (cohort factors), sample quality, accuracy and precision in handling and processing the samples (technical factors), and in the analysis (computational factors). As we will see, each of these steps is extremely important and very challenging at the same time. Cohort Factors During any phases of external or clinical validation of biomarkers, bias can be introduced unintentionally. This is a major concern in the design, conduct, and interpretation of any biomarker study [35,36]. Starting with the population selection process, variations at a biologically level can lead to discernible differences in body fluid and tissue compositions and biomarker measurements [35]. As such, basic criteria such as gender, age, hormonal status, diet, race, disease history, and severity of the underlying condition are all potential sources of variability [35]. Moreover, patient cohort characteristics of the validation phase of a candidate biomarker must be representative of all patients for which the biomarker is developed. To reduce a bias requires the inclusion of hundreds of patients per treatment or disease arm. Not every patient is willing to give biosamples for a biomarker study, however, and only 30 to 50% of the patients in a clinical trial may have signed the informed consent for the biomarker analysis, presenting an important concern for statisticians. There is a risk for population bias, since a specific subset of patients may donate biosamples, hence skewing the feature distribution. Another risk factor that needs to be dealt with in some instances is the lack of motivation of clinical centers to continue a study and collect samples. Although it is sometimes possible to compensate for a smaller sample size through the use of different statistical methods, such as the use of local pooled error for microarray analysis, the analysis team needs to ensure that there is no patient selection bias by the center selection: Affluent centers and their patients may have different characteristics from those with other social backgrounds. The lack of available samples or patient enrolment may ultimately translate to a decrease in biomarker confidence or the generalizability of the intended biomarker. In reality, collaboration will probably make biomarker validation more robust and economically feasible than working independently [37]. Since the issue of intellectual property (IP) agreement is minimal for the biomarker validation process, as it is not patentable, open interactions among steering committees of large trials or cohort studies should be encouraged [37,38]. Technical Factors Sample Collection, Preparation, and Processing Even with the establishment of a relatively homogeneous cohort for external validation or
FROM EXTERNAL VALIDATION TO CLINICAL TRIALS
391
clinical trials, results from the intended biomarker assays are valid only if sample integrity is maintained and reproducible from sample collection through analysis [22]. Major components of assay and technical validation are: • • • • • • •
Reference materials Quality controls Quality assurance Accuracy Precision Sensitivity Specificity
Prior to the start of a validation phase, the team needs to decide on the sampling procedure, sample quality parameters, and sample processing. Reproducibility is the key to successful biomarker validation. It is important that standardized operating protocols (SOPs) for sample collection, processing, and storage be established to provide guidance for the centers and the laboratory teams. Nurses and technicians should be trained to minimize the variability in sample collection and handling. Most biomarkers are endogenous macromolecules which can be measured in human biological fluids or tissues [39]. The collection process for these specimens seems straightforward. However, depending on the type of biomarker (genomic, metabolomic, or proteomic) and the collection methods, various factors may need to be taken into account when designing the SOPs [22]. Several examples are given in Table 1. Processing the samples in a core facility reduces the risk of handling bias, since the same personnel would handle all samples in the most reproducible way possible. Once the samples are collected, systematic monitoring of their quality over time should also be established. Random tests can be conducted to ensure the short-term, benchtop, or long-term stability of the samples to uphold the integrity of the biolibrary [38].
TABLE 1
Sample Considerations
Collection (Biological Fluids or Tissues) Type of needle Type of collection tube or fixation Location of collection Time of collection Status of patient
Preparation and Processing
Storage
Dilution Plasma or serum isolation
Type of storage containers Temperature of storage
Temperature of processing Reagents used
Duration of storage
392
CLINICAL VALIDATION AND BIOMARKER TRANSLATION
Unfortunately, during external and clinical validations, it is likely that the independent collaborators utilize a completely different set of SOPs. Even though this could ultimately contribute to the robustness of the validation process, it may be useful to establish specific guidelines that would help minimize site-to-site variability. There are several ways to achieve this. Provision to the laboratories and centers of kits that contain all chemicals needed for the processing of the sample can potentially reduce the risk of batch effects. Similarly, as pointed out in Figure 5, it may be feasible to collect a separate set of samples internally (to minimize sample collection variability) but send them to a collaborator’s site for external validation using independent processing and analytical SOPs. It is of utmost importance that any deviation from the SOPs is noted. These features can then later be accommodated by statisticians for better modeling. Any available information on bias or deviation from protocols or batch processing is useful in the computational process. Excluding them may influence the decision as to whether a biomarker was or was not validated. Sustained Quality Assurance, Quality Control, and Validation of the Biomarker Tests or Assays The aforementioned cohort (i.e., patient selection) and technical factors (i.e., sample collection, processing, and storage) can all have a significant impact on any of the phased strategies to biomarker validation shown in Figure 1. However, in reality, validation of the biomarker test is just as important as the validation of the biomarker itself. To improve the chance of successful translation from external and clinical validation results to patient care, the analytical validity of the test (does the test measure the biomarker of interest correctly and reliably?) should be closely monitored along with the clinical validity of the biomarker (does the biomarker correlate with the clinical presentation?) [8,12]. Furthermore, to sustain the quality assurance and quality control between different stages of biomarker development, it may be necessary to carry out multiple analytical or technical validations when more than one platform is used. Computational Factors In addition to the statistical and bioinformatical factors in the validation process described earlier, the team must also be aware that a vast number of samples impose a huge computational burden on software and hardware. This is especially true when high-throughput and high-performance technologies or high-density arrays need to be used for the validation process. Not considering this potential issue may ultimately be costly in terms of money and time. To deal with the massive amount of data generated from these technologies, new statistical techniques are continuously being developed. Statistical methods should include those that were used successfully in the discovery phase. If the performance is not as good in prior analyses, the refinement of algorithms will be necessary (“adaptive statistics”).
BIAS AND VARIABILITY: KEY POINTS REVISITED
393
Other Challenges of Clinical Biomarker Validation There are many additional barriers and concerns for clinical validation of biomarkers: • Choice of matrix (readily accessible, effect on biomarker concentrations) • Variability (interindividual and intraindividual) • Preparing calibration standards • Implementations of quality control to assure reproducibility • Limited availability of clinical specimens • Heterogeneity of biomarkers (isoforms, bound states) • IP protection (lack of collaboration) • Lack of clear regulatory guidance As discussed, the discovery and validation processes involve multicenter studies with large patient cohorts and technical equipment, steered by a vast number of staff members. The enormous costs for these studies can in most cases be covered only by consortia, consisting potentially of academic centers and pharmaceutical companies. The establishment of such consortia often has to overcome legal hurdles and IP issues, which may be a time-consuming process. As mentioned above, establishment of a biomarker may involve the recruitment of hundreds of patients per treatment arm. Both recruitment time and success rate are unpredictable at the beginning of the study phase. It is during the initial phase that the team needs to determine what really constitutes a good biomarker. Key questions in this decision include: What is the threshold for decision making that needs to be established to call a biomarker “useful”? How do we achieve a “robust” biomarker: a marker that is not easily influenced by such factors as location and personnel? How economical does the biomarker test need to be for it to be used? Which biomarker matrix should be selected, since this may affect future validation possibilities? Accessible matrices such as urine or blood with limited or known concentration variability are ideal.
BIAS AND VARIABILITY: KEY POINTS REVISITED To summarize the sections above, variability is a major obstacle in biomarker validation, regardless of the matrix, the type of biomarker, and its use. As noted above, there are two types of variability: intraindividual variability, which is usually related to lab techniques, sample timing, drug effects, and within-subject biological processes, and interindividual variability, resulting from different individual responses involving multiple genetic factors [40]. Biological variability may be difficult to assess, but it is important to control
394
CLINICAL VALIDATION AND BIOMARKER TRANSLATION
TABLE 2
Fundamental Concerns in Biomarker Validation
Overfitting
Bias Generalizability
Concern
Solutions
False positives and negatives High sensitivity and specificity found but fail on independent validation sets Misidentification of differences between samples Can results be applied to appropriate clinical populations?
Increase sample size Statistical approaches Receiver operator characteristic curves Control for confounding factors Representative validation cohorts
for this factor. A statistical correlation to a clinical endpoint for a candidate biomarker cannot be determined without assessing biological variability, as the overall noise of a sample is a sum of both analytical and biological variability [6]. A biomarker with wide biological variability or time fluctuations that are difficult to control may be rejected [6]. Diurnal variability may require sample pooling or collection at the same time of day [6]. Biomarkers have diverse molecular structures, including possible bound states, which also need to be considered as influences on variability. There are specific considerations for any biomarker validation study. Overfitting, bias, and generalizability are three of the most fundamental concerns pertaining to clinical biomarker validation (Table 2) [41]. With the introduction of high-throughput discovery strategies, overfitting has become a particular fear. These discovery platforms are designed to measure countless analytes, and there is therefore a high risk of false discovery. When a large number of variables are measured on a small number of observations to produce high sensitivity and specificity, the results may not be reproducible on independent validation sets. Some biomarker candidates may be derived simply due to random sample variations, particularly with inadequate sample sizes [26]. A false positive may be thought of as critical part of a disease process, when in fact it is either associated only loosely or coincided randomly with disease diagnosis or progression [3]. As mentioned earlier in the chapter, a biomarker may correlate with a disease statistically but not prove to be useful clinically [8,42]. Increasing sample size and use of receiver operator characteristic curves may help overcome this concern of overfitting. Bias is another major concern during biomarker validation, as there is often potential for misidentifying the cause of the differences in biomarkers between samples. Confounding variables such as age, race, and gender should be controlled for either through statistical modeling or validation study design to limit the effects of bias. Since validation cohorts require suitable diversity for widespread utility, bias may be difficult to avoid entirely through study design.
REFERENCES
395
Similar to bias, most issues pertaining to the generalizability of a biomarker across clinical populations can be addressed through careful consideration of cohort selection. Cohort factors were discussed briefly above. To increase generalizability, the later phases of validation should include more rigorous testing of potential interfering endogenous components by including more diverse populations with less control of these confounding variables [22]. For example, in later-stage clinical trials there should be less control of diet and sample collection and more concomitant medications and co-morbidities [22]. This will allow a biomarker to be used in more clinically diverse situations.
KEY MESSAGES Many of the biomarkers in current clinical use have become accepted via debate, consensus, or merely the passage of time [13]. This rather unofficial establishment of biomarkers in the past has been very inefficient. Importantly, biomarkers can no longer become accepted in this way, as they would fail to meet the current regulatory standards of modern medicine. Contemporary biomarkers must be tested in highly regulated human clinical trials [43]. To date, the clinical trial process has not been very efficient, and a typical biomarker life cycle from discovery to clinical use may take decades. For example, the evolution of prostate-specific antigen (PSA) as a biomarker for prostate disease diagnosis and monitoring took 30 years for regulatory approval by the FDA [44]. In order to expand on the biomarker repertoire used currently in clinical practice, the acceptance process for new biomarkers needs to become much more efficient and cost-effective. The general problem of lack of regulatory guidance is very likely to be addressed formally by regulatory bodies in the near future. Problems such as the availability of samples will also probably improve as collaborations are established and the ethical issues surrounding biobanks are clarified. Even if a biomarker candidate fails during validation, much may be learned regarding the pathophysiology of the disease and the corresponding drug effects during the process [9]. However, without successful validation and integration of biomarkers into clinical use, much of the research effort, particularly in terms of biomarkers for drug development, can be futile.
REFERENCES 1. Biomarkers Definition Working Group (2001). Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther, 69(3):89–95. 2. Colburn WA (1997). Selecting and validating biologic markers for drug development. J Clin Pharmacol, 37(5):355–362.
396
CLINICAL VALIDATION AND BIOMARKER TRANSLATION
3. Colburn WA (2003). Biomarkers in drug discovery and development: from target identification through drug marketing. J Clin Pharmacol, 43(4):329–341. 4. Boguslavsky J (2004). Biomarkers as checkpoints. Drug Discov Dev, Sept. 5. Hunt SM, Thomas MR, Sebastian LT, et al. (2005). Optimal replication and the importance of experimental design for gel-based quantitative proteomics. J Proteome Res, 4(3):809–819. 6. Lee JW, Figeys D, Vasilescu J (2007). Biomarker assay translation from discovery to clinical studies in cancer drug development: quantification of emerging protein biomarkers. Adv Cancer Res, 96:269–298. 7. Listgarten J, Emili A (2005). Statistical and computational methods for comparative proteomic profiling using liquid chromatography–tandem mass spectrometry. Mol Cell Proteom, 4(4):419–434. 8. Bast RC Jr, Lilja H, Urban N, et al. (2005). Translational crossroads for biomarkers. Clin Cancer Res, 11(17):6103–6108. 9. Kuhlmann J (2007). The applications of biomarkers in early clinical drug development to improve decision-making processes. Ernst Schering Res Found Workshop, 59:29–45. 10. Mandrekar SJ (2005). Clinical trial designs for prospective validation of biomarkers. Am J Pharmacogenom, 5(5):317–325. 11. Lachenbruch PA, Rosenberg AS, Bonvini E, Cavaille-Coll MW, Colvin RB (2004). Biomarkers and surrogate endpoints in renal transplantation: present status and considerations for clinical trial design. Am J Transplant, 4(4):451–457. 12. Harper CC (2007). FDA Perspectives on Development and Qualification of Biomarkers, in Rediscovering Biomarkers: Detection, Development and Validation, GTCbio; San Diego, CA. 13. Goodsaid F, Frueh F (2006). Process map proposal for the validation of genomic biomarkers. Pharmacogenomics, 7(5):773–782. 14. Bonassi S, Neri M, Puntoni R (2001). Validation of biomarkers as early predictors of disease. Mutat Res, 480–481:349–358. 15. Wagner JA (2002). Overview of biomarkers and surrogate endpoints in drug development. Dis Markers, 18(2):41–46. 16. Wagner JA, Williams SA, Webster CJ (2007). Biomarkers and surrogate end points for fit-for-purpose development and regulatory evaluation of new drugs. Clin Pharmacol Ther, 81(1):104–107. 17. Goodsaid F, Frueh F (2007). Biomarker qualification pilot process at the US Food and Drug Administration. AAPS J, 9(1):E105–E108. 18. Baker M (2005). In biomarkers we trust? Nat Biotechnol, 23(3):297–304. 19. Benowitz S (2004). Biomarker boom slowed by validation concerns. J Natl Cancer Inst, 96(18):1356–1357. 20. Lee JW, Weiner RS, Sailstad JM, et al. (2005). Method validation and measurement of biomarkers in nonclinical and clinical samples in drug development: a conference report. Pharm Res, 22(4):499–511. 21. Ilyin SE, Belkowski SM, Plata-Salaman CR (2004). Biomarker discovery and validation: technologies and integrative approaches. Trends Biotechnol, 22(8): 411–416.
REFERENCES
397
22. Lee JW, Devanarayan V, Barrett YC, et al. (2006). Fit-for-purpose method development and validation for successful biomarker measurement. Pharm Res, 23(2):312–328. 23. Peck RW (2007). Driving earlier clinical attrition: If you want to find the needle, burn down the haystack. Considerations for biomarker development. Drug Discov Today, 12(7–8):289–294. 24. Groopman JD (2005). Validation strategies for biomarkers old and new. AACR Educ Book, 1:81–84. 25. Normolle D, Ruffin MT IV, Brenner D (2005). Design of early validation trials of biomarkers. Cancer Inf, 1(1):25–31. 26. Jarnagin K (2006). ID and Validation of biomarkers: a seven-fold path for defining quality and acceptable performance. Genet Eng Biotech News, 26(12). 27. O’Connell CD, Atha DH, Jakupciak JP (2005). Standards for validation of cancer biomarkers. Cancer Biomarkers, 1(4–5):233–239. 28. Simon R (2005). Roadmap for developing and validating therapeutically relevant genomic classifiers. J Clin Oncol, 23(29):7332–7341. 29. Simon R (2005). Development and validation of therapeutically relevant multigene biomarker classifiers. J Natl Cancer Inst, 97(12):866–867. 30. Bleeker SE, Moll HA, Steyerberg EW, et al. (2003). External validation is necessary in prediction research: a clinical example. J Clin Epidemiol, 56(9):826–832. 31. Al-Shahrour F, Diaz-Uriarte R, Dopazo J (2004), FatiGO: A Web tool for finding significant associations of gene ontology terms with groups of genes. Bioinformatics, 20(4):578–580. 32. AmiGO. http://amigo.geneontology.org/cgi-bin/amigo/go.cgi. 33. Ingenuity Pathways Analysis. http://www.ingenuity.com/products/pathways_ analysis.html. 34. MetaCore Gene Expression and Pathway Analysis. http://www.genego.com/ metacore.php. 35. Moore RE, Kirwan J, Doherty MK, Whitfield PD (2007). Biomarker discovery in animal health and disease: the application of post-genomic technologies. Biomarker Insights, 2:185–196. 36. Ransohoff DF (2005). Bias as a threat to the validity of cancer molecular-marker research. Nat Rev Cancer, 5(2):142–149. 37. McCormick T, Martin K, Hehenberger M (2007). The evolving role of biomarkers: focusing on patients from research to clinical practice. Presented at the IBM (Imaging) Biomarker Summit III, IBM Corporation; Nice, France. 38. Maruvada P, Srivastava S (2006). Joint National Cancer Institute–Food and Drug Administration workshop on research strategies, study designs, and statistical approaches to biomarker validation for cancer diagnosis and detection. Cancer Epidemiol Biomarkers Prev, 15(6):1078–1082. 39. Colburn WA, Lee JW (2003). Biomarkers, validation and pharmacokinetic– pharmacodynamic modelling. Clin Pharmacokinet, 42(12):997–1022. 40. Mayeux R (2004). Biomarkers: potential uses and limitations. NeuroRx, 1(2): 182–188.
398
CLINICAL VALIDATION AND BIOMARKER TRANSLATION
41. Early Detection Research Network. Request for Biomarkers. Attachment 2: Concepts and Approach to Clinical Validation of Biomarkers: A Brief Guide. http://edrn.nci.nih.gov/colops/request-for-biomarkers. 42. Katton M (2003). Judging new markers by their ability to improve predictive accuracy. J Natl Cancer Inst, 95:634–635. 43. NCI (2006). Nanotechnology-Based Assays for Validating Protein Biomarkers. NCI Alliance for Nanotechnology in Cancer, Bethesda, MD, Nov.–Dec. 44. Bartsch G, Frauscher F, Horninger W (January 2007). New efforts in the diagnosis of prostate cancer. Presented at the IBM (Imaging) Biomarker Summit III, IBM Corporation; Nice, France, Jan.
20 PREDICTING AND ASSESSING AN INFLAMMATORY DISEASE AND ITS COMPLICATIONS: EXAMPLE FROM RHEUMATOID ARTHRITIS Christina Trollmo, Ph.D., and Lars Klareskog, M.D., Ph.D. Karolinska Institute, Stockholm, Sweden
INTRODUCTION Chronic inflammatory diseases include a number of rheumatic, neurological, dermatological, and gastrointestinal diseases, which develop as a result of immune and inflammatory reactions. These perpetuating reactions ultimately cause the clinical symptoms, which are subsequently used to classify the symptoms as a “disease.” Analyzing the disease course longitudinally reveals several distinct steps during disease progression, for which the presence of biomarkers is of importance, both for identification of disease status and for prediction of disease course and treatment options. However, biomarkers per se are not always available today. We have chosen here to discuss disease characteristics and potential biomarkers in one common chronic inflammatory disease, rheumatoid arthritis (RA), which affects approximately 0.5 to 1% of the population worldwide. In this chapter we focus on the following factors: 1. Onset of disease in order to discuss the questions in whom and why the disease occurs, and whether and how onset can be predicted
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
399
400
PREDICTING AND ASSESSING INFLAMMATORY DISEASE
from biomarkers that are present before the occurrence of clinical symptoms 2. Progression of disease with respect to development of joint destruction, but also other with respect to complications such as extra-articular manifestations, cardiovascular events, and lymphoma development in this patient group 3. Treatment of the individual patient and of specific symptoms and disease manifestations 4. Selection of patients in clinical trials of new drugs
RHEUMATOID ARTHRITIS DISEASE PROCESS Rheumatoid arthritis (RA) is a disease defined by seven criteria, with four that should be fulfilled to make the diagnosis (Table 1). These criteria have been useful in harmonizing clinical trials and clinical practice. However, they are not based on what is now known about etiology or pathogenesis, and they are not too helpful in selecting treatment for the single patient. Hence, there are needs to redefine the diagnosis for RA and related diseases, first to define entities more related to distinct etiologies and pathogenetic mechanisms, then to use such new entities for stratification and selection of patients in clinical trials and clinical practice. Basic features of the immune and inflammatory process in RA are, on the one hand, processes that can be identified in the peripheral circulation, initially
TABLE 1
Classification Criteria for Rheumatoid Arthritisa
1. Morning stiffness in and around joints lasting at least 1 hour before maximal improvement 2. Soft tissue swelling (arthritis) of three or more joint areas observed by a physician 3. Swelling (arthritis) of the proximal interphalangeal, metacarpophalangeal, or wrist joints 4. Symmetric swelling (arthritis) 5. Rheumatoid nodules 6. Presence of rheumatoid factor 7. Radiographic erosions and/or periarticular osteopenia in hand and/or wrist joints Source: Arnett FC, Edworthy SM, Bloch DA, et al. (American Rheumatism Association) (1988). The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum, 31:315–324. a Criteria 1 through 4 must have been present for at least six weeks. Rheumatoid arthritis is defined by the presence of four or more criteria, and no further qualifications (classic, definite, or probable) or list of exclusions are required. These criteria demonstrated 91 to 94% sensitivity and 89% specificity for RA compared with non-RA rheumatic disease control subjects.
RHEUMATOID ARTHRITIS DISEASE PROCESS
401
bone Joint capsule
cartilage
Synovial membrane
Synovial fluid
Figure 1 Inflamed RA joint. In healthy joints a thin synovial membrane lines the joint capsule and the synovial fluid. In the RA joint both the synovial membrane and the synovial fluid are infiltrated by inflammatory cells, leading to tender and swollen joints. The synovial membrane also “grows” over the cartilage, aiding in the process of cartilage and bone desctruction.
rheumatoid factors (RFs), and on the other hand, processes in the inflamed tissue, mainly the joints (Figure 1). Rheumatoid factors, identified almost 70 years ago, are part of the diagnostic criteria for RA. Being present in some 50 to 60% on incident RA cases and increasing over time with active disease, these auto-antibodies are also seen in many non-RA conditions and are thus not very specific for the disease. Rheumatoid factors have never been shown to be pathogenic by themselves, neither in patients nor in experimental models. They are thus seen mainly as biomarkers of importance for diagnosis and prognosis of a more severe disease course, but not necessarily directly involved in disease pathogenesis. Joint inflammation in RA is focused on synovial inflammation, in many cases associated with cartilage destruction and concomitant erosions in bone. This inflammation has been studied in large detail over the years, demonstrating that its features are common to many other types of chronic inflammation in other tissues. No real RA pathognomonic features have yet been identified, the most unique feature identified so far being the way the inflammatory cells and molecules attack and destroy bone and cartilage. Having the major features of RA, the synovial joint inflammation and presence of RF being typical but by no means unique for RA, there is an obvious need to define more specific features of the disease. The identification of such features would enable us to search for a better understanding of the pathogenesis of RA, a more accurate diagnosis based on biomarkers, and more specific treatments.
402
PREDICTING AND ASSESSING INFLAMMATORY DISEASE
STUDIES ON ETIOLOGY AND PATHOGENESIS AS A BASIS FOR DEVELOPMENT OF BIOMARKERS FOR DIAGNOSIS AND PROGNOSIS IN RA Any understanding of a complex, partly genetic disease is based on an understanding of how genes and environment interact in giving risk to immune reactions that contribute to the joint destruction and other inflammatory reactions in RA. In healthy subjects, the major role for the immune system and subsequent inflammatory reactions is to defend us against pathogens, but in RA the immune system has partly changed focus to attack our own tissues, primarily the joints, and is thus denoted as an autoimmune disease. Genes There is strong evidence to support a significant genetic component to the susceptibility of RA. Twin studies clearly demonstrate an overrepresentation of disease concordance in monozygotic twins (12 to 20%, depending on study) compared to dizygotic twins (4 to 5%) and the general population (0.5 to 1%). Analysis on the gene level demonstrates the strongest genetic association with genes within the HLA region, specifically the HLA-DRB1 gene. Its gene products, the MHC class II molecules, were described in the 1970s to be present on cells in the inflamed joint, allowing antigens (part of proteins, generally pathogens, but in autoimmune diseases self-proteins) to be presented to the immune system and subsequently to trigger inflammatory reactions (Figure 2). Serologic and later genetic typing of the various MHC molecules revealed that a few allotypes of HLA-DRB1 were overrepresented in RA patients. A closer analysis demonstrated even identical amino acid
Figure 2 Inflammatory cells in the RA joint. A number of immune cells have infiltrated the joint and local production of inflammatory mediators, including cytokines and antibodies, occurs. The synovial fluid, which functions as a cushion during joint movements, is in a healthy joint acellular. Illustrated to the left is the presentation of an antigen by the dendritic cell to a T-cell; the yellow connector is a MHC class II molecule. Cytokines, released from the immune cells, function as signaling molecules between cells. (See insert for color reproduction of the figure.)
STUDIES ON ETIOLOGY AND PATHOGENESIS AS A BASIS
403
sequences in those regions of the MHC class II DRβ1 chain, which are in contact with the antigen and the T-cell receptor on the T-cell mediating the specific immune responses. This amino acid sequence is termed shared epitope. The nature of the specific immune reactions mediated by the MHC class II molecules has, however, been surprisingly difficult to define. Identifying the specific antigens would help to identify the autoimmune trigger and make it possible to interfere specifically to break the autoimmune reactions. In total, the HLA region contributes 30 to 50% of the genetic component for RA in Caucasians. It appears that other MHC class II genes and allotypes may contribute to RA in other ethnic groups, and much is still not known about the contributions of different genes and allotypes within the MHC concerning susceptibility and disease course in RA. Only much more recently, a second genetic risk allele has been identified in populations of European descent. This minor allele of a nonsynonomous single-nucleotide polymorphism (SNP) in the protein tyrosine phosphatase nonreceptor 22 (PTPN22) gene confers the second-largest genetic risk to the development of RA, with an odds ratio of about 1.8. PTPN22 encodes the intracellular protein lymphoid tyrosine phosphatase, which plays a central role in immune responses by playing an integral part in signal transduction and T-cell receptor signaling pathway and inhibiting T-cell activation. This variant was first demonstrated for type 1 diabetes, but was soon confirmed in a number of autoimmune diseases, including RA, systemic lupus erythematosus (SLE), and Grave disease. However, other autoimmune diseases show no association with this SNP, suggesting subsets of autoimmune diseases to be defined accordingly. The recent introduction of whole genome-wide association studies allowed, for the first time, good coverage of common variations in the human genome. Hundreds of thousands of SNPs in thousands of samples were genotyped and compared. Interestingly, studies focusing on RA again demonstrated the strongest effects for the two well-documented RA susceptibility genes HLADRB1 and PTPN22. Other genes (Table 2) make a more modest contribution to susceptibility. Future studies have to identify remaining, probably smaller, genetic effects and how all the genetic effects interact with each other as well as with environmental factors in inducing and perpetuating the disease.
TABLE 2 Contribution of Genetic Risk Factors in Rheumatoid Arthritis Gene HLA PTPN22 6q23 STAT4 TRAF1/C5
Odds Ratio 6.4 1.8 1.2 1.2–1.4 1.1–1.4
404
PREDICTING AND ASSESSING INFLAMMATORY DISEASE
Environment Information on environmental factors important for the development, perpetuation, or course of RA is surprisingly scarce. Smoking is the only conventional environmental factor that has been linked reproducibly to an increased risk of developing RA. Other exposures, such as silica dust and mineral oils, have been reported in a few studies. It has not yet been possible to verify frequently hypothesized stimuli such as microbial infections with the methods used to date. Smoking was initially considered as an unspecific risk factor, of interest mainly from a public health perspective. However, newer studies indicate that smoking is a specific trigger of RA, as discussed in more detail below.
Immunity Studies on specific immune reactions in RA have been confined almost entirely to those involving autoantibodies. As described above, they were initially restricted to the measure of rheumatoid factors, but more recently, antibodies specific for citrullinated proteins have been shown to be of great importance. This is discussed in detail below, since their presence covariates with genetic and environmental factors, providing an important tool in subgrouping patients into different entities of RA. In the process of joint inflammation and cartilage and bone destruction, many different cells and molecules of the immune system participate. Some of them are illustrated in Figure 2. However, even if some of these inflammatory processes are revealed in detail, a specific trigger has still not been identified. In recent years, advances within the field of cytokine regulation and cytokine-directed therapy have largely dominated the research field of RA, illustrating how therapeutic progress is possible even though the role of adaptive immunity in the disease is not fully understood. Cytokines are soluble molecules that mediate the communication between cells of the immune system but also with other cells of the body, such as the endothelium. Interestingly, the first cytokine that was targeted, tumor necrosis factor (TNF), belongs to the innate immune system. Blocking IL-1, another cytokine belonging to the innate immune system, has not proven to be as effective as TNF blockade for the majority of the RA patients. Recent clinical trails blocking a third cytokine in this family, IL-6, show promising results. This cytokine exerts effects within both the innate and the adaptive immune systems. Temporarily eliminating B-cells, which are the producers of antibodies, has also proven a successful therapy. A third alternative, blocking the interaction between cells presenting antigens and T-cells, has resulted in an approved therapy. Together, these treatment-based data also demonstrate the significant role of the various parts of the innate as well as adaptive immune system for disease progression.
BETTER DIAGNOSIS BY MEANS OF BIOMARKERS AND GENETICS
405
Citrulline Immunity The presence of antibodies specifically identifying citrullinated antigens, specifically termed antibodies to citrullinated protein antigens (ACPAs) are a strong predictor of developing RA. Approximately 60% of all RA patients carry such antibodies. These antibodies are highly specific for the disease; that is, they are rare in the normal population (<2%) and are also quite rare in other inflammatory conditions. Almost all patients who carry ACPAs developed them before disease onset. ACPAs are directed against proteins, which as the name implicates, contain the amino acid citrulline. Citrulline occurs only upon posttranslational modification of the amino acid arginine by enzymatic conversion. Even if these auto-antibodies are more specific for RA than for RF, it is important to ask the question if their presence is an epiphenomenon or is causally related to the disease. This helps in further dissection of the pathophysiology of the disease and in the development of new treatments. Here, animal models provide a tool to test the hypothesis of citrullination to be causative of the disease. One example is arthritis induced in the rat by immunization with collagen type II, a protein expressed almost exclusively in the joint. Immunization with the citrullinated form of collagen type II resulted in a more severe arthritis than immunization with the same noncitrullinated protein. In a mouse model, where transfer of antibodies directed toward collagen induces a mild arthritis, the addition of antibodies directed against citrullinated fibrinogen enhanced the arthritis. Thus, in experimental arthritis models citrullination can change the immunogenicity of self-antigens, and some citrullinated proteins may contribute to arthritis development. From the data above we can conclude that immunity to citrullinated proteins may play a pathogenic role in a subset of RA patients; however, so far we have not identified the specific proteins that are targeted by the immune system. How do we then measure ACPAs? Test kits have been developed which contain cyclic citrullinated peptides (CCPs), where the amino acid citrulline is flanked by a variety of amino acids to mimic many different peptide sequences. Thus, the test measures anti-CCP antibodies, and this is what is generally reported in patient files. In summary, the presence of ACPAs in the circulation is more predictive for RA than high blood pressure is for cardiovascular disease and thus can be considered as a biomarker for a subset of RA patients.
BETTER DIAGNOSIS BY MEANS OF BIOMARKERS AND GENETICS By the late 1970s, the MHC class II genes were already identified as a major genetic risk factor for RA. However, only the presence of specific MHC class II alleles does not result in RA, since the disease-associated HLA-DRB1 alleles are common also in healthy subjects. But combining genetic studies
406
PREDICTING AND ASSESSING INFLAMMATORY DISEASE
with ACPA analysis, the genetic association of RA to HLA-DRB1 SE has recently been shown to be confined entirely to the ACPA-positive subset of RA. In contrast, ACPA-negative RA may be associated with a unrelated HLA-DR allele, HLA-DR3. Also a second major genetic risk factor for RA, the polymorphism in the PTPN22 gene was also shown to be associated only with the ACPA-positive disease. In contrast, other genetic risk factors, in particular variations in the interferon regulating factor 5 (IRF-5), but also polymorphisms in a newly identified risk gene in the C-type lectin complex, were associated exclusively with ACPA-negative disease. Taken together, the descriptive studies of disease course and genetic linkages strongly indicate that ACPA reactivity splits RA into two major and clinically relevant subsets of disease and thus becomes an important biomarker for one subgroup of RA. Practically, ACPA-positive and ACPA-negative RA should be treated as separate entities when studying the molecular pathophysiology of RA and probably also when selecting treatment for some treatments for the single patient (Figure 3). Including smoking habits in these combined analyses provides an even clearer risk assessment for developing RA. The relative risk of developing ACPA-positive RA was shown to be over 20 times higher for smokers carrying two copies of the HLA-DRB1 SE alleles than for nonsmokers with no SE alleles. On the other hand, no increased risk was discerned from smoking concerning development of ACPA-negative RA. This leads directly to the question of whether smoking can trigger an ACPA immunity. Recent studies suggest that this could be the case since cells from bronchoalveolar lavage from smokers, but not from nonsmokers, contain large amounts of citrullinated proteins, and smoking can activate macrophages and thus enhance their antigen-presenting capacity. The disease onset of RA can be gradual, and thus patients can have swollen and tender joints without being given the diagnosis RA, which is based on
Pre-disease factors Genetic and environmental factors: MHC class II, PTPN22 Smoking ACPA
Diagnosis RA
Disease progression ACPA-positive RA ACPA-negative RA
Unspecific arthritis
Unspecific arthritis
Figure 3 Rheumatoid arthritis is a multifactorial disease. A number of predisease factors have been identified, which are of importance for the later disease course but not enough to initiate clinical disease. However, the trigger for clinical disease onset has still not been identified. The presence or absence of anticitrulline protein antibodies (ACPAs) distinguishes two forms of RA. Some patients will have swollen and tender joints but not fulfill four of the seven criteria for RA and thus carry an unspecific arthritis.
BETTER PROGNOSTIC TOOLS
407
achieving four of seven specific criteria. These patients are classified as undifferentiated arthritis (UA), having an inflammatory arthritis for which no specific diagnosis can be made. An important question is if these patients should be treated in order to ameliorate disease progression to RA and retard radiographic joint damage or even prevent the development of RA in these patients. A recent double-blind, placebo-controlled, randomized clinical trial indicated that methotrexate (MTX) treatment as used for RA patients can postpone progression to RA and retard radiographic progression but not prevent the development of RA from UA. Subgroup analysis revealed that the beneficial outcome was most pronounced in patients with anti-CCP antibodies. In contrast, in the anti-CCP-negative subgroup the effect of MTX on the development of RA, the radiographic progression, and even the signs and symptoms was not demonstrable. Thus, this study provides evidence of distinguishing between anti CCP-positive and CCP-negative arthritis, even before the diagnosis RA is met, and therefore also becomes an important biomarker for undifferentiated arthritis. The identification of predisease biomarkers might in the future lead to treatments which can be given to healthy subjects before the clinical diagnosis of RA and so prevent disease onset. An example already applied in medicine is the treatment of high blood pressure to prevent cardiovascular disease.
BETTER PROGNOSTIC TOOLS WITH THE HELP OF BIOMARKERS AND GENETICS Rheumatoid arthritis affects not only joints but also possibly, inflammation in other organs. Manifestations include rheumatoid nodules, secondary Sjögrens syndrome, rheumatoid lung disease, pleuritis, pericarditis, vasculitis, neuropathy, Felty syndrome, and severe eye disease, and are commonly summarized as extraarticular manifestations. Different co-morbidities also occur during the disease course. Cardiovascular morbidity and mortality occur at rates greater than would be expected from the profile of established risk factors. The average risk of lymphoma is increased in RA; however, a more detailed analysis indicates that the risk is increased primarily in patients with the most severe disease (Figure 4). Extraarticular Disease Manifestations Extraarticular disease manifestations are more likely to occur in patients with more severe joint disease. Rheumatoid nodules often precede the onset of severe RA and thus can be considered a biomarker for severe disease. Also, their clustering with all different extraarticular manifestations indicate that vascular pathogenetic mechanisms are important in all types of extraarticular RA. This is of particular interest given the association of RA with cardiovascular co-morbidity. The different manifestations tend to cluster in specific
408
PREDICTING AND ASSESSING INFLAMMATORY DISEASE
atory
n
ase i
Incre
m inflam
en
burd
Lymphoma Cardiovascular disease Severe joint destruction
Figure 4 Inflammatory burden. With increased accumulated inflammatory burden the risk for additional disease states and complications increases for patients with RA. It is thus of importance to control the inflammation at all time points, a difficult task when not all patients respond to the antirheumatic therapies available today.
patterns and suggest shared disease mechanisms in these systemic manifestations. There are very likely several genetic and environmental factors affecting the development of these manifestations, of which some are described below. An association of HLA-C3 with vasculitis has been described. HLA-C3 is the product of a specific MHC class I gene and thus with CD8+ cytotoxic T-cells and NK-cells, in contrast to MHC class II proteins, which interact with CD4+ T-helper cells. Smoking is an independent predictor of vasculitis and is probably involved in vascular damage by antigen modification as described above. The increased number of patients with specific autoantibodies such as RF and ANAs (antinuclear antibodies) supports a role for immune complexes in the pathogenesis of vasculitis, which in turn indicates B-cell abnormalities in the development of extraarticular RA. In summary, extraarticular manifestation indicate a more severe disease; however, this risk group is today not routinely analyzed early in the disease course. Here, specific biomarkers could possibly aid an early aggressive treatment of this patient group in order to limit the progression of extraarticular manifestations. Cardiovascular Disease in RA Cardiovascular (CV) morbidity and mortality in RA occur at rates greater than would be expected from the profile of established CV risk factors. Diabetes mellitus, hypercholesterolemia, hypertension, cigarette smoking, and obesity are generally powerful classifiers of CV risk and when comparing the profiles of these risk factors in people with and without RA, they are similar. Thus, there must be RA-specific factors explaining the increased risk. Dividing the patients in younger and older patients demonstrates that such factors are more common in the younger age group. In older patients, the
BETTER PROGNOSTIC TOOLS
409
proportion of atherosclerosis can be explained by established CV risk factors as described above. In contrast, in the younger age group, systemic inflammation seems to exert its atherosclerotic effects early. High-resolution carotid ultrasound is used to measure carotid intima media thickness (IMT). These measures may thus be used as a biomarker of subclinical atherosclerosis. Interaction calculations suggest that the ESR’s effect on IMT varies according to the number of CV risk factors; thus, higher ESR values, a measure for inflammation, are associated with greater IMT only in the presence of CV risk factors. In patients without CV risk factors, the ESR is not associated significantly with IMT. Taking genetic and environmental components into consideration of CV disease in RA patients demonstrates that the presence of two SE alleles predicts death from cardiovascular disease in RA patients, and this is independent from autoantibody status. However, the greatest risk of death from CV disease is associated with an interaction among smoking, SE alleles, and anti-CCP antibodies. Interestingly, no association of the PTPN22 gene with mortality was detected. In summary, both established CV risk factors and RA manifestations account for a significant proportion of atherosclerosis is RA. Factors related to RA may have a greater influence on the extent of atherosclerosis in young patients. The presence of established risk factors may be necessary for systemic inflammation to promote atherosclerosis; however, more studies are needed to understand these interactions. Lymphoma in RA Among RA patients there is in general a doubled risk for lymphoma. However, a closer analysis of disease severity, as measured by degree of joint destruction, ESR, number of affected joints, and accumulated inflammatory load, revealed a more pronounced relative risk in patients with the most severe disease and little or no increase in those with mild to moderate disease. This raises the question of whether lymphomas arise as a consequence of treatment or because of a lack thereof. Larger cohort studies have thus far not confirmed any treatment-related excess of lymphoma in general. However, it is premature to make firm conclusions about the true lymphoma risk in patients treated with TNF antagonists. In summary, the data so far indicate that the inflammatory burden should be kept as low as possible to possibly reduce the risk for lymphoma development. Cartilage and Bone Destruction During the inflammation in the joint, cartilage and underlying bone are broken down. Matrix metalloproteinases (MMPs) are enzymes active in this process, which results in the release of a number of collagenous and noncollageneous cartilage and bone-derived molecules. Serum levels of MMP-3 and urinary
410
PREDICTING AND ASSESSING INFLAMMATORY DISEASE
levels of c-telopepetide of collagen type II (CTX II) have been identified as two independent baseline factors that predict radiographic progression. Upon successful MTX treatment, MMP-3 levels decrease. Also, serum cartilage oligomeric matrix protein (COMP) levels are predictive of radiographic damage. Proof-of-concept studies demonstrate that targeting RANK-mediated osteoclastogenesis prevents inflammatory bone loss, and clinical application has only just begun by blocking RANKL. Replacing radiographic measures with measures of biochemical molecules in serum and/or urinary samples might in the future become the preferred way to measure joint destruction specifically. Traditionally, high inflammatory burden and measures of CRP have been included in the prediction of joint destruction, but the concomitant measures of inflammation and joint destruction indicate that the inflammatory and destructive processes are uncoupled and therefore need to be identified and treated separately.
BIOMARKERS PROVING LEADS CONCERNING EFFECTS OF TREATMENT OF RA The drugs used in the treatment of RA are generally divided into three groups: nonsteroidal anti-inflammatory drugs (NSAIDs) to relieve pain, traditional disease-modifying antirheumatic drugs (DMARDs), and biologics. There are today some general guidelines for the treatment of RA; however, biomarkers guiding the most adequate selection for the single patient are missing. In the majority of RA patients, treatment with methotrexate or other DMARDs only partly reduces clinical and radiological progression of the disease. For many patients, therefore, the introduction of the biological drugs provided a major breakthrough. By blocking the proinflammatory cytokine TNF, joint progression could be halted much more efficiently than before in a large group of patients. More recent drugs eliminating circulating B-cells or blocking co-stimulation of T-cells also indicate a slowdown in joint destruction. Measuring levels of TNF in the inflamed tissue has been suggested to be a biomarker for treatment response to TNF blocking agents. Unfortunately, it has yet been possible to describe such a marker for serum samples. With the identification of the association of smoking, SE alleles, and antiCCP antibodies with the greatest risk of death from CVD, it will be possible to be more active in treatment to prevent cardiovascular disease in these patients at high risk for CV disease. In the treatment of lymphoma, which in the majority of cases are B-cell derived, a specific treatment option exists for RA patients. Rituximab was established in 1998 as a drug to treat patients with B-cell lymphomas, and the indication was extended in 2006 to treat patients with moderate to severe RA, independent of lymphoma associations. Thus, in these cases both the lymphoma and the RA are treated with the same drug.
RECOMMENDED READING
411
CONCLUDING REMARKS In summary, RA is a heterogeneous disease with regard to severity of the joint inflammation and other inflammatory manifestations. To prevent disease onset or progression of disease in the single patient, distinct biomarkers are needed, and only a few have yet been identified. The fact that the presence or absence of ACPAs divides the disease into two distinct entities will very likely influence future treatments and clinical trials. Clinical trials, if not already focusing on a subgroup of patients, need also to be adequately powered to allow subgrouping patients according to genetic setup, environmental exposures, and different nonsynovial inflammations in order to find biomarkers for individualized therapies. Today’s and tomorrow’s biological therapies targeting different parts of the immune and inflammatory pathways will also increase our understanding of disease etiology and progression and help identify new drugs.
RECOMMENDED READING Bowes J, Barton A (2008). Recent advances in the genetics of RA susceptibility. Rheumatology, doi10:1093. Ekström-Smedeby K, Baecklund E, Askling J (2006). Malignant lymphomas in autoimmunity and inflammation: a review of risks, risk factors, and lymphoma characteristics. Cancer Epidemiol Biomarkers Prev, 15:2069–2077. Farragher TM, Goodson NJ, Naseem H, et al. (2008). Association of the HLA-DRB1 gene with premature death, particularly from cardiovascular disease, in patients with rheumatoid arthritis and inflammatory polyarthritis. Arthritis Rheum, 58:359–369. Klareskog L, Rönnelid J, Lundberg K, Padyukov L, Alfredsson L (2008). Immunity to citrullinated proteins in rheumatoid arthritis. Annu Rev Immunol, 26:651–675. Romas E, Gillespie T (2006). Inflammation-induced bone loss: Can it be prevented? Rheum Dis Clin N Am, 32:759–773. Turesson C, Schaid DJ, Weyand CM, et al. (2006). Association of HLA-C3 and smoking with vasculitis in patients with rheumatoid arthritis. Arthritis Rheum, 54:2776–2783. Van der Helm-van Mil AHM, Huizinga TWJ, De Vries RRP, Toes REM (2007). Emerging patterns of risk factor make-up enable subclassification of rheumatoid arthritis. Arthritis Rheum, 56:1728–1735. Van Dongen H, van Aken J, Lard LR, et al. (2007). Efficacy of methotrexate treatment in patients with probable rheumatoid arthritis: a double-blind, randomized, placebo-controlled trial. Arthritis Rheum, 56:1424–1432. Young-Min S, Cawston T, Marshall N, et al. (2007). Biomarkers predict radiographic progression in early rheumatoid arthritis and perform well compared with traditional markers. Arthritis Rheum, 56:3236–3247.
21 PHARMACOKINETIC AND PHARMACODYNAMIC BIOMARKER CORRELATIONS J.F. Marier, Ph.D., FCP Pharsight, A Certara Company, Montreal, Quebec, Canada
Keith Gallicano, Ph.D. Watson Laboratories, Corona, California
INTRODUCTION The importance of a biomarker in drug development is based on the weight of evidence that changes in dose and/or drug concentration levels correlate strongly with the biomarker and the clinical outcome desired. The linkage between pharmacokinetics of a new chemical entity (NCE), the dynamics of a biomarker, and the ultimate clinical outcome should be based on robust theoretical considerations, prior therapeutic experience, well-understood pathophysiology, and knowledge of the drug’s mechanism of action. The validation of biomarkers is therefore very important, particularly if one considers their relevance for decision-making and regulatory purposes. The elements described below are crucial for the integration of a biomarker in a drug development program. 1. Identification, quantitation, and validation of biomarker assays and drug concentrations. Reliable and selective assays should be validated under
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
413
414
PK AND PD BIOMARKER CORRELATIONS
a good laboratory practices (GLP)–like environment for quantitative methods for measuring both biomarker responses and drug concentrations. Most biomarkers are endogenous macromolecules that can be measured in biological fluids. As for drug assays, biomarker assays should provide acceptable sensitivity, specificity, precision, and accuracy. For example, intra- and intersubject variability of the biomarker should be minimal and well understood, biomarker stability and sample storage conditions should be controlled, and quantitative assay should be standardized and validated with adequate power to distinguish between treatment and control groups [1–3]. Assays should be validated to meet study objectives at various drug development stages and possess adequate performance to quantify biochemical responses specific to the target disease progression and drug intervention. In addition, biomarker measurement should be technically practical, so that measurement can be obtained by minimally invasive techniques and that the sampling of the biomarker is minimal or is taken at the same time as other physiological, safety, or efficacy measurements [4]. 2. PK/PD correlation and integration of model-based drug development. Biomarkers typically have different time courses from clinical endpoints and often are more directly related to the time course of drug concentrations. To evaluate potential correlations, adequate information should be available on the pharmacokinetics (PK) of the new chemical entity (NCE). Similarly, adequate information should be available on the pharmacodynamics (PD) (including PK) of the biomarker, with the expected effect on clinical outcome. Following the identification of correlations between the NCE and specific biomarkers, PK/PD modeling may be performed in early stages of drug development to gain greater insight into the effect of drug concentrations on the biomarker as a function of time. PK/PD modeling involves a set of techniques using mathematical and mechanistic (as opposed to empirical) models derived from quantitative pharmacology. Biomarkers, together with PK/PD modeling and simulation, provide a continuous process to link what has been learned from today’s drug development cycle to the next generation of biomarkers, assays, and models [5–7]. 3. Correlation between biomarkers and clinical endpoint. The availability of a validated biomarker for a specific clinical endpoint may greatly facilitate the development of compounds that act via well-understood mechanisms of action. For example, biomarkers for HIV and AIDS include viral load (the number of free virus particles in the blood) and the count of CD4+ immune system cells. Fasting blood sugar and hemoglobin A1c, a protein that indicates a patient’s blood sugar history, are established biomarkers for diabetes treatments. The relationship of the biomarker to the therapeutic endpoint should
INTRODUCTION
415
be validated in early phase II trials; and, when possible, as early as phase I [8,9]. Many surrogate markers can be assessed in healthy volunteers and in the intended patient population. The ability to assess surrogate markers in healthy volunteers can bring the initial decision-making process into phase I rather than phase II or phase III of the NCE development process. This can be a significant competitive advantage to those who use the opportunity. Table 1 presents widely used biomarkers according to disease and indications. Biomarkers can be elevated to the status of a surrogate endpoint based on the
TABLE 1 Widely Used Biomarkers for Specific Disease or Indications Biomarkers Pulmonary function tests, forced expiratory volume (FEV) PT (including INR), APTT, anti-Xa activity, fragment 1 + 2, plasminogen, plasmin inhibitor, D-dimer Leukotrienes, cytokines, and chemokines Epidermal growth factor (EGF), fibroblast growth factor (FGF), human growth hormone (HGH), neopterin Viral load, CD4 count Cortisol, estradiol, estrone, follicle-stimulating hormone, luteinizing hormone, progesterone, testosterone Angiotensin-I, angiotensin-II, plasma renin, aldosterone, and angiotensin-converting enzyme (ACE) activity Eicosanoids (prostaglandins and leukotrienes) Interferons, interleukins, tumor necrosis factor (TNF), rheumatoid factor Cholesterol, fatty acids, HDL, LDL, phospholipids Cytidine deaminase activity, inosine, xanthine, hypoxanthine, uric acid 8-Hydroxy-dG, protein carbonyls, I-κB, NF-κB
Glucose, glucagon, fructosamine, glycosylated albumin, and hemoglobin A1c, insulin, C-peptide Stomach fullness, cholecystokinin, glucagon-like peptide 1, bombesin, somatostatin, ghrelin, leptin, glucose, insulin, diet-induced thermogenesis, temperature, ventilatory parameters ProMMP-1, ProMMP-3, tissue inhibitory of MMP-1
Disease or Indication Asthma Coagulation and fibrinolysis Chronic obstructive pulmonary disease Growth modulation
HIV Hormonal control
Hypertension
Inflammation Immune modulation, rheumatoid arthritis Lipid metabolism Nucleotide metabolism Oxidative stress (diabetes, heart disease, arthritis, obesity, and cancer) Type 1 and 2 diabetes mellitus Satiety and satiation
Tissue remodeling
416
PK AND PD BIOMARKER CORRELATIONS
weight of evidence that demonstrates that changes in the biomarker correlate strongly with the desired clinical outcome. The best known physiologic surrogate marker is blood pressure reduction as an indicator of reduced incidence of stroke and myocardial infarction. This surrogate marker is now used as a clinical endpoint, because a strong relationship between reduction in blood pressure and a reduction in the incidence of myocardial infarctions and strokes has previously been established. For innovative products, limited information may be available on the relationship between the PK of a drug, the effect on a candidate biomarker, and the ultimate clinical response. Although this may limit the use of predetermined biomarkers in the early development phases of a drug, the incorporation of recent advances in genomics, pharmacogenetics, and proteomics may help to qualify various candidate biomarkers. Pilot exploratory studies should therefore be conducted to qualify and validate candidate biomarkers for predictive clinical assessment of disease progression and the effect of drug intervention. Mechanistic approaches using new technological advances in target validation, functional biology, proteomics, genomics, and the monitoring of gene expression using DNA arrays, imaging, and the correlation of these effects to the disease state may hold promise for an earlier understanding of disease and toxicological processes, improve the predictiveness of preclinical pharmacology and toxicology data to humans, and find suitable biomarkers for use in the early stage of drug development of innovative products [2,10]. The appropriate combination of biomarker identification and selection, bioanalytical methods development and validation for drugs and biomarkers, and mechanism-based PK/PD models for fitting data and predicting future clinical endpoints and outcomes may provide powerful insights and guidance for effective drug development, toward safe and efficacious medicine for individual patients [6–8,11,12]. The U.S. Food and Drug Administration (FDA) is developing these concepts as part of its critical path initiative, where the agency recognizes that the major contributor to inefficiency in development was the absence of innovative new methods for preclinical and clinical testing of drugs. The FDA recommended new tools and various opportunities to improve drug development, including the use of validated biomarkers and the application of model-based drug development (MBDD). In the following sections we focus on the PK/PD correlation assessment of a drug and a biomarker, and the use of PK/PD modeling and simulation to optimize drug development and decision making.
PK/PD CORRELATION OF BIOMARKERS Plans to incorporate biomarkers for potential correlations in drug development programs should occur when a new candidate molecule is being identified and the discovery process is starting. To evaluate potential correlations between the drug and candidate biomarkers, adequate information should be
PK/PD CORRELATION OF BIOMARKERS
417
available on the PK of the drug. Preliminary PK/PD correlation analyses may be performed using noncompartmental approaches (i.e., model independent) by deriving parameters such as the area under the concentration–time curve (AUC), maximum concentration (Cmax), time to maximum plasma concentrations (tmax), and terminal elimination half-life (t1/2). Exploratory data analyses (EDAs) between drug exposure data and candidate biomarkers are typically performed by plotting individual concentration data and/or PK parameters of the drug versus the candidate biomarker using appropriate data transformations and in appropriate subgroups (e.g., by gender, dose, or route of administration) in order to identify the impact of drug exposure on the source and magnitude of variability in biomarker response. During exploratory data analyses, questions in Table 2 should be answered to perform PK/PD correlations of biomarkers. In the simplest case, biomarkers will be related directly to the drug concentrations in the biological fluid sampled; that is, the biomarker will respond in a graded fashion so that increasing drug concentrations will result in a gradual increase or decrease in the biomarker of interest over time. A temporal relationship between the time course of drug concentrations and the biomarker is observed. In some cases, the temporal delay may only be a few minutes to an hour, whereas in other cases it may be as long as several hours to days. Exploratory analyses may also provide valuable information on the pharmacodynamics of biomarkers as well as PK/PD correlations as presented in Table 3. Preclinical and clinical study designs should provide as much information on the nature of the PD response of candidate biomarkers. Depending on the PK of the drug and the dynamics of the biomarker, exploratory PK/PD correlations may be limited and likely to require a more mechanistic approach for an optimal assessment of PK/PD correlations. Mechanism-based PK/PD
TABLE 2
Pharmacokinetic Properties and Impact on PK/PD Assessment
Pharmacokinetic Property Absorption
Distribution
Metabolism
Elimination
Impact on Correlation Assessment Will rate and/or extent of drug exposure correlate with biomarkers? Is a minimum threshold likely to correlate with biomarker response? Will total (unbound plus bound) or unbound drug levels contribute to the overall effect on biomarkers? Are concentration levels in peripheral tissues likely to correlate with biomarkers? Will active metabolites contribute to the overall effect on biomarkers? Is there a temporal delay between concentrations and biomarkers (proteresis or hysteresis)? Will the rate of elimination of the drug correlate with biomarker response? Does drug elimination parallel the biomarker response?
418
PK AND PD BIOMARKER CORRELATIONS
TABLE 3
Pharmacodynamic Properties and Impact on PK/PD Assessment
Pharmacodynamic Property Mechanism of action
Impact on Correlations Assessment Will the drug stimulate or inhibit the formation or degradation of the biomarker? Will the drug create a reversible or irreversible effect on the biomarker? Will the drug compete with endogenous, environmental, or dietary ligands for receptor binding? Are placebo data available to determine a diurnal/ circadian cycle of biomarkers? Will the disease state affect the biomarker response? Does drug elimination parallel the biomarker response?
Competitive ligands Diurnal variation Disease state
models differ from empirical descriptive models in that they contain specific expressions to characterize processes on the causal path between drug administration and effect [13–15]. In the next section we focus on important issues and concepts that apply to PK/PD modeling and the use of biomarkers to improve drug development processes. PK/PD Modeling of Biomarkers PK/PD modeling and simulation tools have continued to develop to a high level of sophistication over the past 20 years. Mechanism-based PK/PD modeling involves a set of techniques using mathematical and mechanistic models derived from quantitative pharmacology. The simplest compartmental PK model can be described as a one-compartment model following intravenous administration of a drug. A schematic representation of the compartmental model is presented in Figure 1. In this example, the removal of
PK Model
Equation
Pharmacokinetics
Dose (IV bolus)
dose C time = • exp − K10 • time Vc
Vc 1
K10
Drug Concentration
100
10
1.0 0.1 Time (h)
Figure 1
Pharmacokinetic modeling and equations.
PK/PD CORRELATION OF BIOMARKERS
419
drug from the central compartment (i.e., systemic circulation) was described using a first-order elimination rate constant. Compartmental analyses can be performed routinely by fitting the data to a model using a PK/PD software package such as WinNonlin (Pharsight, a Certara Company, St. Louis, Missouri), Kinetica (ThermoFisher, Waltham, Massachusetts), S-ADAPT (University of Southern California,) or NONMEM (University of California at San Francisco). The PK of drugs can be described using more advanced compartmental models and can be customized to describe any type of PK behavior. Case studies in which different types of compartmental models were used in preclinical and clinical drug development are presented later in the chapter. Modeling the concentration–effect relationship on the biomarker is very dependent on the mechanism of action of the drug. In vitro studies that may elucidate this mechanism are very important in the early stage of drug development. For example, drug-induced changes in biomarkers of interest are often modeled using classical drug receptor theory following the law of mass action. This theory predicts that as receptors in the target organ interact with the drug, the biomarker levels will increase with increasing interaction until all receptors have been occupied, at which time no further change in biomarker will occur. In its simplest form, the effect on the biomarker will be driven by the measured drug concentration over time, Ctime, assuming that drug concentrations in the systemic circulation is at equilibrium with the effect site. The classic and most commonly used PD model under these conditions is the Emax model, which is an empirical function for describing nonlinear concentration–effect relationships. It has the general form illustrated in Figure 2, where Emax represents the maximum effect on the biomarker, Ctime the concentrations of a drug predicted over time with a PK model, and EC50 the concentration of the drug that produces half of Emax. The PD model can be
Equation
Pharmacodynamics
Effectc = Emax •
C
n time
n C time + EC50n
Effect (Biomarker)
3
2
1 0 Drug Concentration
Figure 2
Pharmacodynamic modeling and equations.
PK AND PD BIOMARKER CORRELATIONS
Pharmacokinetics
C time =
Pharmacodynamics
⎛
dose • exp − K10 • time Vc
Effectc = Emax • ⎜
n n ⎝ Ctime + EC50
100
3 Effect (Biomarker)
Drug Concentration
n Ctime
⎛ ⎜ ⎝
420
10
1.0 0.1
2
1 0
Time (h)
PK/PD
Drug Concentration
Effect (Biomarker)
3
2
1 0 Time (h)
Figure 3
Pharmacokinetic/pharmacodynamic modeling.
related to receptor theory, where EC50 is the parameter characterizing the potency of the drug on the biomarker and n is a sigmoidicity factor describing the drug concentration vs. the effect on the biomarkers or the number of molecules interacting at the biomarker receptor. Although the Emax model is highly versatile for different situations, more sophisticated PD models have been developed over the last decades. PK/PD modeling builds the bridge between these two classical disciplines of pharmacology, as depicted in Figure 3. PK/PD modeling is performed to gain greater insight into how drug concentrations may affect the biomarker as well as how the models can be used to perform simulations in order to predict changes in biomarker levels under a variety of experimental conditions. Models can improve the prediction and assessment of patient response with the use of simulations, and therefore increase the success of a drug development program [13–15]. For example, the models can be used to refine a dosing regimen in order to predict the drug exposure (PK modeling) and the resulting effect on the biomarker, to guide starting doses and regimens (as well as adjustments of dosage or dosing regimens in special populations), and to provide a better understanding of outcomes from clinical efficacy studies. Case studies will be presented to demonstrates how biomarkers, together with PK/PD modeling and simulation, provide a continuous process to link what has been learned from today’s drug development cycle to the next generation of biomarkers, assays, and models.
PK/PD CORRELATION OF BIOMARKERS
421
Critical Path Initiative In the current health care environment, dominated by increasing expenses associated with drug development programs, rapidly changing technologies, generic competition, and therapeutic substitution, there is increased pressure to develop new therapies using effective drug development programs. The escalating costs and low productivity of drug development have been well documented over the past several years. In March 2004, the FDA recognized that drug development has become increasingly challenging, inefficient, and costly [16]. The agency concluded that the major contributor to the inefficiency in development was the absence of innovative new methods for preclinical and clinical testing of drugs. In its critical path initiative, the agency called upon new, publicly available scientific and technical tools to improve drug development programs. These include the use of biomarkers and clinical trial endpoints as well as PK/PD modeling to improve decision making and make the development process more effective and more likely to result in safe products to benefit patients [17–20]. The agency recommended reviewing the body of collective biomarker information to build a new framework for biomarker and surrogate endpoint use in drug development [16]. Part of this initiative is to complete an inventory of surrogate endpoints that have been used for past drug approvals and to evaluate how the decision was made to accept the respective surrogates. As part of this process there will be an attempt to identify widely used, diseasespecific biomarkers and determine the gap that exists to elevate these biomarkers to surrogate endpoints. The acceptance of CD4 cell counts and HIV plasma RNA as mechanism-based surrogate endpoints are probably the best examples. The elevation of a biomarker to a surrogate endpoint is particularly attractive because biomarkers can be measured more quickly and easily and at lower cost than can true clinical endpoints, or when clinical endpoints are excessively invasive or when their use would be considered unethical.
PK/PD Modeling of Biomarkers in the Early Phase of Drug Development (Colburn) There are numerous stages along the development cycle where the development of biomarkers and PK/PD modeling can add value. Incorporation of PK/ PD studies throughout preclinical and clinical development may lead to earlier identification of optimal dosing regimens in clinical development and may reduce the overall time of drug development. Beyond the traditional paradigm of using fractions of the nonclinical toxic doses in animals to design first-inhuman dosing schedules, the use of PK/PD modeling in early preclinical development allows one to define more precisely the dose–concentration– pharmacological effects and dose–concentration–toxicity relationships with the use of key efficacy and safety biomarkers. The extrapolation of these results to humans using a combination of in vitro and in vivo data can be
422
PK AND PD BIOMARKER CORRELATIONS
particularly helpful in determining the appropriate dosing regimen for phase I studies and in guiding dose escalation in order to achieve the systemic exposure in humans that is expected to be associated with the desired effect on the biomarker and ultimately, the clinical outcome. Overall, the use of validated biomarkers and the application of PK/PD modeling and simulation principles represent an opportunity to identify optimal drug candidates and possibly to develop drugs in the shortest time frame possible [7–9]. PK/PD modeling output can be no better than the biomarkers or surrogate endpoints used for the modeling (i.e., the input). As we increase our understanding of biomarkers, surrogate markers, and disease and drug mechanisms, PK/PD modeling inputs and outputs will improve and, consequently, predictive power will improve. PK/PD Modeling of Biomarkers in Late Phases of Drug Development The usefulness and validity of PK/PD models for the evaluation of dose– concentration–effect relationships are not limited to well-designed clinical studies with relatively small groups of patients and frequent measurements of concentration and effect. It has also been shown for observational data obtained from large clinical trials with sparse and imbalanced sampling schedules by applying population modeling techniques. Population PK/PD modeling is based primarily on the nonlinear mixed-effects regression models introduced by Sheiner and co-workers and makes possible the characterization of dose–concentration effect relationships in populations rather than individuals, thereby providing the opportunity to identify and account for sources of interindividual PK and PD variability [21,22]. In addition, the increased understanding of drug action derived from biomarker response and PK/PD-based drug development may lead to a definition of strategies for individualization of drug dosage regimens to ensure optimal therapeutic outcome in subpopulations of patients. The building of a PK/PD database during drug development can provide an essential framework for continued refinement and improvement during postmarketing drug use, allowing one to answer specific questions without the need to perform additional studies. Clinical Trial Simulation Population PK/PD modeling has been used successfully for numerous drugs. These models may be descriptive and/or predictive of the time course of PD effects. When properly developed, predictive models may be used to guide starting doses and regimens (as well as adjustments of dosage and dosing regimens in special populations), add evidence to the certainty of the decision to allow market access, and provide a better understanding of outcomes from clinical studies [23,24]. Clinical trial simulations are usually motivated as follows:
CASE STUDIES
423
• To improve the clinical trial process by maximizing the probability of trial objectives being attained with reduction of unnecessary costs • To improve decision making in drug development by providing an objective framework for safety, efficacy, and commercial risk assessments • To improve the overall efficiency of drug development in terms of informativeness, economy, and speed Modeling and simulation may help predict the results of future drug trials and can be used at decision points in the drug development process to assess quantitatively the risk of moving forward with a candidate drug.
CASE STUDIES In recent years, PK/PD modeling has developed from an empirical descriptive discipline into a mechanistic science that can be applied at all stages of drug development. Preclinical PK/PD studies may prompt a series of important mechanistic studies to explore the relationship between plasma concentrations and the resulting effect on a biomarker in later stages of clinical drug development. The application of PK/PD modeling to preclinical pharmacology studies may help to provide information on drug effects and actions that would be difficult to obtain in human subjects. As such, preclinical and clinical PK/PD studies constitute a scientific basis for rational drug discovery and development. The following preclinical and clinical cases studies are presented to demonstrate how modeling techniques have improved the overall understanding of PK/PD relationship of biomarkers. Case Study 1 (Preclinical): Pharmacokinetic–Pharmacodynamic Modeling of the Respiratory Depressant Effect of Norbuprenorphine in Rats The objective of this investigation was to characterize the PK/PD correlation of buprenorphine’s active metabolite norbuprenorphine for the effect on respiration in rats [25]. Following intravenous administration, the time course of plasma concentrations of buprenorphine and norbuprenorphine were determined in conjunction with the effect on ventilation as determined using plethysmography, a test used to measure changes in air volume. The PK of norbuprenorphine was best described by a three-compartment PK model with nonlinear elimination. A saturable biophase distribution model with a power PD model described the PK/PD relationship best as presented in Figure 4. No saturation of the effect at high concentrations was observed, indicating that norbuprenorphine acted as a full agonist with regard to respiratory depression. By simulation, it was shown that following intravenous administration of buprenorphine, the concentrations of norbuprenorphine reached values that were well below the values causing an effect on respiration. In
424
PK AND PD BIOMARKER CORRELATIONS Dose
Buprenorphine peripheral compartment (V2)
k12
Buprenorphine central compartment (V1)
k21
ke0 =
Norbuprenorphine biophase compartment
k13
Buprenorphine peripheral compartment (V3)
kconv
k10
Norbuprenorphine peripheral compartment (V5)
k31
k45
Norbuprenorphine central compartment (V4)
k54
V mke0 * K mke0 Ce + K m
ke0
k40 =
k64
k46
Norbuprenorphine peripheral compartment (V6)
V mk40 * K mk40
Pharmacodynamic model
Cp + K mk40
Respiratory depression
Figure 4 PK/PD model for respiratory depressant effect of norbuprenorphine and norbuprenorphine. (From ref. 25, by permission of the American Society for Pharmacology and Experimental Therapeutics.)
conclusion, the PD of the metabolite norbuprenorphine was determined to be markedly different from the PD of the parent compound buprenorphine with regard to the receptor association–dissociation kinetics, the in vivo potency, and the intrinsic efficacy for the respiratory depressant effect. The values of these norbuprenorphine concentrations are well below values causing an effect on respiration. Therefore, it was concluded norbuprenorphine does not contribute to the overall respiratory depressant effect of buprenorphine. These results were consistent with the experience from clinical use of buprenorphine in patients. Case Study 2 (Preclinical): Pharmacokinetics and Pharmacodynamics of PEGylated IFN-β1a Following Subcutaneous Administration in Monkeys The purpose of this study was to characterize the PK/PD properties of a new polyethylene glycol (PEG) conjugate formulation of interferon-β1a (IFN-β1a) following subcutaneous (SC) administration in monkeys [26]. Single SC injections of 0.3, 1, and 3 MIU/kg of PEG-IFN-β1a were administered to three
CASE STUDIES
425
At Cp(t–τ) k12
Dsc k′ Asc
kin
k21
Ap V/F k′
N
kout
Figure 5 PK/PD model for the effect of IFN-β1a on neopterin. (From ref. 26, by permission of Springer Science and Business Media.)
groups of cynomolgus monkeys. Plasma concentrations of drug and neopterin, a classic biomarker for IFN-β1a, were measured at various time points after dosing using an ELISA assay. PK/PD profiles were first described by noncompartmental methods, and then a pooled analysis was performed using an integrated mathematical model, where fixed and delayed concentration–time profiles were used as a driving function in an indirect stimulatory response model as presented in Figure 5. The PK component of IFN-β1a was assessed using a standard linear twocompartment model with a first-order rate constant for absorption to and elimination from the central compartment of IFN-β1a. The PD model was a modified stimulatory indirect response model, with a zero-order rate constant of neopterin production (kin), a first-order rate constant of neopterin elimination (kout), where the driving function (i.e., concentrations of IFN-β1a) was delayed by a time-lag parameter (τ). Neopterin concentrations followed a typical dose-dependent biphasic pattern. Pooled PD profiles were well described by the PK/PD model, and the neopterin elimination rate was consistent with previous estimates. The PEG modification of IFN-β1a provided enhanced drug exposure and comparable pharmacodynamics to unpegylated IFN-β1a. Case Study 3 (Clinical): Modeling of Brain D2 Receptor Occupancy–Plasma Concentration Relationships with a Novel Antipsychotic PK/PD modeling using functional imaging methodology such as positronemission tomography (PET) may be a very useful tool for early clinical development of antipsychotic compounds, especially in suggesting initial doses for further clinical studies. The purpose of this study was to assess PK/PD relationships of YKP1358 to guide decisions for further clinical study designs. YKP1358 is a novel serotonin (5-HT2A) and dopamine (D2) antagonist that
426
PK AND PD BIOMARKER CORRELATIONS Drug administration ka
V2 Rapidly equilibrating compartment
k12 k21
V1 Central compartment
k13 k31
V3 Slowly equilibrating compartment
k1e k10
dCe Ve Effect compartment
dt ke0
=
E=
k1eCV1 Ve
– ke0Ce
Emax CeH H EC50 + CHe
Figure 6 PK/PD model used to assess the relationship between YKP1358 and D2 receptor occupancy. (From ref. 27, by permission of Macmillan Publishers Ltd.)
fits the general profile of an atypical antipsychotic drug [27]. A D2 receptor occupancy study was conducted in healthy volunteers using PET to measure the D2 receptor occupancy of YKP1358 and to characterize its relationship to plasma drug concentrations. A single oral dose, parallel group, dose-escalation (100, 200, and 250 mg) study was performed in 10 healthy male volunteers with the PET radiotracer [11C]raclopride. The relationship between plasma concentration and D2 receptor occupancy was analyzed with an indirect link model that included an effect compartment, as well as an equilibrium rate constant (ke0), as presented in Figure 6. Results of the study demonstrated that YKP1358 possesses potential antipsychotic effects based on the D2 receptor occupancy data. Considering the D2 receptor occupancy data, effective doses in patients were predicted to be greater than 250 mg twice a day. This is the first study in which the relationship between plasma concentration and the biomarker of D2 receptor occupancy was modeled using nonlinear mixed-effects modeling. It is anticipated that these results will be useful in estimating the initial doses of YKP1358 required to achieve a therapeutically effective range of D2 receptor occupancy in subsequent studies. Case Study 4 (Clinical): Semimechanistic and Mechanistic Population PK/PD Model for Biomarker Response to Ibandronate Biomarkers of bone turnover provide a rapid and accessible means of evaluating physiological responses to antiresorptive therapies. Recent evidence also suggests an association between the magnitude of biomarker suppression and the likely response in terms of bone mineral density (BMD) change and fracture risk reduction. This association suggests that these
CASE STUDIES
427
biomarkers are on the causal pathway for prevention of fracture following administration of bisphosphonates. As such, biochemical markers of bone turnover can offer a timely indication of the likely efficacy associated with nominal dosing regimens. The aim of this project was to develop and validate a pharmacological model for ibandronate, a new bisphosphonate for the treatment of osteoporosis, capable of describing its PK in serum and urine, and the urinary excretion of the C-telopeptide of the α chain of type I collagen (uCTX), a sensitive biomarker of PD response to ibandronate [28]. A classical PK/PD model was developed that described accurately the PK of intravenously administered ibandronate and its effect on the biomarker, as presented in Figure 7. A four-compartment PK model was used, with an indirect PD response model with a uCTX formation rate (KS) and uCTX degradation rate constants (KD). Ibandronate in the “bone compartment” was assumed to inhibit osteoclast activity and hence the rate of synthesis (KS) and the urinary excretion of CTX. The complex physiological PK/PD model adequately described the PK of intravenously administered ibandronate in serum and urine as well as the time course of uCTX. To reduce processing times, the classical PK/PD model was simplified using a kinetics of drug action or kinetic (K)-PD model (i.e., a dose–response model, as opposed to a dose–concentration–response model). The model was subsequently extended to consider the influence of supplemental therapy on the PD response and subjected to external validation by retrospectively simulating the time course of uCTX change reported in a phase III study and in a phase II/III study of intravenous ibandronate. In summary, a pharmacostatistical model was developed and validated that adequately described the time course of uCTX change after oral and intravenous ibandronate therapy for osteoporosis. This model is currently being used to aid in the development of novel intermittent oral and intravenous regimens for ibandronate.
V2 Bone V1
Dose
V4
CL uCTX Ae
V3
KS
KD
Figure 7 PK/PD model used to assess the relationship between ibandronate in serum and urine and the urinary excretion of uCTX. (From ref. 28, by permission of Wiley-Blackwell.)
428
PK AND PD BIOMARKER CORRELATIONS
Case Study 5 (Clinical): Receptor Theory–Based Semimechanistic PD Model for the CCR5 Noncompetitive Antagonist Maraviroc Maraviroc (UK-427 857), a selective and reversible CCR5 co-receptor antagonist, has been shown to be active in vitro against a wide range of clinical isolates [29]. In human immunodeficiency virus type 1 (HIV-1)–infected patients, maraviroc given as monotherapy for 10 days reduced HIV-1 viral load dose dependently by up to 1.6-log10 copies, consistent with currently available agents that comprise the cornerstone of highly active antiretroviral therapy. The objective of this study was to develop a novel combined viral dynamics/ operational model of (ant-)agonism that describes the PD effects of maraviroc, a noncompetitive CCR5 inhibitor, on viral load. A common theoretical framework based on receptor theory and the operational model of (ant-) agonism has been developed to describe the binding of maraviroc to the CCR5 receptor and the subsequent decrease in viral load. The effect of maraviroc was modeled using an inhibitory Emax model in the PD model acting on the infection rate constant of the virus and target cells. With this parameterization, it was assumed that the antagonistic effect of maraviroc is noncompetitive, which is consistent with recent in vitro findings. In terms of receptor theory and an operational model of drug action, the viral replication process was modeled as a binding–stimulus–response cascade in which the virus acts as an agonist, as shown in the five steps in Figure 8. In the binding–stimulus–response cascade shown, Tmax is the maximum concentration of activated target cells that can be infected, V is the virus con-
Binding/Fusion/Entry
Reverse transcription
[S1] =
[S2] =
Integration/Replication
[S3] =
Protein synthesis
[S4] =
Protein cleavage Virus assembly
[S5] =
[TMAX].[V] KV + [V] [S1] KEI + [S1] [S2] KE2 + [S2] [S3] KE4 + [S3]
[S4] KE5 + [S4]
Figure 8 Viral dynamics–operational model (binding–stimulus–response). (From ref. 29, by permission of Wiley-Blackwell.)
SUMMARY
429
centration, Kv is the theoretical concentration of viruses that elicits half the maximum binding to the activated receptors on the target cells, and Kex is the theoretical stimulus (SX) at step X that elicits half the maximum response (SX+1) at step X + 1 (X = 1 to 5). This new model provided an explanation for the apparent discrepancy between the in vivo binding of maraviroc to the CCR5 receptor (KD = 0.089 ng/mL) and the estimated in vivo inhibition (IC50 = 8 ng/mL) of the infection rate. The estimated KE value of the operational model indicates that only 1.2% of free activated receptors are utilized to elicit 50% of the maximum infection rate. The model developed suggests that the target cells, when activated, express more receptors (spare receptors) than needed. In the presence of maraviroc, these spare receptors first require blocking before any decrease in the infection rate, and consequently in the viral load at equilibrium, can be detected. The model allowed the simultaneous simulation of the binding of maraviroc to the CCR5 receptor and the change in viral load after both short- and long-term treatment. The current model will be used to guide the selection of the optimal dosage regimen for maraviroc in HIV-1–infected patients.
SUMMARY Considering the complexity, risk, and cost involved in drug discovery and development processes, it is imperative that companies adopt strategies and technologies that will facilitate the identification and development of new therapies. The development of biomarkers and clinical trial endpoints as well as PK/PD modeling were recognized as key elements that can improve decision making and render development processes more effective and more likely to result in safe and beneficial products for patients. A new framework for biomarker and surrogate endpoint use in drug development was recommended by the FDA. As part of this process, diseasespecific biomarkers will be evaluated to determine whether some biomarkers may be elevated to surrogate endpoints. This will be particularly attractive because biomarkers can be measured more quickly and easily and at lower cost than can true clinical endpoints. Appropriate PK/PD modeling of biomarkers and surrogate endpoints will facilitate proof-of-concept demonstrations for target modulation; enhance the rational selection of an optimal drug dose and schedule; aid decision making, such as whether to continue or close a drug development project, accelerate drug approval, minimize uncertainty associated with predicting drug safety and efficacy, and decrease the overall costs of drug development. The future of PK/PD modeling of biomarkers holds great challenges and great promises. Pharmaceutical companies that implement PK/PD modeling of biomarkers in their early drug development strategy should have a distinct competitive advantage, with powerful insights and guidance for effective and efficient rational drug development, and by providing safe and efficacious
430
PK AND PD BIOMARKER CORRELATIONS
medicines for individual patients. Although the science of PK/PD modeling and biomarkers is constantly evolving, these disciplines provide a continuous process to link what has been learned from today’s drug development cycle to the next generation of biomarkers, assays, and mechanism-based models. REFERENCES 1. Gao J, Garulacan LA, Storm SM, et al. (2005). Biomarker discovery in biological fluids. Methods, 35:291–302. 2. Goodsaid F, Frueh F (2007). Biomarker qualification pilot process at the US Food and Drug Administration. AAPS J, 9:E105–E108. 3. Colburn WA (1997). Selecting and validating biologic markers for drug development. J Clin Pharmacol, 37:355–362. 4. Lee JW, Hulse JD, Colburn WA (1995). Surrogate biochemical markers: precise measurement for strategic drug and biologics development. J Clin Pharmacol, 35:464–470. 5. Aarons L, Karlsson MO, Mentré F, Rombout F, Steimer JL, van Peer A (2001). COST B15 experts: role of modelling and simulation in phase I drug development. Eur J Pharm Sci, 13:115–122. 6. Wagner JA, Williams SA, Webster CJ (2007). Biomarkers and surrogate end points for fit-for-purpose development and regulatory evaluation of new drugs. Clin Pharmacol Ther, 81:104–107. 7. Colburn WA, Lee JW (2003). Biomarkers, validation and pharmacokinetic– pharmacodynamic modelling. Clin Pharmacokinet, 42:997–1022. 8. Colburn WA (2000). Optimizing the use of biomarkers, surrogate endpoints, and clinical endpoints for more efficient drug development. J Clin Pharmacol, 40(12 Pt 2):1419–1427. 9. Colburn WA (2003). Biomarkers in drug discovery and development: from target identification through drug marketing. J Clin Pharmacol, 43:329–341. 10. Williams SA, Slavin DE, Wagner JA, Webster CJ (2006). A cost-effectiveness approach to the qualification and acceptance of biomarkers. Nat Rev Drug Discov, 5:897–902. 11. National Institutes of Health (1999). Biomarkers and Surrogate Endpoints: Advancing Clinical Research and Applications. NIH, Bethesda, MD. 12. Rolan P, Atkinson AJ Jr, Lesko LJ (2003). Use of biomarkers from drug discovery through clinical practice: report of the Ninth European Federation of Pharmaceutical Sciences Conference on Optimizing Drug Development. Clin Pharmacol Ther, 73:284–291. 13. Meibohm B, Derendorf H (2002). Pharmacokinetic/pharmacodynamic studies in drug product development. J Pharm Sci, 91:18–31. 14. Derendorf H, Meibohm B (1999). Modeling of pharmacokinetic/pharmacodynamic (PK/PD) relationships: concepts and perspectives. Pharm Res, 16:176–185. 15. Derendorf H, Lesko LJ, Chaikin P, et al. (2000). Pharmacokinetic/ pharmacodynamic modeling in drug research and development. J Clin Pharmacol, 40:1399–1418.
REFERENCES
431
16. U.S. Department of Health and Human Services Food and Drug Administration (2004). Challenge and Opportunity on the Critical Path New Medical Products. FDA, Washington, DC, pp. 1–31. 17. Lalonde RL, Kowalski KG, Hutmacher MM, et al. (2007). Model-based drug development. Clin Pharmacol Ther, 82:21–32. 18. Miller R, Ewy W, Corrigan BW, et al. (2005). How modeling and simulation have enhanced decision making in new drug development. J Pharmacokinet Pharmacodyn, 32:185–197. 19. Gieschke R, Steimer JL (2000). Pharmacometrics: modelling and simulation tools to improve decision making in clinical drug development. Eur J Drug Metab Pharmacokinet, 25:49–58. 20. Zhang L, Sinha V, Forgue ST, et al. (2006). Model-based drug development: the road to quantitative pharmacology. J Pharmacokinet Pharmacodyn, 33:369–393. 21. Sheiner LB, Steimer JL (2000). Pharmacokinetic/pharmacodynamic modeling in drug development. Annu Rev Pharmacol Toxicol, 40:67–95. 22. Peck CC, Barr WH, Benet LZ, et al. (1992). Opportunities for integration of pharmacokinetics, pharmacodynamics, and toxicokinetics in rational drug development. Clin Pharmacol Ther, 51:465–473. 23. Holford NHG, Kimko HC, Monteleone JPR, Peck CC (2000). Simulation of clinical trials. Annu Rev Pharmacol Toxicol, 40:209–234. 24. Lockwood P, Ewy W, Hermann D, Holford N (2006). Application of clinical trial simulation to compare proof-of-concept study designs for drugs with slow onset of effect; an example in Alzheimer’s disease. Pharm Res, 23:2050–2059. 25. Yassen A, Kan J, Olofsen E, Suidgeest E, Dahan A, Danhof M (2007). Pharmacokinetic–pharmacodynamic modeling of the respiratory depressant effect of norbuprenorphine in rats. J Pharmacol Exp Ther, 321:598–607. 26. Mager DE, Neuteboom B, Jusko WJ (2005). Pharmacokinetics and pharmacodynamics of PEGylated IFN-beta 1a following subcutaneous administration in monkeys. Pharm Res, 22:58–61. 27. Lim KS, Kwon JS, Jang IJ, et al. (2007). Modeling of brain D2 receptor occupancyplasma concentration relationships with a novel antipsychotic, YKP1358, using serial PET scans in healthy volunteers. Clin Pharmacol Ther, 81:252–258. 28. Pillai G, Gieschke R, Goggin T, Jacqmin P, Schimmer RC, Steimer JL (2004). A semimechanistic and mechanistic population PK–PD model for biomarker response to ibandronate, a new bisphosphonate for the treatment of osteoporosis. Br J Clin Pharmacol, 58:618–631. 29. Jacqmin P, McFadyen L, Wade JR (2008). A receptor theory–based semimechanistic PD model for the CCR5 noncompetitive antagonist maraviroc. Br J Clin Pharmacol, 65(Suppl 1):95–106.
22 VALIDATING IN VITRO TOXICITY BIOMARKERS AGAINST CLINICAL ENDPOINTS Calvert Louden, Ph.D. Johnson & Johnson Pharmaceutical, Raritan, New Jersey
Ruth A. Roberts, Ph.D. AstraZeneca Research and Development, Macclesfield, UK
INTRODUCTION Biomarkers have been used for many years to predict and track biological changes in tissue and organ systems. The conceptual framework for their use was captured in the output of the Biomarkers Definitions Working Group [1], which provided a valuable working definition and some challenges to the scientific community. Here, a biomarker was defined as “a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes or pharmacological responses to a therapeutic intervention” [1]. The working group noted that biomarkers may have the greatest value in early efficacy studies and safety evaluations with applications that could include prediction and monitoring of clinical response to an intervention [1]. Biomarkers in Drug Development Biomarkers have been used for many years as indicators of biological change. In the course of drug development, biomarkers enjoy utility both as predictors Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
433
434
VALIDATING IN VITRO TOXICITY BIOMARKERS
Safety ‘What is the drug doing to the patient?’ Markers of tissue damage, adverse hematology, etc. Predict/report unwanted toxicity endpoint
Preclinical Preclinical Development
Clinical Development
Approval
Efficacy Biomarkers of Pharmacodynamic (PD) endpoints: ‘What is the drug doing to the patient?’ Receptor modification, altered glucose levels/blood lipids Commensurate with desired efficacy endpoint
Figure 1 Biomarkers in clinical medicine. Biomarkers of safety and efficacy can be used together as part of the overall data set to assist in decision making.
of possible efficacy and also as warning signs of potential toxicity (Figure 1). For efficacy endpoints, biomarkers are used to address the pharmacodynamic question “What is the drug doing to the patient?” commensurate with desired outcomes such as receptor modification or blood glucose changes. For safety endpoints, biomarkers may be used as indicators of unwanted endpoints such as tissue damage or adverse hematology. Although these two types of biomarker serve diverse purposes, the parameters dictating their identification, development, and utility are very similar. However, there is one key aspect in which safety biomarkers differ; safety biomarkers are most useful when they predict rather than report damage. For example, an efficacy biomarker can provide rapid confirmation that the drug has hit its target, perhaps by confirmation of growth factor receptor inhibition in a skin biopsy. However, a safety biomarker of liver damage would ideally precede the damage (either in dose or time) to be of maximum use to the clinician. Despite this, biomarkers that report rather than predict can still be useful if they prevent further damage or permit cessation of treatment and subsequent recovery. Toxicity Biomarkers In Vitro The driving force for better clinical biomarkers of potential adverse events is to monitor patient safety, and many such biomarkers will be discovered and developed within the context of human medicine. However, from a nonclinical
INTRODUCTION
435
perspective the discovery and development of toxicological biomarkers facilitates the translation to the clinic of knowledge that will avoid or minimize the risk of occurrence of a potential unwanted outcome identified in animal studies. Thus, it is key to identify preclinical biomarkers of the adverse response of animals to potential new drugs and to evaluate fully their relevance and applicability to humans. Potential biomarkers can be identified in vitro, and their translation to in vivo, either in animal or humans, evaluated at a later date. This concept is developed later in this review with specific examples and case histories. One argument against this approach is that findings in animals in vivo or in animalderived tissue in vitro may not be relevant for human safety; however, the job of the clinician is to manage the best possible therapy while minimizing the risk of adverse reactions. Thus, such data would form part of an integrated assessment of safety and efficacy that can inform preclinical and clinical decisions. Clinically useful biomarkers of toxicity can be derived from appropriately designed preclinical experiments, and such biomarkers can significantly assist safe progression into the clinic with an approach design to minimize patient risk. These biomarkers can be derived from in vitro or in vivo mode-of-action studies in preclinical species then translated to potential clinical use or, as we present here, could already be established in clinical practice. This translational science approach requires close collaboration between the clinical and the preclinical scientists. Characteristics of Biomarkers An important consideration in the use of biomarkers is the practicality of subsequent analysis, which is likely to be far more successful and robust if based on an uncomplicated physiological or biochemical assay that can be carried out with minimal training and preexisting equipment (Figure 2). Nonetheless, the need for specialist techniques or expertise should not exclude a potential biomarker, especially where it contributes to advancing therapies for a serious or unmet medical need. Other practical considerations include the need for assays to be transferrable across international boundaries, supported by low background and variability of the measured parameter coupled with a robust increase or decrease signifying the biological endpoint of interest. Indeed, lower and upper limits for normal values would be better if minimally influenced by age, gender, or race within the human population. Similarly, a biomarker of drug-induced toxicity would be better if it were largely unaffected by disease states expected in the patient population of interest or, indeed, any co-medication that might be present or recently administered. However, this is often not the case, and the best that can be expected is some understanding of these relationships to mitigate against difficult or erroneous interpretation. In addition, an ideal biomarker should exhibit
436
VALIDATING IN VITRO TOXICITY BIOMARKERS 90
Parameter y (units)
80 70 60
Simple physiological or biochemical assay
Minimal training, existing equipment transferrable
50 40 30 20 10 0
Control
Test
Robust increase with tight error bars
Low and consistent background
Figure 2 Technical aspects of biomarker utility. Biomarkers must be sufficiently robust biologically and technically to transfer across patient groups and across international boundaries. This is illustrated by the different variables that need to be overcome or accommodated in the assay.
sufficient sensitivity (not too many false negatives) and specificity (not too many false positives) to underpin clinical decisions. As well as the technical aspects of biomarker assays, there are additional practical considerations to enhance appropriateness and usability. First, a biomarker needs to be accessible and ideally could be noninvasive. If this is not possible, a biomarker could be measurable in blood and other body fluids obtained with minimal additional impact on the subject. If this cannot be achieved, biomarkers may be detectable with increasing impact on the patient via a peripheral or central biopsy. Finally, biomarker research for the advancement of medical knowledge can be conducted after autopsy. Examples of noninvasive methods would include standard physiological measures of cardiovascular parameters such as blood pressure, heart rate, and electrocardiogram, or could be ophthalmological. Fluids could include feces, saliva, and urine, in addition to standard blood sampling. Skin and hair follicles would be examples of peripheral biopsy and muscle and liver needle samples provide the most common route of more invasive biopsy. Biomarkers of Safety One key aspect where biomarkers of safety differ from those of efficacy is that they preferably predict rather than report the biological effect they are flagging. Thus the biomarker should ideally precede that biological effect either in dose (i.e., be detectable at doses below where the damage occurs) or in time
IN VITRO TOXICITY BIOMARKERS
437
(be detectable perhaps after one dose when the damage requires repeated dosing). These biomarker characteristics would provide support and confidence to the clinician during dose escalation in early clinical trials or during the transition from single to multiple doses. During later clinical development, a predictive biomarker could monitor for onset in a patient who is beginning to develop toxicity perhaps after several months of drug use. In the absence of evidence that a biomarker can predict onset, an ideal biomarker of druginduced toxicity would characteristically show detectable changes prior to any serious or irreversible tissue injury and would also show reversibility of the biomarker signal upon removal of the toxicological insult.
IN VITRO TOXICITY BIOMARKERS Having considered some general points on the development and validation of toxicity biomarkers, we now present several working examples that explore the concept of in vitro toxicity biomarkers and their translation to the clinic. Opportunities and caveats are explored with an emphasis on assisting clinical progression and decision making. Toxicity biomarkers in the in vitro context are defined as quantitative measurable characteristics that serve as indicators of a pathologic process or related biochemical or molecular events. Endocrine and Exocrine Pancreatic Toxicity The potential for development of clinical drug-induced diabetes is well recognized, and molecules such as cyproheptadine and structurally related compounds such as cyclizine and chlorcyclizine have been shown to cause pancreatic islet toxicity. However, this is not unique to this class of structurally related compounds because a similar finding has been observed following oral administration of some quinolone antimicrobial agents. In vivo, this effect is characterized by cytoplasmic islet vacuolation, decreased insulin secretory granules, decreased circulating insulin levels, intolerance to glucose challenge, and increased glycosylated albumin. The working hypothesis is that these compounds were associated with functional impairment of insulin synthesis and/or secretion, and an in vitro screen would be invaluable in identifying this liability early in the drug development process. The clonal insulin-producing cell line RINm5F is widely used as a model to detect compound-induced loss and/or reduction of insulin in vitro that is comparable to in vivo findings. Using this in vitro model, it was determined that cyproheptadine and selected analogs caused inhibition of insulin biosynthesis without any effects on pre-pro-insulin mRNA. These data suggested that inhibition of transcriptional regulation of insulin was an unlikely mode of action responsible for cyproheptadine-induced pancreatic islet cell toxicity. The limitation of the RINm5F cells is that they are not glucose-sensitive and as such, cannot be used to study glucose-dependent insulin secretion. When
438
VALIDATING IN VITRO TOXICITY BIOMARKERS
this effect is suspected, the use of Min 6 (mouse) and HT15 (hamster) cells are recommended. Isolated mouse pancreatic acini provide an ex vivo tool used to assess potential exocrine pancreatic toxicity. This is achieved by evaluating digestive enzyme secretion through an assessment of basal amylase and amylase release in response to carbachol and cholecystokinin octapeptide (CCK8). In vitro, cyproheptadine at high doses causes marked release of amylase without any breaks in structural integrity, thus ruling out cytotoxicity as the likely cause of the marked amylase release. Furthermore, cyproheptadine can abolish carbachol-stimulated amylase release, and this may be related to its antimuscarinic pharmacological property. This approach can serve as an in vitro toxicity screen to eliminate and/or reduce undesirable effects on the exocrine as well as endocrine pancreas and, additionally, the pathophysiology of these potential effects can be investigated. Bone Marrow Toxicity Within the last decade, clinical hematopoietic toxicity has accounted for 10 to 20% of approved drug withdrawals and is a major cause of early candidate drug attrition. Identification of potential hematopoietic toxicity as a hazard is done only when in vivo studies are initiated even though there is strong evidence suggesting that in vitro assays could have a very good predictability. A three-tiered approach that enables a large number of compounds to be evaluated at the lead optimization phase was developed to improve the assessment of potential hematotoxicity. A key question in this approach is whether there is clinical translation and predictability of this combined in vitro, in vivo preclinical evaluation. This novel approach uses multiple assays, cell lines representative of broad bone marrow lineages (myeloid, erythroid, lymphoid, and stromal), and a nonhematopoietic cell line (tier 1). In general, there is a good correlation between the mouse myeloid cell line and the mouse colony-forming unit/GM (CFU-GM) assays and compounds with lineage-specific in vivo bone marrow toxicity showed similar lineage specificity in the tier 1 assays. An erythrotoxicant such as chloramphenicol has an IC50 value of 8 μM in the erythroid cell line assay and >100 μM for the other cell lines. CFU assays (tier 2) are then run to qualify and confirm the findings of tier 1. Tier 2 consists of mouse, rat, dog, and human CFU-GM assays to predict species sensitivity, but these assays are tedious and time consuming, enabling only a limited number of compounds to be evaluated. Increased throughput was achieved when a 96-well plate CFU-GM assay was developed that yielded results that correlated well with the traditional CFU-GM assay. The data generated from the tiers 1 and 2 can trigger frontloading of in vivo studies to evaluate a more comprehensive hematology profile and include use of additional technologies such as flow cytometry (tier 3). In summary, a three-tiered approach using in vitro, ex vivo, and in vivo studies allows rapid, high-throughput assessment of compounds
IN VITRO TOXICITY BIOMARKERS
439
with potential chemistry-related bone marrow toxicity to be eliminated in the early preclinical stages of drug development. Liver Toxicity The overall goal of developing an in vitro toxicity testing system is for inclusion in a toolbox as an element of investigative toxicology. This should contribute to hypothesis-based research aimed at identification of mode-ofaction-based toxicity biomarkers, which can extend to preclinical and clinical phases of drug development. The current preclinical and clinical diagnostic panel used to monitor hepatocellular toxicity has been quite reliable except in cases of human drug-induced liver injury, which occurs with a low incidence and prevalence, ranging from 1 in 10,000 to 1 in 100,000. This is also referred to as idiosyncratic hepatotoxicity, meaning that the adverse events are a consequence of individual and unpredictable patient responses. It is well recognized that there is an urgent need to identify and develop a more successful prediction model coupled with diagnostic and prognostic biomarkers for druginduced idiosyncratic liver injury in humans. This type of human liver toxicity is multifactorial, including individual-specific responses, and because of this, it is highly unlikely that any preclinical and/or clinical studies can be powered appropriately to unmask this risk. Furthermore, it is difficult to assess the value of our preclinical toxicology studies and their relevance to predicting drug-induced idiosyncratic liver injury in humans, because we lack understanding of the mode of action and pathophysiology of this unique type of toxicity. However, there is extensive literature related to the predictability of hepatotoxicity, and there is good concordance with in vitro and in vivo assessments. Several different strategies have been employed to detect various forms of hepatotoxicity, including, necrosis, apoptosis, and phospholipidosis. The in vitro hepatotoxicity predictive strategy uses hepatic microsomes; immortalized cell lines; primary hepatocyte cell cultures from humans, rodents, and nonrodents; and liver slices. The latter, in particular, enables a better approximation and evaluation of metabolites that may be formed and an assessment of their hepatotoxicity potential determined and in vitro–in vivo comparisons made. For example, using this approach, studies were done to characterize the metabolic profile of troglitazone, an approved drug that has now been withdrawn from the market because of hepatotoxicity and hepatic failure requiring liver transplants. Data from a series of studies using rat and human hepatic microsomes as well as in vivo rat studies suggested that troglitazone hepatotoxicity was caused by several reactive intermediates covalently binding to hepatic proteins that may undergo redox cycling and induce oxidative stress, causing cell damage. This hypothesis was strongly supported by the data suggesting that not only is troglitazone an inducer of P450 3A, but is also responsible for metabolism of the thiazolidinedione ring, a key structural element of many of the peroxisome prolifercator–activated receptor (PPAR) agonists.
440
VALIDATING IN VITRO TOXICITY BIOMARKERS
Taken together, troglitazone acts as an inducer of enzymes that catalyzes its biotransformation to chemically reactive intermediates; this autoinduction of its own metabolism is ultimately detrimental to the cells. Additionally, this combined approach was able to show that in humans these intermediates could be formed, thus identifying a potential etiology for troglitazone-induced hepatotoxicity in humans. Utilization of liver slice technology, co-cultures, and whole liver perfusion can be quite useful, but these are low-throughput assays that are not always suitable for screening. The advantage is that toxicity to the biliary and other hepatic cellular constituents can be assessed. Thus predicting the hepatotoxicity potential using in vitro cell culture systems can also employ the well-recognized preclinical and clinical biomarkers of hepatocellular damage, such as ALT (alanine aminotransferase), AST (aspartate aminotransferase), GLDH (glutamate dehydrogenase), MDH (malate dehydrogenase), PNP (purine nucleoside phosphorylase), and PON-1 (paraoxonase-1). Other enzymes, such as ALP (alkaline phosphatase), GGT (γ-glutamyl transferase) and 5′-nucleotidase (5′NT) can also be used when toxicity to the biliary tree is suspected. Cytotoxicity (necrosis and apoptosis) in vitro is the primary endpoint, and as such, biomarkers for potential hepatotoxicity clinical monitoring should include necrosis as well as apoptosis. Recent evidence suggests that this may be possible by measuring the soluble pool of cytokeratin 18 (CK18) and the caspase-cleaved CK18 fragments in conjunction with the traditional markers. The in vitro surrogate system can also provide valuable information on the suitability of the appropriate biomarker and potential mechanism of action related to toxicity, such as mitochondrial dysfunction, metabolic pathways, and/or CYP450 induction. In summary, in vitro hepatocyte cell culture systems and tissues derived from in vivo studies will identify the appropriate biomarker for monitoring preclinical and clinical hepatotoxicity but these data can also contribute to hypothesis-driven mode-of-action investigative toxicity studies. This will include molecular profiling, metabonomics, transcriptomics, and proteomics, all of which are useful tools that could aid in identification of novel biomarkers of hepatotoxicity. Phospholipidosis Phospholipidosis (PLD) is characterized by concentric-layered multilamellar intracellular lysosomal inclusion bodies that are often composed of complex phospholipids, parent drug, and/or metabolites. Accumulation will often occur in several different cell types of the hepatobiliary, immune, and nervous systems. Typically, hepatocytes, biliary epithelial cells and macrophages of lymph nodes, and pulmonary alveoli and ganglia and nonganglia neuronal cell bodies of the central nervous system can be affected. The evidence to date suggests that phospholipidosis is a structural-related toxicity of cationic amphiphilic compounds irrespective of pharmacologic action. The finding of phos-
CONCLUSIONS
441
pholipidosis may also be associated with inflammation, severe organ damage, and possibly impairment of immune function. Although drugs are marketed that cause phospholipidosis preclinically and clinically, this is an undesirable profile for potentially new candidate drugs, and as such, this liability should be identified and avoided early in the drug discovery process. Therefore, highthroughput in vitro predictive screens can add value, particularly if phospholipidosis potency can be ranked and supported by in vivo data. Several methodologies have been evaluated to assess neutral and phospholipid content as an index of phospholipidosis in cells growing in culture. However, accumulation of NBD-PE as a result of cytotoxicity induces false-positive results, particularly at high concentrations. Recently, a highthroughput, validated, predictive, sensitive, and selective multichannel fluorescence-based in vitro PLD assay was developed to reduce the false-positive limitation of cytotoxicity. This assay uses I-13.35 adherent mouse spleen macrophages cultured in 96-well plates with fluorescent-tagged phospholipids. Cells with an intact nucleus were differentiated from dead cells using ethidium staining and cell gating that rejects dead cells. Using this improved technique, 26 of 28 positive phospholipidogenic compounds were identified. These findings aided application of this methodology to other techniques, such as flow cytometry, which may be used in preclinical toxicology studies and clinical trials. For example, flow cytometric analysis coupled with Nile Red staining was used to detect neutral and phospholipids in a monocyte cell line, U397. Application of this methodology was utilized in (in vivo) toxicology studies, and this raised the possibility that preclinical toxicology and clinical assessment of phospholipidosis could be done using peripheral blood cells and flow cytometry.
CONCLUSIONS Overall, there is a compelling need for in vitro toxicity biomarkers for clinical endpoints. Toxicity biomarkers in the in vitro context are defined as quantitative measurable characteristics that serve as indicators of a pathologic process or related biochemical or molecular events. Conceptually, bridging biomarkers of toxicity should include not only the traditional parameters of biofluid and physiological measurements, but also measurable endpoints that could serve as indicators of potential adverse events preclinically and clinically even though they have been derived from in vitro, ex vivo, and in vivo studies. The use and application of this combination will undoubtedly improve the toxicologist’s ability to identify human hazards so that the appropriate risk assessment and management strategy can be developed. In developing in vitro assays to identify biomarkers with potential clinical application and utility, a clear understanding and determination of what that biomarker will assess must be defined. For example, exaggerated pharmacologic action of a molecular target may be associated with an undesirable effect resulting in toxicity. In
442
VALIDATING IN VITRO TOXICITY BIOMARKERS
such cases, single/multiple markers and/or assays may be required for utilization in screens during the early drug discovery phase, with continued assessment in preclinical and clinical development. If successful, these data can be invaluable in deriving the therapeutic index of a drug. In the case of a specific target organ toxicity, the ultimate goal is to identify biomarkers of toxicity that can be used as a general screen that reflects cellular damage, regardless of mode of action in a high-throughput manner. Therefore, it is imperative that in vitro assays and the appropriate platforms be developed to identify relevant toxicity biomarkers that will be useful during preclinical and clinical development.
REFERENCE 1. Atkinson AJ Jr, Colburn WA, DeGruttola VG, et al. (2001). Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther, 69:89–95.
PART VI BIOMARKERS IN CLINICAL TRIALS
443
23 OPPORTUNITIES AND PITFALLS ASSOCIATED WITH EARLY UTILIZATION OF BIOMARKERS: CASE STUDY IN ANTICOAGULANT DEVELOPMENT Kay A. Criswell, Ph.D. Pfizer Global Research and Development, Groton, Connecticut
INTRODUCTION The cost of developing new drugs continues to rise and demands an everincreasing percentage of total research and development (R&D) expenditures [1]. Recent studies have shown that there is less than a 10% success rate between the first human trials and the launch of a new product [1]. When coupled with an attrition rate of over 70% in phase II and nearly a third of all new compounds in phase III trials [1–3], the gravity of the need for reliable, early-decision-making capability is evident. All pharmaceutical companies are focused on ways to reduce the cost of discovering and developing new drugs. In general, three areas have received the greatest attention: (1) improving target validation, (2) selecting candidates with a greater chance of success, or (3) identifying those compounds that will fail earlier in their development. Inadequate efficacy or poor safety margins are the two main reasons for compound attrition. Therefore, developing a strategy for development of safety
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
445
446
OPPORTUNITIES AND PITFALLS
and efficacy biomarkers is key to early prediction of compound success or failure. Utilization of biomarkers in preclinical testing and ex vivo human testing may provide valuable information for compound progression and success in the clinic, but it is not without problems. Studying the effects of markers on disease processes is not new. It has probably existed for centuries, as evidenced by such obsolete practices as tasting urine to determine the diabetic status of a patient. Utilization of biomarkers for disease diagnosis and prognosis has escalated dramatically during this century with the improved diagnostic power, reliability, and speed of routine clinical pathology, biochemical, and genomic data. However, when studying the specific effects of a drug on marker activity, utilization of biomarkers has a more recent history. One of the earliest documented cases may be the early trials of lamotrigine, which effectively utilized the electroencephalograph (EEG) as a biomarker and demonstrated decreased epileptiform activity [4]. Twenty years after the lamotrigine study the urgency for relevant and early biomarkers of safety and efficacy during drug development has been recognized, and it is not uncommon that each new drug development program is accompanied by a biomarker request. Not only has the frequency of biomarker development exploded, but the timing of biomarker utilization has been pushed earlier in development. There is an ever-increasing demand to test and screen potential candidates before good laboratory practices (GLP) studies and commitment to human trials. Two problems inherently complicate the use of early biomarkers to preclinical outcomes: (1) truly understanding the disease process and the therapeutic intervention process, and (2) species–species differences that may alter the translation and application of the biomarker. Understanding the disease process appears obvious, but as already noted, understanding the target is still a key area for decreasing attrition. Regardless of the expanding body of knowledge surrounding disease processes, infectious diseases may be the lone area where the disease process is fully understood. Successful therapeutic intervention and drug sensitivity can be predicted accurately with well-characterized biomarkers of causal organism growth and survival [5]. For biomarkers to be truly effective in drug development, they need to reflect a highly specific biochemical or molecular change that occurs in the disease process, is altered by the therapeutic intervention, and occurs prior to any downstream effects on clinical endpoints. This is a lofty goal, as most disease processes affect multiple pathways, and feedback mechanisms further complicate the overall biology. The further impact of nontranslatable, speciesspecific characteristics makes daunting the task of providing biomarkers early in development. There are, however, diseases and biological pathways that have a fairly broad acceptance regarding the essential components. Additionally, certain areas of drug development target activation or inhibition of a highly specific molecule or receptor within those pathways. These conditions provide a unique opportunity for biomarker success when conducted early in the drug discovery process.
INTRODUCTION
447
The coagulation pathway is an example of a well-characterized model, and specific inhibitors of this pathway are being explored actively as new candidates for anticoagulant therapy. One of the documented areas is the utilization and challenge of early implementation of biomarkers to predict the clinical outcomes of factor Xa (FXa) inhibitor compounds. Venous and arterial thromboembolic disorders have a substantial impact on human health and morbidity [6]. Although these conditions have been treated for many years with the vitamin K antagonist warfarin (coumadin), or with intravenous or subcutaneous heparin administration, the opportunity to provide therapeutic intervention with a safer profile, fewer side effects, less intra- and intersubject variability, a better route of administration, and less need for continuous coagulation monitoring is attractive. Novel coagulation therapeutics have generated tremendous interest due to the medical need and opportunity to improve on current treatment strategies. FXa occupies a pivotal position within the coagulation cascade and is a highly attractive candidate as a target for novel coagulation intervention. This enzyme links the intrinsic and extrinsic coagulation pathways and is the rate-limiting step in thrombin formation [7]. Changes within the intrinsic coagulation pathway are routinely monitored by a clotting assay called the activated partial thromboplastin time (aPTT), whereas changes in the extrinsic pathway are assessed by increases or decreases in another clotting assay called the prothrombin time (PT) (7, Figure 1). Both assays are assessed spectrophotometrically or mechanically and are expressed
Intrinsic Pathway (aPTT) Factor XII
Extrinsic Pathway (PT)
Factor XIIa
Factor XI
Factor XIa
Factor IX
TRAUMA (Tissue Factor)
Factor IXa
Factor Xa
Factor X
Prothrombin
Factor VII
Factor X
Thrombin
Fibrinogen
Figure 1
Factor VIIa
Final Common Pathway Fibrin
Intrinsic and extrinsic coagulation pathways.
448
OPPORTUNITIES AND PITFALLS
as the time to clot formation after addition of an exogenous clot-activating agent. The aPTT is used routinely to monitor the safety and level of anticoagulation of heparin therapy and coumadin in monitored via the PT assay. Safety profiles for these compounds have been established following years of use so that fold increases over the predose aPTT or PT can be used to predict therapeutically effective doses of these drugs versus levels associated with inadequate or excessive coagulation. Additionally, aPTT and PT are routine and highly standardized clinical assays. Testing reagents are standardized and instrumentation has undergone rigorous scrutiny to pass U.S. Food and Drug Administration (FDA) requirements because of its routine use in clinical diagnostics. The reliability of aPTT and PT as biomarkers of anticoagulant therapy safety is also fairly well established. Since activation of factor X is required for completion of both the intrinsic and extrinsic pathways, it appears logical that inhibition of factor X activation should prolong both PT and aPTT. It would be anticipated that early assessment of novel anticoagulants such as FXa inhibitors could readily be assessed with the aPTT and PT assays. Furthermore, if the new compound does not require metabolic activation, ex vivo incubation of human plasma with the compound of interest followed by aPTT and PT assessment may provide a reliable method to rapidly assess efficacy. Despite their reliability and acceptability, PT and aPTT do not always reflect the anticoagulant activity of novel compounds. Rivaroxaban is an oral direct inhibitor of activated factor X [8]. In a rat venous model of the inferior vena cava, Rivaroxaban produced dose-dependent inhibition of FXa and an increase in PT [7]. An inhibition of 32% was associated with a 1.8-fold increase in PT, and nearly 100% inhibition produced a 3.2-fold increase in PT. However, in a rabbit model 92% inhibition of FX was associated with only a modest 1.2-fold increase in PT, demonstrating that species-specific sensitivity is one of the problems associated with monitoring the newer anticoagulants with standard coagulation parameters [8]. Even Coumadin provides a good example of species-specific effects that allow a useful therapeutic for humans but a lethal pest control for rodents [9]. Despite this challenge, utilization of coagulation testing for in vivo and ex vivo screening is well documented in the development of FXa inhibitors [8,10–12]. Rarely is a single biomarker considered definitive for evaluation of safety or efficacy. The drug development approach for Otamixabin and DU-176b incorporated a series of clotting parameter assays and assays to measure the effects on thrombus formation [10–12]. Although clotting parameters would be the preferred biomarker based on the ability to monitor in plasma, use of fairly simple but reproducible assays, cost, and the ability to monitor a clinical population easily, evidence is not established that clotting assays and thrombus formation assays are interchangeable. For Otamixaban, in vitro coagulation parameters were assessed for their ability to produce a doubling of PT and aPTT. This testing allowed a rank ordering of anticoagulant effects per species of rabbit > human > monkey >
INTRODUCTION
449
rat > dog [10]. Additionally, aPTT appeared to be the more sensitive biomarker in all species, with aPTT doubling occurring at drug concentrations that were less than half the concentration required for PT doubling. Multiple pharmacological models of thrombosis in rats, dogs, and pigs were also conducted with Otamixaban. In rats, thrombus mass was markedly reduced by nearly 95%, with a corresponding increase in aPTT of 2.5-fold and in PT of 1.6-fold [10]. In contrast, intravenous administration of 1, 5, or 15 μg/mL Otamixaban in the pig model effectively eliminated coronary flow reserves related to this stenosis model at the middle and high dose. PT was also prolonged at the middle and high dose, but aPTT was prolonged only at the high dose. Although pigs were not listed as assessed in the species-specificity model, it suggests that the clotting parameter of choice may vary per species and may not correlate well with thromobosis assays. Furthermore, clinical trial outcomes showed that at anticipated antithrombotic and therapeutic concentrations of 100 ng/mL Otamixaban, neither PT nor aPTT changed appreciably. In contrast, alternative clotting parameters such as the HepTest clotting time and the Russell viper venom clotting time, showed substantial prolongation, again suggesting that alternatives to standard PT and aPTT may be preferable [10]. Further work with the oral FXa inhibitor DU-176b provides additional evidence that selection of the right biomarker and appropriate correlation to functional assays is critical. This study was conducted in 12 healthy male volunteers [12]. The antithrombotic effect of DU-176b was assessed by measuring the difference in size of acutely formed, platelet-rich thrombus, pre- and postdrug administration using a Badimon perfusion chamber model under low and high shear force. Subjects received a single 60-mg dose of DU-176b, and pharmacokinetic and pharmacodynamic assessments were conducted at 1.5, 5, and 12 hours postdosing. Pharmacokinetic assessments included PT, international normalization ratio (INR), aPTT, thrombin generation, and anti-factor Xa activity. Drug levels were also assessed. Badimon chamber results demonstrated a strong antithrombotic effect at 1.5 hours with a progressive return toward baseline by 12 hours. All of the pharmacokinetic endpoints showed significant change from pretreatment, suggesting that any of the parameters might be an effective biomarker of DU-176b safety and/or efficacy. However, a close statistical look at this data raises some questions. A comparison of drug concentration level to anti-factor Xa activity and clotting parameters showed the strongest correlation with anti-factor Xa activity (r2 = 0.85), similar correlation with PT and INR (r2 = 0.795 and 0.78, respectively), but a fairly weak correlation with aPTT (r2 = 0.40). This suggests that although Otamixaban and DU-176b are both FXa inhibitors, arbitrary selection of PT or aPTT as a better predictor of drug concentration is problematic. Furthermore, when the antithrombotic effects of DU-176 assessed by Badimon chamber were compared to those obtained by clotting parameters, the correlation was even more challenging. Prothrombin time showed a correlation of r2 = 0.51 at both high and low stress and the correlation with aPTT was only r2 = 0.39 and 0.24 [12]. This suggests that although aPTT is used for monitoring of heparin therapy and PT
450
OPPORTUNITIES AND PITFALLS
is utilized for clinical safety of coumadin, their routine use as factor Xa inhibitors by themselves is insufficient.
CASE STUDY DATA WITH A DEVELOPMENTAL FXa INHIBITOR Beyond the published literature that is available, data and personal observations collected during the development of another FXa inhibitor at Pfizer Global Research & Development are now provided to complete this case study approach to biomarker utilization during anticoagulant development. Development of this particular FXa inhibitor ultimately was discontinued, as this compound required intravenous administration and as such lacked marketability compared to oral FXa inhibitors. However, the lessons learned provide further documentation of species-specific and interpretational complications that arise with the utilization of accepted coagulation biomarkers to monitor anticoagulant efficacy and safety for FXa inhibitors. Dose Selection for Ex Vivo Experiments In designing ex vivo experiments to evaluate potential biomarkers, selection of the appropriate drug concentration is critical. Furthermore, when experiments are conducted in multiple species, selection of the same drug concentration for all species is typically not ideal, due to species-specific drug sensitivity. Factor X concentrations vary by species, and the level of FXa inhibition is also variable. Therefore, the concentrations of FXa inhibitor utilized in this particular ex vivo evaluation were selected to achieve a range of FXa inhibition that was modest to nearly complete in all species examined. Pharmacology studies predicted that this FXa inhibitor would result in species-specific factor Xa sensitivity in the order human > dog > rat. Interestingly, this species specificity was not identical to that observed with Otamixaban [10], demonstrating that extrapolating biomarker data even between compounds in the same class may be misleading. For this developmental FXa inhibitor, human plasma was spiked to obtain final drug concentrations of 0, 0.2, 0.6, 1.2, and 6.0 μg/mL. Drug concentrations of 0, 0.4, 2.0, 8.0, and 15.0 μg/mL were selected for dog assessments and 0, 1.0, 4.0, 12.0, and 24.0 μg/mL were used for ex vivo assessments in rats to achieve a comparable range of FXa inhibition compared to that observed in human samples. Thromboplastin is the reagent that induces clot formation in the PT assay. There is ample documentation that the type and sensitivity of thromboplastin is a critical factor in the effective and safe monitoring of coumadin administration [13–17]. To minimize this variability in PT assays a calibration system was adopted by the World Health Organization (WHO) in 1982. This system converts the PT ratio observed with any thromboplastin into an international normalized ratio (INR). This value was calculated as follows: INR = observed PT ratioc, where the PT ratio is subject PT/control PT and c is the power value
CASE STUDY DATA WITH A DEVELOPMENTAL FXa INHIBITOR
451
representing the International Sensitivity Index (ISI) of the particular thromboplastin [18]. This system has proven to be an effective means of monitoring human oral anticoagulant therapy with coumadin and has been implemented almost universally. It allows individuals to be monitored at multiple clinics using varying reagents and instrumentation, while still achieving an accurate assessment of true anticoagulation. However, there is little or no information regarding selection of thromboplastin reagents or use of the INR for monitoring of FXa inhibitors. Typically, the higher the ISI value, the less sensitive the reagent, and the longer the PT time produced. The most commonly used thromboplastin reagents for PT evaluation are either rabbit brain thromboplastin (of variable ISI values, depending on manufacturer and product) or human recombinant thromboplastin, typically with an ISI of approximately 1.0. Use of the INR is accepted as a more relevant biomarker of anticoagulant efficacy than are absolute increases in PT alone, at least for coumadin therapy [13]. To more fully evaluate the effect of this FXa inhibitor on INR, PT was evaluated using rabbit brain thromboplastin, with ISI values of 1.24, 1.55, and 2.21 and a human recombinant thromboplastin (0.98 ISI). Although either human recombinant thromboplastin or rabbit thromboplastin are considered acceptable reagents for the conduct of PT testing, it was unclear whether these reagents would produce similar results in the presence of an FXa inhibitor or whether the sensitivity of the thromboplastin itself would affect results. Effect on Absolute Prothrombin Time Prothrombin time data obtained using rabbit brain thromboplastin with the three increasing ISI values during these ex vivo studies are presented in Table 1. The source and sensitivity of thromboplastin used in the assay affected the absolute PT value in all species, clearly demonstrating the need to standardize this reagent in preclinical assessment and to be cognizant of this impact in clinical trials or postmarketing, when reagents are less likely to be standardized. As anticipated, addition of the FXa inhibitor to plasma under ex vivo conditions increased the PT in a dose-dependent manner. This increase in PT time length was observed regardless of the ISI value (sensitivity) of the thromboplastin used and occurred in all species (Table 1). Although the absolute time for clot formation generally increased with increasing ISI, this was not true for all assessments. Table 2 summarizes the maximum change in PT and the range of variability when rabbit brain thromboplastin of varying ISI values was compared to human recombinant thromboplastin. Again, in general, the higher the PT value, the larger the deviation between reagent types. For example, although there was a 2.1-second difference between human and rabbit thromboplastin in untreated human plasma, the difference increased from 5.5, 9.2, 12.0, 13.6, and 38.2 seconds in samples containing 0.2, 0.6, 1.2, 1.8, or 6.0 μg/mL FXa inhibitor, respectively. Dogs were much less sensitive to the type of thrombo-
452
OPPORTUNITIES AND PITFALLS
TABLE 1 In Vitro Effect of an Experimental Factor Xa Inhibitor on Absolute Prothrombin Time Using a Rabbit Brain Thomboplastin Concentration of Factor Xa Inhibitor (μg/mL) 0 0.2 0.6 1.2 1.8 6.0 0 0.4 2.0 8.0 15.0 0 1.0 4.0 12.0 24.0
International Sensitivity Indexa 0.98
1.24
PT (s)—Human Plasma 11.4 ± 0.11 13.4 ± 0.17* 18.3 ± 0.37 23.8 ± 0.50* 30.8 ± 0.76 37.3 ± 0.86* 47.0 ± 1.98 51.4 ± 1.51* 61.2 ± 2.04 61.8 ± 1.58 133.0 ± 3.82 111.4 ± 3.17* PT (s)—Dog Plasma 7.8 ± 0.14 8.4 ± 0.07* 11.0 ± 0.26 11.0 ± 0.13 17.6 ± 0.42 16.6 ± 0.23* 32.0 ± 0.98 27.6 ± 0.49* 45.4 ± 1.52 36.5 ± 0.73* PT (s)—Rat Plasma 9.1 ± 0.05 15.1 ± 0.07* 13.3 ± 0.06 20.7 ± 0.13* 20.1 ± 0.29 30.8 ± 0.23* 31.1 ± 0.72 46.2 ± 0.49* 42.7 ± 1.09 60.6 ± 0.73*
1.55
2.21
12.9 ± 0.13* 23.4 ± 0.44* 39.9 ± 0.70* 58.9 ± 1.08* 74.8 ± 1.54* 152.3 ± 3.08*
10.9 ± 0.14 16.8 ± 0.43* 27.0 ± 0.93* 38.2 ± 1.39* 48.3 ± 2.08* 94.8 ± 4.12*
7.1 ± 0.07 10.5 ± 0.19 17.9 ± 0.41 33.6 ± 0.92 46.7 ± 1.35
6.7 ± 0.06* 9.2 ± 0.15* 15.0 ± 0.31* 27.1 ± 0.63* 36.8 ± 0.88*
17.1 ± 0.15* 31.3 ± 0.27* 51.8 ± 0.52* 83.6 ± 0.84* 109.4 ± 0.79*
13.1 ± 0.07* 23.2 ± 0.19* 36.9 ± 0.52* 56.2 ± 0.84* 75.4 ± 1.60*
*, Mean value ± S.E.M. for 10 individual subjects significantly different from 0.98 ISI thromboplastin means at 5% level by t-test, separately by increasing ISI value for individual rabbit thromboplastins. a
plastin used and showed smaller maximum changes in PT values. In contrast, rat PT values were highly dependent on the source of thromboplastin, and samples tested with rabbit brain thromboplastin were markedly longer than with human recombinant thromboplastin. Rats showed this high level of thromboplastin dependence even in untreated control samples. Variability of PT in FXa inhibitor-treated human and dog plasma was similar to that observed in controls and did not change appreciably with increasing concentration of drug (Table 2). FXa inhibitor-treated rat plasma showed an approximately twofold increase in variability compared to control. Effect on PT/Control Ratio and INR Generating a PT/control ratio by dividing the number of absolute seconds in the treated sample by the number in the control (untreated) sample provides a second method of assessing PT. If the ISI of the thromboplastin used is close to 1.0, the INR should be similar to the PT/control ratio (Table 3). The PT/
CASE STUDY DATA WITH A DEVELOPMENTAL FXa INHIBITOR
453
TABLE 2 Comparison of Human Recombinant Thromboplastin and Rabbit Brain Thromboplastin on Prothrombin Time in Plasma Samples Containing Increasing Concentrations of Factor Xa Inhibitora Species Human
Dog
Rat
Intended Drug Concentration (μg/mL) 0 0.20 0.60 1.20 1.80 6.00 0 0.40 2.00 8.00 15.00 0 1.00 4.00 12.00 24.00
Maximum Change in PTb
Range of Variabilityc (%)
2.1 5.5 9.2 12.0 13.6 38.2 1.1 1.8 2.5 4.9 8.9 8.0 18.0 31.7 52.5 66.7
−4 to +18 −8 to +30 −12 to +30 −19 to +25 −21 to +22 −29 to +14 −15 to +7 −7 to −5 −14 to +2 −15 to +5 −20 to +3 +43 to +88 +56 to +136 +53 to +158 +48 to +168 +42 to +156
a
Samples spiked with a factor Xa inhibitor in vitro. Maximum change in prothrombin time compared to 0.98 ISI human recombinant thromboplastin. c Variability of three increasing ISI levels of rabbit brain thromboplastin compared to human recombinant. b
control ratio could be used effectively to normalize thromboplastin differences in untreated human, dog, or rat samples. At predicted efficacious concentrations of FXa inhibitor, the PT/control ratio effectively normalized reagent differences. However, at high concentrations of FXa inhibitor, particularly in the rat, this method lacked the ability to normalize results effectively. Table 4 shows the corresponding INR values obtained in human, dog, and rat plasma when assessed with rabbit brain thromboplastins of increasing ISI. As anticipated, the PT/control ratio and INR were similar when the ISI was approximately 1. In contrast to the modest differences in PT when expressed as either absolute seconds or as a ratio compared to control value, the INR showed dramatic increases (Table 3). The magnitude of the INR value rose consistently with increasing ISI value and was marked. At the highest dose tested, the INR ranged from 11.1 with the 0.98 ISI reagent to 121.9 with the 2.21 ISI reagent in human samples, 5.6 to 43.4 in dogs, and 4.6 to 48.1 in rats. Assessment of PT in human, dog, or rat plasma containing this developmental FXa inhibitor was affected by the ISI of the thromboplastin selected for the
454
OPPORTUNITIES AND PITFALLS
TABLE 3 In Vitro Effect of an Experimental Factor Xa Inhibitor on Prothrombin Time/Control Ratio Using a Rabbit Brain Thomboplastin Concentration of Factor Xa Inhibitor (μg/mL) 0 0.2 0.6 1.2 1.8 6.0 0 0.4 2.0 8.0 15.0 0 1.0 4.0 12.0 24.0
International Sensitivity Indexa 0.98
1.24
1.55
PT / Control Ratio (:1)—Human Plasma 1.0 ± 0.00 1.0 ± 0.00 1.0 ± 0.00 1.6 ± 0.02 1.8 ± 0.12 1.8 ± 0.21 2.7 ± 0.05 2.8 ± 0.04 2.9 ± 0.03 4.1 ± 0.15 3.8 ± 0.08 4.6 ± 0.06* 5.4 ± 015 5.6 ± 0.07 5.8 ± 0.08 11.7 ± 0.26 8.3 ± 0.17* 11.9 ± 0.16 PT / Control Ratio (:1)—Dog Plasma 1.0 ± 0.00 1.0 ± 0.00 1.0 ± 0.00 1.4 ± 0.01 1.3 ± 0.01 1.5 ± 0.02 2.2 ± 0.02 2.0 ± 0.02 2.3 ± 0.04 4.1 ± 0.07 3.3 ± 0.05* 4.3 ± 0.10 5.8 ± 0.14 4.4 ± 0.08* 6.5 ± 0.16* PT / Control Ratio (:1)—Rat Plasma 1.0 ± 0.00 1.0 ± 0.00 1.0 ± 0.00 1.5 ± 0.02 1.4 ± 0.02 1.6 ± 0.02 2.2 ± 0.03 2.0 ± 0.04 3.0 ± 0.02* 3.4 ± 0.08 3.1 ± 0.07 4.9 ± 0.06* 5.5 ± 0.17 7.3 ± 0.24* 15.3 ± 0.21*
2.21 1.0 1.5 2.5 3.5 5.4 8.7
± ± ± ± ± ±
0.00 0.02 0.06 0.09 0.15 0.29*
1.0 ± 0.00 1.4 ± 0.02 2.3 ± 0.04 4.1 ± 0.08 5.5 ± 0.11 1.0 1.6 2.8 4.3 11.3
± ± ± ± ±
0.00 0.01 0.03* 0.05* 0.28*
*, Mean value ± S.E.M. for 10 individual subjects significantly different from 0.98 ISI thromboplastin means at 5% level by t-test, separately by increasingly ISI value for individual rabbit thromboplastins. a
assay. However, it was not affected to the same degree as was coumadin. Consequently, using the correction calculation designed for coumadin fluctuations to obtain an INR with CI-1031 grossly exaggerated the INR value. Although INR has been used clinically to monitor anticoagulant status during coumadin therapy, it probably should not be used with FXa inhibitor administration. Coumadin therapy typically produces INR values of 2, 4, and 6 as therapeutic, above therapeutic, and critical levels, respectively. INR values of 10 to 15 may be observed in acute coumadin poisoning, but INR values higher than 15 rarely occur [19]. Clearly, the magnitude of the INR obtained in this experiment (>120 in humans), combined with the incremental increase that occurred with increasing ISI value, shows that INR values in these FXa inhibitor-treated samples were an artifact of the calculation and not associated with the true anticoagulant effects of the FXa inhibitor itself. This suggests that when INR is used in clinical trials, it is important to select a thromboplastin with an ISI value close to 1.0. In this manner, the INR will
CASE STUDY DATA WITH A DEVELOPMENTAL FXa INHIBITOR
455
TABLE 4 In Vitro Effect of an Experimental Factor Xa Inhibitor on International Normalization Ratio Using a Rabbit Brain Thomboplastin Concentration of Factor Xa Inhibitor (μg/mL) 0 0.2 0.6 1.2 1.8 6.0
International Sensitivity Indexa 0.98
1.24
1.55
2.21
International Normalized Ratio (:1)—Human Plasma 1.0 ± 0.10 1.0 ± 0.02 1.0 ± 0.02 1.0 1.6 ± 0.04 2.0 ± 0.06* 2.6 ± 0.07* 2.6 2.7 ± 0.07 3.6 ± 0.10* 5.7 ± 0.16* 7.5 4.0 ± 0.16 5.3 ± 0.19* 10.5 ± 0.29* 16.2 5.2 ± 0.17 6.7 ± 0.21* 15.3 ± 0.47* 27.4 11.1 ± 0.31 13.8 ± 0.49* 46.0 ± 1.41* 121.9
± ± ± ± ± ±
0.03 0.16* 0.57* 1.31* 2.618 11.54*
0 0.4 2.0 8.0 15.0
International Normalized Ratio (:1)—Dog Plasma 1.0 ± 0.03 1.0 ± 0.01 0.9 ± 0.02 1.4 ± 0.04 1.4 ± 0.01 1.7 ± 0.05* 2.2 ± 0.05 2.3 ± 0.04 3.8 ± 0.14* 4.0 ± 0.12 4.4 ± 0.10* 10.1 ± 0.43* 5.6 ± 0.18 6.2 ± 0.16* 16.7 ± 0.74*
1.0 ± 0.03 2.0 ± 0.08* 6.0 ± 0.28* 22.1 ± 1.19* 43.4 ± 2.31*
0 1.0 4.0 12.0 24.0
International Normalized Ratio (:1)—Rat Plasma 1.0 ± 0.03 1.0 ± 0.00 1.0 ± 0.02 1.5 ± 0.02 1.5 ± 0.02 2.6 ± 0.03* 2.2 ± 0.03 2.4 ± 0.06* 5.6 ± 0.11* 3.3 ± 0.07 4.0 ± 0.13* 11.7 ± 0.34* 4.6 ± 0.12 5.6 ± 0.22* 17.8 ± 0.20*
1.0 3.5 9.9 25.1 48.1
± ± ± ± ±
0.01 0.07* 0.31* 0.82* 2.27*
*, Mean value ± S.E.M. for 10 individual subjects significantly different from 0.98 ISI thromboplastin means at 5% level by t-test, separately by increasingly ISI value for individual rabbit thromboplastins. a
closely approximate the PT/control ratio and give a true estimate of the anticoagulated state. Table 5 indicates the maximum change in PT/control ratio and INR using thromboplastins with increasing ISI values (1.24 to 2.21). Changes in the PT/control ratio were modest at drug concentrations that produced increases of fourfold or less, the maximum targeted therapeutic PT value for clinical trials. The mean PT/control ratio in human samples increased maximally from 2.7 to 3.1 at twice the therapeutic dose (0.6 μg/mL). Absolute PT and PT ratios compared to baseline values were only modestly different using thromboplastin from various manufacturers, sources (human recombinant versus rabbit), and ISI. This finding indicates that absolute PT or PT/control ratio were more effective biomarkers of FXa inhibitor concentration than was INR.
456
OPPORTUNITIES AND PITFALLS
TABLE 5 Comparison of PT/Control Ratio and International Normalization Ratio in Plasma Samples Containing Increasing Concentrations of Factor Xa Inhibitora
Species Human
Dog
Rat
Intended Drug Concentration (μg/mL) 0 0.2 0.6 1.2 1.8 6.0 0 0.4 2.0 8.0 15.0 0 1.0 4.0 12.0 24.0
PT/Control Ratio (0.98 ISI) 1.00 1.61 2.70 4.13 5.37 11.67 1.00 1.42 2.24 4.11 5.82 1.00 1.45 2.20 3.42 5.51
INR (0.98 ISI) 0.99 1.58 2.65 4.00 5.19 11.10 1.00 1.41 2.22 4.00 5.60 1.00 1.45 2.19 3.34 4.55
Maximum Change in PT/ Control Ratiob 0 0.22 0.38 0.65 0.97 3.39 0 0.10 0.27 0.81 1.46 0 0.38 0.82 1.48 9.82
Maximum Change in INRb 0.02 1.03 4.88 12.24 22.21 110.84 0.08 0.61 3.77 18.06 37.76 0.01 2.08 7.7 21.74 43.54
a
Samples spiked with a factor Xa inhibitor in vitro. Results obtained by selecting the maximum result obtained with rabbit brain thromboplastin and subtracting from result obtained with human recombinant thromboplastin. b
PURSUING BIOMARKERS BEYOND PT, INR, AND aPTT Values obtained with aPTT under ex vivo conditions were less sensitive than PT to FXa inhibitor-induced elevations and often underestimated drug concentration (data not shown). Beyond PT, INR, and aPTT, the most commonly used assay to evaluate FXa inhibitors is probably the anti-factor Xa assay (anti-FXa). It seems logical that a parameter named anti-Factor Xa assay should be the ideal biomarker for a FXa inhibitor. Additionally, this assay is used routinely in clinical settings to monitor the safety of heparin, a substance that also inhibits FXa production [20]. However, this assay is little more than a surrogate marker for drug concentration. A standard curve is prepared using the administered heparin (or other FXa inhibitor), and the chromagenic assay allows determination of the drug concentration in the plasma samples via production of FXa [21]. For heparin, the anti-FXa assay appears relevant. Years of use has allowed the development of a strong correlation between the number of international units of heparin determined via the assay and clinical safety. Reference ranges have been defined for the assay and provide a rapid
PURSUING BIOMARKERS BEYOND PT, INR, AND aPTT
457
estimation of under, over, or therapeutic levels of heparin administration [22]. Still variability in the anti-FXa assay has been reported and is attributable to a number of factors, including instrumentation, assay technique, specificity of the commercially available kits, heparin preparations used in generating the standard curve, and approaches to data fitting [21]. In contrast, this experience does not exist for anti-Xa values obtained during FXa inhibitor administration. Just as the PT and INR may not be as beneficial for predicting FXa inhibitor effects as they are for coumadin, it should not be assumed that the anti-FXa assays have equivalent predictivity for heparin and other FXa inhibitors. For the Pfizer developmental FXa inhibitor, the anti-FXa assay offered little more than the PT as a monitor of drug concentration. An additional assay called the factor X clotting (FX:C) assay was also evaluated. This assay is conducted using genetically engineered factordeficient plasma spiked with serial dilutions of purified human factor X [23,24]. Concentrations of factor X in plasma are then determined by extrapolation from the standard curve. Since factor X must be converted to factor Xa for clot formation to occur, a functional clotting assay for factor X can also be used to assess the effects of factor Xa inhibitors. The FX:C assay provides several unique features that may make it a valuable biomarker for monitoring factor Xa inhibitor therapy: (1) the assay provides a rapid, reliable assessment of drug concentration and the percent inhibition of FXa achieved during drug inhibitor administration; (2) the assay can be performed on a high-throughput automated platform that is available in most hospital-based coagulation laboratories; and (3) individual factor X concentrations range from 60 to 150% between subjects [25]. This fairly high level of baseline intersubject variability suggests that a standard dose of drug may have a substantially different impact on total factor X inhibition. The FX:C assay defines baseline factor X activity and thereby allows continued dosing to achieve a targeted factor X concentration [4]. Literature is available concerning factor X concentrations and bleeding history in patients with either inherited or acquired factor X deficiency, so minimally there is some understanding that correlates the impact of reductions in FX:C evaluations and bleeding potential [26–28]. By determining the actual concentration of functional factor X remaining, physicians may have increased confidence in the administration of factor Xa inhibitors. As with all the other coagulation biomarkers used for monitoring FXa inhibition, it was not immediately clear whether the FX:C assay was applicable in multiple species. Ex vivo experiments allowed this evaluation. To provide effective anticoagulant activity a 30% reduction in FX:C activity was predicted to be the minimal requirement for this compound. The FXa inhibitor concentrations in the ex vivo experiments were selected to bracket a range of factor X inhibition predicted to range from approximately 30% to 100%. Table 6 shows the intended concentrations of this FXa inhibitor in each species, the resulting FX:C activity, and the percent inhibition achieved. Assessment of these drug concentrations induced factor Xa inhibition of approximately 20% to >90%, showing that the targeted range could be predicted and achieved in
458
OPPORTUNITIES AND PITFALLS
TABLE 6 Factor X Activity and Percent Inhibition in Plasma Samples Containing Increasing Concentrations of a Factor Xa Inhibitora Species Human
Dog
Rat
Intended Drug Concentration (μg/mL) 0 0.2 0.6 1.2 1.8 6.0 0 0.4 2.0 8.0 15.0 0 1.0 4.0 12.0 24.0
FX:C Activityb (%) 106.1 ± 1.9 64.3 ± 1.6 32.2 ± 1.0 16.5 ± 0.7 10.0 ± 0.5 2.3 ± 0.2 143.0 ± 4.5 112.9 ± 8.6 42.4 ± 4.3 11.1 ± 1.4 5.4 ± 0.8 84.8 ± 2.8 52.6 ± 1.8 26.2 ± 0.9 11.8 ± 0.6 6.7 ± 0.3
Percent Inhibitionc NA 39.4 69.7 84.4 90.6 97.8 NA 21.0 70.3 92.2 96.2 NA 38.0 69.1 86.0 92.1
a
Samples spiked with a Factor Xa inhibitor in vitro. Mean ± SD of 10 samples/concentration. c Calculated from species-specific control value. NA, not applicable. b
all species. These ex vivo experiments demonstrated that the predicted efficacious dose of 0.2 to 0.3 μg/mL achieved the required 30 to 40% inhibition of FXa, providing early confidence in the dose selection process for phase I human trials. Additionally, these early ex vivo studies confirmed speciesspecific differences. The drug concentrations required to produce similar levels of FXa inhibition across species were markedly different. The FX:C assay was used effectively in preclinical rat and dog studies with this developmental FXa inhibitor. Knowledge of the species-specific concentration of drug required to induce the required 30% inhibition of FXa drove the selection of the low dose, whereas nearly complete inhibition of FXa drove the selection of the high dose. The FX:C assay helped determine the drug concentration required for complete inhibition of factor Xa in these species and the relative bleeding risk associated with a range of factor X concentrations. Prior knowledge of the impact of this drug on FXa inhibition through fairly simple clotting assessments helped eliminate undue risks of over-anticoagulation in preclinical studies, and there was no loss of animals due to excessive hemorrhage. It also addressed questions of whether dosing had been pushed to high enough levels when only minimal bleeding was observed at the highest dose. Since nearly 100% inhibition was achieved during the study, using higher doses was
CONCLUSIONS
459
not indicated and the lack of bleeding under conditions of complete FXa inhibition in rats and dogs suggested a strong safety profile. Inclusion of these biomarkers in preclinical studies provided greater confidence for selection of target stopping criteria for the first-in-human trial. The FX:C assay was translated and used as part of the first-in-human clinical trial with this compound. The FX:C assay provided data consistent with in vitro modeling, suggesting that it is predictive of drug concentration.
CONCLUSIONS One of the goals for new anticoagulant therapies is a superior safety profile compared to marketed anticoagulants, thereby minimizing or eliminating the need for clinical monitoring. Although clinical monitoring with standardized coagulation assays may appear to be a simple solution to monitoring the safety and efficacy of anticoagulants, there are inherent issues that make the elimination of clinical monitoring highly desirable. The obvious factors of cost and labor are minor in comparison to the problems associated with lack of patient compliance, delayed time to achieve therapeutic benefit, and the high degree of variability in the assay itself due to instrumentation, reagents, technique, and the inherent variability among subjects. Phase I studies using biomarkers are generally cheaper than phase II clinical endpoint studies. Additionally, new anticoagulants pose a relatively undetermined safety risk, due to the possibility of excessive bleeding. Therefore, biomarkers will continue to be essential until safety profiles can be established for this newer generation of anticoagulants. Although PT and INR are an effective reflection of drug the safety and efficacy for coumadins, they are less than ideal as biomarkers of new FXa inhibitor drugs. Assessing anti-FXa activity has been similar to drug concentration analysis for some inhibitors but variable for others [12; current case study]. FX:C clotting activity may be another alternative but remains largely unexplored. Regardless of the hope for the development of safer anticoagulants that are monitoring-free, the reality is that development of these drugs requires extensive patient monitoring to ensure safety. Compared to heparin and coumadin, which are monitored fairly effectively with aPTT and PT, respectively, development of the new FXa inhibitors are typically accompanied by a laundry list of probable biomarkers. This process is likely to continue until safety is firmly established through prolonged use and clinical experience with these agents. It seems likely that most of these coagulation assays could just as likely be a bioassay of drug concentration as an indicator of pharmacologic response. In Stern’s evaluation of biomarkers of an antithrombin agent, he concluded that “not all biomarkers are created equal” [29]. He suggested that “If a proposed biomarker measurement requires a drug and its molecular target to be combined in the same assay, it may be more a pharmacokinetic than a
460
OPPORTUNITIES AND PITFALLS
pharmacodynamic assessment. Also, such assays should not be assumed to demonstrate an in vivo effect” [29]. As such, these biomarkers face a lofty hurdle to replace such pharmacodynamic endpoints as the Badimon chamber. So what does this mean for the early use of biomarkers in the development of new anticoagulants? It suggests that the greatest benefit for early utilization of coagulation biomarkers remains in allowing optimal selection of compounds, attrition of the right compounds, and the opportunity to provide an early ex vivo assessment against marketed competitors. It also demonstrates that efforts expended in understanding species-specific and reagent differences are critical in performing those early experiments.
REFERENCES 1. CMR International (2006). 2006/7 Pharmaceutical R&D Factbook CMR International, pp. 22–35. 2. Kola I, Landix J (2004). Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov, 3:711–715. 3. DiMasi JA (2001). Risks in new drug development: approval success rates for investigational drugs. Clin Pharmacol Ther, 69:297–307. 4. Jawad S, Oxley J, Yuen WC, Richens A (1986). The effect of lamotrigine, a novel anticonvulsant, on interictal spikes in patients with epilepsy. Br J Clin Pharmacol, 22:191–193. 5. Smith MB, Woods GL (2001). In vitro testing of antimicrobial agents. In Davey FR, Herman CJ, McPherson RA, Pincus MR, Threatte G, Woods GL (eds.), Henry’s Clinical Diagnosis and Management by Laboratory Methods, 20th ed. W.B. Saunders, Philadelphia, pp. 1119–1143. 6. Anderson FA, Wheeler HB, Goldberg RJ, et al. (1991). A population-based perspective of the hospital incidence and case-fatality rates of deep vein thrombosis and pulmonary embolism. Arch Intern Med, 151:933–938. 7. Colman RW, Clowes AW, George JN, Hirsh J, Marder VJ (2001). Overview of hemostasis. In Colman RW, Hirsh J, Marder VJ, Clowes AW, George JN (eds.), Hemostasis and Thrombosis: Basic Principles and Clinical Practice. Lippincott Williams & Wilkins, Philadelphia, pp. 3–16. 8. Kakar P, Watson T, Gregory-Lip YH (2007). Drug evaluation: Rivaroxaban, an oral, direct inhibitor of activated factor X. Curr Opin Invest Drugs, 8(3):256–265. 9. Guertin KR, Choi YM (2007). The discovery of the factor Xa inhibitor Otamixaban: from lead identification to clinical development. Curr Med Chem, 14:2471–2481. 10. Zafar MU, Vorchheimer DA, Gaztanaga J, et al. (2007). Antithrombotic effect of factor Xa inhibition with DU-176b: phase-1 study of an oral, direct factor Xa inhibitor using an ex-vivo flow chamber. Thromb Haemost, 98:883–888. 11. Hylek EM (2007). Drug evaluation: DU-176b, an oral, direct factor Xa antagonist. Curr Opin Invest Drugs, 8(9):778–783. 12. Crowther MA, Ginsberg JS, Hirsh J (2001). Practical aspects of anticoagulant therapy. In Colman RW, Hirsh J, Marder VJ, Clowes AW, George JN, (eds.),
REFERENCES
13. 14. 15.
16. 17.
18.
19.
20.
21. 22. 23.
24.
25.
26.
27.
461
Hemostasis and Thrombosis: Basic Principles and Clinical Practice. Lippincott Williams & Wilkins, Philadelphia, pp. 1497–1516. Zucker S, Cathey MH, Sox PJ, Hall EC (1970). Standardization of laboratory tests for controlling anticoagulant therapy. Am J Clin Pathol, 52:348–354. Poller L (1987). Progress in standardization in anticoagulant control. Hematol Rev, 1:225–228. Bailey EL, Harper TA, Pinterton PH (1971). The “therapeutic range” of the one-stage prothrombin time in the control of anticoagulant therapy: the effect of different thromboplastin preparations. CMAJ, 105:307–318. Kirkwood TB (1983). Calibration of reference thromboplastin and standardization of the prothrombin time ratio. Thromb Haemost, 49:238–244. Jeske W, Messmore HL, Fareed J (1998). Pharmacology of heparin and oral anticoagulants, In Loscalzo J, Schafer AI (eds.), Thrombosis and Hemorrhage, 2nd ed. Williams & Wilkins, Baltimore, pp. 257–283. Crowther MA, Ginsberg JS, Hirsh J (2001). Practical aspects of anticoagulant therapy. In Colman RW, Hirsh J, Marder VJ, Clowes AW, George JN, (eds.), Hemostasis and Thrombosis: Basic Principles and Clinical Practice. Lippincott Williams & Wilkins, Philadephia, PA, pp. 1497–1516. Levine MN, Hirsh J, Gent M (1994). A randominzed trial comparing activated thromboplastin time with heparin assay in patients with acute venous thromboembolism requiring large daily doses of heparin. Arch Intern Med, 154:49–56. Kitchen S, Theaker J, Preston FE (2000). Monitoring unfractionated heparin therapy: relationship between eight anit-Xa assays and a protamine titration assay. Blood Coagul Fibrinolysis, 11:55–60. Fifth ACCP Consensus Conference on Antithrombotic Therapy (1998). Chest, 119(Suppl):1S–769S. Bauer KA, Kass BL, ten Cate H (1989). Detection of factor X activation in humans. Blood, 74:2007–2015. Bauer KA, Weitz JI (2001). Laboratory markers of coagulation and fibrinolysis. In Colman RW, Hirsh J, Marder VJ, Clowes AW, George JN, (eds.), Hemostasis and Thrombosis: Basic Principles and Clinical Practice. Lippincott Williams & Wilkins, Philadelphia, pp. 1113–1129. Fair DS, Edgington TS (1985). Heterogeneity of hereditary and acquired factor X deficiency by combined immunochemical and functional analyses. Br J Haematol, 59:235–242. Herrmann FH, Auerswald G, Ruiz-Saez A, et al. (2006). Factor X deficiency: clinical manifestation of 102 subjects from Europe and Latin American with mutations in factor 10 gene. Haemophilia, 12:479–489. Choufani EB, Sanchlorawala V, Ernst T, et al. (2001). Acquired factor X deficiency in patients with amyloid light-chain amyloidosis: incidence, bleeding manifestations, and response to high-dose chemotherapy. Blood, 97: 1885–1887. Mumford AD, O’Donnell J, Gillmore JD, Manning RA, Hawkins PN, Laffan M (2000). Bleeding symptoms and coagulation abnormalities in 337 patients with AL-amyloidosis. Br J Haematol, 110:454–460.
462
OPPORTUNITIES AND PITFALLS
28. Stern R, Chanoine F, Criswell K (2003). Are coagulation times biomarkers? Data from a phase I study of the oral thrombin inhibitor LB-30057 (CI-1028). J Clin Pharmacol, 43:118–121. 29. Stirling Y (1995). Warfarin-induced changes in procoagulant and anticoagulant proteins. Blood Coagul Fibrinolysis, 6:361–373.
24 INTEGRATING MOLECULAR TESTING INTO CLINICAL APPLICATIONS Anthony A. Killeen, M.D., Ph.D. University of Minnesota, Minneapolis, Minnesota
INTRODUCTION The clinical laboratory plays a critical role in modern health care. It is commonly estimated that approximately 70% of all diagnoses are to some extent dependent on a laboratory finding. The clinical laboratory has various roles in the diagnosis and treatment of disease, including determining disease risks, screening for disease, establishing a diagnosis, monitoring of disease progression, and monitoring of response to therapy. Not surprisingly, the size of the market is large. A 2004 report by S.G. Cowen on the global in vitro diagnostics (IVD) industry estimated it to be $26 billion in size, and molecular diagnostics was identified as being among the more rapidly growing areas. Today, molecular testing is used in many areas of the clinical laboratory, including microbiology and virology, analysis of solid and hematologic tumors, inherited disorders, tissue typing, and identity testing (e.g., paternity testing and forensic testing). The growth has occurred rapidly over the last 20 years. In this chapter we examine the principal issues surrounding the integration of molecular testing into the clinical laboratory environment.
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
463
464
INTEGRATING MOLECULAR TESTING INTO CLINICAL APPLICATIONS
CLINICAL LABORATORY REGULATION The clinical laboratory environment in the United States is one of the most extensively regulated areas of medical practice and comes under the federal Clinical Laboratory Improvement Amendments (CLIA) of 1988 (http://www. cms.hhs.gov/clia/). Any implementation of molecular diagnostic is therefore governed by the provisions of the CLIA. The history of the CLIA dates back to the 1980s, when public and congressional concern was raised by reports of serious errors being made in clinical laboratories. In response to these concerns, legislation was introduced with the intention of improving laboratory testing. These regulations cover most aspects of laboratory practice. Any laboratory testing that is performed in the United States for clinical purposes such as diagnosis, monitoring, deciding appropriate treatment, and establishing prognosis must be performed in a CLIA-certified laboratory. These regulations, however, do not apply to purely research studies or to early research and development work for molecular or other testing in a non-CLIA-certified environment, but as soon as testing that has genuine clinical utility is made available, it must be performed in a certified environment. The initial application for a CLIA certificate is usually made to the state office of the Centers for Medicare and Medicaid Services (CMS). A successful application will result in a certificate of registration, which allows a laboratory to perform clinical testing pending its first formal inspection. Depending on whether the laboratory is certified by CMS or by an accrediting organization, a successful inspection will result in a grant of either a certificate of compliance or a certificate of accreditation (Figure 1). These are essentially equivalent for Accreditation, 16,142
Compliance, 19,695
PPM,39,014
Waiver, 122,992
Figure 1 Distribution of CLIA certificates by type in non-CLIA-exempt states in 2007. (Data from the CLIA database, http://www.cms.hhs.gov/CLIA/.)
CLINICAL LABORATORY REGULATION
465
the purposes of offering clinical testing. Accrediting organizations function as surrogates for CMS in the laboratory accreditation process and must be approved by CMS to accredit clinical laboratories. The accrediting organizations are the College of American Pathologists (CAP), the Council on Laboratory Accreditation (COLA), the Joint Commission, the American Association of Blood Banks (AABB), the American Society for Histocompatibility and Immunogenetics (ASHI), and the American Association of Bioanalysts (AAB). Some of these, such as the ASHI, accredit laboratories that perform only limited types of testing. Others, such as the CAP, accredit laboratories for all types of clinical testing, including molecular diagnostic testing. Clinical tests are categorized for the purposes of the CLIA into several levels of complexity. This categorization is the function of the U.S. Food and Drug Administration (FDA). The type of CLIA certificate that a laboratory requires parallels the complexity of its test menu. The lowest level of test complexity is the waived category. Tests in this category are typically simple methods with little likelihood of error or of serious adverse consequences for patients if performed incorrectly. Commonly, such tests are performed in physician office laboratories. It should be noted that the term waived applies to a test, not to the need for the laboratory to have a CLIA certificate to perform any clinical testing. The next-highest level is the moderate-complexity test, including a category known as provider-performed microscopy. The highest level is the high-complexity test, which is applicable to most molecular tests. Laboratories that perform high-complexity testing must have a certificate to perform this type of testing. When the CLIA was written 20 years ago, there was relatively little molecular testing, and as a result, molecular diagnostics does not have specific requirements in the regulations as do most areas of clinical laboratory practice, such as clinical chemistry, microbiology, and hematology. Nevertheless, the general requirements of CLIA can be adapted to molecular testing. Accrediting organizations such as the CAP do have specific requirements for laboratories that perform molecular diagnostic testing. These are available in their laboratory inspection checklists [1]. Whereas the FDA is responsible for categorizing tests, the Centers for Medicare and Medicaid Services (CMS) are responsible for the oversight of the CLIA program, including granting certificates, approving accrediting organizations, approving proficiency testing (PT) programs, inspections, and enforcement actions. The CLIA is a federal law and applies to all clinical testing performed in the United States and in foreign laboratories that are certified under the CLIA. There are provisions in the CLIA under which individual states can substitute their own laboratory oversight programs if it is determined that such programs are at least as stringent as the federal program. Currently, such programs exist only in New York and Washington. These are known as “CLIA-exempt” states, although CMS reserves the authority to inspect any aspect of laboratory performance in these states. The CLIA includes the
466
INTEGRATING MOLECULAR TESTING INTO CLINICAL APPLICATIONS
following areas of laboratory testing: proficiency testing, preanalytic testing, analytic testing, and personnel requirements. Proficiency Testing Proficiency testing (PT) is one external measure by which a laboratory’s performance can be judged. In a PT program, laboratories are sent samples for analysis and return their results to the PT program organizers. The correct result (or range of results) for these programs is determined by the organizers based on a comparison of participant results with results obtained by reference laboratories (accuracy-based grading), or by comparison with other laboratories that use the same analytical methods (peer-group grading). Ideally, all PT programs would use accuracy-based grading, but there are significant practical limitations to this approach. One of the major limitations is the PT material itself. For many analytes it is not possible to obtain the necessary range of concentrations to test low, normal, and high concentrations using real human samples. This necessitates the use of artificial samples that have been spiked with the analyte or from which the analyte has been removed (or at least its concentration has been lowered). Such artificial samples may behave unexpectedly when tested using some analytical equipment and give higher or lower values that would be obtained in a native specimen containing the same concentration of the analyte. This is known as the matrix effect. Other limitations may require peer-group grading; for example, recombinant proteins may not be detected equally in different manufacturers’ immunoassays, making accuracy-based grading impossible. Enzyme concentrations may be determined by different manufacturers using different concentrations of cofactors, different temperatures, and different substrates, thus giving rise to such intermethod disagreement that accuracy-based grading is impossible. Molecular testing poses certain challenges to PT programs. It may not be possible to obtain real human specimens such as blood from subjects known to carry mutations of interest because of the quantities required for a large PT program. This necessitates the use of cell lines or even DNA aliquots for PT programs in genetics. Such samples cannot test all phases of the analytical process, including extraction of DNA from whole blood (the normal procedure for genetic testing). The same concern applies to molecular testing for infectious diseases such as HIV-1. For these reasons, it is not uncommon that PT samples do not fully mimic patient samples. Under the CLIA, laboratories are required to enroll in PT programs for a group of analytes specified in Subpart I of the regulations. These analytes were chosen based on clinical laboratory testing patterns that existed in 1988, and the list has not been updated since then. As a result, many newer tests, including molecular tests, are not on this list. For tests not on this list of “regulated” analytes, laboratories must verify the accuracy of their methods by some other method at least twice a year. This could include comparison of results with
CLINICAL LABORATORY REGULATION
467
those obtained by a different method: sample exchange with another laboratory, or even correlation of results with patients’ clinical status. If formal PT programs exist, laboratories should consider enrolling in these. Several of the accrediting organizations do have requirements for participation in PT programs where these exist, including PT programs for molecular testing. Preanalytic Testing The CLIA has requirements that cover the preanalytic phase of testing. These include the use of requisition forms with correct identification of the patient, the patient’s age and gender, the test to be performed, the date and time of sample collection, the name of the ordering provider or the person to whom results should be reported, the type of specimen (e.g., blood), and any other additional information needed to produce a result. All of these are critical pieces of information that should be provided to the laboratory. Many so-called “laboratory errors” actually arise at the time of sample collection, and specimen misidentification is one of the most common types of error in the testing process. In addition to the patient’s age and gender, orders for molecular genetic testing should include relevant information about suspected diagnosis, clinical findings, and especially the family history. Many experienced clinical geneticists and genetic counselors will include a pedigree diagram on a requisition form for tests for inherited disorders. This practice is highly desirable and provides much useful information to the laboratory. As an example of the importance of this information, current practice guidelines in obstetrics and gynecology in the United States encourage the offering of prenatal testing to expectant Caucasian mothers to determine if they are carriers of mutations for cystic fibrosis. A recommended panel of mutations to be tested by clinical laboratories covers approximately 80 to 85% of all mutations in this population. In general, a negative screening test for these mutations reduces the risk of being a cystic fibrosis carrier from 1 in 30 to 1 in 141, and the laboratory would report these figures, or, if a mutation were identified, would report the specific mutation. However, these figures are based on the assumption that there is no family history of the disorder in the patient’s family. If there is such a history, the risk of being a carrier (both before and after testing) is substantially higher. It is therefore essential that the ordering physician inform the laboratory if there is a family history. Analytic Testing The CLIA has detailed requirements for the analytic phase of the testing process. These include the procedure manual, which is a step-by-step set of instructions on how the test should be performed, the process for method calibration, the procedures for preparation of reagents, the use of controls,
468
INTEGRATING MOLECULAR TESTING INTO CLINICAL APPLICATIONS
establishment of the reference range, reporting procedures, and analytical parameters such as sensitivity and specificity. There are no specific CLIA requirements that are unique to molecular testing, and therefore the molecular diagnostics laboratory has to adapt requirements from related areas such as clinical chemistry and microbiology to molecular testing. Some of the accrediting organizations have checklists that include specific requirements for molecular testing. These can provide useful guidance on procedures even for a laboratory that is not accredited by one of these organizations. Postanalytic Testing Postanalytic testing refers to steps involved in reporting results to the ordering physician in a timely manner. The patient’s name and identification information must be on the report, as should the name and address of the performing laboratory. In addition to the result, the report should include the reference interval and any relevant interpretive comments. The laboratory should be able to provide information on test validation and known interferences on the request of an ordering physician. Results must be released only to authorized persons. Although certain elements of the postanalytic phase of testing can be controlled by the laboratory, there are also critical elements that are beyond its control, notably the correct interpretation of the result by the ordering physician. Molecular diagnostics (and genetics in general) is an area in which many physicians and other providers never had formal training in medical school. Concern has been expressed about the need to improve genetics education for health care professionals. Where there is a gap in provider knowledge, the laboratory should be able to offer expert consultation on the interpretation of its results to primary care providers [2]. This requires time, patience, and good communication skills on the part of the laboratory director and senior staff. Although such activity may be reimbursable under some health plans, the primary incentives for providing this kind of consultation are good patient care and customer satisfaction. Personnel Qualifications Under the CLIA, requirements exist for laboratory personnel qualifications and/or experience. Perhaps the most important qualification requirements apply to the laboratory director. The director of a high-complexity laboratory such as a clinical molecular testing laboratory must hold a license in the state in which he or she works (if the state issues such licenses) and be a physician or osteopathic physician with board certification in pathology. Alternatively, the laboratory director can be a physician with at least one year of training in laboratory practice during residency, or a physician with at least two years of experience supervising or directing a clinical laboratory. A doctoral scientist holding a degree in a chemical, physical, biological, or clinical laboratory
GENETIC TESTING AND PRIVACY
469
science field with board certification from an approved board may also serve as the laboratory director. There are also provisions that allow for grandfathering of persons who were serving as laboratory directors at the time of implementation of the CLIA. Currently, there are no specific CLIA-required qualifications for the director of a molecular diagnostics laboratory. There are, however, board examinations in this field or similar fields that are offered by the American Board of Pathology, the American Board of Medical Genetics, the American Board for Clinical Chemistry, the American Board of Medical Microbiology, and the American Board of Bioanalysts. It is possible that individual states may begin to require specific qualifications in molecular diagnostics in the future or even that changes to the CLIA may require such qualifications. Other personnel and their qualifications described in the CLIA for high-complexity laboratories are technical supervisor, clinical consultant, general supervisor, cytology supervisor, cytotechnologist, and testing personnel.
GENETIC TESTING AND PRIVACY For many years there has been concern about the use of genetic information to discriminate against people with genetic diseases or those who are at risk of manifesting genetic disease at some time in the future. Although there are very few reported examples of such discrimination, the possibility of such misuse of genetic information by employers or insurance companies has received considerable attention by both the public and by legislative bodies [3]. A comprehensive analysis of applicable laws is beyond the scope of this chapter, but certain principles that apply to the clinical laboratory are worth mentioning. It is generally assumed, of course, that all clinical laboratory testing is performed with the consent of the patient. However, written consent is a legal requirement for genetic testing in some jurisdictions. The laboratory is generally not in a position to collect informed consent from patients, so it is usually obtained by some other health care worker, such as the ordering physician or genetics counselor. The laboratory director should be aware of applicable laws in this matter and determine, with legal advice if necessary, what testing is covered in his or her jurisdiction and ensure that appropriate consent is obtained. Genetic testing in its broadest meaning can cover more than just nucleic acid testing. For example, some laboratory methods for measuring glycohemoglobin, a test used for following diabetes control, can indicate the presence of genetic variants of hemoglobin, such as sickle-cell hemoglobin. Histopathologic examination of certain tumors can be strongly suggestive of an inherited disorder. Serum protein electrophoresis can reveal α-1 antitrypsin deficiency, an inherited disorder. The laboratory should consider how it reports such findings, which may contain genetic information that is unanticipated by both the ordering physician and the patient.
470
INTEGRATING MOLECULAR TESTING INTO CLINICAL APPLICATIONS
The most significant federal legislation in this area is the Genetic Information Nondiscrimination Act of 2008. This act offers protection against the use of genetic information as a basis for discrimination in employment and health insurance decisions. Under the provisions of this law, people who are healthy may not be discriminated against on the basis of any genetic predisposition to developing disease in the future. Health care insurers (but not life insurers or long-term care insurers) and employers may not require prospective clients or employees to undergo genetic testing or take any adverse action based on knowledge of a genetic trait. The benefits of this legislation are that some people may feel less trepidation about undergoing genetic testing because of fear that such information could be used by an employer or insurance company to discriminate against them.
TESTING IN RESEARCH LABORATORIES As research laboratories report new molecular findings in inherited and acquired diseases, it is not uncommon for clinical laboratories to receive requests to send patient samples to research laboratories for testing. This is an area in which the clinical laboratory must be careful to avoid noncompliance with CLIA regulations. One of the requirements of the CLIA is that certified laboratories must not send samples for patient testing to a non-CLIAcertified laboratory. This rule applies even if the research laboratory is the only one in the world to offer a particular test. Such samples should not be handled by a CLIA-certified laboratory, and the ordering physician should find some other means of arranging for testing if it is considered necessary. For example, it may be possible for testing to be performed under a research protocol. In this case the local institutional review board may be able to offer useful guidance on the creation and implementation of an appropriate protocol. There are good reasons to be cautious about performing clinical testing in a research setting. The standards that one expects in a CLIA-certified laboratory are designed to promote quality and improve the accuracy of patient testing. Laboratories that do not follow these extensive requirements may not have all of the necessary protocols and procedures in place to offer the same quality of test result. Research laboratories are often part of academic institutions that may or may not carry malpractice insurance coverage in the event that a reported test result is erroneous.
MOLECULAR TESTING FROM RESEARCH TO CLINICAL APPLICATION The usual progression of molecular testing begins with gene and mutation discovery, typically in a research laboratory setting. Publication of these early
MOLECULAR TESTING FROM RESEARCH TO CLINICAL APPLICATION
471
findings in peer-reviewed literature is the normal means of disseminating new information about a gene of clinical interest and the variations that can cause disease. It is important to document at least the most common disease-causing mutations and benign polymorphisms. It is usual to file patent applications to establish intellectual property claims, particularly if the test may have wide clinical applicability. After a disease-causing mutation has been discovered, diagnostic testing on patients (as opposed to research subjects who have consented) requires performance in a laboratory that holds a CLIA certificate, and for molecular testing almost certainly means a certificate that allows for high-complexity testing. Because research laboratories are usually not set up to perform clinical testing or to meet the stringent criteria for clinical laboratory operations, it is usual that the rights to perform clinical testing be sold or licensed to a clinical laboratory that has the capability of offering such testing. The question of how many laboratories should be licensed is an important one. In general, it is often problematic for clinical providers when only one laboratory has a license to perform a clinical test. There is no way to verify a result in an independent laboratory, there is no competition that might lead to better test pricing for patients, there is little that can be done if the laboratory performance in areas such as turnaround time is suboptimal, and research may be inhibited [4]. For these reasons, licensing a test to multiple laboratories is generally preferable to an exclusive license to one laboratory. It has also been argued that patenting tests may inhibit their availability in clinical laboratories [5]. What should a molecular diagnostics laboratory be able to offer to meet clinical needs for molecular testing? First, the quality of the test result must be of a very high standard; that is, the results are reliable. Of course, all laboratories strive for this goal, which is implicit in the numerous regulations that govern laboratory testing. This is achieved by careful attention to the preanalytic, analytic, and postanalytic factors mentioned above and to the hiring of qualified and skilled personnel. The laboratory should offer turnaround times that are appropriate to the clinical needs of a specific test and will vary from one test to another. For example, testing for some infectious diseases is likely to require a faster turnaround time than is testing for a genetic predisposition to a chronic disease. Information should be readily available on the requirements for specimen type and the needs for special handling. The laboratory should be able to offer interpretations and consultations to ordering physicians regarding results of patient testing. If the genetic test result is a risk factor for future development of disease or for carrier status (e.g., cystic fibrosis carrier screening in pregnancy), the laboratory should be able to recalculate such risks if additional family history is provided at a later time. Many laboratories have a formal relationship with a genetic counselor who can interact with both patients and other health care workers and provide a variety of very useful services. As clinical testing becomes more widespread, there can be significant changes to the knowledge and thinking about the relationship between disease
472
INTEGRATING MOLECULAR TESTING INTO CLINICAL APPLICATIONS
and underlying genetic mutation. An example of this is illustrated by the hereditary hemochromatosis gene, HFE. Discovery of this gene and the common mutations, C282Y and H63D, led to the view that the homozygous states, especially homozygosity for C282Y, would lead to chronic iron overload and hemochromatosis [6]. That view is no longer correct in light of more recent population studies of the penetrance of these mutations. Approximately one-third of patients who are homozygous for C282Y do not have elevated ferritin levels and appear not to be at risk of iron overload [7]. The reason for the variability of penetrance is probably related to dietary iron, blood loss, and other genetic factors that have yet to be determined. It is important for the laboratory director to be aware of such changing perspectives in thinking about diseases and to be an educator to others, making them aware of important developments, so that rational ordering patterns are encouraged.
REIMBURSEMENT FOR MOLECULAR TESTING In common with all areas of medical practice, reimbursement for molecular testing at the federal level (Medicare) is based on the current Common Procedural Terminology (CPT) coding system. State providers such as Medicaid and private insurance companies generally follow the same process. Under CPT coding, a charge and its payment are based on the number of individual items of service provided. Each step in a typical molecular assay ranging from extraction of DNA to performance of a polymerase chain reaction to gel electrophoresis and final result interpretation has a unique CPT code and an associated reimbursement based (in the case of Medicare) on the published fee schedule. Therefore, the Medicare reimbursement rate is calculable and is based on the individual steps in an assay. Private insurance companies may reimburse at a higher rate than federal payers. The CPT codes are updated annually by the American Medical Association, which retains copyright on the codes. Because of the rapid advances in molecular testing, it is not uncommon for laboratories to use methods that are not listed in the CPT guide. In this case, it may be necessary to seek consultation billing experts on choosing the appropriate fee codes. Not uncommonly, genetic test prices from commercial laboratories are well above those that can be justified from published fee schedules. Although this may be perfectly legal, it can lead to significant problems for patients whose insurance companies (including Medicare) may not cover the full cost of the testing. In this situation the patient may have to pay out of pocket for part or all of the cost of the test if it is decided that the testing is essential. This situation can pose a financial risk for hospitals and clinics if they refer a sample for testing to a reference laboratory and thereby possibly incur the charges for a test. One possible option is to notify the patient and ordering physician that such tests are unlikely to be covered by insurance and determine how they propose to pay for testing. For Medicare patients, an advance beneficiary
REFERENCES
473
notice (ABN) may be used to formally notify a patient that the test is considered to be a noncovered service [8]. These types of situations should be discussed with hospital management.
SUMMARY Molecular testing is firmly established in clinical laboratories for a wide variety of disorders. According to published reports, molecular diagnostics is and will continue to remain one of the fastest areas of growth in clinical testing. In the United States, the clinical laboratory operates under the regulations of the Clinical Laboratory Improvement Amendments of 1988, which provide the framework for producing high-quality results. The clinical laboratory differs significantly from the research laboratory both in practice and from a regulatory point of view. Careful attention should be paid to issues such as patient privacy and reimbursement for molecular testing.
REFERENCES 1. College of American Pathologists (2008). http://www.cap.org (accessed Sept. 18, 2008). 2. Harvey EK, Fogel CE, Peyrot M, Christensen KD, Terry SF, McInerney JD (2007). Providers’ knowledge of genetics: a survey of 5915 individuals and families with genetic conditions. Genet Med, 9:259–267. 3. Harmon A (2008). Insurance fears lead many to shun DNA tests. New York Times, Feb. 24. 4. Cho MK, Illangasekare S, Weaver MA, Leonard DG, Merz JF (2003). Effects of patents and licenses on the provision of clinical genetic testing services. J Mol Diagn, 5:3–8. 5. Merz JF, Kriss AG, Leonard DG, Cho MK (2002). Diagnostic testing fails the test. Nature, 415:577–579. 6. Feder JN, Gnirke A, Thomas W, et al. (1996). A novel MHC class I–like gene is mutated in patients with hereditary haemochromatosis. Nat Genet, 13(4):399–408. 7. Olynyk JK, Trinder D, Ramm GA, Britton RS, Bacon BR (2008). Hereditary hemochromatosis in the post-HFE era. Hepatology, 48:991–1001. 8. Carter D (2003). Obtaining advance beneficiary notices for Medicare physician providers. J Med Pract Manage, 19:10–18.
25 BIOMARKERS FOR LYSOSOMAL STORAGE DISORDERS Ari Zimran, M.D. Gaucher Clinic, Shaare Zedek Medical Center, Jerusalem, Israel
Candida Fratazzi, M.D. Altus Pharmaceuticals, Inc., Waltham, Massachusetts
Deborah Elstein, Ph.D. Gaucher Clinic, Shaare Zedek Medical Center, Jerusalem, Israel
INTRODUCTION During the past few decades, because of the advances in molecular technology and improved understanding of lysosomal diseases, efforts have been made to identify appropriate prognostic and predictive factors in many of these diseases, despite the fact that each disease is a rare disorder. Indeed, what may be seen as a commonality among these diseases in terms of biochemistry or molecular underpinnings further defines the biochemical and molecular elements of each disease that make each disease unique. Thus, the situation today reflects the partial knowledge we have about this conglomerate of diseases, so that in the main, very few biomarkers are available in most of these diseases to assist the clinician to know which patients a priori will suffer from more severe manifestations or will benefit the most from specific therapies. In this context one must mention the infantile (neurological) forms that are rapidly
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
475
476
BIOMARKERS FOR LYSOSOMAL STORAGE DISORDERS
progressive, and for these it might not be ethically acceptable to offer therapies that have not yet withstood the test of time. A biomarker should be technically feasible in many hands, easy to measure (using readily available technology, so that results are universal and standardized); useful, with a consistent relative magnitude between experimentals and controls, or treated and untreated; reliable, precise and accurate clinically, not just statistically; and classifiable as strongly predictive or prognostic. In recruiting patients with lysosomal disorders for clinical trials, the use of biomarkers is a double-edged sword: Whereas biomarkers may meet all the criteria above, they must also be clearly related to disease burden in the vast majority of patients and be capable of detection at both ends of the spectrum, from very mild to severe, and equally reactive to specific therapy within the same range. If all these prerequisites cannot be met, the use of the biomarker may be unjustified clinically. The purpose of this chapter is to review the literature and practice of biomarkers in lysosomal storage diseases and use current practices to discuss guidelines for the use of biomarkers in upcoming clinical trials.
IDENTIFICATION OF SPECIFIC LYSOSOMAL STORAGE DISEASES The rarer a disease, the more likely it is that biomarkers are unavailable. In terms of some of the lysosomal diseases, of which there are more than 50, there are actually no universally recognized biomarkers other than specific protein (enzyme) or substrate markers. Thus, for four mucoploysaccharidoses disorders (MPS I, MPS II, MPS III, and MPS VI), and Pompe disease, Gaucher disease, and Fabry disease there are protein markers, either enzymes or macrophage biomarkers; and for seven diseases (MPS I, MPS II, MPS IIIA, MPS IIIB, MPS IVA, MPS VII, and Fabry disease) there are substrate markers [1]. Urinary heparan sulfates can be used to differentiate MPS IIIC (Sanfilippo C syndrome) and MPS II (Hunter disease) [2], keratin sulfate to identify MPS IV and possibly other MPS disorders [3], and antibodies against gangliosides (i.e., anti-GM2 and anti-GM3) based on animal modeling. Antibodies, monoclonal and/or polyclonal, have been generated against some of the diseases’ enzymes or substrates as well. Immunohistochemical techniques that can also be used to identify proteins [4], although not quantitative, may prove to have considerable potential as predictive markers if teamed with other techniques, such as mass spectrometry or various forms of chromatography. In the first decade of 2000, before his untimely death, Nestor Chamoles produced filter papers for dried blood spots to identify various enzymes whose deficiencies implicated a lysosomal disorder: α-l-iduronidase (MPS I), αgalactosidase (Fabry disease), β-d-galactosidase (GM1 gangliosidosis), and others [5–13]. Eventually, the technology was streamlined into a multiplex assay identification of enzymes for Fabry, Gaucher, Hurler, Krabbe, Niemann–
IDENTIFICATION OF SPECIFIC LYSOSOMAL STORAGE DISEASES
477
Pick A/B, and Pompe diseases [14] because of the recognition that simple and reliable diagnostic markers for patient identification were needed with the imminent and/or probable availability of specific therapies in these diseases. This methodological advance highlights the importance of using an easily obtainable patient sample (and in this case in small quantities) that can be transported without damage without special handling and which is highly reproducible. For the majority of the enzymes above, the filter paper system is reliable; however, in Pompe disease, for example, measured α-glucosidase activity may be a composite of other activities, making this specific assay less reliable. In its stead, and in similar cases where inhibition of nonspecific (substrate) activities cannot be totally suppressed, other assays, such as immunocapture, can be used [15]. There is also proof of principle for the use of diagnostic biomarkers from amniotic fluid in lysosomal disorders, with the express purpose of distinguishing normal from affected and even correlation with specific storage material in some of the disorders [16]. Within the past few years the list of diseases that can be profiled on the basis of various proteins, oligosaccharides, and glycolipids includes six MPS disorders and at least eight other diseases. Of note is a urinary measure of oligosacchirides (glycosaminoglycan derivatives) which has met the criteria of sensitivity and specificity in identifying persons with MPS disorders and, based on unique profiles, can differentiate among (all but MPS IIIB and MPS IIIC) subtypes [17]. Generally, to be useful as true biomarkers, some correlation with clinical disease expression (i.e., predictive or prognostic value) must be proven. A urinary diagnostic test that may be an appropriate marker of both disease progression and response to therapy is urinary globotriaosylceramide (Gb3) in Fabry disease [18], although residual enzyme activity in the blood is a poorer marker of disease status. Similarly, in Gaucher disease, the most common lysosomal storage disorder, which has a range of clinical expression from virtually asymptomatic octogenarians to lethal neonatal forms, residual activity is a poor predictor of disease severity [19]. Therefore, in other diseases, a combination of analyses of enzyme activity with genotype or molecular phenotypes has been recommended [20] [e.g., in MPS II (Hunter disease)] to improve predictability. In summary, within the past few decades various specific assays have been developed that identify patients with enzyme deficiencies and can even quantify residual enzyme activity (relative to normal controls) based on the kinetics model of Conzelmann and Sandhoff [21] of a correlation between lipid accumulation and deficient enzyme activity [22], but residual activity is not always correlated with clinical status. Alternatively, improper or “derailed” processing of the enzyme may be indicative of disease severity since these enzymes undergo trafficking from endosome to Golgi to lysosome. This thinking has been applied in estimates of lysosomal-associated membrane proteins (LAMP-1 and LAMP-2) with the expectation of uncovering a processing defect common to all lysosomal disorders that would also be predictive of disease severity [23], but this was not proven [24]. It has also
478
BIOMARKERS FOR LYSOSOMAL STORAGE DISORDERS
been suggested that mutant enzyme variants may be retained in the endoplasmic reticulum and that this may be one of the factors that determine disease severity [25]. Accumulation of lipid in the endosomes or lysosomes was shown to characterize variants of type C Niemann–Pick disease because of the presence of cholesterol [26], thereby making this a good marker, but again, this was true only for a highly specific variant of this rare disorder. It should be noted that one ramification from these and similar findings is that response to therapy, even if the modality is identical for more than one lysosomal disease, may not be uniquely or sensitively monitored using non-disease-specific markers.
IDENTIFICATION OF CLINICAL MARKERS Clinical markers with predictive or prognostic value would be attractive if quantifiable and if assessment is non-invasive. It was hoped that the use of animal models would be illustrative of human conditions. In a most recent study of murine MPS I, although it was clearly shown that thickened aortic valves and abnormal cardiac function can be monitored from the preclinical stage, the authors note that “murine MPS I is not identical to human MPS I” [27]; each has unique clinical features. Nonetheless, a more recent study in murine MPS I employing proteomic analysis of heparin cofactor II-thrombin (HCII-T), which is a serine protease inhibitor, showed highly elevated serum levels in mice and in human patients that were correlated with disease severity and responsive to therapy [28]. HCII-T, therefore, may indeed meet the requirements of a good biomarker for MPS I, especially since it implicates a specific pathophysiology. A second good example in the MPSs is the use of accumulation of a disaccharide (HNS-UA), a marker of heparin sulfate storage in disease-specific sites of MPS IIIA [29] because the rate of accumulation is commensurate with disease severity at these sites and is appropriately reactive to disease-specific therapy. Along these lines, therefore, it is commendable to find disease-specific parameters that lend themselves to quantification and test correlation with clinical severity and responsiveness to therapy. Biopsies and bone marrow aspirations or repeat radiological workups to stage severity should not be condoned if there is a better option: even, some might say, if that option does not exactly meet our criteria of a “good” biomarker. In Gaucher disease, because of concern about Gaucher-related skeletal involvement as associated with considerable morbidity, one study showed a reduction in osteoblast and osteoclast bone markers [30], but there was no correlation with incidence of bone pathology [31]. Biomarkers in this sense therefore might be misleading. Another example in Fabry disease showed no correlation between plasma concentrations of endothelial markers or homocysteine with response to therapy, although the endothelial and leucocyte activation are good measures of renal and cardiovascular involvement in Fabry disease [32].
MACROPHAGE SURROGATE BIOMARKERS
479
INFLAMMATORY MARKERS AS SECONDARY BIOMARKERS Among the most prevalent hypothetical constructs used in lysosomal storage disorders is that of inflammation as a mediator, either as causative or as consequent effect, of lipid storage material. A pathway common to all MPS disorders (originally to describe MPS VI and MPS VII) has been developed based on inflammatory reactivity of connective tissues correlated with metalloproteinases in chondrocytes [33], but it is not a qualitative measure of severity or responsiveness to therapy. In GM1 gangliosidosis because it is known that neuronal apoptosis and abnormalities in the central nervous system are secondary to storage, assessment of inflammatory cerebrospinal fluid markers showed correlation with clinical course but were not responsive to therapeutic interventions [34]. However, in a mouse model of gangliosidoses that showed disease progression with increased inflammatory cells in the microglia, the difference in GM1 (Sandhoff disease) and GM2 gangliosidosis (Tay–Sachs and late-onset Tay–Sachs disease) models was the timing of the onset of clinical signs [35], which is not always taken into consideration. Thus, while inflammation may be postulated to be either a primary or a secondary index of disease activity, not all markers meet the criteria of sensitivity or clinical relevance. Similarly, in an early study of Gaucher disease using macrophagederived inflammatory markers, there were some cytokines that correlated with disease severity and clinical parameters, but the results were equivocal in many markers [36]. In a knock-out mouse model of types A and B of Niemann– Pick disease, the macrophage inflammatory cytokine MIP-1α was elevated in disease-specific sites and declined with therapy [37], but this marker cannot be disease-specific. On a global level, however, mouse models for the various lysosomal disorders have recently shown a connection between lipid storage in the endosome or lysosome and invariant natural killer T (iNKT)-cell function, indicative of thymic involvement, albeit these findings would conflict with the theory of elaboration of inflammatory markers in lysosomal storage disorders [38].
MACROPHAGE SURROGATE BIOMARKERS Of the many avenues attempted in the various common and less common lysosomal storage diseases, none is a completely satisfactory biomarker. This is a distinct disadvantage when the alternative may be invasive procedures that are more dangerous than merited by the status of the patient. A class of biomarkers has been incorporated into the evaluation initially of Gaucher disease, but now also of Fabry disease and type B Niemann–Pick disease, which are surrogate in the sense that they measure plasma levels of macrophage lipid or chemokines. Examples of this class are chitotriosidase and C-C chemokine ligand 18 (CCL 18; also called pulmonary and activation-regulated chemokine, PARC) that can be measured in plasma and in urine. Chitotriosidase
480
BIOMARKERS FOR LYSOSOMAL STORAGE DISORDERS
in Gaucher disease [39] was considered a specific marker of disease severity and then as a measure of response to therapy [40]. Among the methodological issues with using chitotriosidase is that it is genetically deficient in 6% of all persons, and genotyping should be done. The surrogate marker CCL18/PARC was then introduced [41] because it had the advantage of being present in everyone, yet it is not nearly as elevated in patients with Gaucher disease relative to healthy individuals. An advantage of CCL18/PARC over chitotriosidase assays is the less difficult assay of CCL18/PARC. In male patients with Fabry disease, chitotriosidase levels were found to be significantly elevated but were not correlated with disease severity, although in some cases may have normalized with therapy [42]. In two siblings with type B Niemann–Pick disease, there were elevated levels of both markers, but not commensurate with clinical severity [43]. Recently, urinary levels of chitotriosidase and CCL 18/PARC have been measured in Gaucher disease, but they do not appear to correlate with plasma levels, although there was correlation after exposure to treatment [44]. Interestingly, despite its indirect relationship to diseasespecific parameters, the popularity of chitotriosidase as a putative biomarker has led to its use in testing other nonspecific markers [45]. This should not, of course, be the intention of biomarkers (i.e., that they correlate with each other) because then one is caught up in loops of correlation not one of which is related directly to a disease-specific parameter.
BIOMARKERS AND CLINICAL TRIALS Initiation of clinical trials is a costly and time-consuming commitment which has as a goal decreased time to market of a novel modality that will be a gold standard. This is definitely the case in rare diseases where the availability of a single therapeutic option that is safe and effective may be the only hope of affected individuals. If a pharmaceutical company undertakes the commitment to a clinical trial in an ultrarare disorder, the candidate modality must have tremendous promise to survive rigorous examination of the preclinical stages. By the time a putative therapy achieves phase II or phase III status, patients too will be highly motivated to see a successful treatment brought to market. Thus, on the one hand there is incentive for the company and for patients to get the treatment into the market, but on the other hand there is awareness that in clinical trials of patients, many hopeful candidate therapies do not meet their primary outcome measures, resulting in dismissal of that option. Choosing outcome measures for clinical trials is both a science and an art in rare diseases because candidate patients are few, there is not always a “dream team” in terms of disease severity, and as we all know that “stuff happens.” Things that happen that are unforeseen and uncontrollable may prevent a perfectly acceptable drug from getting to market. The current practice is to have secondary as well as primary outcome measures that one can
REFERENCES
481
assess should the primary outcome be equivocal or difficult to interpret once the clinical trial is completed. As implied above, because they are seen as adjuncts in assessing clinical efficacy of therapy, biomarkers are popular as outcome measures in clinical trials, especially by regulatory agencies. In rare diseases, however, one must be cautious in applying biomarkers merely because they are more convenient to assess than disease-specific clinical parameters. Importantly, all biomarkers are not equal in their predictive or prognostic value. This is a critical starting point in evaluating whether to include a biomarker as an outcome measure. Similarly, there is a difference, as implied above, between markers that measure disease-specific events and those (surrogate markers have a different use and can be confusing in this context) that measure events putatively related to a clinical event. In making clinical decisions, one should not rely on putatively related markers but be guided by clinically relevant parameters that correlate with disease severity. In conclusion, biomarkers are a means of better prediction and follow-up, especially from the perspective of regulatory issues involving diagnostics and novel therapeutic options. This is even more cogent in cases where ancillary or additive therapies are considered to “fine tune” previously achieved therapeutic achievements. However, one should differentiate between diagnostic markers and prognostic biomarkers before choosing the latter over the former in making clinical decisions.
REFERENCES 1. Parkinson-Lawrence E, Fuller M, Hopwood JJ, Meikle PJ, Brooks DA (2006). Immunochemistry of lysosomal storage disorders. Clin Chem, 52: 1660–1668. 2. Toma L, Dietrich CP, Nader HB (1996). Differences in the nonreducing ends of heparan sulfates excreted by patients with mucopolysaccharidoses revealed by bacterial heparitinases: a new tool for structural studies and differential diagnosis of Sanfilippo’s and Hunter’s syndromes. Lab Invest, 75:771–781. 3. Tomatsu S, Okamura K, Maeda H, et al. (2005). Keratan sulphate levels in mucopolysaccharidoses and mucolipidoses. J Inherit Metab Dis, 28:187–202. 4. Walkley SU (2004). Secondary accumulation of gangliosides in lysosomal storage disorders. Semin Cell Dev Biol, 15:433–444. 5. Chamoles NA, Blanco M, Gaggioli D (2001). Diagnosis of alpha-l-iduronidase deficiency in dried blood spots on filter paper: the possibility of newborn diagnosis. Clin Chem, 47:780–781. 6. Chamoles NA, Blanco M, Gaggioli D (2001). Fabry disease: enzymatic diagnosis in dried blood spots on filter paper. Clin Chim Acta, 308(1–2):195–196. 7. Chamoles NA, Blanco MB, Iorcansky S, Gaggioli D, Specola N, Casentini C (2001). Retrospective diagnosis of GM1 gangliosidosis by use of a newbornscreening card. Clin Chem, 47:2068.
482
BIOMARKERS FOR LYSOSOMAL STORAGE DISORDERS
8. Chamoles NA, Blanco MB, Gaggioli D, Casentini C (2001). Hurler-like phenotype: enzymatic diagnosis in dried blood spots on filter paper. Clin Chem, 47:2098–2102. 9. Chamoles NA, Blanco M, Gaggioli D, Casentini C (2002). Gaucher and Niemann–Pick diseases—enzymatic diagnosis in dried blood spots on filter paper: retrospective diagnoses in newborn-screening cards. Clin Chim Acta, 317: 191–197. 10. Chamoles NA, Blanco M, Gaggioli D, Casentini C (2002). Tay–Sachs and Sandhoff diseases: enzymatic diagnosis in dried blood spots on filter paper: retrospective diagnoses in newborn-screening cards. Clin Chim Acta, 318:133–137. 11. Chamoles NA, Niizawa G, Blanco M, Gaggioli D, Casentini C (2004). Glycogen storage disease type II: enzymatic screening in dried blood spots on filter paper. Clin Chim Acta, 347:97–102. 12. Wang D, Eadala B, Sadilek M, et al. (2005). Tandem mass spectrometric analysis of dried blood spots for screening of mucopolysaccharidosis I in newborns. Clin Chem, 51:898–900. 13. Niizawa G, Levin C, Aranda C, Blanco M, Chamoles NA (2005). Retrospective diagnosis of glycogen storage disease type II by use of a newborn-screening card. Clin Chim Acta, 359:205–206. 14. Gelb MH, Turecek F, Scott CR, Chamoles NA (2006). Direct multiplex assay of enzymes in dried blood spots by tandem mass spectrometry for the newborn screening of lysosomal storage disorders. J Inherit Metab Dis, 29:397–404. 15. Umapathysivam K, Hopwood JJ, Meikle PJ (2005). Correlation of acid alphaglucosidase and glycogen content in skin fibroblasts with age of onset in Pompe disease. Clin Chim Acta, 361:191–198. 16. Ramsay SL, Maire I, Bindloss C, et al. (2004). Determination of oligosaccharides and glycolipids in amniotic fluid by electrospray ionisation tandem mass spectrometry: in utero indicators of lysosomal storage diseases. Mol Genet Metab, 83:231–238. 17. Fuller M, Rozaklis T, Ramsay SL, Hopwood JJ, Meikle PJ (2004). Disease-specific markers for the mucopolysaccharidoses. Pediatr Res, 56:733–738. 18. Whitfield PD, Calvin J, Hogg S, et al. (2005). Monitoring enzyme replacement therapy in Fabry disease: role of urine globotriaosylceramide. J Inherit Metab Dis, 28:21–33. 19. Fuller M, Lovejoy M, Hopwood JJ, Meikle PJ (2005). Immunoquantification of beta-glucosidase: diagnosis and prediction of severity in Gaucher disease. Clin Chem, 51:2200–2202. 20. Sukegawa-Hayasaka K, Kato Z, Nakamura H, et al. (2006). Effect of Hunter disease (mucopolysaccharidosis type II) mutations on molecular phenotypes of iduronate-2-sulfatase: enzymatic activity, protein processing and structural analysis. J Inherit Metab Dis, 29:755–761. 21. Conzelmann E, Sandhoff K (1991). Biochemical basis of late-onset neurolipidoses. Dev Neurosci, 13:197–204. 22. Schueler UH, Kolter T, Kaneski CR, Zirzow G, Sandhoff K, Brady RO (2004). Correlation between enzyme activity and substrate storage in a cell culture model system for Gaucher disease. J Inherit Metab Dis, 27:649–658.
REFERENCES
483
23. Zimmer KP, le Coutre P, Aerts HM, et al. (1999). Intracellular transport of acid beta-glucosidase and lysosome-associated membrane proteins is affected in Gaucher’s disease (G202R mutation). J Pathol, 188:407–414. 24. Meikle PJ, Ranieri E, Simonsen H, et al. (2004). Newborn screening for lysosomal storage disorders: clinical evaluation of a two-tier strategy. Pediatrics, 114: 909–916. 25. Ron I, Horowitz M (2005). ER retention and degradation as the molecular basis underlying Gaucher disease heterogeneity. Hum Mol Genet, 14:2387–2398. 26. Sun X, Marks DL, Park WD, et al. (2001). Niemann–Pick C variant detection by altered sphingolipid trafficking and correlation with mutations within a specific domain of NPC1. Am J Hum Genet, 68(6):1361–1372. 27. Braunlin E, Mackey-Bojack S, Panoskaltsis-Mortari A, et al. (2006). Cardiac functional and histopathologic findings in humans and mice with mucopolysaccharidosis type I: implications for assessment of therapeutic interventions in Hurler syndrome. Pediatr Res, 59:27–32. 28. Randall DR, Sinclair GB, Colobong KE, Hetty E, Clarke LA (2006). Heparin cofactor II-thrombin complex in MPS I: a biomarker of MPS disease. Mol Genet Metab, 88:235–243. 29. King B, Savas P, Fuller M, Hopwood J, Hemsley K (2006). Validation of a heparan sulfate–derived disaccharide as a marker of accumulation in murine mucopolysaccharidosis type IIIA. Mol Genet Metab, 87:107–112. 30. Drugan C, Jebeleanu G, Grigorescu-Sido P, Caillaud C, Craciun AM (2002). Biochemical markers of bone turnover as tools in the evaluation of skeletal involvement in patients with type 1 Gaucher disease. Blood Cells Mol Dis, 28:13–20. 31. Ciana G, Addobbati R, Tamaro G, et al. (2005). Gaucher disease and bone: laboratory and skeletal mineral density variations during a long period of enzyme replacement therapy. J Inherit Metab Dis, 28:723–732. 32. Demuth K, Germain DP (2002). Endothelial markers and homocysteine in patients with classic Fabry disease. Acta Paediatr Suppl, 91:57–61. 33. Simonaro CM, D’Angelo M, Haskins ME, Schuchman EH (2005). Joint and bone disease in mucopolysaccharidoses VI and VII: identification of new therapeutic targets and biomarkers using animal models. Pediatr Res, 57:701–707. 34. Satoh H, Yamato O, Asano T, et al. (2007). Cerebrospinal fluid biomarkers showing neurodegeneration in dogs with GM1 gangliosidosis: possible use for assessment of a therapeutic regimen. Brain Res, 1133:200–208. 35. Jeyakumar M, Thomas R, Elliot-Smith E, et al. (2003). Central nervous system inflammation is a hallmark of pathogenesis in mouse models of GM1 and GM2 gangliosidosis. Brain, 126:974–987. 36. Hollak CE, Evers L, Aerts JM, van Oers MH (1997). Elevated levels of M-CSF, sCD14 and IL8 in type 1 Gaucher disease. Blood Cells Mol Dis, 123:201–212. 37. Dhami R, Passini MA, Schuchman EH (2006). Identification of novel biomarkers for Niemann–Pick disease using gene expression analysis of acid sphingomyelinase knockout mice. Mol Ther, 13:556–564. 38. Gadola SD, Silk JD, Jeans A, et al. (2006). Impaired selection of invariant natural killer T cells in diverse mouse models of glycosphingolipid lysosomal storage diseases. J Exp Med, 203:2293–2303.
484
BIOMARKERS FOR LYSOSOMAL STORAGE DISORDERS
39. Hollak CE, van Weely S, van Oers MH, Aerts JM (1994). Marked elevation of plasma chitotriosidase activity: a novel hallmark of Gaucher disease. J Clin Invest, 93:1288–1292. 40. Czartoryska B, Tylki-Szymanska A, Gorska D (1998). Serum chitotriosidase activity in Gaucher patients on enzyme replacement therapy (ERT). Clin Biochem, 31:417–420. 41. Boot RG, Verhoek M, de Fost M, et al. (2004). Marked elevation of the chemokine CCL18/PARC in Gaucher disease: a novel surrogate marker for assessing therapeutic intervention. Blood, 103:33–39. 42. Vedder AC, Cox-Brinkman J, Hollak CE, et al. (2006). Plasma chitotriosidase in male Fabry patients: a marker for monitoring lipid-laden macrophages and their correction by enzyme replacement therapy. Mol Genet Metab, 89:239–244. 43. Brinkman J, Wijburg FA, Hollak CE, et al. (2005). Plasma chitotriosidase and CCL18: early biochemical surrogate markers in type B Niemann–Pick disease. J Inherit Metab Dis, 28:13–20. 44. Boot RG, Verhoek M, Langeveld M, et al. (2006). CCL18: a urinary marker of Gaucher cell burden in Gaucher patients. J Inherit Metab Dis, 29:564–571. 45. Moller HJ, de Fost M, Aerts H, Hollak C, Moestrup SK (2004). Plasma level of the macrophage-derived soluble CD163 is increased and positively correlates with severity in Gaucher’s disease. Eur J Haematol, 72:135–139.
26 VALUE CHAIN IN THE DEVELOPMENT OF BIOMARKERS FOR DISEASE TARGETS Charles W. Richard, III, M.D., Ph.D., Arthur O. Tzianabos, Ph.D., and Whaijen Soo, M.D., Ph.D. Shire Human Genetic Therapies, Cambridge, Massachusetts
INTRODUCTION Biomarkers have only recently come into use as an important tool in the development and clinical testing of therapeutic agents. The value of biomarker development was realized following the failure of drugs to achieve success in late-stage clinical trials in the 1990s. In many cases, small molecules and biologics being tested in clinical trials showed efficacy in preclinical testing as well as in phase I and phase II clinical trials, but failed to meet clinical endpoints once tested in expanded phase III trials. Initially, pharmaceutical and biotechnology companies sought to improve clinical trial design through selection of more refined clinical endpoints. This was augmented by an effort to better understand the mechanism of action of the therapies being tested. These initiatives did improve the success rate of clinical trials in general and yielded an understanding in many cases of why therapies did not work in phase III trials. However, it became clear with time that these measures addressed only part of the overall shortcomings in the drug development process.
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
485
486
VALUE IN BIOMARKER DEVELOPMENT
As more and more clinical trial results became available and were analyzed, the underlying problem with the process of drug development became evident. Clinical trials focused mainly on safety in phase I and early phase II trials. Efficacy data were only obtained in late-stage phase II trials and in expanded phase III trials. This approach squandered the opportunity to obtain meaningful efficacy data in phase I and early phase II testing. While the number of patients in phase I trials is usually smaller than in phase II trials, the opportunity to obtain data in humans was being lost. This realization led to an effort to correlate changes in biological markers or biomarkers in humans with therapies being tested. The ability to monitor changes in the level of naturally occurring cytokines, cell surface molecules, signaling cascades, inflammatory mediators, or metabolic products that correlate with amelioration of disease following treatment gave investigators additional information about the effect of these therapies. This information could then be correlated with clinical outcomes to better understand much earlier in the drug development timeline whether a compound or biologic is effective.
VALUE OF BIOMARKER DEVELOPMENT USING PRECLINICAL MODELS The utility of biomarkers in clinical development is increased through the use of preclinical animal models of disease that recapitulate the major hallmarks of human disease being targeted for drug development. Therefore, biomarker development really does begin in the preclinical stage of drug development and relies heavily on selection and validation of good preclinical models of disease. The selection of good preclinical models is an often overlooked component of drug development. This is the first step in building the value chain of a good biomarker(s) for use in clinical testing. Disease models should faithfully manifest the major aspects of human disease. This should be reflected in the development of the major and minor clinical signs that occur in humans but also in the pathogenesis of disease that leads to overt clinical signs. For example, the cytokines TNFα and IL-6 have been identified in preclinical animal models as major drivers in the pathogenesis of autoimmune diseases such as rheumatoid arthritis (RA; [1–3]). The finding that increased levels of these cytokines circulate in mice with RA and that these levels correlate with the severity of disease in these animals identified TNFα and IL-6 as potential biomarkers for human disease. Further, demonstration that that these levels decrease on immunosuppressive therapies confirmed their potential usefulness as a biomarker for the testing of novel drugs for RA in the clinic [1–3]. It is often the case that preclinical models for a given disease target do not exist or have been poorly developed. In this situation, efforts need to be directed toward understanding the underlying factors contributing to
STRATEGIES FOR DEVELOPING BIOMARKERS
487
disease pathogenesis through the development of animal models that recapitulate the hallmarks of human disease. These basic science exploratory studies can be difficult and time consuming. However, the value of these studies is realized when testing reveals that a therapy can modulate levels of the identified biomarkers and this correlates with a positive effect of the drug on disease endpoints in a clinically relevant animal model. Additional studies that investigate the effect of dose and regimen of a given therapy on modulation of these biomarkers are ultimately the key experiments that will inform on the dosing regimen to be used in human clinical trials. This information is critical, as it is a valuable component in the value chain of drug development. It is now clear that the effort spent on biomarker development in preclinical animal models at the front end of the drug development process facilitates a more informed approach to clinical trial design in humans. This ultimately translates into a higher success rate for therapies tested in the clinic.
STRATEGIES FOR DEVELOPING BIOMARKERS USING PRECLINICAL ANIMAL MODELS There are several strategies that can be employed for the development of biomarkers using preclinical animal models. This often involves utilization of existing information about the pathogenesis of a given disease in relevant models. However, in most cases this requires basic research designed to understand what biomarkers correlate with the development of disease and if these biomarkers can be modulated by therapy in a meaningful way. When considering animal models for biomarker development, it is important to distinguish between the types of models available. The best-case scenario is to utilize (or develop) a model that mimics the major and minor factors that drive the development of disease in humans. The identification of these factors as the cause of disease in these models allows for the ability to monitor and correlate them with disease severity. It then becomes important to determine if these factors respond to standard therapy known to reduce disease in humans. This is the best-case scenario moving forward if searching for new, improved therapeutic agents through head-to-head testing with standard therapies can be achieved. However, it is most often the case that therapeutic agents are being developed for a disease for which there is no current treatment and/or there are no biomarkers of disease pathogenesis. In these situations, basic research is required for biomarker identification. The current approaches are varied and not uniform. Often, a good starting point for this initiative involves thorough research of the existing scientific literature to understand what serum or tissue factors are increased or decreased in patients who manifest disease and whether these factors are predictive of disease or disease severity. In addition, it is important to understand if these factors are a driving force in disease
488
VALUE IN BIOMARKER DEVELOPMENT
pathogenesis. If this is the case, these factors could be good candidates as useful biomarkers to monitor the efficacy of potential therapeutic agents. As cited above, TNFα and IL-6 are cytokines that are generally increased in sera taken from patients with RA [4]. These peptides are proinflammatory cytokines that appear early in the cascade of cytokines that ultimately lead to joint inflammation and the pathogenesis of this disease. The ability of drugs known to have a therapeutic effect on RA to decrease the levels of these cytokines in preclinical animal models demonstrated their value as biomarkers for the testing of new therapies. This highlights the usefulness of identifying serum biomarkers that play a central role in the pathogenesis of disease, as their levels typically increase in a manner that correlates with disease progression. In cases where there is no literature to support biomarker identification, the development of genomic and proteomic techniques has created the opportunity for de novo biomarker discovery. If good animal models are available, genome-wide microarray analysis of cells or tissues obtained during the onset or progression of disease could lead to the identification of genes that are upor down-regulated. These data need to be confirmed with additional proteomic studies to validate the potential role of the identified gene products, while additional structure–function studies in animal models need to be done to determine if these gene products correlate with disease progression and/or play a central role in disease pathogenesis. Finally, studies need to be performed to determine if these biomarkers respond to therapies known to affect the disease in animal models. Once these basic research studies are performed in animal models, the identification of potential biomarkers needs to be validated in humans. This can be done in clinical trials in the patient target population using similar molecular techniques. The biomarkers selected need to be amenable to identification in easy-to-obtain clinical specimens such as serum or peripheral blood cells. However, once validated in humans, these biomarkers can serve as a very important tool in the value chain, leading to the testing and evaluation of novel therapeutic agents.
VALUE OF BIOMARKER DEVELOPMENT AND USE IN CLINIC TRIALS The sequencing of the human genome has presented a bewildering array of new targets for drug discovery that are only now being sorted out through more sophisticated systems biology approaches to biological pathway analysis. Most of these novel targets in drug discovery have not been proven pharmacologically in humans, so early readouts of potential clinical effectiveness in human clinical trials through biomarker analysis provides the much needed confidence that the drug affects the intended target in vivo. Moving beyond biological proof of principle, the ideal biomarker is one that can be measured
PERSONALIZED MEDICINE AND PATIENT STRATIFICATION
489
more easily, more frequently, and more accurately in humans and predicts early response to treatment or is an early indicator of clinical benefit. Since clinical efficacy is often apparent only after extended study in relatively large populations, identifying the most sensitive early biomarker of improvement in pathophysiology in smaller populations in shorter trials is an important research goal. Ideally, this search for the most sensitive and robust biomarker readout has been incorporated into earlier proof-of-efficacy animal studies, as it is almost too late for the investigative and experimental work to begin at activities immediately before filing the investigational new drug (IND) application and first-in-human clinical trials. Toward this end, most big pharmaceutical companies have adopted the incorporation of biomarker teams (either separate-line functions or matrixed teams) into the late discovery research process. Incorporation of potential biomarkers should be considered as part of all traditional phase IIa trials, but is considered increasingly as part of early, nonregistration, exploratory translational medicine studies. Moving from proof of pathway perturbation to using the biomarker to establish the optimal dosing regimen is an important downstream consideration. Finely tuning dosing regimens to clinical outcome measures is rarely satisfactory, so assessing pharmacodynamic endpoints usually requires a sensitive and easily measured biomarker. Optimal biomarkers are those that can be sampled safely upon repeated measurements and are traditionally thought of as biochemical markers of bodily fluids, especially serum samples, but can include serial noninvasive imaging studies. In some instances, limited tissue biopsy material can be procured for histochemical and immunohistochemical analysis.
USE OF BIOMARKERS FOR PERSONALIZED MEDICINE AND PATIENT STRATIFICATION Biomarkers in clinical trials are also valuable if they can predict which subjects are most likely to respond. This is most obvious in the selection of patients for targeted chemotherapy in subgroups of cancers, but any biomarker that can potentially stratify patients into groups most likely to respond to treatment is potentially valuable. Examples from marketed products that use biomarkers for cancer responder selection based on gene expression include Her2 (trastuzumab), c-kit (imatinib), epidermal growth factor receptor (EGFR; erlotinib, cetuximab), and Philadelphia chromosome (imatinib). Biomarkers are also useful if patient subpopulations at risk for toxicity can be identified for exclusion from trials of efficacy. Examples from marketed products include the screening of patients prior to therapy for glucose 6-phosphate dehydrogenase deficiency (G6PD; Dapsone), dihydropyrimidine dehydrogenase deficiency (DPD; fluorouracil) and ornithine transcarbamylase deficiency (OTC; valproic acid), since deficiency of these biomarkers leads to severe toxicity. The promise of pharmacogenetics and the discovery of genetic variants in
490
VALUE IN BIOMARKER DEVELOPMENT
DNA that predicts efficacy or toxicity has gone largely unfulfilled, but some examples in marketed products do exist. Genetic variations in Nacetyltransferase (NAT; isoazanide), thiopurine methyltransferase (TPMT; 6-MP, azothioprine), UDP-glucuronyslytransferase-1 (UGT1A1; irinotecan), and several liver cytochrome P450–metabolizing enzymes [CYP2D6 (Strattera), CYP2C19, and CYP2C9 (warfarin)] have been proven to cause increased drug exposure, leading to toxicity, and are used for dosage adjustment [5]. Great strides have been made in recent years in automated microarray systems for surveying the entire genome for single-nucleotide polymorphisms and copy-number variation. Large numbers of patients are needed to uncover small genetic effects, so subtle differences in efficacy or rare idiosyncratic toxicology reactions will seldom be uncovered in limited phase I/II or other exploratory trials. That said, large efforts are under way by many large pharmaceutical companies to bank DNA from larger phase III and postmarketing trials to conduct genome-wide association studies with increasing sophisticated statistical genetic analysis that may identify single-nucleotide polymorphisms or copy-number variance that correlate with treatment response or toxicity.
BIOMARKER DEVELOPMENT IN PARTNERSHIPS WITH REGULATORY AGENCIES The most widespread use of biomarkers in current practice is for internal decision making, although the aspirational goal remains the development of surrogate biomarkers that can substitute for clinically validated endpoints for registration studies. Surrogate biomarkers are typically endpoints in therapeutic intervention trials, although surrogates are sometimes used in natural history or epidemiologic studies. To assist pharmaceutical companies in this effort, the U.S. Food and Drug Administration (FDA) has published a guidance document for pharmacogenomic data submissions that defines the concept of valid biomarker, probably valid biomarker, and exploratory pharmacogenomic data for regulatory decision making. The PG guidance document [6] defines known biomarkers as accepted by the scientific community at-large to predict clinical outcome, valid biomarkers as having predictive value but not yet replicated or widely accepted, and exploratory biomarkers as those found in exploratory hypothesis generation, often in the context of whole genome genomic, proteomic, or metabolomic analysis. Perhaps the most effective use of exploratory surrogate biomarkers in clinical drug development occurs in the context of 21 CFR 314 and 601 Accelerated Approval Rule (1992), which allows for surrogate or nonultimate clinical registration endpoints. Fast-track designation is granted by the sponsor of the development program for a specific indication of a specific drug or biological program “to facilitate the development and expedite the review of new drugs that are intended to treat serious or life-threatening conditions and that dem-
REFERENCES
491
onstrate the potential to address unmet medical conditions.” Accelerated approval is often based on less well established surrogate endpoints or clinical endpoints. But postmarketing data are then required to verify and describe the drug’s clinical benefit and to resolve remaining uncertainty as to the relation of the surrogate endpoint upon which approval was based to clinical benefit, or the observed clinical benefit to ultimate outcomes. In summary, the continued development of biomarkers and true surrogate markers throughout the drug development value chain in the service of drug registration will remain an important activity to increase efficiency in the drug development process and to understand at the earliest time point in the pipeline whether a new chemical entity is likely to succeed or fail. Academic, pharmaceutical, and regulatory agency partnerships to develop biomarkers collaboratively within such initiatives as the FDA Critical Path Initiative [7] will help to “modernize the scientific process through which a potential human drug, biological product, or medical device is transformed from a discovery or proof of concept into a medical product” [8].
REFERENCES 1. Miyata S, Ohkubo Y, Mutoh S (2005). A review of the action of tacrolimus (FK506) on experimental models of rheumatoid arthritis. Inflamm Res, 54(1):1–9. 2. Moller B, Villiger PM (2006). Inhibition of IL-1, IL-6, and TNF-alpha in immunemediated inflammatory diseases. Springer Semin Immunopathol, 27(4):391–408. 3. Rose-John S, Waetzig GH, Scheller J, Grotzinger J, Seegert D (2007). The IL-6/ sIL-6R complex as a novel target for therapeutic approaches. Expert Opin Ther Targets, 11(5):613–624. 4. Kremer JM, Davies JM, Rynes RI, et al. (1995). Every-other-week methotrexate in patients with rheumatoid arthritis: a double-blind, placebo-controlled prospective study. Arthritis Rheum, 38(5):601–607. 5. Freuh FF (2006). Qualifications of genomics biomarkers for regulatory decision making. Presented at the Annual DIA EuroMeeting, Paris, Mar. 7, 2006. http:// www.fda.gov/Cder/genomics/presentations/DIA_Eur4.pdf. 6. FDA (2005). U.S. FDA Center for Drug Evaluation and Research Guidance for Industry: Pharmacogenomic Data Submissions. http://www/fda/gpv/cder/ guidance/6400fnl.pdf (accessed Jan. 11, 2007). 7. FDA (2007). U.S. FDA’s Critical Path Initiative. http://www.fda.gov/oc/initiatives/ criticalpath (accessed Jan. 11, 2007). 8. Buckman S, Huang SM, Murphy S (2007). Medicinal product development and regulatory science for the 21st century: the Critical Path vision and its impact on health care. Clin Pharmacol Ther, 81(2):141–144.
PART VII LESSONS LEARNED: PRACTICAL ASPECTS OF BIOMARKER IMPLEMENTATION
493
27 BIOMARKERS IN PHARMACEUTICAL DEVELOPMENT: THE ESSENTIAL ROLE OF PROJECT MANAGEMENT AND TEAMWORK Lena King, Ph.D., DABT CanBioPharma Consulting, Inc., Guelph, Ontario, Canada
Mallé Jurima-Romet, Ph.D. MDS Pharma Services, Montreal, Quebec, Canada
Nita Ichhpurani, B.A., PMP MDS Pharma Services, Mississauga, Ontario, Canada
INTRODUCTION: PHARMACEUTICAL PROJECT TEAMS The research-based pharmaceutical industry is one of the most complex industries in the world. Discovery and development teams constitute a wellestablished model to manage the complexity and integrated activities to guide projects in pharmaceutical development. Organizational models and composition of these teams vary between companies, depending on the size and business strategy of the company, but they are always multidisciplinary in nature. The discovery team is charged with discovering and developing new leads. This team may include scientists with expertise in disease models,
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
495
496
BIOMARKERS IN PHARMACEUTICAL DEVELOPMENT
target identification, high-throughput screening, molecular biology, combinatorial chemistry, medicinal chemistry, and imaging. The development team is generally formed once a decision has been made to fund development of a new pharmaceutical lead for eventual registration. Development teams include preclinical disciplines (pharmacology, pharmacokinetics, and toxicology), pharmaceutical development (pilot and production chemists and/or biopharmaceutical expertise, formulation), regulatory affairs, clinical development, and commercial and marketing expertise. The development team often has a formal project management structure with a project team leader and a project manager. In smaller organizations, a project manager may also serve as the team leader. Project management serves a critical role in supporting and driving forward the drug development process for the drug candidate chosen. Particularly in recent years, the organizational structure of the discovery and development teams has been changing to adapt to internal and external demands, and the decreasing productivity and increasing costs associated with pharmaceutical development. To meet these challenges, capitalize on new technologies, and improve quality of decision making, companies are fostering collaborations between discovery and development scientists. The discovery teams increasingly include scientists with experience in DMPK (drug metabolism and pharmacokinetics), toxicology, clinical development, and project management to streamline or translate the research from discovery into development. Translational research is being proposed as the bridge between the perceived discovery–development silos and is emerging as a cross-functional discipline in its own right. As illustrated in Figure 1, some organizations have created an explicit biomarker or translational research unit that is represented on the development project team. Other organizations have adopted an implicit model in which biomarkers are part of the function of existing discovery and development units. A third option is a hybrid model that partners biomarker work in discovery and development without the creation and funding of a separate biomarker unit [1]. In addition to internal organizational restructuring, partnering between companies and outsourcing some parts of development (or in the case of virtual companies, all of development) to contract research organizations (CROs) is also becoming more common. These partnerships or alliances cover a wide spectrum of transactions and disciplines. Formalized alliance structures with contracts, governance, and team-specific guidance may not be in place for pharmaceutical development teams. However, even when a drug development team includes only partners from one company, it has been suggested that the project team is an implicit alliance and when including external partners may be an explicit alliance [2]. For small discovery startup companies, CROs may provide not only the conduct of studies necessary to help new candidates to progress through discovery and development but often also the essential development expertise and will act in implicit partnership with the sponsor. Thus, the concepts and processes developed for alliances, and their
INTRODUCTION: PHARMACEUTICAL PROJECT TEAMS
Implicit
Explicit
TMed objectives owned by pre-existing organizational entities
Clearly identifiable organizational structure dedicated to TMed
R& D
R& D
Biomarker Discovery Discovery
497
Biomarkers
Clinical
Biomarker Development and utilization
Hybrid Sharing of responsibilities for TMed between existing and new organizational entities R& D
Discovery
Clinical
Biomarkers
Figure 1 Translational research (TMed) organizational models. (Adapted from ref. 1, with permission from Drug Discovery World.)
success stories, are instructive for drug development teams [3]. In research and development, an alliance provides a venue for access to complementary knowledge, new and different ideas, and processes with shared risk and reward. The following core principles pertain to alliances: 1. Goals and outcomes are shared with equitable sharing of risk and reward. 2. Participants have equal say in decisions, and each participant should have full management support. 3. Decision criteria must be based on what is best for the project rather than for individual participants. 4. The team operates in a culture of open and honest communication. A recent survey of formalized research and development (R&D) alliances evaluated the contribution of alliance design (i.e., the number and type of partners, geographic proximity, R&D knowledge, and capabilities of each partner) and alliance management (governance agreements and processes) to the success of the alliance. The results showed that the alliance could generally
498
BIOMARKERS IN PHARMACEUTICAL DEVELOPMENT
be designed with appropriate and complementary expertise. The number of partners and the presence of competitors among the partners had no overall effect on the success of the alliance. Effective contractual provisions and governance had a positive effect on the measures for alliance success. However, the most pronounced positive predictors of success were the frequency of communication and how ambitious a project was. The more ambitious projects were a strong predictor for success [4]. The success factors identified for other R&D alliances apply also to successful project teams involved in pharmaceutical development (Table 1). Managing the project within budget and with appropriate resources is a major responsibility. For the pharmaceutical industry, the need for cost containment is providing compelling arguments for introducing high-value decision gates earlier in the development process. As illustrated in Table 2, biomarkers are one of the most important and tangible tools for facilitating translational research, moving data-driven decision making earlier into development, and for guiding development to the most appropriate indication and patient subpopulation. Although these additional decision gates can be helpful
TABLE 1
Success Factors of a Drug Development Project Team
Predictors of Success of R&D Alliances
Successful Drug Development Team
Appropriate number, type of partner with complementary R&D knowledge and capabilities
“The right partners.”
Effective contractual provisions and governance
“Good plan and good execution”—a wellunderstood and management-supported plan.
Excellent and transparent communication
Consultative team interactions Team leader and project manager guide the team for decisions that are “on time, within scope and budget.” Develop trust between team members and the ability to work efficiently and effectively in a context of imperfect, incomplete, and unexpected information. Solutions are sought without attribution of blame. Innovative thinking and ideas are encouraged.
Ambitious projects
With development time lines spanning decades, these are inherently ambitious projects that require champions to obtain resources and management support.
TEAM DYNAMICS: PHARMACEUTICAL PROJECT TEAMS
TABLE 2
499
Biomarkers in the Pharmaceutical Development Cycle
Discovery/ Preclinical Stage Defining mechanism of action Compound selection PK/PD modeling Candidate markers for clinical trials Better prediction by animal models through translational research
Phase I–IIa
Phase IIb–III
Phase IIIb–IV
Demonstrating clinical proof of concept Dose and scheduling optimization Optimization of patient population Applications in new therapeutic indications
Minimize trial sizes through accurate inclusion and exclusion Maximize success rates by early confirmation of efficacy Potential for primary or secondary surrogate endpoints
Differentiation of products in marketplace through superior profiling of response Differentiation in subpopulations (gender, race, genetics) Personalized medicine (co-development of diagnostic)
for the team, the inclusion of biomarkers adds complexity to the traditional linear model of drug development with a more reiterative process for the project team to manage.
TEAM DYNAMICS: PHARMACEUTICAL PROJECT TEAMS The development team has members with complementary technical skills. The management of the complex process of pharmaceutical development requires that these highly skilled knowledge workers engage, relate, and commit to a shared goal with defined milestones. These team interactions have to occur in a dynamic environment where (1) studies and experiments continually generate results that may fundamentally change the course of development, (2) management support and priority may be low compared to other projects, (3) team members may be geographically dispersed, and (4) resources for conduct of studies and other activities often are not controlled directly by the team. At its best, the pharmaceutical development team provides an environment that is mutually supportive, respectful, and enables discussion on controversial issues. It is open to new ideas, agile and constructive in addressing new issues, and has goals and strategy supported and understood by management. The project leader and the project manager should strive to generate an environment that is as conducive as possible to provide this ideal. These are not
500
BIOMARKERS IN PHARMACEUTICAL DEVELOPMENT
features specific to biomarker development, but as mentioned below, including novel biomarkers will add to the complexity of the development project and require additional attention and management due to the increased number of communication channels. Following are some of the general principles of a productive team environment: 1. Include and plan for a project kickoff meeting and face-to-face meetings. 2. Define the roles and responsibilities of each team member. 3. Operate in a spirit of collaboration with a shared vision. 4. Practice active listening. 5. Practice transparent decision making; determine how decisions will be made and the role of each team member in the process. 6. Encourage all team members to engage in debate about strategic issues. 7. Spend time and energy to define objectives. 8. Engage and communicate actively with management. 9. Decide but revisit which communication tools are optimal. 10. Recognize and respect differences. 11. Plan for adversity. 12. Plan for the expected as well as the unexpected. There are a number of excellent books that discuss team dynamics and team management [5–7]. Pharmaceutical scientific organizations are beginning to offer continuing education courses in program management, and dedicated pharmaceutical training courses are available [8]. However, effective drug development project leaders and managers do not come out of university programs or training centers. The understanding of how all the complex pieces of drug development come together can best be learned through hands-on experience as a team member, team leader, or project manager. Typically, it takes many years of working within the industry to gain sufficient knowledge of the drug development process to be an effective project team leader or manager.
CONSEQUENCES OF BIOMARKERS IN PHARMACEUTICAL DEVELOPMENT STRATEGIES Biomarkers are not new in pharmaceutical development. The interpretation, clinical significance, and normal variation of established biomarkers are generally well understood and widely accepted (discussed elsewhere in this book). Their utility, normal variation, and significance have been evaluated and corroborated in many different research and clinical studies. However, novel biomarkers that are now emerging may be available at only a single or a few
PROJECT MANAGEMENT
501
vendors or laboratories. The assays may be technically and scientifically complex, results dependent on platform, and limited data may be available on their normal variation and biological significance. Modern computational techniques allow for powerful multiplex analysis, binning of multiple parameters, and analysis of multiple biomarker on an individual animal or patient basis. These capabilities provide exciting opportunities for advancing the science; however, there are few published or marketed tools for choosing, planning, implementing, and evaluating the risk–cost benefit of biomarkers in pharmaceutical development. The risk–cost benefit for the biomarker may also be dependent on the size of the company, their portfolio, and the financing model. Large pharmaceutical companies’ investment decisions for including novel biomarker strategy may be different from those of startup companies. A larger company may be able to offset the costs of biomarker development and implementation by applying the biomarkers to multiple projects and compounds. A startup company may include a biomarker throughout development despite uncertainty as to its ultimate utility; the company accepts the risk associated with new information emerging during the development process. By contrast, a larger company may require prior assessment of the value of including the biomarker in expediting development and improving development decisions. The integral role of biomarkers in decision making is discussed in Chapter 3 of this book, but this aspect of biomarkers also has implications for project management and teamwork within a drug development team. Following are some of the consequences of employing novel biomarkers or a unique biomarker strategy in pharmaceutical development: • • • •
High levels of investment in infrastructure Multiple technological platforms and specialized expertise High demands on data management Increased complexity in study designs, sample logistics, and study data interpretation • Uncertainty and ambiguity for strategic decision making • Confidence and acceptance in translation and interpretation of results may be low • Lack of precedence for using biomarkers in novel regulatory alternatives such as exploratory IND • Ethical issues: for example, tissue banking, privacy, and data integrity • Evolving regulatory environment with changing requirements and expectations
PROJECT MANAGEMENT The following systematic tools and processes approach available for project management [9,10] can be applied to management of biomarker programs:
502
BIOMARKERS IN PHARMACEUTICAL DEVELOPMENT
Gantt charts Contracts, scope documents Meeting minutes Communication plans RACI (responsible, accountable, consulted, informed) charts Lessons-learned tools Milestone charts
Decision trees Risk analysis logs PERT (performance, evaluation, review, tracking) charts Work breakdown structures Budget tracking Lean-sigma tools
Process mapping with the team is useful to ensure that all aspects of the biomarker project management are well understood. The level of detail can range from GANTT charts (Figure 2) designed principally to track time lines to program-wide integrated biomarker strategies. An example of the latter for development of an oncology candidate is illustrated in Figure 3. The development strategy has to include open discussion about the advantages and disadvantages of including the biomarker, recognizing that this often has to occur in the absence of clear and straightforward knowledge of the value of the biomarker across species and in a specific disease or subset of patients. Increasingly, project teams are expected to analyze risk associated with different activities and develop contingency plans far in advance of the actual activity occurring. Risks associated with various biomarkers (e.g., timely assay qualification, sampling logistics, patient recruitment, regulatory acceptance) have to be part of this analysis and contingency plan development. There are also numerous stakeholders beyond the development team who influence and may guide the development of the pharmaceutical: • Sponsor (may include different departments with diverse goals/ interests) • Business analysts • Investors (small companies) or shareholders • Regulators • CROs and/or biomarker labs • Investigators • Patients • Patient support/interest groups For the project management team, it is important to identify the stakeholders and evaluate their diverse and potentially competing priorities, particularly their perspectives on the benefit–risk effects of the new pharmaceutical. Novel targets with a drug producing an effect on a pharmacodynamic biomarker may be the last ray of hope for patients with serious or life-threatening diseases. Patients and sometimes physicians dealing with these diseases may have a different and often highly personal benefit–risk perspective compared to
503
Figure 2
Sample GANTT chart of a drug development plan incorporating biomarkers.
504
Is selected biomarker relevant to in vitro testing
Biomarker discovery effort
Is selected biomarker relevant to in vivo testing?
Information gathering on known markers (literature review, expert opinion gathering, etc.) YES
Is there a biomarker linked to (a) mechanism of action or (b) type of NO cancer targeted?
Associate cancer(s) type to your chosen mechanism of action Breast, colon, prostate, lung, brain
Therapeutic desired effect Cytotoxic or cytostatic? Adjuvant therapy? Side-effect attenuation? Primary tumor growth? Metastasis?
Define specific disease and Mechanism targeted Apoptosis Tumor invasion Angiogenesis Signal transduction Cell replication etc..
YES
NO
Figure 3
YES
Can you conduct another study with appropriate model and/or dose?
Oncology biomarker map.
Preclinical GLP-like validation
Biomarker is tied to mechanism of action Proceed to ADME/Tox studies and/or confirm in second cell line
Tox biomarkers selection ADME/Tox
Clinical GLP-like validation
Clinical Relevance Can preclinical toxicity biomarker(s) applied clinically? Proceed to clinical
Toxicity biomarker(s) Identification
Efficacy biomarker(s) monitoring
YES
Is monitoring of selected efficacy biomarker(s) relevant during Tox study?
Potential causes of observed inefficacy: Is it dose related? Is the cancer cell line appropriate in this animal model?
Can selected biomarker be applied preclinically and/or clinically ?
YES
Can you measure the biomarker? Choose appropriate technology
YES
Which animal model to use: Syngeneic or xenograph?
Animal testing results
Effect on cancer progression Biomarker(s) relevance Revisit disease mechanism
Effect on cancer progression Biomarker(s) relevance
Effect on cancer progression Biomarker(s) relevance Choose other marker
Effect on cancer progression Biomarker(s) relevance
Nonclinical discovery & efficacy (in vitro–in vivo)
Selection and characterization of efficacy biomarker(s)
CHALLENGES ASSOCIATED WITH DIFFERENT TYPES OF BIOMARKERS
505
regulators, investors, and sponsors. Concerns about statistical significance, translation of the effect of the pharmacodynamic biomarker to clinical efficacy, and market share may carry little weight for patient advocacy groups under certain situations. Even concerns for safety biomarkers can be viewed as too restrictive: at best, perceived to delay access to potentially valuable medicines, and at worst, to stop their development. Sponsors and investors, eager to see hints of efficacy as early as possible, can sometimes become overly confident about positive biomarker results before statistical analysis, normal variability, or relationships to other clinical efficacy markers are available. This may be more common in small emerging companies that rely on venture capital to finance their drug development programs than in larger established pharmaceutical companies. CROs or laboratories performing the assays and/or statistical analyses may be more cautious in their interpretations of biomarker data, sometimes seemingly unnecessarily so, but are motivated by the need to maintain quality standards as well as a neutral position. Consensus and communication problems are more likely to occur when these perspectives are widely disparate. Although it may be difficult at times, it is essential to achieve a common ground between stakeholders for effective communication.
CHALLENGES ASSOCIATED WITH DIFFERENT TYPES OF BIOMARKERS The definition and characteristic of biomarkers proposed by the National Institutes of Health working group: “a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacological responses to a therapeutic intervention” [11], provide a categorization of biomarkers into efficacy, patient stratification, and safety biomarkers. There are different but overlapping challenges for the team, depending on the category of biomarkers. Efficacy Biomarkers Efficacy biomarkers range from pharmacodynamic (PD) biomarkers, markers quantifying drug–target interaction, and markers reflecting the underlying pathology of the disease to those with established links to clinical outcomes accepted as surrogate endpoints for regulatory approval. Effect of a pharmaceutical in development on a biomarker associated with efficacy is guaranteed to generate enthusiasm and momentum in the team. PD biomarkers have a long history in pharmaceutical development and form one of the cornerstones of hypothesis-driven approaches to drug discovery and development. These biomarkers are commonly generated as part of the discovery process. The biomarker may fulfill multiple key criteria in in vitro or animal models: (1) it may be used to characterize the pharmacology
506
BIOMARKERS IN PHARMACEUTICAL DEVELOPMENT
models; (2) it may be used in knock-out or knock-in genetic models to further validate the target; and (3) it may demonstrate a characteristic PD/pharmacokinetic (PK) relationship with the drug under development. The changes reflecting underlying pathology range from largely unknown to those clearly indicative of potential market impact. An example of the latter is atrovastatin administration, resulting in decreases in serum triglycerides in normolipidemic subjects in clinical pharmacology studies [12,13]. This is more an exception than the norm. Typically, interpretation of the clinical significance and potential market impact of a biomarker is less certain, particularly if the pharmaceutical is (1) acting by a novel mechanism of action and (2) targeting a chronic progressive disease where disease modification rather than cure is the outcome anticipated. The rationale for including PD biomarkers is generally easy to articulate to management, and particularly for smaller companies, these biomarkers may be essential for attracting investment. While enthusiasm and willingness to include these types of markers is generally not the issue, they are not without significant challenges in implementation and interpretation in the pharmaceutical development paradigm: • Technical aspects • Stability of the biomarkers • Technical complexity of the assay • Assay robustness, sensitivity, specificity • Throughput of the assay • Biological samples • Access to matrices that can or should be assayed • Sample collection volume or amount and timing in relation to dosing • Feasibility, cost, and resolution capabilities for imaging modalities for interactions with targets in the central nervous system, testis, poorly vascularized tumors, etc. • Data interpretation • Normal values; inter- and intraindividual variability • Values in disease versus healthy conditions • Diurnal and environmental effects in animals • Effects of diet, lifestyle, concomitant medications in humans • Impact on development of no change or unexpected changes in biomarkers in the continuum from discovery to clinical
Patient Stratification Biomarkers The use of patient stratification biomarkers in pharmaceutical development and medical practice forms the foundation of what has been called personal-
CHALLENGES ASSOCIATED WITH DIFFERENT TYPES OF BIOMARKERS
507
ized, individualized, or stratified therapy. Patient stratification biomarkers focus on patients and/or underlying pathology rather than on the effect of the pharmaceutical on the target. For small-molecule drugs, genotyping for polymorphic drug-metabolizing enzymes responsible for elimination or activation/ inactivation of the compound is now an established practice in clinical trials. Results about potential effects attributed to certain genotypes may be reflected in labeling recommendations for dose adjustments and/or precautions about drug–drug interactions [14]. A priori determination of genotype for polymorphic metabolizing enzymes are now included on the labels for irinotecan [15] and was recently added for warfarin [16] to guide selection of dosing regimen. Targeted therapy in oncology is the best established application of patient stratification biomarkers. The development of Herceptin, the monoclonal antibody trastuzumab, with an indication restricted to breast tumors overexpressing HER2/neu protein [17], is a clinical and commercial success story for this approach. Oncology indications also include examples of the potential of using serum proteomics to classify patients according to the highest potential for clinical benefit. For example, Taguchi et al. [18] used matrix-assisted laser desorption ionization (MALDI) mass spectroscopy (MS) analysis to generate an eight-peak MALDI MS algorithm of unidentified proteins to aid in the pretreatment selection of appropriate subgroups of non-small cell lung carcinoma patients for treatment with epidermal growth factor receptor inhibitors (erlotinib or gefitinib). As illustrated by the examples above, patient stratification biomarkers encompass a wide range of technologies, including algorithms of unknown proteins. Challenges for the development team are to understand and identify the potential for including patient stratification biomarkers either as part of or as the major thrust in the development process. This is often a major challenge, since the technologies may lie outside the core knowledge areas of the team members, making it difficult to articulate and discuss their value within the team and to communicate effectively to management. These challenges can be particularly pertinent for some of the “omics” technologies, which can be highly platform dependent and rely on complex statistical methodologies to analyze large sets of data to principal components. The results often have little intuitive inference in the underlying targeted disease pathology and may be one of the reasons that these powerful methodologies are not used more commonly. Some considerations for including patient stratification biomarkers are summarized as follows: • Strategic issues • What is the purpose of including the patient stratification biomarker? • Will it be helpful in reaching go/no go decisions? • Is it required for registration purposes? • What will the implications be for marketing and prescribing practices?
508
BIOMARKERS IN PHARMACEUTICAL DEVELOPMENT
• Practical considerations • Is the biomarker commercially available and accessible? • If a diagnostic biomarker is essential to the development of the pharmaceutical, should co-development be considered? • Are there IP and marketing restrictions? • What are the implications of the biomarker technology on the conduct of the clinical trial? Safety Biomarkers The considerations for safety during development are paramount; not surprisingly, it is one of the most regulated aspects of pharmaceutical development. Safety biomarkers have spurred on interesting and innovative regulatory and industry initiative and collaborations to develop and qualify novel biomarkers. Examples are the guidance of the U.S. Food and Drug Administration (FDA) for voluntary submissions of genomic data [19] and partnerships among government, academia, and industry for qualification of safety biomarkers [20]. Data qualifying the interpretation and significance of changes in safety biomarkers are needed to guide pharmaceutical development as well as evaluation of risk to patients or healthy volunteers in clinical trials. The purpose of safety biomarkers in clinical trials can be (1) to exclude patients at risk of developing adverse effects, (2) to increase sensitivity of the adverse event monitoring, and (3) to evaluate the clinical relevance of toxicity observed in the preclinical studies. Introducing novel or more uncommon biomarkers into a development project to address any of these aspects will not be embraced universally. There may be concerns not only about the added testing burden but also about the sensitivity and specificity of the biomarker, its relevance, and its relationship to well-established biomarkers. Nevertheless, including novel or uncommon biomarkers may be a condition for the conduct of a clinical trial as mandated by either regulatory bodies or institutional review boards. For example, there may be requirements to include sperm analysis in healthy volunteers and adapting experimental genotoxicity assays to humans to address effects observed in preclinical safety studies on the male reproductive tract and in genotoxicity evaluation, respectively. These will directly affect the conduct of the trials, the investigator, his or her comfort level with the assay, and the ability to communicate the significance of baseline values and any changes in the biomarkers to the clinical trial participant. However, novel and uncommon biomarkers will also have strategic and practical implications for the overall development program: • Strategic issues • Will including the safety biomarker be a requirement for the entire pharmaceutical development program? • Are the efficacy and/or PK properties sufficiently promising to warrant continued development? • Can the identified safety concern be managed after approval?
MANAGEMENT OF LOGISTICS, PROCESSES, AND EXPECTATIONS
509
• Practical considerations • What are the implications for the clinical trial program, locations of trials, linking to testing laboratories? • Will additional qualification of the biomarker assay be required as the development advances and for regulatory approval?
MANAGEMENT OF LOGISTICS, PROCESSES, AND EXPECTATIONS The logistical aspects of biomarker management are often a major undertaking, particularly if these include multisite global clinical trials (Figure 4). In clinical trials, specialized or esoteric assays, sometimes including very few samples, may require processes and systems that are not commonly in place for high-throughput analytes with standard operating procedures (SOPs) and established contract service providers. In addition, managing the logistics requires recognition and integration of the different expertise, experience, expectations, culture, and mindset in each discipline within the team. Regulations and guidelines governing the different disciplines as well as generally accepted practices have a major impact on the culture and mindset. Transitioning from the less regulated discovery process into development more accustomed to good laboratory practices (GLPs), good manufacturing practices (GMPs), and good clinical practices (GCPs) can be a major cultural shift. The regulatory requirements will vary depending on the purpose of the biomarker. Safety biomarkers will require a high degree of formalized regulatory compliance, whereas there are no requirements for GLP compliance for
Lab D
Non-coagulated blood
Pharmacogenomic assay
Serum
Lab B
Lab E Target enzyme assays
Future proteomics
Plasma
WBCs
Clinical chemistry
Lab G
Heparinized blood freeze
Lab F
Lab A
Stimulated cell assay
LC/MS/MS assay parent drug and metabolites
Add stabilizer
freeze
Tissue Biopsy Add stabilizer
Urine
Lab C – LC/MS/MS assay pathophysiological substrate and product Figure 4
Sample logistics.
Lab B Urinanalysis
510
BIOMARKERS IN PHARMACEUTICAL DEVELOPMENT
pharmacodynamic biomarker assays. The question of whether or not to conduct the assay under GLP regulations will need to be considered for all types of biomarkers. In the extensive project management coordination required to include particularly novel biomarkers in clinical trials, the long-term vision for program direction can become lost. The long-term view of the impact of the biomarker results on the pharmaceutical product under development as well as guidance for further discovery efforts should be considered. Questions about the impact of different outcomes have to be considered from both a strategic and a scientific perspective. For example, in which preclinical and clinical studies should the biomarker be included if normal values and variations are largely unknown? What will be the impact on future development of no change or unexpected effects in toxicology studies or in a first-in-human study? If there are changes to the assay or new technologies become available, should they be included to provide additional functional information about the target? Particularly when limited information is available about normal values and variation, adding additional parameters may not be of value for the decisionmaking process. It may be tempting to include a large number of biomarkers simply because they are available. Moreover, the increase in cost, complexity of the studies, and risk for erroneous results should be weighed carefully against the value added at each step of the development process. The evaluation of whether to include a biomarker in a drug development program may not be straightforward. There is no doubt that biomarkers have proven valuable in pharmaceutical development to provide guidance for dose selection in early clinical studies, to enhance understanding of disease mechanisms and pathobiology, and to support decision making and strategic portfolio considerations. Success stories for the use of biomarkers in translating from discovery concept to clinical development have been published. One example is the first proteosome inhibitor bortezomib approved for treatment of multiple myeloma. The protesome is a key component in the ubiquitin– proteosome pathway involved in catabolism of protein and peptides as well as cellular signaling [21]. Ex vivo determination of proteosome inhibition was used in discovery and continued through toxicology and early- and late-stage clinical studies. Although not a clinical endpoint, proteosome inhibition provided valuable information that the drug interacted with the intended target [22]. However, in contrast to well-publicized success stories such as the example above, it is more difficult to obtain information and find examples of decisions taken when the PD biomarker did not yield the results expected. Why and where in the continuum of development did the PD marker fail, and what were the consequences of its failure? Strong confidence in the assay and the mode of action of the biomarker, as well as expectations about enhanced effects in patients compared to healthy volunteers, may result in progression of the biomarker despite lack of apparent interaction with the target. Decisions based on biomarkers require making a judgment call taking into account all
SUMMARY
511
the data available. For the development team, the consequences of no effect or an unexpected effect of the drug on the PD marker should be considered and debated openly before the relevant studies are initiated. Questions that should be discussed and understood by the team in the inclusion of biomarkers are as follows: • Cost and logistics • What are the costs and logistics associated with including efficacy biomarkers? • What are the costs associated with not including a biomarker (i.e., progressing a compound without use of a biomarker or panel of biomarkers)? • Confidence in the biomarker • Will the team/management accept go/no go decisions on the basis of the results of the efficacy biomarker? • How many patients are required to obtain meaningful results and/or to demonstrate response? • What degree of change or lack of progression of disease is considered acceptable? These may appear to be relatively simple to answer, but it will take courage and conviction from the team to make decisions to discontinue development based, or at least partially based, on results of unproven efficacy biomarkers. There may be pressures from patient groups or specific patients for access or continuation of clinical development if the drug is perceived to be beneficial. Management may be reluctant to accept the decision if significant resources have been spent in development. SUMMARY The successful launch of a novel pharmaceutical product represents the culmination of years of discovery and development work driven by knowledgeable people passionate about their project and the pharmaceutical. The development process will be challenging, require perseverance, and cannot be successful without coordination and teamwork. Novel biomarkers, organizational structures with multiple stakeholders, and a need to bring data-driven decision-making strategies earlier in development make the paradigm more complex and place higher demands on team communication and project coordination. Effective program leadership together with formalized program management and communication tools and processes facilitate this endeavor. As biomarkers in discovery and development are here to stay, more attention will be paid to best practices for project management and teamwork, as these roles are recognized increasingly to be essential for successful pharmaceutical development.
512
BIOMARKERS IN PHARMACEUTICAL DEVELOPMENT
REFERENCES 1. Hurko O (2006). Understanding the strategic importance of biomarkers for the discovery and early development phases. Drug Discov World, Spring, pp. 63–74. 2. Ahouse J, Fontana D (2007). Negotiating as a cross-functional project manager: lessons from alliance management. Cambridge Healthtech Institute: http://www. healthtech.com/wpapers/WP_pam.asp (accessed Oct. 4, 2007). 3. Bamford JD, Gomes-Casseres B, Robinson M (2002). Mastering Alliance Strategy: A Comprehensive Guide to Design, Management, and Organization. Jossey-Bass, New York. 4. Dyer JH, Powell BC, Sakakibara M, Wang AJ (2006). Determinants of success in R&D alliances. Advanced Technology Program NISTIR 7323. http://www.atp.nist. gov/eao/ir-7323/ir-7323.pdf (accessed Oct. 4, 2007). 5. Means JA, Adams T (2005). Facilitating the Project Lifecycle: The Skills and Tools to Accelerate Progress for Project Managers, Facilitators, and Six Sigma Project Teams. Wiley, Hoboken, NJ. 6. Parker GM (2002). Cross-Functional Teams: Working with Allies, Enemies, and Other Strangers. Wiley, Hoboken, NJ. 7. Wong Z (2007). Human Factors in Project Management: Concepts, Tools, and Techniques for Inspiring Teamwork and Motivation. Wiley, Hoboken, NJ. 8. Tufts Center for the Study of Drug Development. http://csdd.tufts.edu program. 9. Atkinson AJ, Daniels CE, Dedrick RL, Grudzinskas CV, Markey SP (2001). Principles of Clinical Pharmacology. Academic Press, San Diego, CA, pp. 351–364. 10. PMI Standards Committee (2004). A Guide to the Project Management Body of Knowledge (PMBOK Guide), 3rd ed. Project Management Institute, Inc., Newtown Square, PA. 11. Biomarkers Definitions Working Group (2001). Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther, 69:89–95. 12. Cilla DD, Gibson DM, Whitfield LR, Sedman AJ (1996). Pharmacodynamic effects and pharmacokinetics of atrovastatin after administration to normocholesterolemic subjects in the morning and evening. J Clin Pharmacol, 36:604–609. 13. Posvar EL, Radulovic LL, Cilla DD, Whitfield LR, Sedman AJ (1996). Tolerance and pharmacokinetics of a single-dose atrovastatin, a potent inhibitor of HMGCoA reductase, in healthy subjects. J Clin Pharmacol, 36:728–731. 14. Huang S-M, Goodsaid F, Rahman A, Frueh F, Lesko LJ (2006). Application of pharmacogenomics in clinical pharmacology. Toxicol Mechanisms Methods, 16:89–99. 15. Pfizer Inc. (2006). Camptosar (irinotecan) label. http://www.pfizer.com/pfizer/ download/uspi_camptosar.pdf (accessed Oct. 4, 2007). 16. Bristol-Myers Squibb Company (2007). Coumadin (warfarin) label. http://www. bms.com/cgi-bin/anybin.pl?sql=PI_SEQ=91 (accessed Oct. 4, 2007). 17. Genentech (2006). Herceptin (trastuzumab). http://www.gene.com/gene/products/ information/oncology/herceptin/insert.jsp (accessed Oct. 4, 2007).
REFERENCES
513
18. Taguchi F, Solomon B, Gregorc V, et al. (2007). Mass spectrometry to classify non–small-cell lung cancer patients for clinical outcome after treatment with epidermal growth factor receptor tyrosine kinase inhibitors: a multicohort cross-institutional study. J Natl Cancer Inst, 99:838–846. 19. Goodsaid F, Frueh F (2006). Process map for the validation of genomic biomarkers. Pharmacogenomics, 7:773–782. 20. Predictive Safety Testing Consortium. http://www.c-path.org. 21. Glickman MH, Cienchanover A (2002). The ubiquitin–proteasome proteolytic pathway: destruction for the sake of construction. Physiol Rev, 82:373–428. 22. EPAR (2004). Velcade (bortezomib). http://www.emea.europa.eu/humandocs/ PDFs/EPAR/velcade/166104en6.pdf (accessed Oct. 4, 2007).
28 INTEGRATING ACADEMIC LABORATORIES INTO PHARMACEUTICAL DEVELOPMENT Peter A. Ward, M.D., and Kent J. Johnson, M.D. The University of Michigan Medical School, Ann Arbor, Michigan
INTRODUCTION Historically, there has been somewhat of an arm’s-length relationship between researchers in academic medical centers and pharmaceutical companies. Traditionally, academic researchers often have taken an “ivory tower” approach: that their research is basic in nature and not meant necessarily to relate to the development of new drugs [1]. However, this attitude has undergone a major change over the past three decades. Academicians in California, Boston, and elsewhere have taken an entrepreneurial approach through the formation of biotech firms such as Genentech. However, it was not until fairly recently that most large pharmaceutical companies actively began developing collaborations with basic scientists in academia rather than relaying almost exclusively on their internal research and development programs. In the following discussions we look at reasons for this change, ways in which these collaborations can be fostered, and pitfalls associated with this association. Bridging between academia and the pharmaceutical industry has increased steadily over the past 30 or more years, and this trend has accelerated Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
515
516
INTEGRATING ACADEMIC LABORATORIES
significantly with time. Movement in this direction has been based on fundamental scientific discoveries at the universities. These institutions generally lack the background, abilities, and resources to transform such discoveries into diagnostic and/or therapeutic modalities. Large pharma has the opposite problem: namely, more limited fundamental discovery research efforts, less emphasis on scientific research into basic disease processes, but extensive experience in drug design and development, clinical trial design and conduct, and monetary resources that ultimately result in a successful commercial product. In this chapter, we discuss the history and development of strong ties between academic scientists and the pharmaceutical industry. This will include discussions on the advantages of these collaborations and how they are usually structured. In addition, there are examples of these collaborations at both the basic science discovery stage and later in clinical development. This will include an example of scientists (Arul Chinnaiyan, George Wang, Dan Rhodes) who have been involved in fundamental discoveries regarding antigenic epitopes and autoantibodies in prostatic cancer patients, have taken these observations further with the development of spinoff companies, and have projected plans for the future.
HISTORICAL PERSPECTIVE Historically, academic researchers have taken a research approach that stressed National Institutes of Health (NIH)–sponsored basic research. These programs have targeted disease characterization or processes, but were not directed at a specific therapy for a given disease. In fact, many academic centers still differentiate between NIH research dollars and industry funds when ranking academic departments and consider the NIH funding to be the “gold standard” in determining the success and reputation of an investigator. However, several forces are changing this somewhat elitist perspective, as described below. A major factor in this change has been the federal government change in funding levels. NIH dollars are much harder to come by, in particular the R01 primary research award. In fact, NIH funding for grants has diminished over the last few years, and several institutes in the NIH are now funding a much smaller percentage of research grants. Currently, only the top 8% of research applications submitted are funded (NIH Office of Extramural Research). Funding for the NIH has increased only slightly, with the fiscal year 2008 appropriation being increased only 0.9%, to $29.45 billion, compared to $29.3 billion in fiscal year 2007 (S. Forrest, Annual Report on Research and Scholarship, FY 2007 Financial Summary, University of Michigan). This amount has not kept up with inflation and has resulted in less funding for external investigators. For example, in fiscal year 2007 extramural funded R01 research grants decreased to 27,850 from 28,192 in 2006 (NIH Office of Extramural Research). Furthermore, the total amount of money given out in
THE BIOTECH EXPERIENCE
517
2007 dropped to $10.04 billion from $10.12 billion in 2006. The overall impact of the reduction in NIH funding from 1995 to 2007 has resulted in a decline of real dollars available to researchers of approximately 10% (T. Mazzaschi, Research Funding Trends: Surviving an NIH Recession, presented at the 2007 AMSPC Annual Meeting). This cutback in funds has been devastating to many investigators and has even resulted in several laboratory closings. In addition, the reduction in federal funds has had a severe impact on the number of young scientists, particularly physicians, deciding on a research career. There has also been a change in focus by the U.S. government and the NIH. Today, the NIH requires that grants have a “translational component” that provides a direct connection between the research hypothesis and a clinical disease parameter. To demonstrate this, a researcher studying a specific cytokine in an inflammatory process would have a specific aim in the grant for evidence that this cytokine was involved in a human disease. As would be expected, this has resulted in a major change in how basic investigators conduct their research projects. Additionally, it has mandated collaborations with clinicians and provided direct correlations with disease processes. The reduction in available funding, and closer linkage to translating research findings into disease diagnosis or treatments, has lead to renewed interest in academic–pharmaceutical company collaborations. This shift has been encouraged further by the increasing need of biotechnology and small-molecule pharmaceutical companies to address more complex and chronic diseases, incorporate new technologies into their drug development programs, and enhance their molecular understanding of disease. Currently, most academics do not have a clear idea of how to solicit research support from such companies, and thus it is still relatively unusual for investigators to get significant industry funding for preclinical research. Many scientists within the pharmaceutical industry also have limited understanding of operations within universities and how to identify/access basic research programs that might be of benefit to drug development. The industry perspective will be described in greater detail below, as well as examples of research collaborations that have been successful.
THE BIOTECH EXPERIENCE Traditionally, biotech companies have largely been created by entrepreneurial academic scientists who want to commercialize their research findings for the development of drugs. Historically, these companies have been funded by venture capital funds or other sources rather than by the pharmaceutical industry. Such funding provides for the early-stage development of compounds but usually does not allow for the high expenses associated with full clinical development. Phase II clinical proof-of-concept and phase III pivotal safety and efficacy clinical trials involve significantly larger numbers of patients,
518
INTEGRATING ACADEMIC LABORATORIES
clinicians, institutions, and amounts of money than even well-funded startup companies have access to through normal funding sources. To further complicate the current model, large pharmaceutical companies are most interested in new molecules which have demonstrated efficacy and safety in humans (phase IIb or early phase III). Given a choice, the companies are substantially less interested in investing in early-stage preclinical development, due to the dramatically higher risk of attrition from toxicity or lack of effect against the targeted disease. Once a compound shows real promise in patients, a number of pharmaceutical companies will be interested in purchasing the rights to the compound or may invest in co-development of a drug. This model has been quite successful in bringing new drugs to market that are not part of a pharmaceutical company’s internal portfolio. However, historically, this has not allowed for the consistent support of preclinical research. Academic researchers are often placed in the unenviable situation of having a molecule with high potential for an unmet medical need but insufficient funds to develop the molecule to a stage where outside organizations see sufficient commercial potential to invest. The unfortunate outcome all too often is that the new approach either languishes or the academic institution assigns rights to the compound at a relatively low price. The Pharmaceutical Approach Historically, pharmaceutical companies have relied primarily on their internal discovery scientists to develop most of their new drugs. They have also purchased the rights to compounds developed by biotech firms that show promise in the clinical trials. Thus, pharmaceutical companies previously have not been major supporters of external preclinical developmental, including those academic laboratories. Pharma often has supported targeted programs in academic medical centers, such as postdoctoral fellowships and occasional research laboratories, but this is not a widespread or reliable long-term funding mechanism for most universities. The nature and culture of academic and industrial groups can also inhibit successful interactions. The priorities, working processes, reward systems, and even ways of communicating differ significantly between the two institutional types. Recently, however, there has been greater movement to support preclinical research programs in academic laboratories. Increased costs of drug development, high failure rates of new drug moieties, greater acceptance for outsourcing of many roles, and recognition that the next great drug discovery could come from many fields have increased the pressure on pharmaceutical companies to look beyond their own walls. This shift is further supported by mergers within the industry and the need for new drug pipelines larger than can reasonably be achieved based solely on internal resources. For a company to prosper, it is becoming essential to explore all available sources of new medical treatments. Also, competition for new drugs in late-stage development is intense and subject to great competition and cost. This last fact is
THE BIOTECH EXPERIENCE
519
resulting in more attention to compounds in early clinical trials (phases I and IIa) or even late-stage preclinical evaluations. Another factor supporting the movement to fund academic laboratories is that most companies now recognize that external agreements and collaborations are often more cost-effective than developing and supporting the same expertise internally. Examples of outreach by companies into academic institutions or private organizations would include companies such as Sandoz and Pfizer partnering with the Scripps Research Institute in La Jolla, California. In fact, today, many of the investigational laboratories in pharmaceutical companies have closed. There is a belief that outsourcing these studies provides expertise without the need to maintain large in-house programs. Advantages of Collaboration There are a number of positive things associated with pharmaceutical company support for academic laboratories. For the company the academic laboratory provides cost-effective access to expertise and established research programs that do not have to be duplicated in-house. Since most critical observations in biology, including mechanisms of disease processes, are first elucidated in academic research laboratories, support of this process by companies provides access to this valuable basic research that cannot be duplicated in industry without tremendous internal investment. Such collaborations also allow pharmaceutical scientists access to in-residence stays in academic research laboratories. Furthermore, a pharmaceutical company investing in a faculty member’s research has a deeper understanding of the benefits and limitations of the research than does the average person in industry who is looking for licensing opportunities. This often translates into earlier recognition of new therapeutic opportunities and the first right to decide on licensing a new product from the academic laboratory. For academic researchers this funding provides an important additional source of research support in addition to the NIH. Industry funding also provides often for the purchase of expensive equipment which often would otherwise not be available to the investigator. Through academia–industry collaborations, many university scientists can gain access to specialized instruments or reagents that can be difficult or impossible to obtain within their institution. Additionally, taking an idea to actual therapy can be realized. Potential Pitfalls with the Collaborations As a research investigator in a university setting, it is important to keep in mind that several conditions must be met for research collaborations with pharmaceutical companies to be successful. First, it is critical that the agreement be structured as research collaboration and not as a testing service. Although there are situations where short-term fee-for-service testing may be desirable, that is rarely the case for an individual researcher. Industry has
520
INTEGRATING ACADEMIC LABORATORIES
contracts with outside laboratories such as toxicology and medical reference laboratories where data ownership and intellectual property concerns are more efficiently defined and negotiated than within the university. This model generally does not work in academic research collaboration because in that situation the investigator is responsible for the analysis of the data and owns the data, usually with the intent of placing this information into the public domain through publications. This is a critical component of successful research collaboration since ideally the companies are interested in academic expertise in evaluating the data. The quality of research conducted will also be significantly richer when the interaction is a joint research collaboration. By structuring the agreement to allow both parties to benefit, advance the research, and look for options that neither could achieve alone, academic– industry programs are valuable tools to use to address unmet medical needs. Hand in hand with control over the data is the issue of publishing and intellectual property. When employing a contract laboratory, the company owns all data and intellectual property pertaining thereto. This would not be acceptable to the institution in an academic environment. Most institutions require at least part of the intellectual property that comes from these collaborations, if not all of it. This is often a major sticking point between academic centers and pharmaceutical companies. Ideally, there is an agreement in place where the university has rights to patent data that come from the research collaboration, while the company has access to utilize these data in partnership with the institution. Another major issue that goes hand in hand with intellectual property is the right to publish findings in peer-reviewed journals. In universities it is fundamental that the right to publish be part of any agreement made between research investigators and for-profit companies. For many companies this can be a difficult hurdle since they are concerned about keeping information confidential, at least until a patent is issued. Usually, what it done is to have a clause written into the contract defining when findings can be submitted for publication. This usually means a few months (but not much longer) before the academicians can submit their report for peer review.
COLLABORATIONS BETWEEN PHARMACEUTICAL COMPANIES AND UNIVERSITIES IN PROMOTING BASIC RESEARCH Traditionally, pharmaceutical companies have concentrated their external investments on compounds that have completed the early stages of the drug development process, and they usually already have some patient information. However, increasingly, the pharmaceutical industry is funding collaborations with universities for support for early-stage research. Historically, this role has been dominated by government and venture capital funding. The combined factors of cutbacks in NIH and venture capital funding, and risk aversion to early-stage funding by pharmaceutical companies, is leading to an emphasis
PROMOTING BASIC RESEARCH
521
on supporting some early research for promising academic investigators and departments. This support of science at the “grassroots” level also provides companies with research capabilities that would not be possible in internal drug development laboratories and allows these companies to embrace the nimble style of biotechnology companies. A recent example of such a collaboration is that of Pfizer forming a collaboration with the University of California at San Francisco (UCSF). This collaboration will not only fund narrowly defined research projects but will allow for early proof-of-concept funding for ideas proposed by the academic researchers (Bernadette Tansey, San Francisco Chronicle. June 10, 2008). Thus, this new model of collaboration between academic researchers and a major pharmaceutical company both funds specific projects and utilizes the academic scientists’ expertise in developing new ideas and projects. The company can utilize the intellectual capital of these researchers, many of whom are acknowledged leaders in their particular field of study. From the perspective of the university, the individual scientists receive financial support and the ability to see accelerated progression of their discoveries toward medical implementation. The University of Michigan Experience with Pharma and Joint Research Projects Like other research institutions, the University of Michigan has had longstanding collaborations with pharmaceutical and biotechnology companies. In general, these companies have provided support for lecture series, specialized symposia, specialty meetings, and consulting. In addition, there has been support for individual investigators. One example is that of the authors of this chapter and the Department of Pathology at the University of Michigan, who have had research collaborations with pharmaceutical companies: primarily Warner-Lambert and subsequently, Pfizer, as well as other companies to a lesser extent. These research collaborations have focused on areas that provided value to both institutions. One area of collaboration has been the development by the academic researchers of experimental disease models that can be utilized by pharmaceutical scientists early in the drug discovery process. Examples include models of lung, skin, and gastrointestinal injury. The use of experimental in vitro and in vivo models allows pharmaceutical scientists to evaluate new compounds of interest in models not readily available in their laboratories and to determine activity relative to a particular aspect of a disease or adverse effect. These collaborations have been used extensively to evaluate efficacy and toxicity in the discovery stage of drug development. Another area of collaboration has involved studies comparing animal models with the human diseases that are targeted. The collaborations that these companies have with academic centers such as the University of Michigan allow comparison between the animal findings for molecules of interest and what is seen in the human disease. This is very valuable in determining how
522
INTEGRATING ACADEMIC LABORATORIES
relevant the preclinical animal models and toxicology studies are in predicting what will happen in humans. Examples include models of inflammatory vascular and lung injury, as well as sepsis. Another area of collaboration has been in the identification of biomarkers of disease activity. Pathologists have unique training that bridges basic and clinical science dealing with mechanisms of disease, with specific interest in developing tests for clinical laboratories to diagnose specific diseases. In this regard our laboratories have been able to develop high-throughput technologies such as antibody arrays with support from these companies that would otherwise not be possible because of cost. This support allows the identification of new biomarkers of specific diseases, as well as toxicity. This information is useful not only to the pharma companies, but also to the medical community at large. Examples of these collaborative activities are cited below, where we show that this technology has great potential to identify new markers of disease in humans [2]. The fact that pathologists typically are responsible for running clinical laboratories has also been of value to the pharmaceutical companies since they may have on-site phase I clinics that evaluate human specimens. Members of our department have provided supervision of laboratory operations and support for College of American Pathology certification. Our clinical expertise has also been utilized in specific issues that arise in phase II and III testing. Finally, and very important, our colleagues in the pharmaceutical industry have provided support for joint postdoctoral programs. This allows for the funding of postdoctoral scientists to work in the laboratories both on basic research studies at the university and on focused projects in a pharmaceutical setting. This has proven very successful, with several high-quality postdoctoral investigators trained in our laboratories going on to positions in industry and academia.
EXAMPLES OF CORPORATE TIES BETWEEN THE UNIVERSITY OF MICHIGAN AND BIOTECH COMPANIES Below are two examples of companies that are closely aligned with the research efforts of Arul Chinnaiyan at the University of Michigan Medical School. Ultimate commercialization of intellectual property by the two companies is closely linked to Chinnaiyan and his colleagues. Compendia Bioscience, Inc. Compendia Bioscience was incorporated in January 2006. Its chief executive officer (CEO) is Dan Rhodes, who worked with Arul Chinnaiyan and obtained his Ph.D. degree through this association. Compendia Bioscience takes advantage of the research database Oncomine, which has been in place for several
EXAMPLES OF CORPORATE TIES
523
years. This database takes published microarray information arising from more than 10,000 microarray studies of tumors and is made available at no charge to persons in academic settings. The information allows people to search the database and to extract information providing genomic sequences that can be used for diagnostic verification, predictions of drug sensitivity, and many other features, all of which ultimately bear on diagnostic analysis and clinical decision making in the oncology field. The Oncomine database is also available to the pharmaceutical industry on a charge basis, which involves the payment of license fees annually. Compendia was set up as a commercial entity with SPARK funding (a Michigan Economic Development Corporation entity in Ann Arbor, Michigan). In late 2007, Compendia received small business innovative research (SBIR) funding and funding from the 21st Century Job Funds (approximately $1.2 million). The company, which currently has approximately 12 employees, is seen as an important resource to the commercial world, with 14 of the top 20 pharmaceutical companies annually paying license fees to access the database. The fast-track SBIR granted to Compendia is allowing the company to increase its personnel to 17 employees, who are involved in software development and in evaluation of content and updating of the Oncomine databases and related matters. In 2008 Compendia is expected to break even financially. A decision has been made by Compendia not to become directly involved in drug discovery, personalized medicine, or other areas that depend on genomic information to make diagnoses or develop new drugs. The Compendia model is very different from that of most companies, in that in its licensing to pharmaceutical companies, no intellectual property is involved. In other words, if the pharmaceutical companies discover information in the Oncomine database that allows for development of a new drug, Compendia does not have any direct financial stake or gain. On the other hand, there is an increasing likelihood that Compendia will be involved in consultative activities with pharmaceutical companies to optimize use of its database. Armune BioScience, Inc. Armune BioScience, located in Kalamazoo, Michigan, was formed in 2007 to develop and commercialize diagnostic tests for prostate, lung, and breast cancers. Its CEO is Eli Thomssen. Most of the initial work on which the company is based was done by the Chinnaiyan group at the University of Michigan Medical School. In this setting of fundamental proteomic prostate cancer research, several proteins expressed in low- or high-grade prostatic cancers and in metastatic lesions were identified and monoclonal antibodies developed in mice [3,4]. These resulting approaches are proving to be useful prognostic and diagnostic reagents. The well-known prostate-specific antigen (PSA) test has been widely used for the past decade and has a very high sensitivity (but very poor specificity), with as many as 40% of the test results being
524
INTEGRATING ACADEMIC LABORATORIES
interpreted in a manner that does not correlate with the clinical condition of the patient. This has led to great consternation for both patients and physicians as they try to select the optimal clinical intervention. With respect to lung cancer, no commercial tests are available that allow a serological diagnosis of the presence of lung cancer. The leading candidate for studies in this area is peripheral adenocarcinomas, which currently are detected by CT scans or chest x-rays, often with the diagnosis being made only at an advanced clinical state. Similarly, the accurate diagnosis of breast cancer is chiefly made based on excisional or needle biopsy, with no reliable serological test existing at this time. The technology advanced by Armune Biosciences uses phage protein microarray methods employing beads coated with certain antigenic epitopes and the Luminex platform as the readout. In these assays, the coated beads interact with autoantibodies present in the blood of patients with prostate cancer. As reported by Wang et al. [5], the presence of these autoantibodies to prostate cancer cells represents an extremely specific humoral immune response and increasingly appears to be a reliable and early indicator of the presence of prostate cancer. It is hoped that the development of such diagnostic tests will result in high sensitivity (>90%) together with high specificity (>90%). It is expected that development of highly specific, highly sensitive, and reliable serological tests for the diagnosis of prostatic cancer will allow the diagnosis of prostatic cancer at an early stage so that a decision can be made much sooner with respect to whether surgery or radiation therapy should be employed. The technology should also reduce the number of unsuccessful or unnecessary needle biopsies of the prostate. Developing a lung cancer diagnostic test would represent the first of its kind and might allow detection much earlier in high-risk groups such as smokers. Such success could result in a much better cure percentage than the current five-year survival rate of <15% with conventional diagnostic technology. Similarly, serological diagnostic tests for breast cancer have the potential for earlier diagnosis and treatment. Armune Biosciences has received $900,000 of seed money, which is being used to establish a laboratory for validating the assay platform that is ultimately to be used for commercial laboratory applications. The likely test system will employ solid-phase (on beads) antigenic epitope peptides known to be reactive with autoantibodies present in the serum of patients with prostate cancer. In the future, a series A offering of $5.5 million and a B round of $5.7 million are planned. The series A funds will be used to develop a serological test that will identify aggressive prostatic cancer, followed by a similar approach to lung cancer. The series B funds will be used to complete development of the serological assay to be used in CLIA-approved laboratories, involving detection of both prostate and lung cancer. Series C funding is intended to support the conduct of prostate cancer diagnostic test market research studies to validate the product profile and to determine the optimal strategy for positioning of a commercial product.
REFERENCES
525
Accordingly, Armune Bioscience currently has a well-developed plan and is being funded by venture capital funds. Although the scientific strategy appears to be sound, it will probably be several years before the development of an autoantibody-based test can be validated and ultimately applied successfully clinically.
CONCLUSIONS There are many areas in which academic research laboratories and pharmaceutical companies can interact and have mutually beneficial results. Industry support for investigational studies such as those described above for UCSF and the University of Michigan provide support for many research activities at academic laboratories and allow companies to access innovative research and clinical expertise that they may not have internally. We have also described two collaborations between the University of Michigan and biotechnology companies in the development of spinoff companies working to develop new ways to diagnose prostate, breast, and lung cancer based on detection of autoantibodies. The seminal discovery of autoantibodies as highly specific immune responses to tumor antigens was first described in 2005 by Wang et al. [3]. Such autoantibodies appear to be sensitive and reliable early detectors of the presence of a tumor provided by the immune system. Armune Biosciences is a relatively traditional young spinoff company supported by venture capital funding as well as by other sources, and headed ultimately for development of serological tests for the diagnosis of prostate, breast, and lung cancer. In contrast, Compendia Bioscience is a company that takes advantage of a proprietary database related to malignant tumors. This company has no commercial product development but has an enormous database that is constantly being upgraded and is available to the academic sector free of charge and on a licensing fee basis to the pharmaceutical industry. Access to a reliable and continuously updated genomic database is likely to provide pharmaceutical companies with new strategies for the development of antitumor drugs. Both companies were established in association with the University of Michigan, which holds equity in each.
REFERENCES 1. Kozlowski RZ (1999). Industrial-academic collaboration: a bridge too far? (Editorial). Drug Discov Today, 4:487–499. 2. Olle E, Sreekumar A, Warner RL, et al. (2005). Development of an internally controlled antibody microarray. Mol Cell Proteom, 79:206–209. 3. Varambally S, Dhanasekaran SM, Zhou M, et al. (2002). The polycomb group protein EZH2 is involved in progression of prostate cancer. Nature, 419:624–629.
526
INTEGRATING ACADEMIC LABORATORIES
4. Tomlins SA, Laxman B, Dhanasekaran SM, et al. (2007). Distinct classes of chromosomal rearrangements create oncogenic ETS gene fusions in prostate cancer. Nature, 448:595–599. 5. Wang X, Yu J, Sreekumar A, et al. (2005). Autoantibody signatures in prostate cancer. N Engl J Med, 353:1224–1235.
29 FUNDING BIOMARKER RESEARCH AND DEVELOPMENT THROUGH THE SMALL BUSINESS INNOVATIVE RESEARCH PROGRAM James Varani, Ph.D. The University of Michigan Medical School, Ann Arbor, Michigan
INTRODUCTION When the Small Business Innovative Research (SBIR) program was established, it was generally thought to be a less competitive road to government funding than were the traditional grant programs administered by the National Institutes of Health (NIH), the National Science Foundation (NSF), and others. There were many fewer applications relative to the pool of dollars available, and the quality of the grants was judged to be, on average, significantly lower than applications in traditional venues such as the investigatorinitiated (R01) tract. The quality issue, in particular, reflected the lack of experience that most people in the private sector had with the government granting process. This no longer is the case. Obtaining funds through the SBIR program is every bit as challenging as being successful through the traditional mechanisms. Thus, there is no single formula that can guarantee success and no single approach that is necessarily better than the others. Any case study provides only a guideline. Still, one can increase the chances of being success-
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
527
528
FUNDING BIOMARKER RESEARCH AND DEVELOPMENT
ful by understanding the process and by carefully studying past successes as well as past failures. With that in mind, we discuss some of the key features of the SBIR program, things that may increase the likelihood of success in funding a biomarker-development project through the SBIR program.
THE SBIR PROGRAM Congress passed the Small Business Innovation Development Act in 1982 with a number of aims in mind. First (and foremost), the program was designed to stimulate technological innovation by funding projects initiated by hightechnology companies to solve pressing research and development needs. Another aim was to provide a mechanism for encouraging participation by minority and disadvantaged persons in technological innovation. Finally, a stated goal was to increase commercialization of innovations derived from federal research and development dollars. Doing these things was thought to be necessary to increase the competitiveness of U.S. firms relative to their foreign counterparts. Largely unstated was the worry that the U.S. global lead in technology was being eroded. Under the act, federal agencies with substantial extramural research and development budgets were required to direct a portion of their extramural funding to private enterprises that fit the federal definition of a small business. While the percentage of the total research and development budgets of the federal agencies that went into the SBIR program was small, the amount of funding available to small, high-tech companies dwarfed anything that had been readily available to them previously. The 11 federal agencies included in the program were the National Institutes of Health (NIH); National Science Foundation (NSF); Departments of Agriculture, Defense, Energy, Education, Health and Human Services, and Transportation; the Environmental Protection Agency; the Nuclear Regulatory Commission; and the National Aeronautics and Space Administration. Research in support of biomarker development efforts would probably come from the NIH, although other agencies have missions with enough overlap that limiting one’s scope to the NIH might be shortsighted. As constituted, the SBIR program is divided into three phases: phase I: feasibility, phase II: reduction to practice, and phase III: commercialization. The governmental agencies provide support for phases I and II. The third phase is expected to occur with private funding. The principal operative unit for the SBIR program is the SBIR grant. The phase I grant provides the awardee with a relatively small amount of funding: up to $100,000 currently to demonstrate the feasibility of the innovative research. The period of support is expected to be between six and 12 months. If feasibility is demonstrated, a phase II grant provides a much larger amount of support to carry the project from the feasibility phase to the point where it can be commercialized. Commercialization is to occur during the third phase with private funding.
THE SBIR PROGRAM
529
Funding in phase II is routinely up to $1,000,000 currently, and there are mechanisms for increasing support beyond the level. Efforts have been made by the National Research Council (NRC) to evaluate the success of the SBIR program: that is, to what extent it is fulfilling its mandate. Several publications summarizing the findings of the NRC are available [1–5]. Using successful commercialization as an endpoint, a study conducted in the early 1990s found that 12% of companies receiving funding through the SBIR program had commercialized the technology developed with program funds within four years of the phase II award, and 23% had achieved success by six years. A higher percentage of companies at each time reported optimism with regard to future commercialization. Efforts have been made to elucidate the variables that led to success (i.e., commercialization). Commercialization was achieved in project funding through the NIH at a much higher rate than was achieved with funding from any other source. Among other interesting findings, commercialization was more likely to be achieved with product-oriented technology than with service-oriented technology. More success was achieved when the end user of the developed technology was in the private sector rather than in the government sector. Of particular importance to the present discussion, success was higher in those companies with a focus on tangible products than in companies with a research and development focus. Finally, technology that could be protected through patents, copyrights, and so on, achieved a higher level of commercialization than did technologies that were unprotectable in these ways. Other variables that improved commercialization success were a welldeveloped business and marketing plan for the technology and progress (at any stage) in implementing the plan. Companies that had potential partners as a source of capital, for co-development, or as customers also did better (not surprisingly) than those that did not. Ultimately, however, there was so much variability in the individual projects initially funded and in those that achieved a measure of success that no single criterion had great predictive value by itself. Thus, one of the messages for a company seeking first-time support through the SBIR program is that while factors that have correlated with success in the past should not be discounted, success in the future will depend on the quality of the initial idea and success at the bench (i.e., turning a good idea into a proven product that meets a market need). A subsequent, larger evaluation occurred approximately 10 years after the initial evaluation. This programmatic review came to many of the same conclusions as the initial review. The SBIR program was determine to be a success in its major mission: funding novel research in the private sector that ultimately had an economic payoff. It also arrived at the conclusion that the quality of the applications submitted (as well as those funded) had increased substantially over time. Like other funding programs within the various participating government agencies, the amount of funds available was judged to be insufficient to support all of the projects worthy of support based on
530
FUNDING BIOMARKER RESEARCH AND DEVELOPMENT
scientific merit. Summaries of these evaluations are available online at the National Research Council Web site, and many of the studies have been published as monographs.
STRUCTURE OF THE SBIR GRANT PROPOSAL Phase I Application The SBIR grant application process proceeds in two steps. The initial step is the phase I application. The application is (relatively) small and uncomplicated. The 15-page work plan introduces the overall research goal of the project, provides a rationale for doing the research, suggests feasibility, and outlines a series of experiments that (if successful) will support feasibility. Phase I funding provides up to $100,000 to demonstrate the merit of the investigation proposed. The length of support for the typical phase I application is usually six to 12 months. The phase I grant application must provide a rationale for the study and a work plan that describes experiments that, if successful, would provide evidence of feasibility. The rationale is critical. If the reviewers are not convinced that the study is worth funding (regardless of outcome), their will be no enthusiasm for the application. Preliminary data are not required for a phase I application. Although this may be technically correct, it is very difficult to argue convincingly that the undertaking is feasible without some preliminary data. Thus, virtually all applications contain some evidence to suggest that the approaches proposed will be successful. Regarding the work plan itself, this must include criteria that define success. The criteria need to be spelled out in such a way that at the end of the phase I studies, it will be possible to justify the investment in phase II. A detailed plan for commercialization is not required for the phase I application. However, if the reviewers of the application cannot see how the novel technology or product being developed can be commercialized, the grant will not receive a fundable score. Therefore, it behooves the investigator to link the proposed studies directly to the tangible outcomes. After introducing the subject and providing background information, a strong rationale for the work, and whatever preliminary data are available, the detailed experimental design and methods are presented. This section needs to be concise (since probably only six to nine pages will be available), but more important, a longer work plan will stray from the mission, which is to convince review panel members that the studies proposed will (if successful) demonstrate feasibility. It is generally acknowledged that a detailed work plan, no matter how well written and how logical, cannot rescue an application with a weak rationale. On the other hand, if the rationale for the project is solid, the investigator can still “shoot himself in the foot” by failing to provide an adequate research plan.
STRUCTURE OF THE SBIR GRANT PROPOSAL
531
Phase II Application At the end of the phase I period of support (normally six to 12 months, or when the investigator believes that feasibility has been demonstrated), a phase II application is submitted for a substantially larger amount of money (up to $1,000,000 or more). The phase II application has a work plan that is up to 25 pages in length. The phase II application has several important differences from the phase I application, in addition to size. A successful phase II application has several critical elements. First, the value of the project goal has to be convincing. If the review panel feels that the value of the technology being developed will, ultimately, have little scientific or economic impact, there is no chance of redeeming the project, regardless of how well written and convincing the rest of the application. Often, investigators will assume that the project’s value is already accepted, based on the fact that “they funded the phase I studies.” The reality is that review panel members are less likely to be concerned with the overall project’s value during phase I review since the amount of money awarded is small. Not so for phase II. Most reviewers are acutely aware of fund limitations and will do their best not to fund projects that seem to have no long-term payoff. Therefore, although it is critical that the project be justified at phase I, it is even more important to justify it in the phase II application. It goes almost without saying that demonstrating both scientific and economic merit is critical. The successful grant has both. The “Background and Significance” section is the place to provide this information. The “Progress Report and Preliminary Studies” section is next. It is critical that the progress report justify the continued investment. Some investigators assume incorrectly that completing the experiments precisely as outlined in the initial phase I application and obtaining the hoped-for results in these studies are sufficient. The reality is that regardless of what was proposed in the initial application, the reviewers of the phase II proposal will want to be convinced that the project has merit (as assessed by them independently) and that data presented in the “Preliminary Studies” section strongly support the phase II work plan. Often, this means including a large amount of information that is above and beyond that proposed originally. The reviewers of the phase II application will not care what was proposed in phase I per se. What they will care about is the likelihood of success in phase II. Undoubtedly, much of the data that ends up in the progress report for the phase II application will have been generated with funds over and above what was allocated in the phase I grant. The “Experimental Design and Methods” section follows. Exactly as in phase I, this is where the investigator lays out the approaches that will be used, the rationale for the approaches, and a brief description of experimental procedures. As with other grants, the rationale for the approaches to be used is more important than the actual methods. The reviewers will not want to see methodological detail unless the methods proposed are, themselves, novel. Remember—only so many pages are allowed. Pages spent describing routine
532
FUNDING BIOMARKER RESEARCH AND DEVELOPMENT
TABLE 1 Typical Phase I and Phase II SBIR Applicationa Section Specific Aims Background and Significance Preliminary Studies (Progress Report) Experimental Design and Methods
Phase I
Phase II
1 page 2–3 pages 3–5 pages 6–9 pages
1 page 3–5 pages 7–8 pages 11–14 pages
a
The maximum length for a phase I work plan is 15 pages and for a phase II work plan is 25 pages.
procedures are pages not available for convincing the reviewers that this proposal will provide critical answers to important questions. Table 1 lays out what is typically seen in the workplan of an SBIR application (phases I and II) submitted to NIH. The experimental design section of a phase II SBIR application being submitted to NIH is similar to that of an investigator-initiated NIH application (R01). The most important piece of the application is the significance. If the reviewers are not convinced that the project is worthwhile, it will be virtually impossible to resurrect the application in later sections. The second most important section is the progress report and preliminary studies section. If the reviewers are not convinced that the data support the studies proposed, they will not score the application as competitive. The experimental design section is therefore the least important of the three. Having said that, however, if the experimental design section is not well laid out, the reviewers will conclude that carrying out the studies as proposed will not answer the critical questions. An application with such a work plan will not be funded. To reiterate, it is axiomatic that the experimental design section cannot save a noncompetitive application but can sink a grant that is otherwise competitive. Everything stated above for the typical SBIR grant application can be applied to other, more traditional grant applications. Grant reviewers, particularly those who review applications for NIH, are attuned to this format. Nothing in the work plan will clearly designate this as being an SBIR application. What does distinguish an SBIR application from other grants is the business plan. Technically speaking, there is no need for a detailed business plan for a phase I application. However, somewhere in the grant, it is important to convey to the reviewers how success in the endeavor will be commercialized. Although not a major part of the phase I application, the well-developed business plan is an integral part of the phase II application. It is critical that the application describe in detail how the technology or product being developed will be brought to the marketplace. It is not a question of details but, rather, that the relevant issues have been thought through: identification of the target market, how large the market is, what actually comes to market, some idea of what it will cost to accomplish this, and so on.
STRUCTURE OF THE SBIR GRANT PROPOSAL
533
For example, one recently funded application had the goal of replacing an animal-derived protein with a fully synthetic counterpart in vaccine manufacturing protocols. Key information in the business plan included an estimate of the size of the target market (i.e., how many vaccine protocols could make use of the synthetic product, how many units of vaccine this would entail, the percentage of that market that could be expected to utilize the animal-free alternative), how much of the synthetic product would be needed to meet the market need, what regulatory challenges need to be addressed, and most important, what the projected cost would be for bringing the replacement moiety to market. Estimates for each had to be convincing (not necessarily accurate or precise). If any of these were not convincing, it would not matter how interesting the science was. The project would not generate strong enthusiasm from reviewers. In addition to addressing the specifics of any project, the reviewers want to see that the company has the necessary infrastructure to carry the project successfully beyond the end of phase II. No matter how good the technology is, there has to be a plan for commercializing it and the company requesting support must be capable not only of formulating such a plan but also for carrying it out successfully. While each business plan is unique, there are several features that most successful applications include. Some of the most important issues to address in the business plan are: • About the project • Business opportunity (who or what the market is for the technology) • Magnitude of the market • Final product or service (what the commercialized product or service will look like) • Additional research (toxicology/clinical trials, etc.) • Other regulatory hurdles • Manufacturing cost estimate • Cost of delivering the service • Where the money will come from [plan for phase III funding (private sector)] • Pricing and sales • Marketing • Company commitment to project • About the company • Company structure • Business experience and qualifications of principals • Capitalization • Strategic vision • Current products and markets
534
FUNDING BIOMARKER RESEARCH AND DEVELOPMENT
• Strategic alliances • Investment strategies • Intellectual property In summary, virtually all (open competition) grant applications to federal government agencies are highly competitive. Although it might have been thought at one time that seeking funds through the SBIR program provided a less competitive route, this is no longer the case. Only high-quality applications are funded in today’s climate. Even when the application is of the highest quality, there is a strong possibility that it will not be funded. There are many reasons for this. Needless to say, a certain percentage of the grant applications reviewed in any study section have been reviewed at least once previously. Revised Applications In today’s funding climate, very few applications of any kind are funded on the first submission. Anyone seeking support from the SBIR program faces the strong likelihood of having to rewrite and resubmit his or her proposal. Very soon after review of the initial grant is complete, the applicant is given a score. At this point, the applicant will know whether she or he is “in, out, or on the fence.” If the score is definitely not fundable, the process of restructuring the grant begins. Most applicants will think about the resubmission even if they are in that large gray area containing the grants that may or may not be funded. The strategy is to prepare for resubmission and happily drop the project if funding comes through. In any event, there is not much in the way of concrete changes that can be made until the critique sheet arrives. In the old days, this was the infamous “pink sheet.” The paper on which the critique was written was, in fact, pink. Now, like everything else, it is electronic. The critique sheet summarizes the comments of the primary and secondary reviewers and any discussion of the project in the study section. This may arrive as much as six to eight weeks after the review. The resubmission differs from the initial application in that it contains an additional section: “Introduction to Revised Application” (one page for revised phase I proposals and up to three pages for revised phase II proposals). The Introduction summarizes the major criticisms and indicates how the investigators have altered the proposal in response. It goes without saying that responding fully to the issues raised (even if they seem trivial or incorrect) is absolutely essential. One can point out (hopefully, politely) where the reviewers are incorrect and not change the grant application’s content. This may be a workable strategy in some cases. Most of the time, however, altering the application is the better choice. The introduction summarizes how the revised application has been altered as a result of the initial review. The major changes come in the appropriate sections of the grant itself. Rewriting the background and significance section may be necessary if the reviewers do not appreciate the significance of the
FUNDING BIOMARKER RESEARCH THROUGH THE SBIR PROGRAM
535
work. Additional preliminary data may be added if the reviewers are not convinced that the work is feasible. New methodology is included to bolster a challenge to the approach. These changes aside, the revised application, like the initial version, has the same goal: convincing the review panel members that the proposed research has the potential of providing a novel technical solution to a pressing problem.
FUNDING BIOMARKER RESEARCH THROUGH THE SBIR PROGRAM The discussion above applies to just about any project for which funding through the SBIR program might be sought. The following discussion relates, more specifically, to funding of biomarker development work. Developing a biomarker is, in many respects, similar to developing a therapeutic product. Both involve a focus on a particular disease or condition. While the therapeutic agent is, obviously, a tangible material that needs to be developed and tested in the laboratory, validated in a clinical study, and then manufactured to specifications, and so on, developing a biomarker has similar issues. Included are (1) demonstrating the possible relationship between a disease state and an expression of the putative marker; (2) establishing that the putative biomarker can be identified and quantified in tissue, plasma, or urine; (3) developing reagents that can be used reliably and conveniently to assess the marker at the appropriate site; and (4) conducting preliminary and definitive clinical investigations to demonstrate the feasibility of the marker and the utility of the reagents developed. The critical question to ask is this: At what stage in the process of developing a biomarker could one expect to obtain funding through the SBIR program? Another way to phrase the question: What specific question(s) should be put forward in the grant application? At the theoretical end of the spectrum, one might hypothesize that a particular moiety would be a good biomarker (diagnostic or prognostic) for a particular condition, and then craft an application around demonstrating that such was the case. This might be a candidate for SBIR funding since it is logical to assume that if the moiety could be shown to have biomarker potential, “the commercial ramifications would be obvious.” However, reviewers of a grant with such a focus might determine this to be too basic in nature. Slightly less theoretical would be a study to demonstrate that a moiety which was already presumed to be associated with a disease state could be identified reliably (and quantified) in tissue, plasma, or urine. Phrasing the question in this way moves the project one step closer to commercialization. The problem with this approach would be, of course, that one would have to know at the start that factor X was, indeed, a biomarker candidate. Moving even closer to the commercialization end of the spectrum, one could argue that the moiety in question was a good candidate and that there was solid evidence that it could be measured in some relevant biological sample. The issue for the grant would
536
FUNDING BIOMARKER RESEARCH AND DEVELOPMENT
be developing reagents and assay procedures necessary for doing this reliably. Clearly, such a project would not seem too theoretical for the SBIR program. On the other hand, some might consider this to be “product development” and not innovative. Finally, one could propose clinical studies to confirm that demonstrating the presence of a particular moiety in a biologically relevant specimen would have significant predictive value. Such a study would not be thought of as product development, but the “product” aspect (i.e., the commercial potential) would easily be understood. Additionally, such an application would lend itself to SBIR support in that a preliminary assessment could be made in the phase I portion of the grant and a more definitive study during phase II. The downside of such an approach is that by the time that the project was funded through the SBIR grant, millions of dollars would already have been invested in it. Funding at this stage is not what most companies are looking for in an SBIR grant. Furthermore, even a phase II award would provide far less money than required for most clinical studies. Most companies that had gotten to this stage would be less interested in spending the time and effort needed for the amount of money that would be obtained if successful. Regardless of the stage at which one approaches a government agency for SBIR funding, it is critical to remember that this a research proposal, not a contract proposal. The two are fundamentally different. The successful grant proposal must have an overall goal (i.e., a work plan tied to a testable hypothesis). Achieving the overall goal must be highly desirable, but not assured. The phase I portion of the SBIR grant is designed, a priori, to provide evidence that a novel idea has merit. If the feasibility has already been demonstrated, there is no “testable” hypothesis. Without this, there is no grant. Often, what comes to the SBIR grant reviewers is a “classical” contract proposal. There is a proposed plan of work. The plan will be followed “to the letter” regardless of findings. There is no testable hypothesis. The problem with such a proposal (from the standpoint of the SBIR program) is that, by definition, it is not innovative. The critique of such an application will undoubtedly contain comments such as “this is development rather than research” and “this should be funded through company resources.” The end result is: no money. The antithetical issue can be just as problematic. To reiterate, although the reviewers of an SBIR grant will want to see innovative research, the project will not be funded if it consists entirely of research, regardless of how innovative. The successful SBIR application must straddle a fine line between basic research and development. The successful application must not only present an interesting hypothesis and a detailed plan, but a road map of how verifying the hypothesis will translate directly into tangible benefit. Another issue that is pertinent to biomarker development (as well as to the development of a therapeutic) is the nature of the problem for which the biomarker or therapeutic is being developed. Conditions that are primarily cosmetic or of a trivial medical nature, as opposed to serious medical conditions, will not elicit much enthusiasm, regardless of how novel the research is.
FUNDING BIOMARKER RESEARCH THROUGH THE SBIR PROGRAM
537
Even if the technology ultimately developed has the potential to be economically viable, interest will be low. In a like manner, interest will be low if other diagnostic or prognostic markers are already available for the condition under study. Of course, demonstrating that the new biomarker is clearly superior to existing diagnostic or prognostic markers in some way would mitigate this criticism. This is especially true if existing markers are a part of a foreignowned technology. Disease biomarkers are, by definition, moieties whose detection and/or quantification has predictive value in relation to the disease. In most cases, moieties with significant predictive value have such value because they are related in some way to disease pathophysiology. Thus, it is a good strategy, if possible, to combine mechanistic studies with the more technology-driven aspects of the research program. It would not be a good idea to focus entirely, or even primarily, on the mechanistic studies, as the reviewers would question the tangible benefits of the work. Nonetheless, mechanistic studies in the appropriate context always help in that they raise the overall interest level of the reviewers and provide a measure of credibility to the investigators. It goes almost without saying, of course, that weak mechanistic studies are worse than none at all. Only the investigative team can know how their proposal is likely to be helped or hindered by whatever mechanistic studies they can incorporate into their application. There are a number of additional issues that should be considered before trying to fund a biomarker study through the SBIR program. As indicated above, the feasibility portion (phase I) of the grant may be limited to $100,000 or less. Any study that involves obtaining and processing clinical specimens quickly becomes very expensive. It is important to understand what can be accomplished with the level of funding expected and whether this will satisfy the reviewers at phase II that feasibility has been demonstrated. One can argue all he wants that what was accomplished was limited by the funding available. This reasoning won’t, however, be persuasive to reviewers of the phase II application. The only thing they will want to be convinced of is that feasibility has been demonstrated. Given this reality, much of what is accomplished during phase I in many studies is supported only partially by the phase I application. Cost sharing with retained company earnings is common. This is not necessarily good or bad; the point is to be prepared to partially fund the work independent of the SBIR grant. Another consideration is the timing. Corporate investigations, particularly at the initiation of projects, are fast-paced compared to the workings of any federal grant program. The SBIR program was designed to have a short lag between application and funding, but it is still a slow process. There are typically three deadlines spaced throughout the year. Should a company intend to meet the deadline for a phase I submission in April, much of February and March will be consumed with putting together a high-quality application. That assumes, therefore, that the idea for the project has already been generated and thought through (at least in a preliminary way). Whatever preliminary
538
FUNDING BIOMARKER RESEARCH AND DEVELOPMENT
data are to be included will also have to have already been generated. If the grant submitted for the April deadline is approved for funding without revision, the earliest time that funding would be available is the latter part of the year. Today, very few grants are funded on the first cycle. What this means is that even a high-quality application for an interesting and supportable project will probably need to be resubmitted at least once. The grant submission and review cycles are constructed so that a revised application cannot be submitted for the deadline immediately following the cycle in which it was originally submitted. The earliest that a revised application could be resubmitted for one initially put forward in April would, therefore, be December. Assuming that the applicant responded (successfully) to all of the initial criticisms and was approved for funding, June would be about the earliest that one could expect the money to arrive. Thus, a gap of one and a half years between conception of the grant and funding is not unreasonable. Often, the criticisms are such that two or more deadlines pass between the initial application and the revised submission. Often, a second resubmission is required. When funding for the phase I application is attained, the period of support is normally up to one year. This is the period in which the studies needed to demonstrate feasibility are conducted. Often, they may not be completed within the one year of support. In that case, there is no reason why additional experiments, as necessary, cannot be carried out with company funds. It is simply a question of time and money. If the idea has merit, this is often what occurs. At whatever time it is deemed that sufficient evidence for feasibility is in hand, the phase II application is written. Again, assume one or two months for putting the application together to meet one of the three annual deadlines. Again, assume one or even two rewrites before it is approved, bringing the gap between the end of phase I and the restart of phase II at between one and a half to two and a half years. The point is that a small company should not intend to support its R&D effort on funding through the SBIR program alone. Rather, it is better to think of the program as a way to supplement ongoing research activities with new initiatives, and as a way to provide additional resources for essentially untried and inherently risky endeavors.
SUMMARY In summary, projects that are good bets for SBIR funding straddle a fine line between innovative research and product/technology development. If the reviewers deem the project to be too far to one side or the other, the outcome will be poor. Projects that do well in review also have a well-defined business plan: a plan to take the product or service from successful research to commercialization. Projects that actually go to completion generally have a mix of funding. Often, the SBIR-funded portion is small. Finally, the time from project initiation to successful commercialization is usually quite long when
REFERENCES
539
clinical studies and regulatory approval are mandated. The SBIR program can provide the impetus for project initiation, in some cases cutting the time to successful commercialization. More important, the seed money provided through an SBIR application will, if used in accord with the way the program is designed to work, provide the impetus for certain high-risk approaches that would not be undertaken in the absence of such funding.
REFERENCES 1. SBA (1991). Results of three-year commercialization study of the SBIR program. SBA Small Business: Building America’s Future. U.S. Small Business Administration, Washington, DC. 2. Wessner CW (ed.) (1999). The Small Business Innovative Research Program: Challenges and Opportunities. Board on Science, Technology and Economic Policy. National Research Council, Washington, DC. 3. Wessner CW (ed.) (2008). An Assessment of the SBIR Program. Committee on Capitalizing on Science, Technology, and Innovation: An Assessment of the Small Business Innovation Research Program, National Research Council. National Academies Press, Washington, DC. 4. Wessner CW (ed.) (2007). An Assessment of the Small Business Innovative Research Program at the National Institutes of Health. Committee on Capitalizing on Science, Technology, and Innovation: An Assessment of the Small Business Innovation Research Program, National Research Council. National Academies Press, Washington, DC. 5. Wessner CW (ed.) (2007). SBIR and the Phase III Challenge of Commercialization: Report of a Symposium. Committee on Capitalizing on Science, Technology, and Innovation: An Assessment of the Small Business Innovation Research Program, National Research Council. National Academies Press, Washington, DC.
30 NOVEL AND TRADITIONAL NONCLINICAL BIOMARKER UTILIZATION IN THE ESTIMATION OF PHARMACEUTICAL THERAPEUTIC INDICES Bruce D. Car, B.V.Sc., Ph.D., Brian Gemzik, Ph.D., and William R. Foster, Ph.D. Bristol-Myers Squibb Co., Princeton, New Jersey
INTRODUCTION Accurate projection of the safety margins of pharmaceutical agents from late discovery and early nonclinical development phase studies, including in vitro and animal studies, to humans is fundamental to the first decision to move compounds forward into the clinic. The robustness of those estimates over time is also central to the ability to conduct proof-of-concept studies in humans in phases IIa and IIb. After sufficient clinical experience, direct human information renders the nonclinical projections redundant. When nonclinical projections are discrepant with safe clinical exposures, discovery strategies for backup compound selection should be adjusted appropriately. The essential elements of this work are well-defined no-effect levels (NOELs) or no-adverse effect levels (NOAELs) and lowest observed effect levels (LOELs) or IC50s/EC50s, if a molecular off-target is known, both as
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
541
542
NOVEL AND TRADITIONAL NONCLINICAL BIOMARKER UTILIZATION
unbound drug and plasma protein–bound drug concentrations, expressed as area under the curve (AUC), Cmax, or concentration at a defined time point. A ratio is calculated from the relevant parameter (Cmax, AUC, time above a
TABLE 1 Tiers of Biomarkers Toxicity Endpoint Hepatocellular necrosis Renal tubular injury
Traditional Biomarkers Histopathology, ALT, AST, SDH, LDH BUN
Renal glomerular injury Seizure
Creatinine
Retinal degeneration Systemic phospholipidosis
Histopathology Histopathology and electron microscopy
Small intestinal mucous metaplasia of gamma secretase inhibitors Na channel inhibition
Histopathology and clinical signs of diarrhea QRS prolongation
hERG channel inhibition
QT interval prolongation
Mutagenicity
Ames assay positivity
Carcinogenesis
Malignant tumors in chronic toxicity studies or two-year bioassays Positive findings in development and reproductive segment II studies
Teratogenicity in rats or rabbits
Observational finding
Novel or Secondary Endpointa Markers of apotosis, gene signature, circulating RNA1 Cystatin c, urinary alpha GST, urinary GGT, urine sediment Urine protein electrophoresis Electroencephalographic seizure, repetitive sharp waves Electroretinography Evidence of organ dysfunction or PL storages in leukocytes, metabonomic2 or transcriptomic3 profiles Peripheral blood genomic biomarkers of Notch signaling inhibition4 Decreased dP/dT determined telemetrically Integration of ion-channel effects in Purkinje fiber assay Genomic biomarker of DNA repair Gene signatures predictive for carcinogenicity5–9 In vitro whole embryo culture recapitulating effects at same concentrations
a1 Miyamoto et al., 2008; 2Delaney et al., 2004; 3Sawada et al., 2005; 4Milano et al., 2004; 5EllingerZiegelbauer et al., 2008; 6Fielden et al., 2007; 7Fielden et al., 2008; 8Nie et al., 2006; 9Andersen et al., 2008.
INTRODUCTION
543
certain concentration, IC50) between the NOEL or NOAEL values and that same parameter at the efficacious human dose projected. The science of the projection of the clinical dose has become more refined (Huang et al., 2007), allowing preclinically determined therapeutic indices to have greater predictive value. This ratio is the safety margin. A safety margin is considered a therapeutic index if a relevant efficacy or pharmacodynamic endpoint is included in the animal study. The two terms are frequently used interchangeably, although therapeutic indices by nature of their derivation should generally be considered better predictors of drug safety. Considerable skill is required for accurate human pharmacokinetic and pharmacodynamic prediction to facilitate accurate estimation of therapeutic index. Superimposed on this numerical calculation is the toxicologist’s understanding of the therapeutic index that incorporates the severity of the LOELs or other higher-dose effects, its reversibility, how cumulative exposure influences the safety margin estimate over time, and the ability to monitor in clinical evaluation. A therapeutic index of greater than 1 in at least two nonclinical species indicates that a compound can generally be given to humans safely up to the efficacious concentrations projected. Therapeutic indices below 1, which frequently occur with oncologics, alert the clinician that toxicities should be expected and monitored for at exposures below those projected to have benefit for patients. Drug discovery toxicology and development nonclinical safety groups, together with ADME (absorption, distribution, metabolism, and elimination) and discovery pharmacology groups, influence the progression of compounds through the progressive refinement of estimates of their therapeutic indices. Traditional therapeutic indices calculated from findings in toxicology studies are broad in nature, including such diverse endpoints as liver necrosis, seizure, or prolongation of the electrocardiographic QT interval. Such types of endpoints are unambiguous, and may be refined further with an additional tier of biomarker information, such as clinical chemistry, electroencephalography, or advanced electrocardiography (e.g., instrumented animals monitored by telemetry). When novel biomarkers prove to be more sensitive in detection and time relative to traditional biomarkers, they may supplant or be used to supplement traditional approaches. Examples of these differing tiers of biomarkers are provided in Table 1. When pharmacodynamic or efficacy endpoints are available for a nonclinical species and form the denominator of the therapeutic index equation, the most predictive safety margins may be calculated, assuming that the particular species has exposure, metabolite profiles, and other ADME characteristics similar to those of humans. Creative research in applications or technologies to validate such endpoints in nonclinical species greatly enhances the predictive power of animal models of toxicity. Several examples of these are included in Table 2.
544
NOVEL AND TRADITIONAL NONCLINICAL BIOMARKER UTILIZATION
TABLE 2 Efficacy Endpoints of Traditional and Novel Pharmacodynamic Biomarkers
Efficacy Endpoint
Traditional Pharmacodynamic Biomarker
Novel Pharmacodynamic Biomarker
Decreased anxiety, depression Cognition improvements
Neurobehavioral change
CNS receptor occupancy
Learning tasks
Improving Alzheimer dementia Cancer xenograft regression Immune modulation of rheumatoid arthritis
Peripheral blood exposure and CNS Aβ (ex vivo) Size and weight of xenograft Arthritis scores (ACS 20, 50, 70)
Altered phosphorylation of proteins in key pathways CSF Aβ concentrations Altered phosphorylation of proteins in key pathways Plasma cytokines, FACS, leukocyte transcriptomics
IN VITRO THERAPEUTIC INDICES When pharmacology targets and homologous or nonhomologous secondary off-targets are molecularly well defined, the temptation to calculate ratios of IC50s at desired efficacy endpoints to safety endpoints leads to the creation of in vitro therapeutic indices. Typically, these are large numbers that lull many a discovery working group into a false sense of security. Two examples are provided based on real outcomes, for which large in vitro ratios would potentially create an illusion of greater safety. For example, an oncology compound had a hERG (Ikr; repolorizing K+ current) IC50 value of 35 μM and a target receptor IC50 value of 10 nM, ostensibly providing a 3500-fold safety window. Caveats for the use of these simple formulas are: • Plasma protein binding if pharmacologic or toxicologic activity relates to the free fraction • Relative concentration in tissue may exceed plasma by many fold • Tissue-bound drug, and thus tissue concentration at efficacy and toxicity targets can be difficult to determine and may influence the expression of efficacy or toxicity • Cmax/trough ratio. • Typically, one uses IC50 values for ion channels, although inhibition of hERG at an IC10 may still produce clinically important prolongation in the QT interval • The efficacy and toxicity of metabolites After integrating these various considerations to the calculation of a safety margin, and considering the simple unknown, which is the biological counter-
NOVEL METABONOMIC BIOMARKERS OF TOXICITY
545
part to in vitro activity when measured in vivo, the safety margins for QT prolongation due to hERG inhibition were in the five to tenfold range. In a second example, a compound with IC50 for phosphodiesterase (PDE4) intravenously of 2 μM was considered safe for a central nervous system efficacy target with EC50 of 1000-fold less (2 nM). At the lowest dosage tested in animals, projected to provide a threefold safety multiple, portal vasculitis was observed in rodents, considered likely to be secondary to PDE4 inhibition at a plasma Cmax of only 70 nM. This toxicity is generally driven by Cmax, and as peak concentrations occur in the portal vasculature during absorption of drug from the small intestine, the potential toxicity can be markedly exaggerated relative to in vitro determined safety windows. Frequently, the exposure–response relationship for in vitro surrogates of in vivo toxicity can be quite poor. Although this makes prediction of valid safety windows difficult, in vitro–determined numbers are still valuable in permitting the rank ordering of compounds within a chemical series for selection, and for refining in vivo assessments.
NOVEL METABONOMIC BIOMARKERS OF TOXICITY The application of systems biology technologies, including metabonomics, proteomics, and transcriptomics to biomarker development, is a nascent science, with relatively few examples of the impactful prospective uses of these technologies (Lindon et al., 2004; Robertson, 2005; Car, 2006). The following example describes how understanding the mechanism of rat urinary bladder carcinogenesis combined with metabonomic profiling of urine, yielded a mechanism-specific biomarker that could be evaluated in studies with patients. Muraglitizar-Induced Urinary Bladder Transitional Cell Carcinoma Peroxisome proliferator-actuated receptors (PPARs) are nuclear hormone receptors targeted for therapeutic modulation in diabetes. Specifically, PPARα agonism will control dyslipidemia, while PPARγ agonism affords improved glucose homeostasis. Nonclinical and clinical safety issues have prevented PPARαγ agonists from becoming drugs (Balakumar et al., 2007, Rubenstrunk et al., 2007). The results of two-year rodent carcinogenicity studies, including hemangiosarcoma, liposarcoma, and urinary bladder transitional cell carcinoma, have generally clouded a clear human risk assessment. With widespread distribution of PPARα and PPARγ receptors in tissues, including those transformed in carcinogenesis, a clear separation of the potentially beneficial role of receptor agonism from the potentially adverse contribution to tumor development is complex to research and understand. The investigative approaches directed toward establishing a cogent human risk assessment for dual PPAR agonist–induced urinary bladder transitional cell carcinoma in rodents are described here for the PPARα/γ agonist mura-
546
NOVEL AND TRADITIONAL NONCLINICAL BIOMARKER UTILIZATION
glitazar (Dominick et al., 2006; Achanzar et al., 2007; Tannehill-Gregg et al., 2007; Waites et al., 2007). An increased incidence of ventral bladder wall transitional cell papillomas and carcinomas of the urinary bladder were noted in rats at doses as low as eight-times the projected human exposure at 5 mg/kg (Tannehill-Gregg et al., 2007). Histopathology and scanning electron microscopy revealed early microscopic injury associated with the presence of calcium phosphate crystals. Crystalluria was confirmed in studies designed to document the fragile and sometimes transient crystals in male rats dosed with muraglitazar. The crystal-induced epithelial injury was hypothesized as initiating the increased turnover as confirmed in BrdU-labeling experiments of the ventral bladder urothelium, a proliferative response strongly suspected in the genesis of tumor development. To determine the potential role of crystalluria in injury, and carcinogenesis, crystals were solubilized in rats through urinary acidification with 1% dietary ammonium chloride. Urinary acidification of male rats dosed with muraglitazar abrogated crystalluria, early urothelial injury, and cell proliferation (urothelial hyperplasia), and ultimately, urinary bladder carcinogenesis. This mode of action is recognized as a nongenotoxic mechanism of urinary bladder carcinogenesis in rats (Cohen, 1999). To evaluate a potential role for pharmacology, the regulation of genes downstream of PPARα and PPARγ in the rat bladder urothelium were evaluated in the presence of PPARαγ agonist–treated crystalluric and acidified diet, noncrystalluric rats. No changes in gene expression or traditional endpoints were observed, suggesting that PPAR-mediated changes were not directly causative in urothelial profileration or carcinogenesis (Achanzar et al., 2007). To investigate further the mechanism of muraglitazar-induced crystalluria, urine samples were collected from treated rats for metabonomic analysis. NMR spectroscopic evaluation of urine from treated compared to control rats revealed a striking reduction in divalent acids, including citrate and 2oxoglutarate. Subsequent analytical-grade analyses of urinary citrate to creatinine concentrations confirmed and extended these metabonomic findings. It was hypothesized that male-rat-specific decreased urinary excretion of divalent acids, and in particular citrate, contributed to a milieu highly permissive of calcium phosphate crystal formation. Based on the results of studies conducted in rats, a final set of experiments examined the absolute excretion of citrate in urine from humans treated with muraglitazar. No reductions in citrate concentrations were observed across many patients compared to placebo and pretest populations. Therefore, a research strategy based on determining the role of urinary crystallogenesis in rats suggested that muraglitazar was unlikely to pose any risk to humans in inducing the early procarcinogenic change observed in rats. The muraglitazar example demonstrates how preclinical metabonomics evaluations may identify biomarkers with potential clinical impact; however, the potential for this technology to yield specific and sensitive individual or multiple-entity biomarkers is also largely unrealized (Lindon et al., 2004; Robertson, 2005; Robertson et al., 2007).
NOVEL TRANSCRIPTOMIC BIOMARKERS
547
NOVEL TRANSCRIPTOMIC BIOMARKERS Disease- and toxicity-specific transcriptional and metabonomic biomarkers are an as yet largely untapped reservoir; however, publications investigating such biomarkers have become increasingly visible in the literature (Fielden and Kolaja, 2006; Foster et al., 2006; Robertson et al., 2007). In a retrospective review of several years of toxicogenomic analyses of drug safety studies, biomarkers of pharmacology were readily identified in 21% of studies (40% of drug targets) (Foster et al., 2006). An unvalidated version of such an mRNA signature is that of proliferation inhibition observed consistently in the liver of rats given oncologics. A set of such genes is illustrated in Table 3. These observations can readily be adapted to transcriptomic signatures and used to determine which doses, for example in a toxicology study, demonstrate compound efficacy. When combined with traditional endpoint data, a therapeutic index may then be derived. This approach is particularly useful when the pharmacology of a compound has not been evaluated in the test species. Although early transcriptomic signatures consistent with previously identified pathology are frequently observed (in approximately 50% of studies of target tissues profiled at times preceding pathology), the target tissues involved are rarely analyzed transcriptionally such that finding valid predictive signatures will continue to be problematic (Fielden and Kolaja, 2006). Transcriptomic signatures may also provide insight toward pharmacologic effect in distinct patient groups. The demonstration of increased expression of wild-type and mutant Kras by both immunohistochemistry and transcriptional profiling or real-time polymerase chain reaction (RT-PCR) led to the hypothesis that the selection of certain patients dosed with EGFR inhibitors based on the expression and presence of mutation could markedly increase the responder rate of patients (Lièvre et al., 2006; Di Nicolantonio et al., 2008). The ability to triage patient groups and eliminate patients for whom potentially toxic medicines offer no benefit is clearly a huge advance in the practice of oncology.
TABLE 3
Genes Commonly Changed by Diverse Oncologic Agents in Rat Liver
Gene
Transcriptional Change
Gene Function
Gene Name
Rrm2
Repression
Proliferation
Cdc2a
Repression
Proliferation
Cdkn1a
Repression
Proliferation
Ccnb1 Dutp Csnk1a1
Repression Repression Induced
Proliferation Proliferation Cell survival
Ribonucleotide reductase M2 Cell division cycle 2 homolog A Cyclin-dependent kinase inhibitor 1A Cyclin B1 Deoxyuridine triphosphatase Casein kinase 1, α 1
548
NOVEL AND TRADITIONAL NONCLINICAL BIOMARKER UTILIZATION
CONCLUSIONS Accurate determination of therapeutic indices from nonclinical studies across multiple species, overlain with an understanding of the human risk associated with nonclinically identified liabilities, provides an invaluable tool for advancing compounds with reduced potential for harm and reduced likelihood of attrition for safety concerns. Novel approaches for identifying and validating biomarkers combined with highly refined clinical dose projections will allow toxicologists to predict and therefore avoid clinically adverse outcomes with increasing accuracy.
REFERENCES Achanzar WE, Moyer CF, Marthaler LT, et al. (2007). Urine acidification has no effect on peroxisome proliferator–activated receptor (PPAR) signaling or epidermal growth factor (EGF) expression in rat urinary bladder urothelium. Toxicol Appl Pharmacol, 223:246–256. Andersen ME, Clewell H, Bermudez E, Wilson AG, Thomas RS (2008). Genomic signatures and dose-dependent transitions in nasal epithelial response to inhaled formaldehyde in the rat. Toxicol Sci, 105:368–383. Balakumar P, Rose M, Ganti SS, Krishan P, Singh M (2007). PPAR dual agonists: are they opening Pandora’s Box? Pharmacol Res, 2:91–98. Car BD (2006). Enabling technologies in reducing drug attrition due to safety failures. Am Drug Discov, 1:53–56. Cohen SM (1999). Calcium phosphate-containing urinary precipitate in rat urinary bladder carcinogenesis. IARC Sci Publ, 147:175–189. Delaney J, Neville WA, Swain A, Miles A, Leonard MS, Waterfield CJ (2004). Phenylacetylglycine, a putative biomarker of phospholipidosis: its origins and relevance to phospholipid accumulation using amiodarone treated rats as a model. Biomarkers, 3:271–290. Di Nicolantonio F, Martini M, Molinari F, et al. (2008). Wild-type BRAF is required for response to panitumumab or cetuximab in metastatic colorectal cancer. J Clin Oncol, 26:5705–5712. Dominick MA, White MR, Sanderson TP, et al. (2006). Urothelial carcinogenesis in the urinary bladder of male rats treated with muraglitazar, a PPAR alpha/gamma agonist: evidence for urolithiasis as the inciting event in the mode of action. Toxicol Pathol, 34:903–920. Ellinger-Ziegelbauer H, Gmuender H, Bandenburg A, Ahr HJ (2008). Prediction of a carcinogenic potential of rat hepatocarcinogens using toxicogenomics analysis of short-term in vivo studies. Mutat Res, 637(1–2):23–39. Fielden MR, Brennan R, Gollub J (2007). A gene expression biomarker provides early prediction and mechanistic assessment of hepatic tumor induction by nongenotoxic chemicals. Toxicol Sci, 99:90–100. Fielden MR, Kolaja KL (2006). The state-of-the-art in predictive toxicogenomics. Curr Opin Drug Discov Devel, 9:84–91.
REFERENCES
549
Fielden MR, Nie A, McMillian M, et al. (2008). Interlaboratory evaluation of genomic signatures for predicting carcinogenicity in the rat: Predictive Safety Testing Consortium; Carcinogenicity Working Group. Toxicol Sci, 103(1):28–34. Foster WR, Chen SJ, He A, et al. (2006). A retrospective analysis of toxicogenomics in the safety assessment of drug candidates. Toxicol Pathol, 35:621–635. Huang C, Zheng M, Yang Z, Rodrigues AD, Marathe P (2007). Projection of exposure and efficacious dose prior to first-in-human studies: how successful have we been? Pharm Res, Sept 25. Lièvre A, Bachet JB, Le Corre D, et al. (2006). KRAS mutation status is predictive of response to cetuximab therapy in colorectal cancer. Cancer Res, 66:3992–3995. Lindon JC, Holmes E, Nicholson JK (2004). Metabonomics: systems biology in pharmaceutical research and development. Curr Opin Mol Ther, 6:265–722. Miyamoto M, Yanai M, Ookubo S, Awasaki N, Takami K, Imai R (2008). Detection of cell-free, liver-specific mRNAs in peripheral blood from rats with hepatotoxicity: a potential toxicological biomarker for safety evaluation. Toxicol Sci, 106(2): 538–545. Milano J, McKay J, Dagenais C, et al. (2004). Modulation of notch processing by gamma-secretase inhibitors causes intestinal goblet cell metaplasia and induction of genes known to specify gut secretory lineage differentiation. Toxicol Sci, 82:341–358. Nie AY, McMillian M, Parker JB, et al. (2006). Predictive toxicogenomics approaches reveal underlying molecular mechanisms of nongenotoxic carcinogenicity. Mol Carcinog, 45(12):914–933. Robertson DG (2005). Metabonomics in toxicology: a review. Toxicol Sci, 85:809–822. Robertson DG, Reily MD, Baker JD (2007). Metabonomics in pharmaceutical discovery and development. J Proteome Res, 6:526–539. Rubenstrunk A, Hanf R, Hum DW, Fruchart JC, Staels B (2007). Safety issues and prospects for future generations of PPAR modulators. Biochim Biophys Acta, 1771:1065–1081. Sawada H, Takami K, Asahi S (2005). A toxicogenomic approach to drug-induced phospholipidosis: analysis of its induction mechanism and establishment of a novel in vitro screening system. Toxicol Sci, 83:282–292. Tannehill-Gregg SH, Sanderson TP, Minnema D, et al. (2007). Rodent carcinogenicity profile of the antidiabetic dual PPAR alpha and gamma agonist muraglitazar. Toxicol Sci, 98:258–270. Waites CR, Dominick MA, Sanderson TP, Schilling BE (2007). Nonclinical safety evaluation of muraglitazar, a novel PPARalpha/gamma agonist. Toxicol Sci, 100:248–258.
31 ANTI-UNICORN PRINCIPLE: APPROPRIATE BIOMARKERS DON’T NEED TO BE RARE OR HARD TO FIND Michael R. Bleavins, Ph.D., DABT Michigan Technology and Research Institute, Ann Arbor, Michigan
Ramin Rahbari, M.S. Innovative Scientific Management, New York, New York
INTRODUCTION Biomarkers have entered the drug development process with great fanfare, with the impression that this is the first time for a new approach. Projections of solving virtually all major issues faced in drug development, and the potential of new technologies revolutionizing medicine, have been promised by many specialists. In reality, biomarker assays represent the logical progression and integration of laboratory techniques already available within clinical pathology and biochemistry, with additional methods arising from new technologies such as molecular biology, genomics, proteomics, and metabonomics. The emphasis on biomarkers as a new approach often has lead to expectations that the ideal biomarker must be novel and/or exotic. To answer the pertinent questions during drug development, teams often embark on the quest for the “unicorn” of biomarkers, sometimes resulting in the best and most practical test being overlooked as the search progresses for elusive methods. Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
551
552
ANTI-UNICORN PRINCIPLE
When working with drug development teams, especially those with limited hands-on experience in biomarker applications, it is imperative that the business need and scientific rationale for including a biomarker in the development plan be elaborated prospectively. By clearly defining both the question(s) to be answered and how a biomarker will advance the new drug’s development, the likelihood of successful implementation to advance the compound and save time and laboratory resources can be dramatic. In most instances the most expedient biomarker approach will be identified by focusing on what is necessary to advance the compound at its particular stage of development, independent of how that parameter will be measured. Establishing precisely what is needed versus what would be nice to have may reveal opportunities not initially obvious to all participants. Building a new biomarker on an emerging technology generally should be a last resort. Acceptance, reproducibility, quality control, standards, automation, and assessing laboratory-to-laboratory differences become exponentially more complex to characterize with the novelty of technology and lower numbers of groups using those approaches. For translation of a potential biomarker from the bench to the bedside, simpler in all aspects is preferable. Simpler tests to administer and evaluate do not mean less scientifically sound or relevant, and can provide a more solid foundation for acceptance in clinical and regulatory environments. This is not to say that the new technologies aren’t braving new territory and having a significant impact on drug safety and development [1]. The emphasis of this chapter is to show how casting a wide net for ideas and approaches can expedite a compound’s progress and improve decision making, keeping in mind that even less novel technologies will often be the best choice. Under ideal conditions, and in the instances where the quest is for a decision-making biomarker, there should be solid prospective agreement to actually affect the drug’s progress based on the biomarker results. To that end, scientists should keep in mind that the primary purpose of biomarkers is to enable better decisions. A better decision is one that can be made more confidently, earlier, less invasively, more efficiently, or the test is transferable to reference laboratories as the drug enters phase II or later. Therefore, it is essential that people supporting preclinical and clinical teams focus on the test(s) that aid in definitively graduating a compound to its next stage of development or to prove nonviability of the development efforts and compound attrition. This needs to be done without concern as to whether the biomarker derives from exciting new technology or is a new application of an established method using conventional science. In reality, if a team will not expedite, realign, or terminate a compound’s development based on the biomarker results, inclusion of a non-decision-enabling biomarker is unlikely to serve a purpose other than to increase costs or complicate study design. Sometimes the appropriate biomarker is a “unicorn” (exciting, rare, and exotic), but more often it is a “horse” (understood, generally accepted, and available) or a “mule” (proven performance, hybrid, and versatile). In bio-
UNICORN BIOMARKERS
553
marker selection, the primary consideration must be whether the test(s) is the best solution for the situation at hand. As a part of the biomarker development package, the biological rationale and significance (as understood at that stage), in addition to confidence in the platform, have to be evaluated. Another important factor is time; the development of a novel biomarker can be as complicated and time consuming as developing a new chemical entity. This can be essential criteria in circumstances where multiple companies are working in the same areas or patent timelines are not ideal. Identification, development, testing, characterization, and scaling of a new biomarker, for any one stage of the drug development process, generally requires at least six to nine months. By looking beyond the experience base or capabilities of one group or laboratory, it may be that your search for unicorns ends most appropriately by locating a horse residing in the stable of someone else. This chapter highlights examples of unicorn, horse, and mule biomarker approaches.
UNICORN BIOMARKERS Advances in technology that enable new applications can be valuable tools in addressing difficult questions of activity, efficacy, and safety for drug developers. The novelty of these approaches, and the high-tech equipment, create excitement and interest. Several new medicines actually owe their development and successful clinical utilization to tools that did not exist 25 years ago. Gleevec (imatinib) was developed by Novartis to treat chronic myeloid leukemia (CML). This molecule, and two subsequent tyrosine kinase inhibitors for CML (Sprycel, Tasigna), were designed using imaging and molecular modeling specifically to target cells expressing the mutant BCR-ABL protein underlying the disease origin. The altered chromosome size arising from the balanced translocation between chromosomes 9 and 20 was a key feature in identifying genetic involvement in leukemia [2,3]. This formation of the Philadelphia chromosome, and the discovery that translocation resulted in the chimeric BCR-ABL gene, created new opportunities to target the resulting BCR-ABL oncoprotein and its role in hematologic cell proliferation [4,5]. The role of BCR-ABL and several other proteins in cell cycle progression is reviewed by Steelman et al. [6]. Cloning, mutational analysis, sequencing, and animal models [4,7–10] have proven to be valuable tools in designing effective treatments to CML. These techniques have also been important in developing the next-generation medicines for this disease since mutations in BCR-ABL arise in 50 to 90% of patients and result in resistance to imantinib [11–13]. Genetic testing has identified specific mutations that can be targeted by new tyrosine kinase inhibitors [12–14] and provides better clinical options for patients with these mutations. The use of specific mutational genotyping is an important consideration in developing new drugs, particularly for the T315I mutation that is currently resistant to all approved tyrosine kinase inhibitor compounds for CML. In addition to BCR-ABL mutational analysis, monitor-
554
ANTI-UNICORN PRINCIPLE
ing the phosphorylation status of Crkl and Stat5 using flow cytometry, Western blotting, and/or enzyme-linked immunosorbent assay (ELISA) techniques provides biomarkers of compound activity [15,16]. Pharmacogenetics also was an important aspect in the target selection and development of the CCR5 antagonist class of anti-HIV drugs, including the Pfizer, Inc. compound maraviroc (Selzentry). The observation that persons homozygous for the CCR5-Δ32 mutation were resistant to the development of acquired immune deficiency syndrome (AIDS) was shown mechanistically to result from inhibiting the human immunodeficiency virus (HIV) binding to the mutated receptor and entering T-helper lymphocytes [17–20]. Maraviroc binds selectively to the human chemokine receptor CCR5 present on the cell membrane, preventing the interaction of HIV-1 gp120 and CCR5 necessary for CCR5-tropic HIV-1 to enter cells [21]. CXCR4-tropic and dual-tropic HIV-1 are not inhibited by maraviroc. Genetic testing for the CRR5-Δ32 deletion to stratify clinical trial subjects was useful in determining whether small molecules showed differential activity in these groups, for establishing inclusion and exclusion criteria, and the characterization of safety [22]. These data were also key components in the successful registration of maraviroc in 2007. As HIV treatment has advanced, genetic assays for CCR5 mutations, as well as tropism for CXCR4 or CCR5 co-receptor, are proving useful in optimal use of these co-receptor antagonists. In fact, the Selzentry label [21] recommends tropism testing to identify appropriate candidates and states that “use of Selsentry is not recommended in patients with dual/mixed or CCR4-tropic HIV-1, as efficacy was not demonstrated in a phase II study of this patient group.” The Trofile assay was used extensively in Pfizer’s maraviroc phase III trials and measures the ability of the patient’s specific virus envelope gene to effect entry into cells. This biomarker uses amplified RNA to establish a patient’s HIV genome, followed by an assessment of that genome to infect CCR5- and CXCR4-expressing cell lines.
HORSE BIOMARKERS Factor Xa inhibitors are of therapeutic interest as antithrombotic agents because direct-acting antithrombin drugs often induce bleeding or deficits in fibrin [23–25]. By targeting a specific enzyme in the coagulation cascade, it is generally assumed that toxicity can be reduced and efficacy retained. Having an accurate indicator of bleeding risk is essential in this class of molecules since the major rate-limiting toxicities tend to be associated directly or indirectly with inhibited blood coagulation. During the development of factor Xa inhibitors, a logical biomarker of both safety and efficacy was available by measurement of Xa activity. In fact, development and decision making with this class of drug were expedited by being able to monitor factor Xa activity in both preclinical effi-
HORSE BIOMARKERS
555
cacy experiments and toxicology studies, as well as in early clinical trials. By adapting the automated human technique available in the clinical pathology coagulation laboratory to rodent and nonrodent species, direct comparisons were possible between the relative doses and plasma concentrations associated with therapeutic inhibition of clotting and exposures likely to cause unacceptably long coagulation times. At the discovery, preclinical safety, and phase I clinical stages, drug development teams had a tool that allowed rapid prioritization of molecules and decisions on dosing. Additionally, because reagents were available commercially at reasonable cost, blood volumes required were small, and instrumentation already existed for the method in clinical pathology laboratories, a reproducible means of determining a factor Xa inhibiting compound’s pharmacodynamic characteristics was readily performed in a variety of research and hospital groups. This biomarker has been useful tool for developing factor Xa inhibitors, although caution must be exercised to assess compound activity beyond the results being a reflection of plasma drug concentration. Cholesterol analysis isn’t often considered very exotic or cutting edge, but has undergone significant evolution during its application in predicting cardiovascular risk. Total cholesterol has typically been measured using enzymatic, immunochemical, chemical, precipitation, ultracentrifugation, and column chromatography methods [26,27]. Since there can be significant differences in the values obtained for each of the lipoprotein classes using the various techniques, and to provide a standard for comparison, the Centers for Disease Control maintains reference methods for cholesterol, triglycerides, and highdensity lipoproteins [28]. Reference methods considered the gold standards for cholesterol fractions have also been developed, validated, and credentialed [29,30]. As the relative roles and significance of very low-density lipoprotein (VLDL), high-density lipoprotein (HDL), and low-density lipoprotein (LDL) major lipid subgroups became better established in clinical practice, techniques for determining each cholesterol subcategory were developed and integrated into standard clinical laboratory use. Automation of these tests has made lipid analysis easily monitored in routine clinical practice. For many years, enzymatic and chemical methods have comprised the primary approaches to monitoring total cholesterol, triglycerides, and phospholipids. Precipitation is generally used for HDL and LDL, with ultracentrifugation considered the gold standard for assessment of new techniques [27,30]. As the fibrate and statin classes of drugs were integrated into standard clinical practice among cardiologists and general practioners, cholesterol monitoring became commonplace and continues as a leading indicator of heart disease risk. Otvos et al. [31] published a nuclear magnetic resonance (NMR) approach to cholesterol monitoring that could be applied to clinically practical samples, as well as correlating the various lipoprotein subclasses to coronary artery disease [32–34], diabetes [35,36], and genetic polymorphisms having relevance in heart disease [37–39]. NMR has been an analytical technique since the 1950s, although its primary applications had been in research chemistry labo-
556
ANTI-UNICORN PRINCIPLE
ratories, and more recently for metabonomic investigations. As cholesterol profiles and cardiovascular risk assessment have developed, many physicians have asked for increasingly detailed lipid evaluations. NMR profiling of serum lipids has demonstrated differences in the relative effects on various lipoprotein classes with available fibrate [40–42] and statin drugs [43–47]. A better understanding of particle-size distribution within lipid classes using NMR allows tailoring of patient therapeutic approaches beyond the broad goals of raising HDL and lowering LDL. Even among persons taking statin drugs, the best choice for one patient may be atorvastatin, whereas another person may achieve a more desirable lipid profile with rosuvastatin or simvastatin. This approach has been particularly useful in assessing changes in LDL particle numbers as well as total LDL content. The ability to select the optimal drug for each person provides an aspect of personalized medicine using well-accepted clinical pathology parameters determined by a technology with proven reliability in an arena outside the standard hospital laboratory, but readily accessible through specialty reference laboratories. The use of NMR to determine cholesterol and lipoprotein categories also has been useful in studying exercise [48,49], arthritis [50], and vascular responses [51].
MULE BIOMARKERS The anticancer drug trastuzumab (Herceptin) is a case study in stratification to identify patients most likely to respond to a drug using both conventional approaches and newer technologies. In 2007, trastuzumab achieved sales of $1.3 billion in the United States despite serious toxicity considerations and being efficacious in a limited subset of patients. Trastuzumab has primary efficacy in breast cancer patients who are overexpressing the human epidermal growth factor protein (HER2; [52–54]). The drug is a therapeutically engineered monoclonal antibody that targets the HER2 receptor protein [54]. The response rate to trastuzumab is as high as 35% in patients with overexpression of HER2, while the drug lacks a target and is ineffective at tolerated doses in patients who do not have increased HER2 levels. Pretreatment characterization of breast cancer patients to identify HER2-positive cancers and determine a patient’s suitability for trastuzumab therapy is now commonly practiced. The serious adverse events that can develop following trastuzumab use (cardiac failure, pulmonary toxicity, infusion reactions) make it highly desirable to minimize exposure of those populations unlikely to benefit from the drug’s therapeutic effects. The targeting of persons most likely to respond to the drug also has been a factor in reimbursement by third-party payers and acceptance by governmental health programs. The primary and U.S. Food and Drug Administration (FDA)–approved approach for determining HER2 expression is immunohistochemistry for HER2 [55]. This “hybrid mule biomarker” combines the well-established but low-specificity technique of histology with the more recent development of
MULE BIOMARKERS
557
probe antibodies directed against specific proteins. Although neither of these modalities is particularly high tech or exotic, they provided the basis for developing a biomarker indicating HER2 over expression as molecular biology and proteomic techniques became available. These approaches quantified the protein resulting from the underlying gene amplification, leading to overexpression and identification and purification of the specific protein involved. As the reason for the progression to cancer became clearer, components of the overall process became suitable biomarker targets. This led, in turn, to the development of specific antibodies suitable for the identification of HER2 in tissue sections. Direct gene amplification and fluorescent in situ hybridization techniques for measuring HER2 expression also exist, but acceptance into clinical practice has been slower, due to their greater complexity and the limited number of laboratories capable of performing them in a diagnostic setting. Although not a perfect biomarker, HER2 expression is a valuable tool for the oncologist, and additional research is being conducted to refine the predictivity of this measure, either by alternative tests or by better characterization of key variables. Osteoarthritis presents a difficult therapeutic area for pharmaceutical intervention. The progressive nature of the disease, generally late onset in life, and difficulty in reversing existing damage have made development of effective therapy challenging. Additionally, determining clinical efficacy often requires long-term clinical trial designs even in early drug development assessments where little is known about efficacy or side effects. These characteristics act as significant impediments for safe and rapid screening of new molecules. Clearly, this is an ideal place for a predictive biomarker that would allow a rapid assessment of whether a new drug had activity in the disease processes. Underlying the disease is destruction of joint cartilage by matrix metalloproteinases (MMP; [56,57]), so a mechanistic approach to identifying a biomarker was a logical approach to improving drug development in this area. As the role of collagen degradation and MMP activity became clearer in arthritis [58–61], interest focused on biomarkers that could be applied to animal models for both compound advancement and clinical monitoring as the ultimate endpoint. Nemirovskiy et al. [62] reported development of a translatable biomarker of MMP activity with the goal of its application for MMP inhibitor compound selection and improved diagnosis of osteoarthritis. Similar approaches were under way by other groups within the pharmaceutical industry, both specific pharmaceutical companies and external contract research organizations. The approach taken by Nemirovskiy et al. [62] was to identify specific cleavage products using liquid chromatography–tandem mass spectrometry. By studying these MMP-derived peptides from human articular cartilage, they were able to show a 45-mer peptide fragment of collagen type II correlated with the pathology of human osteoarthritis and was present in urine and synovial fluid. An immunoaffinity liquid chromatography–mass spectrometry/ mass spectrometry (LC-MS/MS) assay was developed to quantify collagen
558
ANTI-UNICORN PRINCIPLE
type II neoepitope (TIINE) peptides as biomarkers of collagenase modulation. The resulting assay was capable of detecting TIINE peptides in the urine of healthy and afflicted human subjects and preclinical species (rat, rabbit, guinea pig, dog). This LC-MS/MS assay had excellent sensitivity, high throughput, reasonable costs, and robustness. By including immunoaffinity in the technique, a substantial improvement in assay sensitivity over traditional LC-MS/MS was achieved by eliminating much of the background noise associated with the sample matrix. ELISA methods also were developed to measure TIINE concentrations in urine, but proved to have higher intrasample variability, greater matrix sample effects, lower specificity for cleavage products, and to be more difficult to outsource to external laboratories. Although the ELISA technique remains useful as a research tool, the LC-MS/MS assay had significant advantages for clinical translation and implementation. The TIINE biomarker was also applied to better characterize osteoarthritis in a surgically induced system using the Lewis rat [63]. This preclinical model has proven valuable to study progression and therapeutic intervention in degenerative joint disease. Using immunohistochemical staining of the joints, the authors were able to compare TIINE expression with proteoglycan loss and histological changes. This study showed that TIINE levels increased in intensity and area in lesions that co-localized with the loss of proteoglycan. From these data, Wancket et al. [63] were able to better define the medial meniscectomy surgical model of osteoarthritis, demonstrate a progressive pattern of cartilage damage similar to those seen in human lesions, and further characterize TIINE as a useful biomarker for monitoring cartilage degradation. Clinically, TIINE has been used to evaluate the mechanism by which doxycycline slows joint space narrowing in patients with knee osteoarthritis. Urinary TIINE and radiographic determinations were conducted over a 30month period [64]. Although the TIINE measurements were highly reproducible, the authors concluded that high visit-to-visit variability limits the sensitivity of the TIINE assay for detecting changes in clinical monitoring of osteoarthritis, and that increases in urinary TIINE concentration are unlikely to account for doxycycline reductions in joint space narrowing. The value of the TIINE biomarker for other collagen-based diseases remains to be determined. This biomarker does, however, highlight how even with mechanistic approaches, good preclinical correlations, and solid technology, a new technique may not yield a biomarker with strong clinical utility.
CONCLUSIONS The best biomarker for any given situation can come from a wide range of sources, so it is critical that no promising option be excluded. Matching the application and question to be answered are far more important that the platform used for analysis or if a technique is resident within a particular labo-
REFERENCES
559
ratory or company. It should be recognized, however, that it is difficult to consider approaches that one has no idea exist. Nevertheless, those people providing biomarker support to drug discovery and development teams must keep in mind as wide a range of options as possible. The Internet has made specialized testing by reference laboratories, teaching hospitals, and research groups significantly more accessible. In addition, directories of tests and the groups that perform them are available [65], as well as characterized genomic biomarkers being described on the FDA Web site [66]. Genomic, proteomic, and metabonomic technologies can provide essential information when identifying new biomarkers, but have been slow to be implemented into clinical applications. Although often critical to identifying new targets or biomarker options, the extensive data sets produced, variability in sample and platform conditions, challenges of validating multiplexed measurements and algorithms, and lack of experience have limited their usefulness in clinical trials to a few diseases. The fields are rapidly progressing and hold great promise, especially when specific focused questions are defined prior to conducting the tests. To paraphrase Helmut Sterz, “use of a little grey matter at the beginning can save a lot of white powder, chips, instrumentation, and time.” All too often, the quantity of information obtained from many “omics” experiments cannot be realized effectively due to limits on data mining tools and the realities of clinical trial conduct. People cannot be subjected to the same degree of environmental and genetic control possible with animal studies, and many diseases represent a constellation of effects rather than changes induced by a single cause or gene. Our experience in developing, validating, translating, and implementing new biomarkers has emphasized repeatedly that the question to be answered must drive the technology used. It is also vital that the solution be “fit for purpose” with respect to the parameter being measured, platform selected, and level of assay definition or validation [67]. Sometimes the biomarker must utilize a cutting-edge technology and novel approaches, but more commonly the question can be answered without an exotic assay, often with a test that already exists in someone else’s laboratory. REFERENCES 1. FDA (2008). Pharmacogenomics and its role in drug safety. FDA Drug Saf Newsl, 1(2). http://www.fda.gov/cder/dsn/2008_winter/pharmacogenomics.htm. 2. Nowell PC, Hungerford DAA (1960). Chromosome studies on normal and leukemic leukocytes. J Nat Cancer Inst, 25:85–109. 3. Rowley JD (1973). A new consistent chromosomal abnormality in chronic myelogenous leukemia identified by quinacrine fluorescence and Giemsa staining. Nature, 243:290–293. 4. Daley GQ, Van Etten RA, Baltimore D (1990). Induction of chronic myelogenous leukemia in mice by the P210bcr/abl gene of the Philadelphia chromosome. Science, 247:824–830.
560
ANTI-UNICORN PRINCIPLE
5. Lugo TG, Pendergast AM, Juller AJ, Witte ON (1990). Tyrosine kinase activity and transformation potency of bcr-abl oncogenes products. Science, 247: 1079–1082. 6. Steelman LS, Pohnert SC, Shelton JG, Franklin RA, Bertrand FE, McCubrey JA (2004). JAK/STAT, Raf/MEK/ERK, PI3K/Akt and BCR-ABL in cell cycle progression and leukemogenesis. Leukemia, 18:189–218. 7. Hariharan IK, Harris AW, Crawford M, et al. (1989). A bcr-abl oncogene induces lymphomas in transgenic mice. Mol Cell Biol, 9:2798–2805. 8. Pfumio F, Izac B, Katz A, Shultz LD, Vainchenker W, Coulombel L (1996). Phenotype and function of human hematopoietic cells engrafting immunedeficient CB 17 severe combined immunodeficiency mice and nonobese diabetic-severe combined immunodeficient mice after transplantation of human cod blood mononuclear cells. Blood, 88:3731–3740. 9. Honda H, Oda H, Suzuki T, et al. (1998). Development of acute lymphoblastic leukemia and myeloproliferative disorder in transgenic mice expressing P210bcr/abl: a novel transgenic model for human Ph1-positive leukemias. Blood, 91:2067– 2075. 10. Li S, Ilaria RJ, Milton RP, Daley DQ, Van Etten RA (1999). The P190, P210, and P230 forms of the BCR/ABL oncogene induce a similar chronic myeloid leukemialike syndrome in mice but have different lymphoid leukemogenic activity. J Exp Med, 189:1399–1412. 11. Gorre ME, Mohammed M, Ellwood K, et al. (2001). Clinical resistance to STI-571 cancer therapy caused by BCR-ABL gene mutation or amplification. Science, 293:876–880. 12. Shah NP, Nicoll JM, Nagar B, et al. (2002). Multiple BCR-ABL kinase domain mutations confer polyclonal resistance to the tyrosine kinase inhibitor imatinib (STI571) in chronic phase and blast crisis chronic myeloid leukemia. Cancer Cell, 2:117–125. 13. Branford S, Rudzki Z, Walsh S, et al. (2003). Detection of BCR-ABL mutations in mutations in patients with CML treated with imantinib is virtually always accompanied by clinical resistance, and mutations in the ATP phosphate-binding loop (P-loop) are associated with poor prognosis. Blood, 102:276–283. 14. Shah NP, Tran C, Lee FY, Chen P, Norris D, Sawyers CL (2004). Overriding imatinib resistance with a novel ABL kinase inhibitor. Science, 305(5682): 399–401. 15. Klejman A, Schreiner SJ, Nieborowska-Skorska M, et al. (2002). The Src family kinase Hck couples BCR/ABL to STAT5 activation in myeloid leukemia cells. EMBO J, 21(21):5766–5774. 16. Hamilton A, Elrick L, Myssina S, et al. (2006). BCR-ABL activity and its response to drugs can be determined in CD34+ CML stem cells by CrkL phosphorylation status using flow cytometry. Leukemia, 20(6):1035–1039. 17. Dean M, Carrington M, Winkler C, et al. (1996). Hemophilia Growth and Development Study, Multicenter AIDS Cohort Study, Multicenter Hemophilia Cohort Study, San Francisco City Cohort, ALIVE Study. Genetic restriction of HIV-1 infection and progression to AIDS by a deletion allele of the CKR5 structural gene. Science, 273:1856–1862.
REFERENCES
561
18. Carrington M, Dean M, Martin MP, O’Brien SJ (1999). Genetics of HIV-1 infection: chemokine receptor CCR5 polymorphism and its consequences. Hum Mol Genet, 8(10):1939–1945. 19. Agrawal L, Lu X, Quingwen J, et al. (2004). Role for CCR5Δ32 protein in resistance to R5, R5X4, and X4 human immunodeficiency virus type 1 in primary CD4+ cells. J Virol, 78(5):2277–2287. 20. Saita Y, Kodama E, Orita M, et al. (2006). Structural basis for the interaction of CCR5 with a small molecule, functionally selective CR5 agonist. J Immunol, 117:3116–3122. 21. Pfizer, Inc. (2008). Selzentry label. http://www.fda.gov/cder/foi/label/2007/022128lbl. pdf. 22. Vandekerckhove L, Verhofstede C, Vogelaers D (2008). Maraviroc: integration of a new antiretroviral dug class into clinical practice. J Antimicrob Chemother, 61(6):1187–1190. 23. Rai R, Sprengeler PA, Elrod KC, Young WB (2001). Perspectives on factor Xa inhibition. Curr Med Chem, 8(2):101–119. 24. Ikeo M, Tarumi T, Nakabayashi T, Yoshida M, Naito S, Koike T (2006). Factor Xa inhibitors: new anti-thrombotic agents and their characteristics. Front Biosci, 11:232–248. 25. Crowther MA, Warkentin TE (2008). Bleeding risk and the management of bleeding complications in patients undergoing anticoagulant therapy: focus on new anticoagulant agents. Blood, 111(10):4871–4879. 26. Rifai N (1986). Lipoproteins and apolipoproteins: composition, metabolism, and association with coronary heart disease. Arch Pathol Lab Med, 110:694–701. 27. Bachorik PS, Denke MA, Stein EA, Rifkind BM (2001). Lipids and dyslipoproteinemia. In Henry JB (ed.), Clinical Diagnosis and Management by Laboratory Methods, 20th ed., W.B. Saunders, Philadelphia, pp. 224–248. 28. Myers GL, Cooper GR, Winn CL, Smith SJ (1989). The Centers for Disease Control–National Heart, Lung and Blood Institute Lipid Standardization Program: an approach to accurate and precise lipid measurements. Clin Lab Med, 9:105–135. 29. Myers GL, Cooper GR, Hassemer DJ, Kimberly MM (2000). Standardization of lipid and lipoprotein measurement. In Rifai N, Warnick GR, Dominiczak M (eds.), Handbook of Lipoprotein Testing. AACC Press, Washington, DC, pp. 717–748. 30. Rafai N, Warnick GR (2006). Lipids, lipoproteins, apolipoproteins, and other cardiovascular risk factors. In Burtis CA, Ashwood ER, Bruns DE (eds.), Tietz Textbook of Clinical Chemistry and Molecular Diagnostics, 4th ed., Elsevier Saunders, St. Louis, MO, pp. 903–981. 31. Otvos JD, Jeyarajah EJ, Bennett DW, Krauss RM (1992). Development of a proton NMR spectroscopic method for determining plasma lipoprotein concentrations and subspecies distribution from a single, rapid measure. Clin Chem, 38:1632–1638. 32. Freedman DS, Otvos JD, Jeyarajah EJ, Barboriak JJ, Anderson AJ, Walker J (1998). Relation of lipoprotein subclasses as measured by proton nuclear magnetic resonance spectroscopy to coronary artery disease. Arterioscler Thromb Vasc Biol, 18:1046–1053.
562
ANTI-UNICORN PRINCIPLE
33. Kuller LH, Grandits G, Cohen JD, Neaton JD, Prineas R (2007). Lipoprotein particles, insulin, adiponectin, C-reactive protein and risk of coronary heart disease among men with metabolic syndrome. Atherosclerosis, 195:122–128. 34. van der Steeg WA, Holme I, Boekholdt SM, et al. (2008). High-density lipoprotein cholesterol, high-density lipoprotein particle size, and apolipoprotein A-1: significance for cardiovascular risk. J Am Coll Cardiol, 51:634–642. 35. MacLean PS, Bower JF, Vadlamudi S, Green T, Barakat HA (2000). Lipoprotein subpopulation distribution in lean, obese, and type 2 diabetic women: a comparison of African and White Americans. Obes Res, 8:62–70. 36. Berhanu P, Kipnes MS, Khan M, et al. (2006). Effects of pioglitazone on lipid and lipoprotein profiles in patients with type 2 diabetes and dyslipidaemia after treatment conversion from rosiglitazone while continuing stable statin therapy. Diabetic Vasc Dis Res, 3:39–44. 37. Couture P, Otvos JD, Cupples LA, et al. (2000). Association of the C-514T polymorphism in the hepatic lipase gene with variations in lipoprotein subclass profiles: the Framingham Offspring Study. Arterioscler Thromb Vasc Biol, 20:815–822. 38. Russo GT, Meigs JB, Cupples LA, et al. (2001). Association of the Sst-I polymorphism at the APOC3 gene locus with variations in lipid levels, lipoprotein subclass profiles and coronary heart disease risk: the Framingham Offspring Study. Atherosclerosis, 158:173–181. 39. Humphries SE, Berglund L, Isasi CR, et al. (2002). Loci for CETP, LPL, LIPC, and APOC3 affect plasma lipoprotein size and sub-population distribution in Hispanic and non-Hispanic white subjects: the Columbia University BioMarkers Study. Nutr Metab Cardiovasc Dis, 12:163–172. 40. Ikewaki K, Tohyama J, Nakata Y, Wakikawa T, Kido T, Mochizuki S (2004). Fenofibrate effectively reduces remnants, and small dense LDL, and increases HDL particle number in hypertriglyceridemic men: a nuclear magnetic resonance study. J Atheroscler Thromb, 11:278–285. 41. Ikewaki K, Noma K, Tohyama J, Kido T, Mochizuki S (2005). Effects of bezafibrate on lipoprotein subclasses and inflammatory markers in patients with hypertriglyceridemia: a nuclear magnetic resonance study. Int J Cardiol, 101:441–447. 42. Otvos JD, Collins D, Freedman DS, et al. (2006). LDL and HDL particle subclasses predict coronary events and are changed favorably by gemfibrozil therapy in the Veterans Affairs HDL Intervention Trial (VA-HIT). Circulation, 113:1556–1563. 43. Rosenson RS, Shalaurova I, Freedman DS, Otvos JD (2002). Effects of pravastatin treatment on lipoprotein subclass profiles and particle size in the PLACI trial. Atherosclerosis, 160:41–48. 44. Schaefer EJ, McNamara JR, Taylor T, et al. (2002). Effects of atorvastatin on fasting and postprandial lipoprotein subclasses in coronary heart disease patients versus control subjects. Am J Cardiol, 90:689–696. 45. Blake GJ, Albert MA, Rifai N, Ridker PM (2003). Effect of pravastatin on LDL particle concentration as determined by NMR spectroscopy: a substudy of a randomized placebo controlled trial. Eur Heart J, 24:1843–1847. 46. Soedamah SS, Colhoun HM, Thomason MJ, et al. (2003). The effect of atorvastatin on serum lipids, lipoproteins and NMR spectroscopy defined lipoprotein sub-
REFERENCES
47.
48.
49.
50.
51.
52.
53.
54. 55. 56.
57.
58.
59.
60.
563
classes in type 2 diabetic patients with ischemic heart disease. Atherosclerosis, 167:243–255. Schaefer EJ, McNamara JR, Tayler T, et al. (2004). Comparisons of effects of statins (atorvastatin, fluvastatin, lovastatin, pravastatin, and simvastatin) on fasting and postprandial lipoproteins in patients with coronary heart disease versus control subjects. Am J Cardiol, 93:31–39. Nicklas BJ, Ryan AS, Katzel LI (1999). Lipoprotein subfractions in women athletes: effects of age, visceral obesity and aerobic fitness. Int J Obes Rel Metab Disord, 23:41–47. Yu HH, Ginsburg GS, O’Toole ML, Otvos JD, Douglas PS, Rifai N (1999). Acute changes in serum lipids and lipoprotein subclasses in triathletes as assessed by proton nuclear magnetic resonance spectroscopy. Arterioscler Thromb Vasc Biol, 19:1945–1949. Hurt-Camejo E, Paredes S, Masana L, et al. (2001). Elevated levels of small, lowdensity lipoprotein with high affinity for arterial matrix components in patients with Rheumatoid Arthritis. Arthritis Rheum, 44:2761–2767. Stein JH, Merwood MA, Bellehumeur JL, et al. (2004). Effects of pravastatin on lipoproteins and endothelial function inpatients receiving human immunodeficiency virus protease inhibitors. Am Heart J, 147:E18. Hudelist G, Kostler W, Gschwantler-Kaulich D, et al. (2003). Serum EGFR levels and efficacy of trastuzumab-based therapy in patients with metastatic breast cancer. Eur J Cancer, 42(2):186–192. Smith BL, Chin D, Maltzman W, Crosby K, Hortobagyi GN, Bacus SS (2004). The efficacy of Herceptin therapies is influenced by the expression of other erbB receptors, their ligands and the activation of downstream signalling proteins. Br J Cancer, 91:1190–1194. Burstein HJ (2005). The distinctive nature of HER2-positive breast cancer. N Engl J Med, 353:1652–1654. Genentech (2008). Herceptin label. http://www.fda.gov/cber/foi/label/2008/ 103792s517lbl.pdf. Shinmei M, Masuda K, Kikuchi T, Shimomura Y, Okada Y (1991). Production of cytokines by chondrocytes and its role in proteoglycan degradation. J Rheumatol Suppl, 27:89–91. Okada Y, Shinmei M, Tanaka O, et al. (1992). Localization of matrix metalloproteinase 3 (stromelysin) in osteoarthritic cartilage and synovium. Lab Invest, 66(6):680–690. Billinghurst RC, Dahlberg L, Ionescu M, et al. (1997). Enhanced cleavage of type II collagen by collagenases in osteoarthritic articular cartilage. J Clin Invest, 99:1534–1545. Huebner JL, Otterness IG, Freund EM, Caterson B, Kraus VB (1998). Collagenase 1 and collagenase 3 expression in a guinea pig model of osteoarthritis. Arthritis Rheum, 41:877–890. Dahlberg L, Billinghurst RC, Manner P, et al. (2000). Selective enhancement of collagenase-mediated cleavage of resident type II collagen in cultured osteoarthritic cartilage and arrest with a synthetic inhibitor that spares collagenase 1 (matrix metalloproteinase 1). Arthritis Rheum, 43:673–682.
564
ANTI-UNICORN PRINCIPLE
61. Wu W, Billinghurst RC, Pidoux I, et al. (2002). Sites of collagenase cleavage and denaturation of type II collagen in aging and osteoarthritic articular cartilage and their relationship to the distribution of matrix metalloproteinase 1 and matrix metalloproteinase 13. Arthritis Rheum, 46:2087–2094. 62. Nemirovskiy OV, Dufield DR, Sunyer T, Aggarwal P, Welsch DJ, Mathews WR (2007). Discovery and development of a type II collagen neoepitope (TIINE) biomarker for matrix metalloproteinase activity: from in vitro to in vivo. Anal Biochem, 361(1):93–101. 63. Wancket LM, Baragi V, Bove S, Kilgore K, Korytko PJ, Guzman RE (2005). Anatomical localization of cartilage degradation markers in a surgically induced rat osteoarthritis model. Toxicol Pathol, 33(4):484–489. 64. Otterness IG, Brandt KD, Le Graverand MP, Mazzuca SA (2007). Urinary TIINE concentrations in a randomized controlled trial of doxycycline in knee arthritis: Implications of the lack of association between TIINE levels and joint space narrowing. Arthritis Rheum, 56(11):3644–3649. 65. Hicks JM, Young DS (eds.) (2005). DORA 05-07: Directory of Rare Analyses AACC Press, Washington, DC. 66. FDA (2008). Table of validated genomic biomarkers in the context of approved drug labels. Center for Drug Evaluation and Research. http://www.fda.gov/cder/ genomics/genomic_biomarkers_table.htm (updated Sept. 10, 2008). 67. Lee JW, Devanarayan V, Barret YC, et al. (2006). Fit-for-purpose method development and validation of biomarker measurement. Pharm Res, 23(2):312–328.
32 BIOMARKER PATENT STRATEGIES: OPPORTUNITIES AND RISKS Cynthia M. Bott, Ph.D. Honigman Miller Schwartz and Cohn LLP, Ann Arbor, Michigan
Eric J. Baude, Ph.D.* Brinks Hofer Gilson & Lione, P.C., Ann Arbor, Michigan
INTRODUCTION Biomarkers are used in scientific research to identify therapeutic agents and to measure the performance of a drug before and after regulatory approval. It is possible and often desirable, for a company, institution, or individual to patent a biomarker for its use in those settings. In general, as the patentee the company, institution, or individual can sue anyone who infringes the patent. If the patentee prevails in a lawsuit, the patentee may receive money damages and prevent the infringer from using the biomarker. It is equally true that others can obtain patents on biomarkers. Therefore, the use of a biomarker by a company, institution, or individual creates a risk of patent infringement for the user of the patented biomarker. Accordingly, it is important to identify patent infringement risks associated with the making and using of biomarkers. These two aspects, patenting biomarkers and identifying patent infringement risks, are discussed in this chapter. *The views expressed herein are those of the authors and do not necessarily reflect those of Brinks Hofer Gilson & Lione, P.C., Honigman Miller Schwartz and Cohn LLP, or their clients.
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
565
566
BIOMARKER PATENT STRATEGIES: OPPORTUNITIES AND RISKS
PATENTING BIOMARKERS Introduction to Patents In exchange for fully describing an invention, a U.S. patent gives inventors the right “to exclude others from making, using, offering for sale, or selling their invention in the United States or importing the patented invention throughout the United States or importing their invention into the United States” for a limited period of time.1 The U.S. Constitution in Article I, section 8, clause 8, provides the power under which Congress enacts patent laws. The original rationale for the provision of patents remains today: to promote the disclosure of ideas to the general public. Patents are granted by most countries of the world and the process and requirements for getting a patent in any of these countries are essentially the same. The process for obtaining a patent on an invention begins with the inventor submitting a patent application to a government patent office, such as the U.S. Patent and Trademark Office (USPTO). After the inventor files a patent application at the USPTO, a patent examiner reviews the application and determines whether the application meets the requirements for patentability. This process is referred to as patent application examination. Patentability Requirements The requirements for the patentability of inventions in the United States are set out in the U.S. Code, Title 35. Four of the major claims for an invention are: 1. Utility. To qualify for a patent, the invention must have utility; it must be useful for some purpose.2 The invention must have a specific, substantial, and credible use (i.e., it must have a real-world use). The USPTO has promulgated utility guidelines around what is required to meet the standard of a credible, specific, and substantial utility.3 Essentially, a biomarker must generally have an associated biological function that is demonstrated experimentally or by analogy to existing biomarkers. In general, this requirement is easily met. 2. Novelty. A patent claim is novel if it is different from the prior art in any aspect.4 An examiner in the USPTO analyzes the prior art identified in a search to determine if the invention claimed in the application is different from what was known previously. If the prior art falls within the scope of a claim, the examiner will reject the claim as not being novel. Examples of prior art include patents, patent applications, Web site disclosures, public talks, scientific papers, and scientific posters presented at meetings. Therefore, it is important to file a patent application before speaking publicly about or publishing any aspect of the invention. If an invention is disclosed publicly before applying for a patent, it is not
PATENTING BIOMARKERS
567
possible to obtain a defensible patent on the invention in most countries outside the United States. In the United States, an inventor may be able to obtain a U.S. patent provided that the public disclosure is less than one year before the patent application is filed. This one-year period is referred to as a grace period. 3. Nonobviousness. An invention is considered nonobvious if someone in the field of technical expertise would have viewed the invention as surprising or unexpected in view of the prior art.5 This requirement is in place to prevent patents from being granted to obvious improvements over the prior art.6 Patent applications are often rejected as being obvious over the prior art because many inventions are combinations of what is already in the public domain. It is often the case that those rejections can be overcome by arguing that the invention offers surprising advantages over the prior art. A skilled patent attorney can offer advice on how to address potential obviousness concerns around an invention before the application is filed. 4. Enablement. A patent application must fully describe how to make and use the invention so that a person of ordinary skill in the technology can carry out the invention without undue experimentation.7 In part, the enablement requirement is met from the application itself, which contains a detailed description of how to carry out the invention. In addition, examples of how actual experiments were carried out are provided in the patent application to help comply with the enablement requirement. Together, the descriptions of the invention and the examples often serve to satisfy the enablement requirement. The Patenting Process The process of getting a patent granted is quite long, generally taking from three to four years and often much longer for inventions in certain technology areas. The starting point for this process is the filing of a patent application with a government patent office such as the USPTO. The patent application must include a description of the invention and one or more claims that clearly define the invention. After the patent application is filed, the invention as set forth in the patent application is reviewed by an examiner at the UPSTO to determine if it meets the requirements for patentability. To determine the patentability of an invention, the examiner performs a search to identify prior art that is relevant to novelty, obviousness, and other patentability requirements in reference to the claims of the application. If the examiner identifies prior art that renders one or more of the claims not novel or obvious, the USPTO will send the patent applicant a document termed an office action. Office actions set out the reasons that a claim was rejected over the prior art. The patent applicant can then respond to the USPTO with reasons why the office action is in error. The exchange of office actions and responses between the applicant and the USPTO can continue until the application is allowed by
568
BIOMARKER PATENT STRATEGIES: OPPORTUNITIES AND RISKS
the USPTO, or the applicant decides to abandon the application. One of the ways in which applicants can overcome the USPTO’s rejections is to narrow the scope of the claims by amending the claims. Often, the claims can be amended in a manner that still provides valuable coverage for the applicant’s invention and satisfies the USPTO’s requirements for patentability. Patents are territorial in scope, meaning that a patent is enforceable only in the country in which it is granted. For example, the owner of a U.S. patent cannot sue in a German court for infringement of the U.S. patent. The owner of the U.S. patent would have to obtain a German patent as well to provide protection in Germany against potential infringers. Therefore, an inventor must get a patent in every country in which protection of the invention is sought. A process exists to simplify the filing applications in most countries. An international patent law treaty known as the Patent Cooperation Treaty (PCT) provides a unified procedure for filing a patent application to protect inventions in the 138 countries that are parties to the treaty.8 A patent application filed under the PCT is referred to as an international patent application (or simply a PCT application). In addition to providing a unified procedure, a PCT application allows the inventor to delay the filing of patents in each of the desired countries, thereby delaying the outlay of filing fees and translation fees. After filing the PCT application, the process is much like the initial part of prosecution in the United States. An examiner of a PCT application performs a search for prior art and provides an opinion on patentability. PCT applications are published 18 months after the earlier of the filing date or priority date. The PCT application itself, however, does not result in a patent. In each country in which patent protection is desired, a county-specific patent based on the PCT application must be filed. Filing in a specific country is referred to as entry into the national or regional phase. National phase entry must occur at or before 30 months from the earlier of the filing date or priority date of the international application. In most countries, the term of a patent is 20 years less the time from the date that the patent application was filed. For example, if a patent is issued four years after it was filed, the patent would expire in 16 years. The patent owner is responsible for enforcing the patent. To do so the owner files a patent infringement lawsuit in a U.S. federal district court to sue the alleged infringer. Examples of How to Protect a Biomarker Examples of potentially patentable biomarkers are myriad and include DNA sequences with known function, such as genes, gene fragments or control elements, deleterious mutations, functional proteins, hormones, drug metabolites, peptides, polymorphisms such as single-nucleotide polymorphisms, and cellular antigens. Provided that patentability requirements are met, an inventor may be able to secure patent protection for a biomarker in several ways. Imagine the discovery that protein X is expressed in patients with pancreatic
IDENTIFYING BIOMARKER PATENT INFRINGEMENT RISKS
569
cancer but not in healthy subjects. Protein X might be useful as a biomarker for the detection and diagnosis of the pancreatic cancer or for determining if a patient is responding to drug therapy. The discovery of such a biomarker would be considered an invention. The ways in which the person making this discovery can protect his or her invention depend on several factors. If a biomarker was not known at the time of the invention, the strongest protection is generally provided by a patent covering the biomarker itself. In the example above, the biomarker would be protein X and/or the DNA sequence encoding the protein. The strongest protection of protein X would be a patent that specifically claims protein X. Such a patent is often referred to as a composition of matter patent because of the way that patentable inventions are defined in the patent law. A patent to the biomarker or the DNA sequence could be enforced to prevent others from making, using, offering for sale, or selling the patented biomarker.9 If a biomarker is already known, the biomarker itself would not be patentable because it is not novel.10 Patent protection may still be available for new uses of that biomarker. For example, if protein X in the example above was known but its correlation with pancreatic cancer was not known, the inventor might be able to obtain patent coverage for methods of using the biomarker such as methods of diagnosing pancreatic cancer, assays for measuring the level of the protein in a pancreatic cancer patient, or methods of identifying the effectiveness of chemotherapeutic treatment against pancreatic cancer. Another consideration is whether in a particular situation it makes sense to patent the biomarker. If the inventor is interested in excluding others from using the biomarker, either for competitive reasons or to generate income from licensing, it may be well worth the effort to patent the biomarker. If there is no intention to exploit a patent, it may make more sense to disclose the invention by publication. This disclosure would probably prevent someone else from subsequently patenting that invention.
IDENTIFYING BIOMARKER PATENT INFRINGEMENT RISKS Patents give the patent owner the right to exclude others from practicing the patented invention. The patent, however, does not provide the patent owner with an affirmative right to practice the invention. For example, the owner of a patent that claims a method of using a biomarker can exclude others from the use of that biomarker. If another party owns a composition of matter patent on the biomarker (i.e., a patent to the biomarker itself), the owner of the method of use patent would not be able to make or use the biomarker for any purpose without the permission of the composition of matter patent owner. This second patent is said to dominate the biomarker use patent. These patents are also referred to as blocking patents. If a blocking patent exists, a person or company could be sued for using the biomarker without permission from the owner of the biomarker composition of matter patent. It is important
570
BIOMARKER PATENT STRATEGIES: OPPORTUNITIES AND RISKS
to understand this distinction in order to avoid the feeling that possession of a patent on an invention protects the patentee from a charge of infringement when using that invention. It is possible to research if a planned use of a biomarker infringes someone else’s patent. This process is known as determining if there is freedom to operate around the use of the biomarker, sometimes referred to as freedom to practice. To determine if there are any patents that pose a risk of infringement, a laboratory scientist or drug development team first needs to search for patents that would cover the making and use of the biomarker. Typically, a fee-based patent search firm is employed to use text-based searching and sequence-based searching to identify potentially relevant patents. It is also possible to conduct some initial searching on one’s own. There are currently several free Internet-based public databases that are available for searching patents such as the USPTO database available at http://www.uspto.gov, http://www.patentlens.net, http://www.google.com/patents, and http://www. freepatentsonline.com. The next step is to interpret if any of the potentially relevant patents identified in the search pose a significant risk of patent infringement to the planned activities. The layperson or scientist may find the scope of patent claims to be difficult to understand. Often, a patent specialist such as a patent attorney is needed to review the search results and provide advice on potential infringement concerns. If there are any potential infringement issues, the scientist and the patent attorney can work together to address those issues. Potential ways to reduce infringement risks include licensing the problematic patent from the owner, designing around the patent, or deciding not to use the biomarker. The potential availability of certain exceptions and exemptions to infringement is also discussed below. Licensing a Patent If there is concern that a planned use might infringe a patent, one way to address this issue is to obtain a license from the patent owner that would allow the licensee to practice the patented invention. Different types of licenses are common in the biomarker arena. The license may involve an up-front payment with no further obligation to the patentee. In other circumstances, during the license term the license terms might require payments such as royalty payments or payments when certain development milestones are met. For example, a license to a diagnostic kit might include a $20,000 payment upon signing the license and a 2% royalty paid annually based on that year’s net sales from the diagnostic kit. Designing Around a Patent For cases in which a license is not available or the terms of the license would not be reasonable, another option is to design around the claims of the patent.
IDENTIFYING BIOMARKER PATENT INFRINGEMENT RISKS
571
A typical design around strategy involves designing a biomarker in a way that is outside the scope of the blocking patent. A skilled patent attorney can examine the scope of the claims and advise if there is a potential design-around. For example, a patent may claim only the full-length sequence of a protein or a nucleic acid biomarker. If the full-length sequence is not necessary for the use being considered, use of a modified sequence that is shorter than the full-length sequence may not infringe the problematic patent claim. Successful designaround strategies can realize cost savings in research and development costs, legal fees, and potential litigation costs, as well as minimize the delay in commercializing a product or method. Designing around a patent is one of the ways in which the patent system works to promote new advances in technology. Another design-around strategy is sometimes called a geographic designaround. As noted above, patents are territorial and can only be enforced in the countries in which the invention is patented. Therefore, it may be possible to make and use a specific biomarker in a country where the patentee has not yet obtained patent protection or will not be obtaining patent protection. For example, it might be that a desired biomarker has been patented in the United States but not in Canada. If use of the biomarker can be limited to Canada (or to any other country where the invention is not patented), there would not be a need to license that biomarker. Ceasing Use of the Biomarker One option to address a significant infringement risk is to stop using the biomarker. This may not be a preferred option, but in certain cases it may be the only appropriate course of action. Exemption from Infringement Under 35 U.S.C. § 271(e)(1) Under 35 U.S.C. § 271(e)(1), the use of a biomarker to discover a drug or in clinical trials might be exempted from infringement if the use is “reasonably related to the development” of information that might be submitted to support U.S. Food and Drug Administration (FDA) approval. Although the boundaries of the exemption are not completely clear, the courts have exempted the use of a patented compound under 35 U.S.C. § 271(e)(1). The courts, however, have not addressed directly whether the use of a research tool such as a biomarker in drug development would fall under the 35 U.S.C. § 271(e)(1) exemption. Thus, the boundaries of the exemption are somewhat uncertain with regard to research tools. Once the drug is approved, however, further use of a patented biomarker will generally not be exempted from infringement. Experimental Use Exception For private institutions such as companies and private universities, there is generally no immunity from a patent lawsuit under an experimental use excep-
572
BIOMARKER PATENT STRATEGIES: OPPORTUNITIES AND RISKS
tion.11 This was made clear in the private university arena in the Madey v. Duke case in 2002. The federal circuit court held that the “very narrow and strictly limited experimental use defense” was applicable only if a patented invention was used “solely for amusement, to satisfy idle curiosity, or for strictly philosophical inquiry.”12 Duke’s use of a patented invention was related to obtaining research funding which was in furtherance of their legitimate business. It would seem, then, to be the rare case where even a private university would not be subject to a patent lawsuit if its activities relate to obtaining research grants. Accordingly, the research use exception in the United States does not provide much protection from a patent lawsuit.
Immunity Under the 11th Amendment In general, a state institution such as a state university may not be sued for patent infringement. The 11th amendment of the Constitution prevents a private party from suing a state university for patent infringement.13 Thus, an individual or a for-profit corporation would not be able to sue a public university for use of a patented biomarker. The state immunity does not transfer to a licensee of a patent from a state university. Therefore, if such a biomarker patent is licensed from a state university, the licensee could be subject to a patent lawsuit from the holder of the biomarker patent even though the university would not.
CONCLUSIONS The discovery and use of biomarkers presents opportunities and risks in the patent area. There are opportunities to patent biomarkers and methods of using biomarkers to further commercial ventures. There is also the risk of infringing third-party patents when making or using certain biomarkers. When discovering a biomarker or a new use of a biomarker, it is important to consider capturing biomarker patent opportunities. At the same time, it is important to be conscious that the use of a biomarker can pose risks of infringement and seek ways to minimize those risks. The top three questions to keep in mind when working with biomarkers are: 1. Is it desirable to patent the biomarker invention? 2. Is the invention patentable? a. Is the biomarker unknown? b. Is the biomarker known but the use is not? c. Does the invention meet the criteria for patentability?
NOTES AND CITATIONS
573
3. Are there risks to making and using a biomarker? a. Is the use “reasonably related to the development” of information that might be submitted to support FDA approval and therefore exempt from infringement? b. Is there a patent that could block the making and using of the biomarker?
NOTES AND CITATIONS 1. See 35 U.S.C. § 271(a). 2. The requirement that an invention be useful is set forth in 35 U.S.C. § 101. 3. See Revised Interim Utility Guidelines Training Materials at http://www.uspto. gov/web/menu/utility.pdf and the USPTO Utility Guidelines at http://www.uspto. gov/web/offices/com/sol/notices/utilexmguide.pdf. 4. The requirements for determining whether or not an invention is new are set forth in 35 U.S.C § 102. 5. The requirements for determining obviousness are set forth in 35 U.S.C. § 103(a). 6. In Europe and Japan a similar requirement for nonobviousness is that of “inventive step.” 7. The requirement enablement is set forth in 35 U.S.C. § 112, first paragraph. 8. As of December 2007, 138 countries are parties to the PCT. Separate patent applications must be filed in countries that are not signatories to the PCT to obtain patent protection in those countries. For example, Argentina and Taiwan are not members of the PCT. 9. See 35 U.S.C. § 271(a). 10. See novelty provisions at 35 U.S.C. §§ 102(a)–(g). 11. See Madey v. Duke University, 307 F.3d 1351 (Fed. Cir. 2002). 12. Ibid., 1362. 13. See Fla. Prepaid Postsecondary Educ. Expense Bd. v. Coll. Sav. Bank, 527 U.S. 627 (1999).
PART VIII WHERE ARE WE HEADING AND WHAT DO WE REALLY NEED?
575
33 IT SUPPORTING BIOMARKERENABLED DRUG DEVELOPMENT Michael Hehenberger, Ph.D. IBM Healthcare & Life Sciences, Somers, New York
PARADIGM SHIFT IN BIOPHARMACEUTICAL R&D The biopharmaceutical industry is currently undergoing a transformation driven primarily by the need to move from its proven blockbuster model to a new stratified medicine or nichebuster model [1]. This paradigm shift has been anticipated in white papers such as IBM’s Pharma 2010 report [2] and will have to be accompanied by serious efforts to streamline operations and to address research and development (R&D) productivity issues as described and quantified by DiMasi [3] and DiMasi et al. [4]. In its Critical Path initiative [5], the U.S. Food and Drug Administration (FDA) is guiding industry toward use of biomarkers [6] that will address efficacy and safety issues and hold the promise of increased R&D productivity [7]. In the excellent biomarker review paper by Trusheim et al. [1], clinical biomarkers are defined as biomarkers that associate a medical treatment to a patient subpopulation that has historically exhibited a differential and substantial clinical response. Clinical biomarkers can be based on “genotypes, proteins, metabonomic patterns, histology, medical imaging, physician clinical observations, or even self-reported patient surveys. A clinical biomarker is not defined by its technology or biological basis, but rather by its reliable, predictive
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
577
578
IT SUPPORTING BIOMARKER-ENABLED DRUG DEVELOPMENT
correlation to differential patient responses.” It is generally believed that “biomarker-enabled” drug development will lead to better and earlier decision making and that clinical biomarkers will pave the way toward targeted therapeutics combining “precision drugs” for stratified patient populations with diagnostic tests designed to identify not only “responders” (who will benefit) but also patient cohorts at risk for adverse side effects. Biomarkerenabled R&D is evolving into a new discipline with a strong patient focus. Organizations who believe in biomarker-enabled R&D are investing in tools and make the necessary organizational changes to implement the new concepts and associated processes. Among biomarkers, imaging biomarkers have received particular attention because of the noninvasive nature of imaging technologies and the obvious link to diagnostic procedures and clinical care. Imaging technologies are used increasingly as core technologies in biopharmaceutical R&D in both the preclinical and clinical phases of the R&D process. Disease areas most affected by this paradigm shift are cardiology, oncology, and neurology. Below we discuss in more detail how biomarkerrelated data types and their increasing volumes are challenging existing information technology (IT) infrastructures, and how IT architectures have to be enhanced and modified to integrate genomic, imaging, and other biomarker data.
PROCESSES, WORKFLOWS, IT STANDARDS, AND ARCHITECTURES The conventional R&D process—documented extensively by most researchbased biopharmaceutical companies—is sequential (Figure 1). After target identification and validation by the “biologists,” the medicinal “chemists” take over and screen extensive libraries of hundreds of thousands of chemical compounds against the target of eventually finding a suitable candidate for a drug. After the IND (initial new drug) application to the FDA, the drug candidate is tested in preclinical animal studies before it is handed over to the clinical development organization for clinical trials that proceed through three phases. If the drug candidate survives through phase III, all the supporting information collected will be submitted to the FDA in form of a new drug application (NDA) dossier that has to be compiled such that FDA’s rules and regulations are respected and followed. After FDA approval, phase IV trials may be conducted to collect postmarketing surveillance data about adverse drug reactions or to position the drug for new indications not yet covered by a given FDA approval. To manage the process effectively and to terminate failed projects as early as possible, R&D organizations have created a set of disciplined processes designed to optimize project portfolios and to track the progress of individual projects by milestones or decision gates (Figure 2). Industry leaders such as
PROCESSES, WORKFLOWS, IT STANDARDS, AND ARCHITECTURES
Basic Research
Biology
Chemistry
Target ID
Target Valid.
1.0
2.0
Screen
0.4
579
Development
Optimiz.
Preclinical clinical
Phase I
Phase II
Phase III
2.7
1.6
1.5
1.5
2.5
Phase IV
1.5
Time (Years)
Total ~15 years
Figure 1 Sequential R&D process. (From Peter Corr, IBM Imaging Biomarkers Summit I, Palisades, New York, December 15–17 2007.)
Decision “Gates” Drug Discovery
Pre-Clinical Research
PI
Phase II
Focus on Fast Fail and Performance Prediction
IND Submission
Phase III
Product Launch
Product Supply
Focus on Phase III Design Lock and Industrialization
NDA Submission
Figure 2 Decision gates (milestones) to manage sequential R&D processes. (From Terry McCormick, Kathleen Martin, and M. Hehenberger, IBM Institute for Business Value, The evolving role of biomarkers: focusing on patients from research to clinical practice, July 2007, http://www.ibm.com/industries/healthcare/doc/jsp/resource/ insight/.)
Novartis have realized that biomarker-enabled R&D has to be organized differently (personal communication from Werner Kroll, January 2007). In particular, there is an increasing emphasis on parallel processes and accelerated “proof-of-concept” in humans to “learn” quickly and to “confirm” (i.e., conduct extensive clinical trials) only if the learning is yielding promising results (Figure 3). A high-level depiction of this new approach can also be found in the IBM Pharma 2010 report [2]. It is the role of IT standards and architectures to support business strategies and to enable their implementation. IT standards relevant in this context are: • Data standards proposed by the Clinical Data Interchange Standards Consortium (CDISC) [8] such as: • SEND (Standard for Exchange of Nonclinical Data) covering animal data
580
IT SUPPORTING BIOMARKER-ENABLED DRUG DEVELOPMENT
Figure 3 Parallel biomarker-enabled processes up to preclinical development. (From Terry McCormick, Kathleen Martin, and M. Hehenberger, IBM Institute for Business Value, The evolving role of biomarkers: focusing on patients from research to clinical practice, July 2007, http://www.ibm.com/industries/healthcare/doc/jsp/resource/ insight/.)
• SDTM (Study Data Tabulation Model) covering human data • ODM (Operational Data Model) covering study data including EDCgenerated data • ADaM (Analysis Data Model) covering analysis data sets • HL7 Clinical Document Architecture (CDA) [9] • DICOM (Digital Imaging and Communications in Medicine) to transmit medical images [10] • Janus Data Model [11] developed by FDA and IBM [in a collaborative research and development agreement (CRADA)] and implemented by NCI (National Cancer Institute) and the FDA A comprehensive overview of relevant standards along with associated Web sites is shown in Figure 4. Based on the technical challenges for integrating a diverse set of data sources for a biomarker-based clinical data submission, we propose an IT architecture (Figure 5 [12]) that addresses a majority of those requirements. Although this architecture includes software products and assets belonging to IBM, one can logically extend it to fit other vendor products as well. Our approach is to present a general-purpose platform for managing clinical submissions of patient data, enhanced with genomic and imaging data.
581
Global Grid Forum: Life Sciences Grid standards
GGF www.ggf.org
Research
HUPO Proteornios Standards Initiative
Psidev.sourceforge.net
Figure 4
Medical Imaging
Standards for medical imaging
HL7
Clinical/ EHR/ Clinical Genornics
Electronic Common Technical Document
eCTD www.ich.org
X12 www.x12.org
ANSI Accredited Standards Committee (ASC) X12N : EDI and XML standards
Healthcare
DICOM Medical.nema.org
Pharmacogenomios Standards Initiative
Joint project of CDISC and HL7, based on FDA Guidance. IBM co-founded in Nov. 2003
Health Level Seven: Hospital/clinical data standards
HL7 www.hl7.org
Develop ment:
CDISC www.cdisc.org
Clinical Data Interchange Standards Consortium
Health care and life sciences standards organizations.
Bionformatics metadata standards
BioMoby www.biomoby.org
MGED www.mged.org
Microarray Gene Expression Data Society: MAGE standard
582
IT SUPPORTING BIOMARKER-ENABLED DRUG DEVELOPMENT
SCORE Portal (JSR 168) Analysisi
Cross-Trial Query
Reporting
Trial Design Applications
Application Services SCORE API (JSR 170)
Image Management and Analysis, External Data and Text Search
InsightLink Middleware
Data Abstraction Information Integration and Federation ETL Process DATAMART
SDTM SAS Exprot Files
Imaging Data
Genomic and Analysis Data Files
External Data
JANUS
Figure 5 Proposed IT architecture for biomarker-based clinical development. (From Terry McCormick, Kathleen Martin, and M. Hehenberger, IBM Institute for Business Value, The evolving role of biomarkers: focusing on patients from research to clinical practice, July 2007, http://www.ibm.com/industries/healthcare/doc/jsp/resource/ insight/.)
At the bottom data layer, summarized clinical submission data in CDISC’s SDTM format receive feeds from a clinical data management system that stores case report forms. First, the associated metadata for the SDTM submission (such as the vocabularies used for the adverse event codes, lab codes, etc.) are mapped into the tables of the Janus data model. An extracttransform-load tool is then used to improve the scalability and data validation, cleansing, and transformation needed to load the SDTM data into Janus. One may also need to build a collection of applications and use-case specific datamarts on top of Janus. These can be designed using Star-Schema-based dimensional models for optimal query performance. In addition to the clinical submission data in Janus, one would need to establish links to the imaging data that often reside in picture archiving and communications Systems. After extraction, the images can be managed centrally using a standardized imaging broker service (such as provided by IBM’s content management offering) and stored along with genomic raw and analysis files in a content management repository supported by IBM’s Solution for Compliance in a Regulated Environment (SCORE [13]). Finally, external reference databases such as PubMed, GenBank, dbSNP, and SwissProt are integrated using unstructured
CLINICAL BIOMARKERS AND ASSOCIATED DATA TYPES
583
information management technology provided, for example, by IBM’s WebSphere Information Server/OmniFind middleware. All these content stores can be searched dynamically using a federated warehouse that uses a wrapper-based technology for linking diverse data sources. On top of the federation layer, we propose a data abstraction and query layer powered by the Data Discovery Query Builder (DDQB). DDQB has been designed to expose a user-centric logical data model (based on XML) mapped on top of the physical data model. To support processes and workflows that may include the transfer of clinical biomarker data between pharmas, contract research organizations (CROs), imaging core labs, investigator sites (often academic medical research centers) and ultimately the FDA, the IT architecture should be designed as a serviceoriented architecture (SOA). IBM’s SCORE architecture satisfies this requirement and can therefore serve as a basis for the enablement of biomarker-based R&D.
CLINICAL BIOMARKERS AND ASSOCIATED DATA TYPES Among clinical biomarkers, the genomic and imaging data types are creating the greatest IT challenges. By recommending CDISC’s SDTM (see above) as the standard for drug submission, the FDA has taken an important step toward the integration of genomic and imaging data with conventional clinical patient data. CDISC SDTM is an easily extendable model that incorporates the FDA submission data structures. Based on strong collaboration between the biopharmaceutical industry, clinical research organizations, clinical trial investigator sites, IT vendors, and the FDA, SDTM represents the collective thoughts from a broad group of stakeholders. Conventional clinical data are categorized into four classes, subdivided into domains, of the SDTM data model: 1. Events include specific domains covering adverse events, subject disposition, and medical history. 2. Interventions cover exposure (to study drug), concomitant medications, and substance use. 3. Findings contain the assessment information, such as electrocardiogram, lab, physical exam, vital signs, and subject questionnaire data. 4. The other class was created to group specialized categories of information such as trial design, supplemental qualifiers, trail summary, and related records, where related records provide linkages across the different files. To support biomarker data submission, two new SDTM domains have been added:
584
ZB1000-009 ZB1000-009 ZB1000-009 ZB1000-009 ZB1000-009 …
NSCLC10 NSCLC10 NSCLC10 NSCLC10 NSCLC10 …
Figure 6
USUBJID
STUDYID
SPEC001 SPEC002 SPEC001 SPEC001 SPEC001
PGREFID
PGOBJ
EGFR-KD CYP1A2 CYP1A2 CYP2D6 CYP2C19 (*2,
PGTESTCD
PGOBJ
PGASSAY
50-574 50-574 50-574 50-574 50-574 …
MOLGEN MOLGEN MOLGEN MOLGEN MOLGEN …
EGFR CYP 1A2 CYP1A2 CYP2D6 CYP2C19
EGFR CYP 1A2 CYP1A2 CYP2D6 CYP2C19
PGSTRESC
CYP2D6 GENE.g.-1584C>G CYP2D6 GENE.g.100C>T CYP2D6 GENE.g.124G>A CYP2D6 GENE.g.883G>C CYP2D6 GENE.g.1023C>T …
PGTEST
PGSTRESC
PGORRES
PGSTRESN
PGSTRESN
CYP1A2 Mutation CYP1A2 Mutation DNA Analysis (provided f CYP2D6 test Cytochrome P450 2C19 Test
PGTEST EGFR-KD (EGFR Gene, Protein kinase domain assoc
M33388:g.-1584GG M33388:g.100TG M33388:g.124GC M33388:g.883GC M33388:g.1023CG …
PGORRES
CYP2D6 CYP2D6 CYP2D6 CYP2D6 CYP2D6 …
PGTESTCD
50-776 50-777 50-574 50-575
SPEC001 HGNC:2625 SPEC001 HGNC:2625 SPEC001 HGNC:2625 SPEC001 HGNC:2625 SPEC001 HGNC:2625 … …
PGREFID
PGMETHCD
CYP2D6-00001 CYP2D6-00001 CYP2D6-00001 CYP2D6-00001 CYP2D6-00001 …
PGGRPID
83891, 83892 x2, 83998 x 83891, 83892 x2, 83998 x 83891, 83892, 83901 x2, 83891, 83892, 83901 x2,
PGMETHCD PGASSAY 88323, 88380, 83890 (X2), 8389 12700056
EGFR-KD-001 CYP1A2-00001 CYP1A2-00003 CYP2D6-00001 CYP2C19-00001
PGGRPID
Partial sample of the pharmacogenomics SDTM domain.
1 2 3 4 5 …
1
7 1 1 11
PGSEQ
PGSEQ
ZBI000-0007 ZB1000-007 ZB1000-008 ZB1000-009 ZB1000-009
NSCLC10 NSCLC10 NSCLC10 NSCLC10 NSCLC10
Child Domain:
USUBJID
STUDYID
Parent Domain:
585
Unique subject identifier within the submission
Sequence number given to ensure uniqueness within a data set for a subject; can be used to join related records
Internal or external Imaging identifier; example: UUID for external imaging data File
Short name of the measurement, test, or examination described in IMTEST; can be used as a column name when converting a data set from a vertical to a horizontal format
Variable Name
Unique subject identifier
Sequence number
Imaging reference ID
Test or examination short name
CDISC STDM Imaging Domain
Clinical trial subject ID
Instance number
SOP instance UID
Study description
(0020,0013)
(0008,0018)
(0008,1030)
Attribute Name
(0012,0040)
Tag
Mapping of DICOM Imaging Metadata Tags into SDTM Imaging Domain
CDISC Notes (for Domains) or Description (for General Classes)
TABLE 1
Institution-generated description or classification of the study (component) performed
Uniquely identifies the SOP Instance (see C.12.1.1.1 for further explanation; see also PS 3.4)
A number that identifies this image (Note: This attribute was named an image number in earlier versions of this standard)
The assigned identifier for the clinical trial subject (see C.7.1.3.1.6; will be present if Clinical Trial Subject Reading ID (0012,0042) is absent, may be present otherwise)
Attribute Description
DICOM Tag
586
IT SUPPORTING BIOMARKER-ENABLED DRUG DEVELOPMENT
• The pharmacogenomic (PG) and pharamacogenomics results (PR) domains will support submission of summarized genomic (genotypic data). • A new imaging (IM) domain will include a mapping of the relevant DICOM metadata fields required to summarize an imaging submission. The PG domain belongs to the findings class and is designed to store panel ordering information. The detailed test-level information (such as Genotype/SNP summarized results) is reported in the PR domain. Figure 6 shows what a typical genotype test might look like in terms of data content and use of the HUGO [14] nomenclature. The PG domain supports the hierarchical nature of pharmacogenomic results, where for a given genetic test (such as EGFR, CYP2D6, etc.) from a patient sample (listed in the parent domain), multiple Genotypes/SNPs can be reported (listed in the child domain). To support the use of imaging biomarkers, DICOM metadata tags have to be mapped into the fields of the new IM domain. Table 1 illustrates this mechanism. While the FDA has proposed the SDTM data model for submission data, it is clear that this is only an interchange format for sponsors to submit summary clinical study data to the FDA in a standardized fashion. The FDA identified a need for an additional relational repository model to store the SDTM data sets. The requirement was to design a normalized and extensible relational repository model that would scale up to a huge collection of studies going back into the past and supporting those in the future. Under a CRADA, the FDA and IBM jointly developed this submissions repository called Janus (named after the two-headed Roman God) that can look backward to support historic retrospective trials and look forward to support prospective trials. The data classification system of CDISC such as interventions, findings, and events was leveraged in the Janus model with linkages to the subjects (for the patients enrolled in the clinical trial) to facilitate the navigation across the different tables by consolidating data in three major tables. Benefits resulting from this technique include reduced database maintenance and a simpler data structure that is easier to understand and can support cross-trial analysis scenarios. The ETL (extract–transform–load) process for loading the SDTM domain data sets instantiates the appropriate class table structure in Janus without requiring any structural changes. DATA INTEGRATION AND MANAGEMENT As scientific breakthroughs in genomics and proteomics and new technologies such as biomedical and molecular imaging are incorporated into R&D processes, the associated experimental activities are producing ever-increasing volumes of data that have to be integrated and managed. There are two major approaches to solving the challenge of enterprise-wide data access. The
DATA INTEGRATION AND MANAGEMENT
587
creation of data warehouses [15] is an effective way to manage large and complex data that have to be queried, analyzed, and mined in order to generate new knowledge. To build such warehouses, the various data sources have to be extracted, transformed, and loaded (ETL) into repositories built on the principles of relational databases [16]. Warehousing effectively addresses the separation of transactional and analysis/reporting databases and provides a data management architecture that can cope with increased data demands over time. The ETL mechanism provides a means to “clean” the data extracted from the capture databases and thereby ensures data quality. However, data warehouses require significant effort in their implementation. Alternatively, a virtual, federated model can be employed [17]. Under the federated model, operational databases and other repositories remain intact and independent. Data retrieval and other multiple-database transactions take place at query time, through an integration layer of technology that sits above the operational databases and is often referred to as middleware or a metalayer. Database federation has attractive benefits, an important one being that the individual data sources do not require modification and can continue to function independently. In addition, the architecture of the federated model allows for easy expansion when new data sources become available. Federation requires less effort to implement but may suffer in query performance compared to a centralized data warehouse. Common to both approaches is the need for sorting, cleaning, and assessing the data, making sure they are valid, relevant, and presented in appropriate and compatible formats. The cleaning and validation process would eliminate repetitive data stores, link data sets, and classify and organize the data to enhance their utility. The two approaches can coexist, suggesting a strategy where stable and mature data types are stored in data warehouses and new, dynamic data sources are kept federated. Genomic data are a good example of the dynamic data type. Since genomics is a relatively new field in biopharmaceutical R&D, organizations use and define data their own way. Only as the science behind genomics is better understood can the business definitions be modified to better represent these new discoveries. The integration of external (partly unstructured) sources such as GenBank [18], SwissProt [19], and dbSNP [20], can be complicated, especially if the evolving systems use does not match the actual lab use. Standardized vocabularies (i.e., ontologies) will link these data sources for validation and analysis purposes. External data sources tend to represent the frontier of science, especially since they store genetic biomarkers associated with diseases and best methods of testing that are ever-evolving. Having a reliable link between genetic testing labs, external data sources for innovations in medical science, and clinical data greatly improves the analytical functionality, resulting in more accurate outcome analysis. These links have been designed into the CDISC PG/PR domains to facilitate the analysis and reporting of genetic factors in clinical trial outcomes.
IT SUPPORTING BIOMARKER-ENABLED DRUG DEVELOPMENT
Stakeholder Stakeholder Management Management Discovery Discovery
Development Development
Imaging Imaging Sites Sites
CROs CROs
Regulatory Regulatory Affairs Affairs
Storage Policy Policy Definition Storage Definition
Taxonomy Definition Taxonomy Definition
Compliance Interpretation Compliance Interpretation Search Browser
Info Grid Work Area
Inbox Actions
Preferences Favorites
Reports Admin
Collaboration
Portal Workflow
Reg. Reg. Agencies Agencies
Process Flow
Business Process Management Lifecycle
Search
Assembly
Rendition
Security
Distribution
Import/Export
View/Print
Auditing
Change Ctrl
Retention
E-Signature
Services Services
Application Integration
588
Image Repository Repository Image Business Process Change Process Modeling
SOP Creation
Figure 7
Training
Change Management
Monitoring
IBM’s medical image management solution.
As standards continue to evolve, the need for semantic interoperability is becoming increasingly clear. To use standards effectively to exchange information, there must be an agreed-upon data structure, and the stakeholders must share a common definition for the data content itself. The true benefit of standards is attained when two different groups can reach the same conclusions based on access to the same data because there is a shared understanding of the meaning of the data and the context of use of the data. IMAGING BIOMARKER DATA, REGULATORY COMPLIANCE, AND SOA Under FDA’s strict 21 CFR Part 11 [21] guidelines, new drug submissions must be supported by documentation that is compliant with all regulations. As explained above and illustrated in Figure 3, IBM’s SCORE software asset has been designed for this purpose. SCORE’s flexibility and modular design make it particularly suitable for the management of imaging biomarker data. The FDA requires reproducibility of imaging findings so that an independent reviewer can make the same conclusion or derive the same computed measurements as that of a radiologist included in a submission. As a result, a unified architecture is required for a DICOM-based imaging data manage-
IMAGING BIOMARKER DATA, REGULATORY COMPLIANCE, AND SOA
589
ment platform that supports heterogeneous image capture environments and modalities and allows Web-based access to independent reviewers. Automated markups and computations are recommended to promote reproducibility, but manual segmentation or annotations are often needed to compute the imaging findings. A common vocabulary is also needed for the radiological reports that spell out the diagnosis and other detailed findings as well as for the specification of the imaging protocols. Figure 7 shows how imaging biomarker data and work flows can be managed in a regulated multistakeholder environment. The solution includes a range of capabilities and services: • Image repository: stores the image content and associated metadata. • Collaboration layer: provides image life-cycle tasks shared across sponsors, CROs, and investigator sites. • Image services: provide functionality such as security and auditing. • Integration layer: provides solutions for integration and interoperability with other applications and systems. • Image taxonomy definition: develops image data models, including naming, attributes, ontologies, values, and relationships. • Image storage policy definition: defines and helps to manage policies and systems for image storage and retention. • Regulatory interpretation: assists interpretation of regulations and guidelines for what is required for compliance. • Portal: provides a role-based and personalized user interface. In addition, the solution design incorporates the customized design, implementation, and monitoring of image management processes. It should also be pointed out that the medical image management solution architecture is fully based on principles of service-oriented architecture (SOA [22]). SOA is taking application integration to a new level. To take full advantage of the principles of SOA, it is important to start with a full understanding of business processes to be supported by an IT solution. Such a solution must be architected to support the documented business processes. The component business modeling (CBM [23]) methodology identifies the basic building blocks of a given business, leading to insights that will help overcome most business challenges. CBM allows analysis from multiple perspectives—and the intersection of those views offers improved insights for decision making. In the case of biomarkerenabled R&D, CBM will break down the transformed processes and identify the respective roles of in-house biopharmaceutical R&D functions and outside partners such as CROs, imaging core labs, investigator sites, genotyping services, and possible academic or biotech research collaborators. After mapping work flows it is then possible to define a five-level serviceoriented IT Architecture that supports the processes and work flows:
590
1. 2. 3. 4. 5.
IT SUPPORTING BIOMARKER-ENABLED DRUG DEVELOPMENT
Discrete: hard-coded application Partial: cross line of business processes Enterprise: cross-enterprise business processes Partner: a known partner Dynamic Partner: any trusted partner
In its most advanced form, SOA will support a complex environment with integrated data sources, integrated applications, and a dynamic network of partners. CONCLUSIONS Biomarkers are key drivers of the ongoing health care transformation toward the new paradigm of stratified and personalized medicine. In this chapter we focused on the role of IT in supporting the use of biomarkers in biopharmaceutical R&D. When doing so, we need to keep in mind that the benefits desired for patients and consumers will be realized only if the new biomedical knowledge is translated into stratified and personalized patient care. The biopharmaceutical industry will have to participate not only as a provider of drugs and medical treatments, but also as a contributor to the emerging biomedical knowledge base and to the IT infrastructures needed to enable biomarker-based R&D and clinical care. It is therefore critical to define the necessary interfaces between the respective IT environments and to agree on standards that enable data interchanges. IT standards and architectures must support the integration of new biomarker data with conventional clinical data types and the management of the integrated data in (centralized or federated) data warehouses that can be queried and analyzed Analysis and mining of biomarker and health care data are mathematically challenging but are necessary to support diagnostic and treatment decisions by providers of personalized care. Finally, service-oriented architectures are required to support the resulting processes and work flows covering the various health care stakeholders. REFERENCES 1. Trusheim MR, Berndt ER, Douglas FL (2007). Stratified medicine: strategic and economic implications of combining drugs and clinical biomarkers. Nat Rev, 6:287–293. 2. Pharma 2010: The Threshold of Innovation. http://www.ibm.com/industries/ healthcare/doc/content/resource/insight/941673105.html?g_type=rhc. 3. DiMasi JA (2002). The value of improving the productivity of the drug development process: faster times and better decisions. PharmacoEconomics, 20(Suppl 3):1–10.
REFERENCES
591
4. DiMasi JA, Hansen RW, Grabowski HG (2003). The price of innovation: new estimates of drug development costs. J Health Econ, 22:151–185. 5. http://www.fda.gov/oc/initiatives/criticalpath/whitepaper.html#execsummary. 6. http://www.fda.gov/cder/genomics/PGX_biomarkers.pdf. 7. Lesko LJ, Atkinson AJ Jr (2001). Use of biomarkers and surrogate endpoints in drug development and regulatory decision making: criteria, validation, strategies. Annu Rev Pharmacol Toxicol, 41:347–366. 8. CDISC. http://www.cdisc.org. 9. HL7. http://www.hl7.org. 10. DICOM. http://www.Medical.nema.org. 11. Janus data model. http://www.fda.gov/oc/datacouncil/, and http://crix.nci.nih.gov/ projects/janus/. 12. Hehenberger M, Chatterjee A, Reddy U, Hernandez J, Sprengel J (2007). IT solutions for imaging biomarkers in bio-pharmaceutical R&D. IBM Syst J, 46(1): 183–198. 13. SCORE. http://www.03.ibm.com/industries/healthcare/doc/content/bin/ HLS00198_USEN_02_LO.pdf. 14. http://www.gene.ucl.ac.uk/nomenclature/. 15. Kimball R, Caserta J (2004). The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data. Wiley, Hoboken, NJ. 16. Codd EF (1981). The significance of the SQL/data system announcement. Computerworld, 15(7):27–30. See also http://www.informatik.unitrier.de/∼ley/db/ about/codd.html. 17. Haas L, Schwarz P, Kodali P, Kotlar E, Rice J, Swope W (2001). DiscoveryLink: a system for integrated access to life sciences data. IBM Syst J, 40(2):489–511. 18. GenBank. http://www.ncbi.nlm.nih.gov/Genbank/. 19. SwissProt. http://www.ebi.ac.uk/swissprot/. 20. dbSNP. http://www.ncbi.nlm.nih.gov/SNP/. 21. http://www.fda.gov/ora/compliance_ref/part11/. 22. Carter S (2007). The New Language of Business: SOA & Web 2.0. IBM Press, Armonk, NY. 23. http://www.ibm.com/services/us/gbs/bus/html/bcs_componentmodeling.html.
34 REDEFINING DISEASE AND PHARMACEUTICAL TARGETS THROUGH MOLECULAR DEFINITIONS AND PERSONALIZED MEDICINE Craig P. Webb, Ph.D. Van Andel Research Institute, Grand Rapids, Michigan
John F. Thompson, M.D. Helicos BioSciences, Cambridge, Massachusetts
Bruce H. Littman, M.D. Translational Medicine Associates, Stonington, Connecticut
INTRODUCTION A chronic disease is really a phenotype representing the combination of symptom patterns and pathological findings that practicing clinicians have classified together as a disease. Like any other trait, each component of the disease exists because of contributions from genetic and environmental factors and the resultant modifications of biological functions that disturb the normal homeostatic state. Thus, the expression of a single chronic disease can be due to different combinations of genetic and environmental factors. Yet, when
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
593
594
REDEFINING DISEASE AND PHARMACEUTICAL TARGETS
physicians treat patients with a drug, they are actually modulating just single or subsets of molecular targets and their downstream pathways. In a given patient, the importance of that target will generally be more or less significant than the average importance in the disease population. Since the drug was approved based on an average clinical response, there will be patients with much better than average responses and patients who do not have a satisfactory clinical response at all. Two important concepts for personalized medicines are hypothesized in this chapter. The first is that there is a distribution of relative expression of abnormal pathway activity for every element contributing to the pathogenesis of a disease. The second is that regardless of the proportion of subjects in a disease population, when a drug target is aligned with the most relevant abnormalities, it is likely that the therapeutic benefit of the drug for the individual patient where that alignment is found will be much greater than the average benefit for the entire disease population. This means that if the drug target and its downstream pathways are abnormally expressed to a large degree in a patient subpopulation, the therapeutic benefit of a drug targeting that pathway will also be more pronounced than that observed for the entire disease population. These concepts are expressed graphically in Figure 1. The first is illustrated by scenarios 1, 2, and 3, where the three distributions represent the proportion of patients with different degrees of abnormal pathway expression for three different targets or pathways in the same disease population. The second concept is depicted as the solid black curve, where the degree of expression of the abnormal target or pathway is correlated with the potential therapeutic benefit of a drug targeting that pathway. The shaded areas represent the proportion of the disease population with the best clinical response. These same two concepts will also determine the spectrum of relative toxicity (safety outcomes) of drugs across a population in a maner similar to efficacy. Thus, the therapeutic index (benefit/risk) of drugs targeting a specific pathway may become more predictable. Molecular definitions of disease and personalized medicine therefore has the potential to improve the outcomes of drug treatment provided that physicians have the appropriate diagnostic tools as well as drugs with targeted and well-understood biological activities. Then, rather than directing treatment toward the mythical average patient, physicians will be able to tailor their prescriptions to the patients who could benefit most and also customize the dose and regimen based on the pharmacogenetic profile of the patient and the absorption, distribution, metabolism, and elimination (ADME) properties of the drug. This should provide patients with safer and more efficacious treatment options. In this chapter we use three different disease areas to illustrate these principles and the potential of personalized medicine. These concepts are most advanced in oncology, and we start with this example. Here tumors from individual patients are molecularly characterized and drug regimens are
High
20% 15% 10% 5% 0% 0
Low 20
40
60
80
Percent Patients
25%
High
20% 15% 10% 5% 0% 0
Low 20
40
60
80
100 High
25% Percent Patients
100
20% 15% 10% 5% 0% 0
20
40
60
80
Clinical Response to Drug
Percent Patients
25%
Low 100
Clinical Response to Drug
Scenario 1 Scenario 2 Scenario 3 Clinical Response to Drug
595
Clinical Response to Drug
INTRODUCTION
Target Expression Level in Patients
Figure 1 Principles supporting personalized medicine strategies. Scenarios 1, 2, and 3 are three distributions representing the proportion of patients with different degrees of abnormal pathway expression for three different targets or pathways in the same disease population. The solid black curve shows the correlation between the degree of expression of the abnormal target or pathway and the potential therapeutic benefit of a drug targeting that pathway. The shaded areas represent the proportion of the disease population with increased probability of achieving the best clinical response. (See insert for color reproduction of the figure.)
selected that target the abnormal pathways driving the neoplastic phenotype. However, as described above, the same principles apply to all chronic diseases, and we illustrate this using type 2 diabetes and rheumatoid arthritis, where the science is just approaching a level that will enable personalized medicine strategies.
596
REDEFINING DISEASE AND PHARMACEUTICAL TARGETS
ONCOLOGY In the United States alone in 2007, it is estimated that nearly 1.5 million new cancer diagnoses will be reported and that over 500,000 patients will die from the disease [1]. Although early detection coupled with improved debulking procedures has led to some improvement in the survival of patients diagnosed with early-stage disease, the outcome of patients with advanced metastatic disease remains bleak. Metastatic disease will continue to burden society and the health care system due to an aging population, late-onset recurrence of microscopic metastases [2], and the fact that many tumors remain undetectable in their early stages. Long-term treatments for disseminated disease that maximize antitumor efficacy and patient survival while minimizing patient morbidity remain a primary objective in medical oncology, yet with few exceptions continue to be elusive. The level of interest in the field of individualized molecular-based treatments has been driven by a number of factors, including public demand [3], regulatory agencies [4], and the possible financial incentives associated with biomarker–drug co-development within the pharmaceutical and biotechnology industry [5]. In oncology in particular, the lack of a statistical demonstration of agent efficacy in phase III trials is the primary reason for late-stage drug failures that logically result in increasing costs of oncology treatments that successfully attain market approval [6]. Biomarker strategies that can accurately identify the responsive tumors and/or patient population most likely to receive benefit provide a clear conduit to rescue failed or failing drugs, and also provide a realistic approach to enrich patient populations in early clinical trials to maximize the probability of drug efficacy. The concept was perhaps best illustrated during the approval of trastuzumab, a monoclonal antibody against ERBB2 that is frequently amplified in breast tumors [7]. The co-development of a biomarker that assesses the mRNA or protein expression of ERBB2 increased the overall response rate from approximately 10% in the overall population to 35 to 50% in the ERBB2-enriched subpopulation [8]. Without a means to enrich for patients with this molecular subset of tumors, trastuzumab may not have gained U.S. Food and Drug Administration (FDA) approval for the breast cancer “indication.” Today, there are an unprecedented number of available therapeutic agents that have been designed to target specific molecular entities irrespective of the selected phenotypic indications. Within our current knowledge base, there are more than 1500 drugs that with varying degrees of specificity, target defined constituents of molecular networks. Coupled with advances in technology and computer sciences, in the postgenomic era we are now presented with an unprecedented opportunity to apply our knowledge and existing and/ or emerging resources to revolutionize medicine into a predictive discipline, where integration of clinical and molecular observations are used to maximize therapeutic index. We are now able to measure the molecular components of biological systems at an extraordinary density using standardized molecular
ONCOLOGY
597
profiling technologies, which provide a portrait of the perturbed molecular systems within a diseased tissue. The lack of efficacy of targeted agents in oncology is somewhat ironically likely due in part to their specificity, and biomarker-driven selection of drug combinations will be required to target the Achilles heel of the tumor system. Pathways involved in neoplastic transformation and in vivo tumor growth and progression are complex. Biological systems have evolved to provide the ultimate level of plasticity to allow cells to adapt to or exploit extracellular cues and ensure their long-term survival and/or survival of the organism as a whole [9]. The system is highly responsive to the cellular context, which includes both temporal (time dependent) and spatial (location dependent) factors that through a series of epigenetic events, influence the formation of the observed phenotype. The complexity of a tumor system is exacerbated due to the inherent genomic instability of a tumor cell; alterations in DNA repair mechanisms at the onset of tumorigenesis essentially instigate an accelerated microevolutionary process, where the interplay between each tumor cell and its microenvironment provides a constantly shifting context and selective milieu which naturally results in cellular heterogeneity [10,11]. The molecular network of the tumor system represents integration between the subsystems of malignant cells and their host microenvironment, which can include a multitude of proteomic and chemical constituents and other cellular systems contributed by endothelial, stromal, and inflammatory cells [12,13]. Collectively, tumor systems are highly adaptive and naturally exhibit significant variation over time, between locations, and across individuals. The malignant phenotype results from perturbation of many pathways that regulate the tumor–host interaction and affect fundamental cellular processes such as cell division, apoptosis, invasion, and metabolism. The multistage process of tumor etiology and progression is driven by the progressive accumulation of genetic mutations and epigenetic abnormalities that include programs of gene expression. At first glance, the perturbations in individual components (DNA, RNA, protein, and metabolites) of a tumor system that have been identified in association with the various malignant phenotypes appear to reflect a somewhat stochastic process. However, while recent efforts to sequence the human genome have confirmed the large degree of redundancy within signaling networks, they have also revealed the extent that tumor cells utilize converging pathways to thrive within their selective environment [14,15]. Indeed, a relatively small number of key intracellular switches have been associated with tumorigenesis in preclinical mouse models and in vitro cell lines. This phenomenon, termed oncogene addiction, suggests that targeted agents against these central signaling relays may prove effective [16]. The classical oncogene target should be expanded to include any molecular aberration that is causative with respect to the etiology and/or progression of the disease at the network level. Additional intervention points within conserved tumor systems are being associated with the various phenotypes of cancer, in part through the use of high-throughput functional screens [17].
598
REDEFINING DISEASE AND PHARMACEUTICAL TARGETS
While these preserved network hubs represent obvious candidates for targeted therapies, redundancy within molecular pathways also provides a predictable path to drug resistance. Given the genetic instability within a tumor and the robustness within molecular networks, the probability that a cell within the average tumor mass will acquire resistance to a single agent with a selective molecular target during the life span of a tumor population would be expected to be very high [18]. This is well illustrated with the drug imatinib, which was developed to target the ABL kinase, which is constitutively active in patients with chronic myelogenous leukemia due to a Bcr-Abl gene translocation. Resistant tumors have recently emerged that utilize alternative pathways downstream of the drug target to circumvent the absolute requirement for the ABL kinase [19]. Similar results with other targeted agents used in single or minimal combinational modalities are emerging and would indeed be predicted from analysis of the target network that demonstrates the level of redundancy and robustness within signaling pathways [20]. Essentially, the plasticity within these networks reduces the dependency on single nodes within pathways. Tools such as systems biology will play a critical role in modeling the pathways to de novo and acquired resistance. Coupled with knowledge of drug–biomarker associations, logical targeted approaches that minimize the probability that a tumor will develop resistance and/or target key network nodes/hubs to reverse the resistant phenotype can be developed. The target for individualized therapy therefore becomes the perturbed tumor system as a whole, against which targeted therapeutics could be combined to maximize disruption of the networks identified. An increasing number of technologies are available for assessment of the individual molecular components of cellular systems. Biomarkers can be genetic, genomic, proteomic, or metabolomic in nature, and have been used in various aspects of medicine to predict a phenotype of interest. While these have traditionally been developed as individual biomarkers that can readily be validated as an in vitro diagnostic, multivariate assays have recently been developed that simultaneously assess the levels of different biomarkers and provide a molecular signature in association with a phenotype or context [21]. These signature-based tests require integrated informatics, which can generate a mathematical algorithm that is trained and tested on independent sample sets [22]. Global molecular profiling offers many advantages over custom biomarkers, since a profile that accurately captures the underlying system of the disease can be attained and used as a common input for both the discovery and development of diagnostics. Indeed, a major bottleneck in the field of personalized medicine is the time required to develop a validated biomarker; a genome-scale technology that provides a standardized input of raw data coupled with computational methods that provide consistent algorithm-based predictive outputs would ultimately permit for the rapid development and testing of new molecular signatures associated with any phenotype of interest.
ONCOLOGY
599
Despite the proliferation of new technologies that enable detection of specific biomarkers, gene expression profiling represents a relatively standardized platform that has been used extensively to create a depth of empirical data sets in association with various phenotypes. The ability to utilize gene expression profiles of human cancers to identify molecular subtypes associated with tumor progression, patient outcome, and response to therapy is increasingly evident [23]. For example, a multiplexed assay that determines the expression of a number of mRNA transcripts from breast carcinomas has been developed as a commercial test to predict the risk of tumor recurrence [21,24]. With respect to the prediction of optimal cytotoxic or targeted therapies, systematic efforts utilizing gene expression signatures to identify compounds that reverse the diseased genotype hold great promise [25]. In vitro cell line gene expression signatures associated with differential drug sensitivity have also been shown to predict tumor response to some agents in the clinic with varying degrees of accuracy [26,27]. These and other empirically based methods for predicting optimal therapeutics based on the overall genomic signature of the tumor will play a pivotal role in future personalized medicine initiatives. These and other signature-based methods are currently being evaluated within our predictive therapeutics protocol in conjunction with network analysis to determine their feasibility for broad application. While the signature-based approaches outlined above represent a systematic approach for the logical selection of treatments based on the gene expression profile of a tumor, considerable experimentation is required to generate the predictive models. A supplementary approach is to utilize advances in network theory, systems biology, and computer modeling to reconstruct the aberrant molecular network predicted based on the same input of deregulated genes within the tumor. Although the deregulated expression of a molecular target may be associated with differential sensitivity to a targeted agent, gene or protein expression alone does not necessarily equate to target activity. For example, one of the first molecular events that occurs in some cells following stimulation with an extracellular ligand can be the down-regulation of the activated cell surface receptor (reduced protein expression) and/or reduced receptor mRNA transcription [28]. Nonetheless, successive waves of transcriptional events that occur within a tumor cell represent a hallmark of upstream chronic signaling cascades on which the tumor more likely depends. Gene expression profiling has been used successfully to identify the activation status of oncogenic pathways [29], demonstrating the feasibility of utilizing standardized gene expression signatures as a surrogate input for the prediction of network activity. Further analysis of the conceptualized network can predict convergence and divergence hubs within the tumor system, some of which can be targeted with existing therapeutics. This approach does not require comparison to an empirical data set, but rather, relies on network knowledge and graph theory to construct networks from known interactions between system components. Combinational strategies that target key nodes or hubs within
600
REDEFINING DISEASE AND PHARMACEUTICAL TARGETS
deregulated molecular networks associated with maintenance of the malignant phenotype may maximize therapeutic efficacy and may reduce the probability of a tumor cell utilizing alternative network components to achieve the “resistant” phenotype.
PREDICTIVE THERAPEUTICS PROTOCOL The general schema for a predictive therapeutics protocol is outlined in Figure 2. The primary objective of the protocol is to evaluate the merits of the various predictive methodologies outlined above while simultaneously providing information back to treating physicians in a real-time fashion for consideration in the design of a treatment plan. While a full description of the protocol is beyond the scope of this chapter, we have enrolled 50 patients in the first phase that was focused on the development of the critical infrastructure and logistics. From each patient, highly qualified tumor tissue (or isolated tumor cells) is processed using standard operating procedures to create a gene expression profile. The signature is used to compare to other well-annotated samples
Figure 2 High-level review of our IRB-approved predictive therapeutics protocol in which patient tumors are processed using Affymetrix GeneChip technology after the required consenting and pathology clearance, to generate a gene expression signature that is reflective of the underlying biological context. These samples are processed using standardized procedures, to minimize confounding variables that can significantly influence the interpretation of the results. Molecular data are analyzed statistically relative to a wide variety of well-annotated samples within the database, and these intermediate results are applied further to the integrated knowledge base that includes systems biology tools. For example, enriched networks are identified and further refined to categorize significant convergence and/or divergence hubs that represent drugable targets with existing agents with known molecular mechanisms of action. Irrespective of the predictive method employed, each drug is associated with a normalized score for predicted efficacy. A report with these standardized predictions (indicated by the arrow) is provided back to the medical oncologist, who determines a treatment selection using all information available to him or her, which may include the molecular evidence. The patient’s administered treatment is captured and their tumor response is assessed using standard clinical criteria. In this fashion, the association between the drug score predicted and the tumor response can be determined. In addition, a section of the patient’s tumor is implanted directly into immunecompromised mice to establish a series of tumor grafts, which naturally more closely resemble the human disease at the molecular and histological level relative to established-cell-line xenograft models. These tumors are expanded in additional mouse cohorts and alternative predictive methods are tested to prioritize those with the most promise. Over time it is hoped that this approach of predictive modeling from standardized data, experimental testing, and model refinement may provide a means to identify optimal therapeutics with a high degree of confidence in a systematic fashion.
601
Patient enrollment
Eligible?
STOP
Tumor implant into mice
Tumor sample Pathology
STOP
10 days
Treatment
Figure 2
6–9 months
Preclinical treatment evaluation
Gene expression profiling
First report
Patient response
Second Report
Second treatment
Patient response
602
REDEFINING DISEASE AND PHARMACEUTICAL TARGETS
within a large database, and deregulated patterns of gene expression used in conjunction with a knowledge base of known drug–target interactions to infer treatment strategies. We also attempt to establish a xenograft after implantation of a section of the fresh tumor into immune-compromised mice. These tumor grafts are expanded through two generations to create a large colony of mice harboring the patient’s tumor, and these are then used to statistically evaluate the different predictive methodologies and their corresponding treatment recommendations. While the preclinical component of the protocol does not typically provide useful information to the treating physician, it represents an excellent resource for prioritizing predictive methodologies and for developing a biomarker strategy for novel therapeutics. At the onset, it is apparent that this multidisciplinary protocol requires several infrastructural components as well as integrated logistics. These include the development of centralized informatics capabilities that permit full integration of clinical and molecular data, drug–biomarker knowledge, predictive modeling, and reporting. Standardized tissue procurement and pathological characterization with attention to quality control are essential to ensure consistency in the raw molecular data that are used to derive treatment predictions. Consistent feedback from the clinical and preclinical treatment outcomes is critical to assess the validity of the predictive methods. Each component of a therapeutic regimen is scored objectively based on the predictive methodology, and the ultimate success of the method is determined by comparing this standardized score with tumor response. It is important to state that at this time, the results obtained from the clinical arm of the protocol remain anecdotal, due to the underpowered nature of the initial proof-of-feasibility experimental design. However, despite representing a nonvalidated method for drug prioritization, any molecular information that can be provided to the treating physician is deemed valuable, especially for late-stage metastatic or refractory patients who have exhausted their standard of care options. In this sense, this protocol serves as a rudimentary clearinghouse where patients are placed onto experimental protocols, including off-label protocols based on the molecular profile of their disease. With the multitude of predictive models now available to suggest optimal combinational strategies based on a standardized gene expression signature from an individual tumor, the preclinical tumor grafts provide an invaluable resource for triaging ineffective methodologies. At this time, we are exploring a range of methods that range from rudimentary target expression to the more sophisticated signature-based methods and network inference. In general, the molecular similarities between the human tumor and the derived tumor grafts are excellent and represent a significant improvement from the classical cell line xenograft models (Figure 3). This implies that the molecular network within the tumor system as a whole is generally maintained in the human and mouse hosts, although some clear exceptions are noted; for example, the expected reduction in markers of human vasculature and inflammatory cells are evident and expected. Although it is too early to claim direct equivalence
PREDICTIVE THERAPEUTICS PROTOCOL
603
between the mouse tumor graft and patient tumor with respect to drug efficacy, early data are promising. In a handful of cases tested to date where the mouse and human harboring the same tumor are treated with the same treatment regimen, similar tumor responses have been observed. However, the tumor grafts are used predominantly to test the concept of using derived molecular data (in the mouse system) to predict optimal combinational therapies and not necessarily to define the best treatment strategy for the donating patient. To illustrate how the molecular signature of an individual tumor can be modeled to predict target activation and sensitivity to approved drugs, we use a case study from the first phase of our protocol. A 63-year-old male presented with metastatic non-small cell lung carcinoma, and after the necessary consent, enrolled in this research protocol. A biopsy of the tumor was qualified and released by pathology, and the sample processed within a CLIA/CAPaccredited laboratory that utilizes full genome Affymetrix GeneChip technology. The standardized gene expression data were compared to a database of other tumors and normal tissues, to identify the most significantly deregulated gene transcripts within the patient’s tumor. Our informatics solution uses a database of drug–biomarker knowledge that includes the reported molecular targets of more than 1500 drugs, the interaction type, and the effect of the interaction (agonistic or antagonistic). By aligning this knowledge with the deregulated signature within the tumor, potential drugs of interest are quickly identified. Of key importance, each drug is assigned a priority score (weight) based on the indication of the biomarker, which in turn depends on the predictive methodology used. For example, increased expression of a molecular target may indicate the corresponding targeted drug, and this is scored based on the normalized gene expression value. Since target expression does not necessarily equate to target activation status, we have also developed a specific network analysis tool in conjunction with GeneGo (http://www.genego.com), which systematically evaluates the topological significance within reconstructed networks. Molecular networks are constructed based on the input provided (in this example, overexpressed genes) and an algorithm compares this to the connectivity map of the global network. The significance for each node within the identified network is calculated based on its probability of providing network connectivity, and this is used to score each drug. In the tumor of this patient, a major divergence point was identified as epidermal growth factor receptor (EGFR) (Figure 4), suggesting EGFR lie upstream of the transcriptional events observed. In this case, EGFR was also overexpressed at the transcript level relative to other tumors and, collectively, was therefore assigned a high score. Based on this inference, the medical oncologist confirmed EGFR gene amplification in the tumor using traditional FISH analysis. This patient was treated with a combination of erlotinib, cisplatin (reduced expression of the ERCC1 gene), and bevacizumab (inferred constitutive activation of the VEGF–VEGFR pathway). The patient exhibited a partial response to this
604
REDEFINING DISEASE AND PHARMACEUTICAL TARGETS
targeted, combinational treatment, and is currently maintained on a noncytotoxic regimen of erlotinib and bevacizumab alone (Figure 5). While these results remain anecdotal in nature, a handful of significant tumor responses have been observed in the first phase of the protocol. Tumor regression has also been observed in several mouse models, where the various systematic methods for predicting optimal drug combinations are being evaluated and prioritized. The key to realizing the full promise of personalized medicine in oncology lies in the ability to predict combinations of agents in a systematic and objective fashion, irrespective of context and historical disease classification. The wealth of drugs currently available for combinational treatments necessitates a bold migration away from traditional diagnostics in which custom biomarkers are developed in parallel with a specific drug, typically in tumor subtypes. In conjunction with the isolation of the postulated tumor stem cell compartment of a cancer [30], standardized technologies that permit the application of genome scale network-based approaches for the prediction of any drug in various combinations may allow specific targeting of the Achilles heel of the molecular system irrespective of the observed phenotype.
TYPE 2 DIABETES Type 2 diabetes, in many ways like cancer, arises from a complex interplay of environmental and genetic factors that lead to a broad spectrum of conditions characterized by chronic high levels of glucose. While glucose levels are the common denominator for diagnosis of the disease, the simplicity of this mea-
Figure 3 The preclinical arm of our predictive therapeutics protocol allows for the prioritization of methodologies based on their ability to predict optimal combinational designs derived from the networks identified within the tumorgraft system. These tumorgrafts are established directly from the patient’s tumor by implantation into immune-compromised mice, and are characterized by both molecular profiling and histopathology. In this particular example, the data were restricted to include only biomarkers that represent known drug targets. In this fashion, the relative distribution of existing targets can be determined across patient tumors and their corresponding tumor grafts. (A) A heat map following unsupervised hierarchical clustering shows how the tumorgrafts in the mouse host closely resemble their donating human tumor at the genomic level even when the analysis was restricted to utilize only known drug targets. Patient tumors and their derived mouse tumorgrafts are coded with the same color and can be seen to co-cluster based on their overall genomic similarity. Probes encoding EGFR are highlighted to show the distribution of expression of this target across the various tumors. (B) The mean correlation coefficient in a direct comparison of human tumors with mouse tumor grafts is approximately 0.93, demonstrating excellent overall similarity at the biomarker level. Some notable exceptions are evident, such as reduced expression of human targets associated with angiogenesis in the murine host. (See insert for color reproduction of the figure.)
TYPE 2 DIABETES
605
surement actually masks a complex set of problems that vary among individuals. Some people are beset primarily with dysfunctional pancreatic beta cells, while others may have more significant issues in muscle or liver tissues. Being able to determine the underlying nature of an individual’s diabetes will help best determine the proper course of treatment.
Figure 3
606
Figure 4 Topological network analysis of the overexpressed genes from a non-small cell lung carcinoma identified a potential key input node at the level of EGFR. The results of these analyses are displayed using MetaCore, a systems biology network tool produced by GeneGo (www.genego.com). The significance of each node to confer system connectivity can be inferred after comparison with the global connectivity map and the drug–target knowledge base applied to select corresponding inhibitors. Among other applications, this type of systems approach, which does not depend on prerequisite empirical data sets, can readily be applied for the discovery of new disease targets, prioritization and/or validation of existing targets, and/or the identification of new indications for compounds that have a known or associated molecular mechanism of action. A key aspect is successful identification of the significant convergence or divergence hubs or nodes within the identified networks. (See insert for color reproduction of the figure.)
TYPE 2 DIABETES 12/29/06 1/2/07
Bevacizumab
Cisplatin
600
Erlotinib
Docetaxel CA 125 (U/mL)
607
400
SUV, g/mL * RECIST, mm
200
Biopsy 12/7/06
35 5 30 4 25 20 3 15 2 10 1 5 0 0 12/14/2006 12/28/2007
0 12/1/06
1/20/07
Normal Level: 0–30 2/7/2007
3/10/07
4/9/2007
4/29/07
6/15/2007
6/17/07
8/6/07
* Sum of SUV measurements from select lesions: right upper lobe, right hilar, right humerus and right acetabulum
Figure 5 Anecdotal evidence of a molecularly targeted combinational treatment in a 63-year-old man with metastatic non-small cell lung carcinoma. In this example, the patient’s tumors showed a prolonged partial response to erlotinib (overexpression of EGFR and network-based inference of activated EGFR) in combination with cisplatin and bevacizumab, which were also indicated from the molecular profiling data (low ERCC1 gene expression and evidence of constitutive VEGF–VEGFR network signaling, respectively).These agents were combined with docetaxel, an approved secondline treatment for metastatic NSCLC. The levels of the serum marker CA125 together with the sum of the maximum dimensions of the target lesions (CT scan) and standard uptake value for the glucose tracer ([18F]DG PET scan) are shown over time. The timing of the respective treatments is also shown.
Approaching diabetes from a more molecular point of view provides the opportunity to personalize treatment. Efforts to do this are still in their infancy, but significant progress is being made, with real results becoming apparent. Metabonomics focuses on analyzing a host of small molecules, generally in the urine but potentially in any bodily fluid, and determining differences among disease states. Most work thus far has been in animal models, but progress is being made with human samples. For example, a fingerprint of thiazolidinedione treatment was detected in humans, although no difference was found looking at healthy versus diseased individuals [31], a review of this field by Griffin and Nichols [32] with a focus on diabetes and related disorders highlights both the potential of the field and identifying issues.
608
REDEFINING DISEASE AND PHARMACEUTICAL TARGETS
Proteomics and transcriptomics have also been studied extensively as potential means of assessing subtypes of diabetes. For both technologies, diabetes presents a challenge, due to the inaccessibility of relevant tissues for the study of humans. However, in some cases, such as proteins derived from the kidney and other proteins secreted or leaked into the urine, it is possible to carry out such studies. Susztak and Bottinger provided examples of the advances and issues with these technologies for studying [33]. Because DNA is much more readily available than the relevant proteins and mRNA in tissues, genetic studies of diabetes have advanced more rapidly. Both type 1 and type 2 diabetes have long been known to have significant genetic components. Type 1 diabetes has generally been most highly associated with genes involved in the immune response, while type 2 has been less easily addressed, due both to its complex nature and its variable phenotype, such as age and severity of onset. Frequently, genes responsible for MODY (maturity onset diabetes of the young) have been assumed to play a role in the adult form of the disease, but that connection is weak in some cases. Genetic associations with type 1 diabetes have been strongest with genes in the MHC cluster and to a lesser extent with insulin and components of its signaling pathways. Genetic associations with type 2 diabetes have been less well replicated, but until the advent of whole genome scans had focused on genes known to be involved in obesity, lipid handling, and signaling, but many of these associations are poorly replicated. One attempt to replicate associations with 134 single-nucleotide polymorphisms (SNPs) across more than 70 genes found that only 12 SNPs in nine genes could be replicated [34]. This suggests both that there are many genes of weak effect and that there are many genes not yet discovered. Advances in selecting new therapeutic targets and the most appropriate patients for each therapy depend on identifying these genes so that the true complexity and subtypes of the disease can be determined. For both forms of diabetes, whole genome analysis of large cohorts is beginning to make substantial inroads into understanding the etiology of the diseases. The availability of large (multi-thousands) family sets for type 1 diabetes patients and large case–control cohorts for genetic analysis, coupled with cheaper genotyping technology and an in-depth understanding of human genome structure, are allowing previously unsuspected genes to be linked with the disease and setting the stage for a much more detailed molecular understanding of the systems involved and how they may go awry in diabetes. The bottleneck in overall understanding has changed from identifying the appropriate genes to study, which used to be virtually impossible, to functionally characterizing the myriad genes that are now associated with diabetes and placing them in appropriate cellular and molecular pathways. Now that the multitude of genes that lead to diabetes are being uncovered, it will be possible to subdivide the disease into categories and determine whether the different subsets might best be treated by particular therapies. Any review of novel genes associated with diabetes is certain to be outdated even before it is complete. With the advent of so many whole genome scans,
TYPE 2 DIABETES
609
cross-replication of findings across cohorts has recently become a priority among research groups, and many of the new associations have already been confirmed. Those that have not been confirmed may still be real but suffer difficulties between populations due to differences with respect to disease, risk factors, ancestry, and other confounding variables that make comparisons challenging. Nevertheless, recent reviews can help make sense of the exploding databases [35]. Recent publications of whole genome scans include type 1 diabetes in 2000 British cases [36] and 563 European cases/483 European trios [37], and in type 2 diabetes, 694 French cases [38], 1464 Finnish and Swedish cases [39], 1161 Finnish cases [40], and 1399 Icelandic cases [41]. Although not completely concordant, the genes shared among these studies provide strong evidence that many are involved in the genetic basis of diabetes. The most striking new gene to be associated with type 2 diabetes is TCF7L2, a transcription factor that regulates a number of genes relevant to diabetes [42]. Prior to the first publication on the association of TCF7L2 with diabetes, its role in the regulation of proglucagon had been established [43], but its high level of importance was not apparent. After the initial report, numerous publications emerged that associated it with diabetes, glucose levels, birth weight, and other related phenotypes in many different populations. TCF7L2 is clearly important in diabetes etiology, and its genotype may also be valuable for choosing the best mode of treatment. A retrospective analysis of patients treated with either metformin or a sulfonylurea showed no genetic difference in treatment effect with metformin but a significant genetic effect if treated with a sulfonylurea [44]. Thus, knowing a patient’s TCF7L2 genotype may help guide a physician in a drug treatment decision, but it may also affect how aggressively a patient should be handled. For example, should patients with one or two high-risk alleles be started on medication or lifestyle modification sooner than patients at lower risk? Should their glucose levels be managed more aggressively? Knowing which prediabetic patients are more likely to develop the disease based on TCF7L2 genotype may help motivate physicians and patients to take more aggressive prophylactic approaches. The 14% of patients homozygous for the high-risk allele rs12255372 [42], the most strongly associated SNP, might be willing to adopt more rigorous lifestyle changes than the majority who are at lower risk of disease. Across all disease areas, pharmacogenetic studies have been plagued by a lack of replication for a variety of reasons but generally related to the study size [45]. Appropriately sized studies on the genetics of response to diabetes medications are now just beginning to emerge, but like most other areas, replication is frequently lacking. There are two broad classes of drug response studies. In many cases, individuals have variations in enzymes responsible for the metabolism, uptake, or other aspect of drug handling. A genetic analysis would inform the proper dosage level for a given drug independent of disease subtype. Alternatively, genetics may predict the subtype of diabetes within the patient population and could determine which treatment, whether lifestyle or drug, would be most efficacious for that particular subtype.
610
REDEFINING DISEASE AND PHARMACEUTICAL TARGETS
The glitazone class of therapeutics act through peroxisome proliferatoractivated receptor (PPAR) gamma, and thus patients with variants in the gene, may respond differently to such drugs. One of the first such studies involved extensive resequencing of the gene in 93 Hispanic women [46]. Novel SNPs were identified that were weakly associated with troglitazone response, but the number of subjects was too small to be convincing. A subsequent study with a much larger population, 3548 individuals at high risk of diabetes, was examined for the P12A and other polymorphisms in PPARγ. This particular amino acid–changing SNP has frequently (though not always) been associated with risk of diabetes. The potential association of P12A with the efficacy of therapeutic intervention (lifestyle, metformin, or troglitazone) on development of diabetes was assessed rather than the risk of diabetes itself. Even though PPARγ is the target of troglitazone, no association with treatment effects was observed [47]. This could be caused by a multitude of possibilities. P12A may have little or no effect on troglitazone action, since it is far from the drug-binding site. The P12A polymorphism may have an early effect on diabetes progression and hence may not affect progression at later times after its early impact. Thus, even knowing that a patient is predisposed to diabetes because of variation in a particular gene may not be useful if the knowledge is not used at the appropriate time in disease progression. In contrast to the null troglitazone effect with PPARγ, metformin appeared to substantially benefit patients with the E23K variation in KCNJ11, a gene known to be associated with diabetes. Those with the E23 variant of KCNJ11 were less susceptible to progression when treated with lifestyle changes or metformin, while those with the K23 variant only benefited from lifestyle changes [48]. Further replication is required before recommendations can be made for clinical practice, but, if replicated, this information would help guide the most appropriate therapy for particular subgroups of patients. Other drugs have also been examined for associations with various candidate genes. These studies are often plagued by small numbers of people and/or varying definitions of drug response. In one study with pioglitazone, the population was relatively small (n = 113) and two different measures of drug response were used, resulting in different conclusions [49]. At least one prospective study has been initiated in which diabetic patients have been selected for antioxidant therapy based on genotype [50]. Prospective studies are the gold standard for proving an effect but are not always feasible because of the high cost. Categorizing patients for disease based on genetic or circulating markers will help choose the best therapeutic options once more data are available. In addition, choice of the most appropriate dose of a drug can be dependent on variation in ADME genes, as has been shown clearly for drugs such as warfarin [51]. Similarly, the dose of a treatment for diabetes can be affected by variation in genes completely unrelated to the underlying diabetic condition. For example, the OCT-1 gene is not thought to be involved in diabetes but still has an apparent effect on treatment. This gene is important in the uptake of metformin into the liver, where it acts on AMPK. When people with normal
RHEUMATOID ARTHRITIS
TABLE 1
611
Patient Genotype as a Potential Guide for Treatment Decisions
TCF7L2
PPARγ
KCNJ11
Diabetes Risk
Lifestyle Modification
Metformin
Percent of Population
aa aa aa aa AA/Aa AA/Aa AA/Aa AA/Aa
PP PP PA/AA PA/AA PP PP PA/AA PA/AA
KK EK/EE KK EK/EE KK EK/EE KK EK/EE
Very high High High Moderate High Moderate Moderate Lower
Aggressive Aggressive Aggressive Normal Aggressive Normal Normal Normal
Titrate Normal Titrate Normal Titrate Normal Titrate Optional
1.9 10.8 0.3 1.9 10.8 61.4 1.9 10.8
or variant OCT-1 genes were subjected to an oral insulin glucose tolerance test in the presence or absence of metformin treatment, those with normal OCT-1 were found to clear glucose much more effectively and maintain lower insulin levels [52]. Although this study and studies discussed above categorize patients as responders and nonresponders, it may be better to refine the analysis and choose the drug dose based on genetics, as is done with warfarin. As long as safety issues are not in question, patients with a nonresponder genotype may simply require a higher dose of medication, a possibility that could be tested in clinical trials. Thus, genes not directly involved in diabetes but affecting drug action can still be important to understand. A hypothetical example of how a patient’s genotype could be used in guiding treatment decisions is shown in Table 1. For simplicity, the minor allele frequency for each SNP is set at 15%, which is approximately what is observed in a prediabetic population for each of these SNPs. For TCF7L2 and KCNJ11, the minor allele is high risk, whereas the minor allele is low risk for PPARγ. With just these three genotypes, the prediabetic population generally considered to be at a similar high risk can be segregated into groups containing over 10% of the population actually at low risk, 65% at moderate risk, 23% at high risk, and 2% at very high risk. Even within these categories, differential treatment paradigms may be warranted based on individual genotypes. As more information accumulates relating to circulating biomarkers and additional genetic markers, these decision trees can be made much more powerful and personalized.
RHEUMATOID ARTHRITIS Rheumatoid arthritis (RA) is also a complex disease phenotype defined by clinical criteria [53] and with multiple genetic and environmental factors contributing to its pathogenesis, resulting in highly variable degrees of severity
612
REDEFINING DISEASE AND PHARMACEUTICAL TARGETS
and responsiveness to therapy. Selection of an initial therapeutic regimen for RA patients is currently based on disease duration and disease severity, particularly with respect to progression or status of joint damage assessed radiographically. Most studies suggest that early aggressive treatment with disease-modifying antirheumatic drugs (DMARDs) helps to delay or prevent joint damage and leads to better long-term outcomes. All DMARDs have significant safety issues, and the cost of treatment with biological DMARDs is high, about $10,000 per year. In the United States the most commonly used DMARDs are methotrexate (MTX), sulfasalazine, hydroxychloroquine, and the biologicals targeting TNFα. Newly diagnosed patients are generally started on a DMARD, and future changes in therapeutic regimen are based on clinical response. These include addition of combinations of DMARDs as well as changes in DMARD and use of corticosteroids. While assessment of clinical status often involves the use of biomarkers (diagnostics) as well as clinical parameters, these therapeutic choices, other than those dictated by specific safety concerns, are currently empirical and are not based on prospective genetic or biomarker factors. Approximately 46% of RA patients achieve an ACR20 response with lowdose MTX (the most common initial DMARD) [54]. TNFα-targeted therapies are more successful at reducing or even halting the progression of joint damage, but again about 29 to 54% of patients do not achieve a satisfactory clinical (ACR20) response [54]. In addition to these agents, other biologicals are available or in development that have other specific targets. Recombinant human IL-1 receptor antagonist (anakinra) competes with IL-1 for stimulation of IL-1 receptors and has moderate efficacy in RA [54]. A recently approved biologic, abatacept (a CTLA-4+ modified immunoglobulin Fc region fusion protein), targets T-lymphocyte activation by blocking co-stimulation of T-lymphocyte CD28 by antigen presenting cell CD80/86 [55]. Rituximab, an anti-CD20 monoclonal antibody previously used for treating B-cell lymphomas, has also been shown to be effective in RA [56,57]. Other biologicals targeting different cytokine pathways, such as tocilizumab, an anti-IL-6 receptor monoclonal antibody [58,59], have also reported efficacy in RA and may become available in the future. Thus, there are multiple treatment possibilities for RA patients with distinctly different molecular targets and mechanisms of action, but currently, biomarkers are not used to select those more likely to have superior efficacy in individual patients. The heterogeneity of RA is not only apparent from the unpredictable clinical response to approved and experimental treatments; it is also confirmed by studies of RA synovial tissue histology and patterns of gene expression within inflamed joints of RA patients [60]. The Online Mendelian Inheritance in Man (OMIM) database listing for RA (http://www.ncbi.nlm.nih.gov/entrez/ dispomim.cgi?id=180300) includes references to at least 19 specific genes with significant associations with RA susceptibility, disease severity, or response to therapy. Whole genome-wide scans using single-nucleotide polymorphism (SNP) maps have become very cost-effective, enabling the rapid confirmation
RHEUMATOID ARTHRITIS
613
of genetic associations with RA [61]. In addition, RNA peripheral blood microarray-based RA studies (transcriptomic studies) are beginning to appear in the literature, and preliminary data from these suggests that patterns of gene expression may predict disease severity and response to specific therapies [62,63]. In this section we describe how biomarker, genomic, and transcriptomic data may be used in the future to help improve clinical outcomes for RA patients. Some genetic associations with disease are really just associations with markers that tell us that a region of a particular chromosome is associated with RA or some feature of RA. However, with a greater understanding of gene function and the ability to perform whole genome-wide scans, a useful way to classify SNP associations with RA is to infer from this information whether it is likely that the genetic differences will have functional significance and influence T-lymphocyte activation, macrophage function, specific cytokine and inflammatory signaling pathways, and/or generalized inflammatory responses secondary to downstream dysregulation of these pathways. This information, together with an ever-increasing number of targeted therapeutic agents and greater understanding of biomarkers predicting response to older drugs such as MTX, may lead to a more rational basis for treating individual patients or RA subpopulations. In this section we describe a number of genetic and biomarker associations that suggest possible strategies to design such individualized therapeutic regimens for RA patients. This is not intended to be a complete list of all such reported associations, but rather, a selection to illustrate how a path to personalized medicine in RA may be investigated further. As such, we are proposing these hypotheses partly to accelerate this type of clinical research and hopefully to improve outcomes and lower the cost of treatment for RA. We have not found any evidence in the literature that biomarkers like these have actually been used prospectively and systematically to test personalized treatment hypotheses in RA. In Table 2 we classify a number of known genetic and biomarker associations and speculate as to how this information may lead to therapeutic decisions. Using this information, it is also possible to create hypotheses that are easily testable using samples from randomized controlled clinical trials to achieve prospective scientific confirmation. They also illustrate how the practice of personalized medicine in RA may evolve, and its potential benefits. Hypothesis A If tocilizumab is approved as a DMARD for RA patients it will likely be on the basis of results similar to those in published phase II and phase III trials. ACR20 response scores at 16 and 22 weeks in two different trials were 63% alone and 74% with MTX [58] and 59% with MTX [59], respectively. Thus, roughly one-third of patients did not achieve an ACR20 clinical response. Yet this compound had significant safety issues, including higher risk for infection, and if it becomes an approved drug, it will probably have a high cost of treatment, similar to other biologics. As described in Table
614 Anti-inflammatory cytokine.
Macrophage activation.
IL-10 gene [72]
IL-1β and IL-8 mRNA in blood monocytes [78]
HLA-G Lack of 14-bp polymorphism [71]
Myeloid cell (macrophage) activation and joint inflammation
Cytokine and severity of disease
Soluable HLA-G is antiinflammatory, inhibits NK cell activity, and is increased by IL-10 [71].
DAS response to TNF targeted biologic best in G/G (81%) vs. 42% in A/A and A/G [67].
TNFα response to stress or inflammatory stimuli: A/A largest TNF response, G/G lowest response [67].
TNFα gene promoter: G to A SNP at position -308 [68]
Cytokine and severity of disease
High levels may indicate better treatment response to anakinra, MTX.
Significant association between favorable response to MTX and lack of the 14-bp polymorphism of HLA-G, odds ratio 2.46 for methotrexate responsiveness. Methotrexate, a folate antagonist used for the treatment of rheumatoid arthritis, induced the production of soluble HLA-G molecules by increasing IL-10 [71].
IL-10 promoter genotypes or IL-10 haplotypes may correlate with response to IL-10 treatment.
Higher dose of TNF targeted agents for A/A and A/G may be needed.
Possible Treatment Implications
Increased activated macrophages in synovium.
-2849A/G SNP G allelle associated with higher progression rate and more joint damage; promoter SNPs -1082A, -819T, and -592A define a low IL-10 producer haplotype [73].
Clinical Correlation
Function / Role
Biomarker / Gene
Rheumatoid Arthritis Genetic and Biomarker Associations
RA Mechanism
TABLE 2
615
Levels of PTPN22 expression and/or presence of the 1858T SNP may predict good response to T-cell targeted therapy such as abatacept or cyclosporine A. SNP R620W minor allele 1858T associated with RA, type 1 diabetes, SLE, and autoimmune thyroiditis is present in approximately 28% of white patients with RA [69].
PTPN22 down-regulates T-cell activation mediated by TCR and CD28 co-stimulation [69].
PTPN22 (is a lymphoid-specific intracellular phosphatase)
TCR response
Gold inhibits myeloid differentiation (R8) and may therefore reduce PAD; estrogen or estrogen receptor antagonists modulate PADI4 activity.
A functional haplotype of PADI4 is associated with susceptibility to rheumatoid arthritis [74] and was associated with levels of antibody to citrullinated peptide.
Posttranslational modification enzyme that converts arginine residues to citrulline [74].
PADI4 haplotype (peptidylarginine deiminase)
Auto-antigen generation
Response to MTX + etanercept best in RA patients with two copies of SE [77]; abatacept blocks T-cell co-stimulation through CD80/CD86.
May be associated with initial T-cell-driven response to disease initiation factors or auto-antigens; CD80 and CD86 candidate genes also linked to this locus.
SE predisposes to sero-positive RA and more severe disease, including extraarticular manifestations.
HLA-DRB1-betachain shared epitope (SE) [77]
Clinical benefit of IL-6 targeted therapies (R4) may be greater in carriers of G allele and may also benefit more from B-cell targeted therapies.
Associated with B-cell neoplasms, Kaposi sarcoma [75,76], and systemic JRA [64] rituxamab.
C/C has low IL-6 response to IL1β, G/G and G/C higher production of IL-6.
IL-6 promoter: G/C polymorphism at position -174 [64]
Possible Treatment Implications
Clinical Correlation
Function / Role
Biomarker / Gene
TCR response and antigen presentation
RA Mechanism
616
REDEFINING DISEASE AND PHARMACEUTICAL TARGETS
2, there is a promoter SNP at position 174 of the IL-6 gene that significantly influences the amount of IL-6 produced in response to IL-1 and other inflammatory stimuli [64]. The gene frequency of the C allele is about 0.4 in healthy subjects. In vitro C/C cell constructs do not increase IL6 production in response to IL-1 stimulation compared to a 3.6-fold increase for G/G cell constructs [63]. Thus, it is likely that this SNP has functional significance. It is reasonable to hypothesize that the therapeutic benefit of tocilizumab will be different in populations that do not increase IL-6 signaling as much as a population with a robust increase in IL-6 during disease flares. If 30 to 40% of RA patients do not achieve a good clinical response with tocilizumab, could these be patients whose disease is not so dependent on this pathway (e.g., C/C genotype) or patients who produce such large amounts of IL-6 (e.g., G/G genotype) that higher doses of tocilizumab would have been needed? These hypotheses are easily tested in the clinic and could lead to a rational personalized medicine treatment regimen that would be more cost-effective and have a better efficacy and safety profile. Hypothesis B TNFα-targeted agents are very effective DMARDs in RA. Yet, on average, 40% of patients do not achieve an ACR20 response. Again these agents cost $8000 to $10,000 per year and have significant safety issues [54]. As noted in Table 2, there is a -308 promoter G to A SNP in the TNFα gene with probable functional significance since it is associated with outcomes in several infectious diseases and different clinical outcomes in septic shock [65]. The allele frequencies are reported to be 0.77 for allele G and 0.23 for allele A in a Swedish study [66]. In one published study on RA where the clinical response to infliximab was compared between -308 G/A genotypes, a disease activity score (DAS28) improvement of 1.2 occurred in 81% of G/G patients and in only 42% in A/A and A/G patients. The clinical improvement based on the DAS28 score was about twice as good in the G/G patients as in the A/A and A/G patients [67]. TNFα promoter SNPs, including the -308 SNP, are also associated with clinical outcomes in RA [68]. If these findings were replicated, how could that lead to a personalized medicine approach that improved outcomes and reduced the overall cost of therapy in a population of RA patients? Using the gene frequency and response information above, 84% of responders would be G/G, 4% would be A/A, and 12% would be A/G. Clearly, A/A and A/G patients would be far better off trying a different type of DMARD first, or perhaps they require a different dose or dose regimen. These clinical differences in response to a TNFα blocking agent could also occur if the amount of TNFα produced during disease flares is much greater in patients with the A allele and the blood level of their TNFα blockers is not high enough to neutralize TNFα activity at these times. In other words, it is possible that the dose of TNFα-targeted agents that is required for a durable clinical response is really different in these populations. Since the dose and frequency of dosing in the label for these agents is based on the average response in groups given different doses and dosing
RHEUMATOID ARTHRITIS
617
frequencies, it is quite possible that response rates could be improved using higher doses in patients with the A allele. Patients with the G/G genotype may actually do well with lower doses. If this hypothesis is proven correct, there is an opportunity for improved patient outcomes by using higher doses in a small number of A/A and A/G patients and cost-saving with improved safety by using lower doses in a much larger number of G/G patients. Clearly, both of these personalized medicine approaches to the use of TNFα-targeted agents can be tested prospectively and could greatly influence patient outcomes. Hypothesis C Abatacept was recently approved as a DMARD in RA for patients who do not respond adequately to an earlier DMARD. It blocks T-lymphocyte co-stimulation needed to fully activate a T-lymphocyte-driven immune response through interaction between CD28 and CD80/86 (B7-1 and B7-2) mimicking natural CTLA4-mediated down-regulation of immune responses. In controlled trials with MTX background therapy about 60% of patients achieved an ACR20 response compared to about 30% on MTX alone [55]. Again, because of high cost and significant safety risks, one asks whether there is a testable personalized medicine hypothesis that could improve the probability of response. PTPN22 is a lymphoid-specific phosphatase that down-regulates T-cell activation mediated by TCR and CD28 co-stimulation. This gene has a very strong association with RA and in particular there is a SNP associated with RA and other autoimmune diseases, such as type 1 diabetes [69]. This SNP, a 1858C-T transition, results in an arg620-to-trp amino acid change that alters the protein’s function as a negative regulator of T-cell activation. This allele is present in approximately 17% of white people from the general population and in approximately 28% of white people with RA. Other variants of the PTPN22 gene are also probably associated with RA [70]. The relationship between response to abatacept and PTPN22 genotype has not been investigated, but it is likely that the T-lymphocyte co-stimulation pathway is more active in RA patients with a variant of the PTPN22 gene that results in reduced phosphatase function, such as the 1858T SNP. Should these patients receive a drug such as abatacept as first-line treatment for RA instead of waiting to fail another DMARD? Perhaps an alternative biomarker could be a measure of PTPN22 phosphatase activity or gene expression in lymphoid cells. Regardless of the biomarker used, this clinical question can be answered easily in appropriately designed clinical trials. Hypothesis D MTX is commonly used as the first DMARD treatment and it is used increasingly in combination with biologics. Often, the period of time to assess the therapeutic benefit of MTX is prolonged as doses are increased and other drugs are added. For those patients who do not achieve a satisfactory response to MTX, this practice often results in MTX dose escalation into ranges more likely to cause liver damage, pulmonary toxicity, or bone marrow suppression in addition to allowing further progression of disease and joint
618
REDEFINING DISEASE AND PHARMACEUTICAL TARGETS
damage. Because of its low cost and acceptable safety profile, a strategy that enriched the population of patients started on MTX with those more likely to respond and provides alternative first treatments for patients less likely to respond to MTX could lead to significant improvements in overall outcomes and treatment costs. Several alternatively spliced HLA-G mRNA isoforms have been described, including a 14-bp polymorphism of the HLA-G gene with the 14-bp sequence deleted, and a significant association has been reported between a favorable response to MTX and a lack of the 14-bp polymorphism of the HLA-G gene, with an odds ratio of 2.46 for MTX responsiveness [71]. This finding, if confirmed, may enable an enrichment strategy for RA patients more likely to respond to MTX. Interestingly, in vitro MTX also induces the production of soluble HLA-G molecules by increasing IL-10. Promoter polymorphisms of the IL-10 gene have also been reported to have functional significance and associations with RA [72]. Using samples collected from clinical trials with MTX treatment arms, it would be possible to test the hypothesis that good prospective MTX responders can be identified by evaluating these two biomarkers.
CONCLUSIONS The still experimental practice of personalized medicine in cancer patients described here illustrates all of the necessary components to develop effective personalized medicine treatment strategies that are systematic in nature: diagnostics, targeted agents with well-understood mechanisms of action, an understanding of the molecular pathways important in disease progression, and ways of rapidly assessing clinical success. This has especially been enabled by the power of new genomic, biomarker, and informatics technologies. These technologies have also been applied to other chronic disease states where the potential for personalized medicine also exists. Diabetes and rheumatoid arthritis in many ways are like cancer, with genetic and environmental factors contributing to a very heterogeneous spectrum of disease. As the understanding of what drives these chronic disease phenotypes improves and more homogeneous subpopulations can be identified, treatment regimens will become more personalized. This trend will lead to safer and more efficacious treatments earlier and reduce the burden of disease to individuals and society.
REFERENCES 1. Jemal A, Siegel R, Ward E, Murray T, Xu J, Thun MJ (2007). Cancer statistics, 2007. CA: Cancer J Clin, 57(1):43–66. 2. Vessella RL, Pantel K, Mohla S (2007). Tumor cell dormancy: an NCI workshop report. Cancer Biol Ther, 6(9):1496–1504.
REFERENCES
619
3. Maron BJ, Hauser RG (2007). Perspectives on the failure of pharmaceutical and medical device industries to fully protect public health interests. Am J Cardiol, 100(1):147–151. 4. Goodsaid F, Frueh FW (2007). Implementing the U.S. FDA guidance on pharmacogenomic data submissions. Environ Mol Mutagen, 48(5):354–358. 5. Jain KK (2006). Challenges of drug discovery for personalized medicine. Curr Opin Mol Ther, 8(6):487–492. 6. DiMasi JA, Grabowski HG (2007). Economics of new oncology drug development. J Clin Oncol, 25(2):209–216. 7. O’Donovan N, Crown J (2007). EGFR and HER-2 antagonists in breast cancer. Anticancer Res, 27(3A):1285–1294. 8. Vogel CL, Cobleigh MA, Tripathy D, et al. (2002). Efficacy and safety of trastuzumab as a single agent in first-line treatment of HER2-overexpressing metastatic breast cancer. J Clin Oncol, 20(3):719–726. 9. Huang S (2004). Back to the biology in systems biology: what can we learn from biomolecular networks? Brief Funct Genom Proteom, 2(4):279–297. 10. Wang E, Lenferink A, O’Connor-McCourt M (2007). Cancer systems biology: exploring cancer-associated genes on cellular networks. Cell Mol Life Sci, 64(14):1752–1762. 11. Aranda-Anzaldo A (2001). Cancer development and progression: a non-adaptive process driven by genetic drift. Acta Biotheor, 49(2):89–108. 12. Hanahan D, Weinberg RA (2000). The hallmarks of cancer. Cell, 57–70. 13. Webb CP, Vande Woude GF (2000). Genes that regulate metastasis and angiogenesis. J Neurooncol, 50(1–2):71–87. 14. Balakrishnan A, Bleeker FE, Lamba S, et al. (2007). Novel somatic and germline mutations in cancer candidate genes in glioblastoma, melanoma, and pancreatic carcinoma. Cancer Res, 67(8):3545–3550. 15. Sjoblom T, Jones S, Wood LD, et al. (2006). The consensus coding sequences of human breast and colorectal cancers. Science, 314(5797):268–274. 16. Weinstein IB, Joe AK (2006). Mechanisms of disease: Oncogene addiction: a rationale for molecular targeting in cancer therapy. Nat Clin Pract, 3(8):448–457. 17. Haney SA (2007). Increasing the robustness and validity of RNAi screens. Pharmacogenomics, 8(8):1037–1049. 18. Michor F, Nowak MA, Iwasa Y (2006). Evolution of resistance to cancer therapy. Curr Pharm Des, 12(3):261–271. 19. Hochhaus A, Erben P, Ernst T, Mueller MC (2007). Resistance to targeted therapy in chronic myelogenous leukemia. Semin Hematol, 44(1 Suppl 1):S15–S24. 20. Bublil EM, Yarden Y (2007). The EGF receptor family: spearheading a merger of signaling and therapeutics. Curr Opin Cell Biol, 19(2):124–134. 21. Kaklamani VG, Gradishar WJ (2006). Gene expression in breast cancer. Curr Treat Options Oncol, 7(2):123–128. 22. Webb CP, Pass HI (2004). Translation research: from accurate diagnosis to appropriate treatment. J Transl Med 2(1):35. 23. Rhodes DR, Kalyana-Sundaram S, Tomlins SA, et al. (2007). Molecular concepts analysis links tumors, pathways, mechanisms, and drugs. Neoplasia, 9(5):443–454.
620
REDEFINING DISEASE AND PHARMACEUTICAL TARGETS
24. Miller LD, Liu ET (2007). Expression genomics in breast cancer research: microarrays at the crossroads of biology and medicine. Breast Cancer Res, 9(2):206. 25. Lamb J, Crawford ED, Peck D, et al. (2006). The Connectivity Map: using gene expression signatures to connect small molecules, genes, and disease. Science, 313(5795):1929–1935. 26. Lee JK, Havaleshko DM, Cho H, et al. (2007). A strategy for predicting the chemosensitivity of human cancers and its application to drug discovery. Proc Nat Acad Sci USA, 104(32):13086–13091. 27. Potti A, Dressman HK, Bild A, et al. (2006). Genomic signatures to guide the use of chemotherapeutics. Nat Med, 12(11):1294–1300. 28. Shtiegman K, Kochupurakkal BS, Zwang Y, et al. (2007). Defective ubiquitinylation of EGFR mutants of lung cancer confers prolonged signaling. Oncogene, 26(49):6968–6978. 29. Bild AH, Yao G, Chang JT, et al. (2006). Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature, 439(7074):353–357. 30. Schulenburg A, Ulrich-Pur H, Thurnher D, et al. (2006). Neoplastic stem cells: a novel therapeutic target in clinical oncology. Cancer, 107(10):2512–2520. 31. Van Doom M, Vogels J, Tas A, et al. (2006). Evaluation of metabolite profiles as biomarkers for the pharmacological effects of thiazolidinediones in type 2 diabetes mellitus patients and healthy volunteers. Br J Clin Pharmacol, 63:562–574. 32. Griffin JL, Nichols AW (2006). Metabolomics as a functional genomic tool for understanding lipid dysfunction in diabetes, obesity and related disorders. Pharmacogenomics, 7:1095–1107. 33. Susztak K, Bottinger EP (2006). Diabetic nephropathy: a frontier for personalized medicine. J Am Soc Nephrol, 17:361–367. 34. Willer CJ, Bonnycastle LL, Conneely KN, et al. (2007). Screening of 134 single nucleotide polymorphisms (SNPs) previously associated with type 2 diabetes replicates association with 12 SNPs in nine genes. Diabetes, 56:256–264. 35. Sale MM, Rich SS (2007). Genetic contributions to type 2 diabetes: recent insights. Expert Rev Mol Diagn, 7:207–217. 36. Wellcome Trust Case Control Consortium (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature, 447:661–678. 37. Hakonarson H, Grant SFA, Bradfield JP, et al. (2007). A genome-wide association study identifies KIAA0350 as a type 1 diabetes gene. Nature, 448:591–594. 38. Sladek R, Rocheleau G, Rung J, et al. (2007). A genome-wide association study identifies novel risk loci for type 2 diabetes. Science, 445:881–885. 39. Saxena R, Voight BF, Lyssenko V, et al. (Diabetes Genetics Initiative) (2007). Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science, 316:1331–1336. 40. Scott LJ, Mohlke KL, Bonnycastle LL, et al. (2007). A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science, 316:1341–1345. 41. Steinthorsdottir V, Thorleifsson G, Reynisdottir I, et al. (2007). A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nat Genet, 39:770–775.
REFERENCES
621
42. Grant SFA, Thorleifsson G, Reynisdottir I, et al. (2006). Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat Genet, 38:320–323. 43. Yi F, Brubaker PL, Jinet T (2005). TCF-4 mediates cell type-specific regulation of proglucagon gene expression by catenin and glycogen synthase kinase-3. J Biol Chem, 280:1457–1464. 44. Pearson EW, Donnelly LA, Kimber C, et al. (2007). Variation in TCF7L2 influences therapeutic response to sulfonylureas. Diabetes, 56:2178–2182. 45. Loannidis JPA, Trikalinos TA, Ntzani EE, Contopoulos-Ioannidis DG (2003). Genetic associations in large versus small studies: an empirical assessment. Lancet, 361:567–571. 46. Wolford JK, Yeatts KA, Dhanjal SK, et al. (2005). Sequence variation in PPARg may underlie differential response to troglitazone. Diabetes, 54:3319–3325. 47. Florez JC, Jablonski KA, Sun MW, et al. (2007). Effects of the type 2 diabetes associated PPARg P12A polymorphism on progression to diabetes and response to troglitazone. J Clin Endocrinol Metal, 92:1502–1509. 48. Florez JC, Jablonski KA, Kahn SE, et al. (2007). Type 2 diabetes–associated missense polymorphisms KCNJ11 E23K and ABCC8 A1369S influence progression to diabetes and response to interventions in the Diabetes Prevention Program. Diabetes, 56:531–536. 49. Wang G, Wang X, Zhang Q, Ma Z (2007). Response to pioglitazone treatment is associated with the lipoprotein lipase S447X variant in subjects with type 2 diabetes mellitus. Int J Clin Pract, 61:552–557. 50. Levy AP (2006). Application of pharmacogenomics in the prevention of diabetic cardiovascular disease: mechanistic basis and the clinical evidence for utilization of the haptoglobin genotype in determining benefit from antioxidant therapy. Pharm Ther 112:501–512. 51. Yin T, Miyata T (2007). Warfarin dose and the pharmacogenomics of CYP2C9 and VKORC1: rationale and perspectives. Thromb Res, 120:1–10. 52. Shu Y, Sheardown SA, Brown C, et al. (2007). Effect of genetic variation in the organic cation transporter 1 (OCT1) on metformin action. J Clin Invest, 117:1422–1431. 53. Arnett FC, Edworthy SM, Bloch DA, et al. (1988). The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum, 31:315–324. 54. Olsen NJ, Stein CM (2007). New Drugs for Rheumatoid Arthritis. N Engl J Med, 3501:2167–2179. 55. Kremer JM, Dougados M, Emery P, et al. (2005). Treatment of rheumatoid arthritis with the selective costimulation modulator abatacept: twelve-month results of a phase IIb, double-blind, randomized, placebo-controlled trial. Arthritis Rheum, 52:2263–2271. 56. Cohen SB, Emery P, Greenwald MW, et al. (REFLEX Trial Group) (2006). Rituximab for rheumatoid arthritis refractory to anti-tumor necrosis factor therapy: results of a multicenter, randomized, double-blind, placebo controlled, phase III trial evaluating primary efficacy and safety at twenty-four weeks. Arthritis Rheum, 54:2793–2806.
622
REDEFINING DISEASE AND PHARMACEUTICAL TARGETS
57. Emery P, Fleischmann R, Filipowicz-Sosnowska A, et al. (DANCER Study Group) (2006). The efficacy and safety of rituximab in patients with active rheumatoid arthritis despite methotrexate treatment: results of a phase IIB randomized, double-blind, placebo-controlled, dose-ranging trial. Arthritis Rheum, 54:1390–1400. 58. Maini RN, Taylor PC, Szechinski J, et al. (CHARISMA Study Group) (2006). Double-blind randomized controlled clinical trial of the interleukin-6 receptor antagonist, tocilizumab, in European patients with rheumatoid arthritis who had an incomplete response to methotrexate. Arthritis Rheum, 54:2817–2829. 59. Smolen AB, Rubbert-Roth A, Alecock E, Alten R, Woodworth T (2007). Tocilizumab, A novel monoclonal antibody targeting IL-6 signalling, significantly reduces disease activity in patients with rheumatoid arthritis. Ann Rheum Dis, 66(Suppl II):87. 60. Glocker MO, Guthke R, Kekow J, Thiesen, H-J (2006). Rheumatoid arthritis, a complex multifactorial disease: on the way toward individualized medicine. Med Res Rev, 26:63–87. 61. Docherty SJ, Butcher LM, Schalkwyk LC, Plomin R (2007). Applicability of DNA pools on 500 K SNP microarrays for cost-effective initial screens in genomewide association studies. BMC Genom, 8:214. 62. Edwards CJ, Feldman JL, Beech J, et al. (2007). Molecular profile of peripheral blood mononuclear cells from patients with rheumatoid arthritis. Mol Med, 13:40–58. 63. Lindberg J, Klint E, Catrina AI, et al. (2006). Effect of infliximab on mRNA expression profiles in synovial tissue of rheumatoid arthritis patients. Arthritis Res Ther, 8:R179. 64. Fishman D, Faulds G, Jeffery R, et al. (1998). The effect of novel polymorphisms in the interleukin-6 (IL-6) gene on IL-6 transcription and plasma IL-6 levels, and an association with systemic-onset juvenile chronic arthritis. J Clin Invest, 102:1369–1376. 65. Mira J.-P, Cariou A, Grall F, et al. (1999). Association of TNF2, a TNF-alpha promoter polymorphism, with septic shock susceptibility and mortality: a multicenter study. JAMA, 282:561–568. 66. Rosmond R, Chagnon M, Bouchard C, Bjorntorp P (2001). G-308A polymorphism of the tumor necrosis factor alpha gene promoter and salivary cortisol secretion. J Clin Endocrinol Metab, 86:2178–2180. 67. Mugnier B, Balandraud N, Darque A, Roudier C, Roudier J, Reviron D (2003). Polymorphism at position -308 of the tumor necrosis factor alpha gene influences outcome of infliximab therapy in rheumatoid arthritis. Arthritis Rheum, 8:1849–1852. 68. Fonseca JE, Cavaleiro J, Teles J, et al. (2007). Contribution for new genetic markers of rheumatoid arthritis activity and severity: sequencing of the tumor necrosis factor-alpha gene promoter. Arthritis Res Ther, 9:R37. 69. Begovich AB, Carlton VEH, Honigberg LA, et al. (2004). A missense singlenucleotide polymorphism in a gene encoding a protein tyrosine phosphatase (PTPN22) is associated with rheumatoid arthritis. Am J Hum Genet, 75: 330–337.
REFERENCES
623
70. Carlton VEH, Hu X, Chokkalingam AP, et al. (2005). PTPN22 genetic variation: evidence for multiple variants associated with rheumatoid arthritis. Am J Hum Genet, 77:567–581. 71. Rizzo R, Rubini M, Govoni M, et al. (2006). HLA-G 14-bp polymorphism regulates the methotrexate response in rheumatoid arthritis. Pharmacogenet Genom, 16:615–623. 72. Lard LR, van Gaalen FA, Schonkeren JJM, et al. (2003). Association of the -2849 interleukin-10 promoter polymorphism with autoantibody production and joint destruction in rheumatoid arthritis. Arthritis Rheum, 48:1841–1848. 73. Summers AM, Summers CW, Drucker DB, Barson A, Hajeer AH, Hutchinson IV (2000). Association of IL-10 genotype with sudden infant death syndrome. Hum Immunol, 61:1270–1273. 74. Suzuki A, Yamada R, Chang X, et al. (2003). Functional haplotypes of PADI4, encoding citrullinating enzyme peptidylarginine deiminase 4, are associated with rheumatoid arthritis. Nat Genet, 34:395–402. 75. Kawano M, Hirano T, Matsuda T, et al. (1988). Autocrine generation and requirement of BSF-2/IL-6 for human multiple myelomas. Nature, 332:83–85. 76. Foster CB, Lehrnbecher T, Samuels S, et al. (2000). An IL6 promoter polymorphism is associated with a lifetime risk of development of Kaposi sarcoma in men infected with human immunodeficiency virus. Blood, 96:2562–2567. 77. Criswell LA, Lum RF, Turner KN, et al. (2004). The influence of genetic variation in the HLA-DRB1 and LTA-TNF regions on the response to treatment of early rheumatoid arthritis with methotrexate or etanercept. Arthritis Rheum, 50:2750– 2756. 78. Schulze-Koops MD, Davis LS, Kavanaugh AF, Lipsky PE (2005). Elevated cytokine messenger RNA levels in the peripheral blood of patients with rheumatoid arthritis suggest different degrees of myeloid activation. Arthritis Rheum, 40:639–647.
35 ETHICS OF BIOMARKERS: THE BORDERS OF INVESTIGATIVE RESEARCH, INFORMED CONSENT, AND PATIENT PROTECTION Heather Walmsley, M.A. Lancaster University, Bailrigg, UK
Michael Burgess, Ph.D., Jacquelyn Brinkman, M.Sc., Richard Hegele, M.D., Ph.D., Janet Wilson-McManus, M.T., B.Sc., and Bruce McManus, M.D., Ph.D. University of British Columbia, Vancouver, British Columbia, Canada
INTRODUCTION In 2000, the Icelandic Parliament (Althingi) authorized an Iceland-based subsidiary of deCODE Genetics to construct a biobank of genetic samples from the Icelandic population [1–4]. The Althingi also granted deCODE (which had a five-year commercial agreement with the Swiss pharmaceutical company Roche Holdings) a 12-year exclusive commercial license to use the country’s medical records, in return for an annual 70 million kronur (approximately 1 million USD in 2007). These records were to be gathered, together with lifestyle and extensive genealogical data, into the Icelandic Health Sector Database. The resulting public outcry and academic critique has been well documented [3,5,6]. Several hundred articles appeared in newspapers [7],
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
625
626
ETHICS OF BIOMARKERS
many of them referring to the sale of the “genetic heritage” of the nation (see http://www.mannvernd.is/english/home.html for a list of media articles). A grass-roots lobby group, Mannvernd, emerged to fight the project, complaining principally about the use of “presumed consent” and the commercial aspects of the agreement [4]. Despite these critiques, Iceland was one of the first countries to discuss how to structure a biobank at the political level [8]. When a population geneticist from Stanford University announced plans for a human genome diversity project, he received a similar reception. This project aimed to challenge the ethnocentrism of the Human Genome Project by studying 722 diverse “anthropologically unique” human populations [9]. Indigenous activists were unconvinced. Debra Harry worried that “these new ‘scientific findings’ concerning our origins can be used to challenge aboriginal rights to territory, resources, and self-determination” [10]. The Canada-based Rural Advancement Foundation International (RAFI), now the ETC Group (Action Group on Erosion, Technology and Concentration), characterized the list of 722 as a list of peoples who had suffered most at the hands of Western “progress” and campaigned against this “bio-colonial Vampire Project.” The project has since stimulated productive dialogue about the importance of race and ethnicity to health and genetic research. In June 2007, UK Biobank opened its doors to donors in Glasgow, the fourth of about 35 planned donation points [11]. The project hopes to recruit a total of 500,000 volunteers aged between 40 and 69. This biobank is a prospective cohort study hoping to contribute to disease risk prediction through the identification of biomarkers [12]. The UK biobank has recognized the need to build public trust and knowledge. This has led to public engagement, although some critics suggest that public acceptance of this project has been carefully cultivated, with varying success, in a context of controversy and distrust [13,14]. The UK is no stranger to human tissue scandals. In 2001 it became known that the organs of deceased children were routinely kept for research purposes at Alder Hey Hospital in Liverpool and Bristol Royal Infirmary without their parents’ knowledge or consent [15]. Public outrage led to a near moratorium on tissue banking and research. An expensive system of accreditation of specimen collections by the newly formed Human Tissues Authority eventually followed [16]. These three examples illustrate the increasingly visible role of large biobanking projects within biomedical research. They publicly announce the complexity of international collaborations, commercial involvement, and public–private partnerships that have become the norm in biomedical research. They also reveal major public concerns with the social and ethical implications of these projects: for privacy, indigenous identity and self-determination, ownership and control over body parts, and medical data for individuals and their families. Traditionally, the interests of patient protection and investigative research have been served jointly by Research Ethics Boards and the guiding principles of biomedical ethics: respect for autonomy, beneficence, nonmaleficence, and
BIOMARKERS, ETHICS, AND INVESTIGATIVE RESEARCH
627
justice. These have been enacted through the process of obtaining informed consent, alongside measures to protect privacy and confidentiality of research participants and guard against discrimination. They have ensured, to a reasonable degree, the ethical enactment, legitimacy and public acceptance of research projects. Today, however, the demands of biomedical research, of the informed consent process and of patient protection, especially privacy, are beginning to jostle against each other uncomfortably. They are engaged in an increasingly public struggle and there appears to be ever-decreasing space in which to maneuver. If biomarker research is to proceed without unnecessary constraint toward improving patient care in a manner that individuals and society at large deem ethical, radical intervention is needed. This chapter begins by outlining the diversity of social and ethical issues surrounding biomarker-related research and its applications. Focusing in on the ever more central process of banking of human biological materials and data, it then traces a recent trend toward large-scale population biobanks. Advances in genomics and computational biology have brought a whole raft of new questions and concerns to the domain of biomedical ethics. The peculiarities of these large biobanks, in the context of divergent legislative frameworks and increasing demands for international networking and collaboration, make such challenges ever starker. Privacy advocates argue that studies using DNA can never promise anonymity to their donors [17,18]. Prospective collections of human DNA and tissues seem doomed either to fail the demands of fully informed consent, or face the crippling financial and administrative burden of seeking repeated consent. Population biobanks are increasingly conceived as national resources [19]. Indigenous and wider publics are now vocal in their concerns about ownership, commercialization, privacy: essentially, about who uses their DNA, and how. We do not set out here to design new governance frameworks for biobanking, or suggest the best ethical protocols for biomarker research, although these are sorely needed. The aim of this chapter is to suggest legitimate processes for so doing. In our search we veer outside the realm of ethics as traditionally conceived, into the domain of political science. New theories of deliberative democracy facilitate public participation in policy decision making; they aim for deliberation and communicative actions rather than strategic action; they have much to offer. Our conclusion is that ethics must embrace politics. Those involved in biomarker-related research are essential—as informers and participants in democratic public deliberation.
BIOMARKERS, ETHICS, AND INVESTIGATIVE RESEARCH What are the ethics of biomarkers? The application of biomarkers to assess the risks of disease, adverse effects of drugs, and organ rejection and for the development of targeted drugs and treatments is essential. Yet the search for biomarkers of exposure, effect and susceptibility to disease, toxic chemicals,
628
ETHICS OF BIOMARKERS
or pharmaceutical drugs raises many diverse ethical questions. Some of the most common debates surround the impact of developing predictive genetic tests as biomarkers for disease and their use in pharmacogenomics. Neuroimaging, for example, promises much for the identification of biomarkers of diseases such as Alzheimer disease, offering earlier prediction capability than is currently available. But this technology may have unintended social or ethical consequences [20]. It could lead to reduced autonomy for patients at an earlier age if they are not allowed to work or drive. New tests may not be distributed equitably if certain health insurance plans refuse to include the test. Physicians may not be adequately prepared to counsel patients and interpret biomarker test results. Most important, the value of early prediction is questionable for a disease that as yet has no effective treatment. Ethical concerns surrounding the use of biomonitoring in the workplace or by insurers have also been voiced within the health sciences literature and the wider media [21–23]. Biomarkers offer some hope for monitoring exposure to toxic chemicals in the workplace and protecting the health of employees. In an environment of high exposure to carcinogens, for example, a test could be developed to identify persons with an increased genetic risk of developing cancer from a specific dose, who could, for example, be excluded from the workplace. This would probably reduce the number of workers at risk of developing cancer. There are, however, concerns about discrimination, as well as the reliability of such tests for measuring risk [22]. Is it right for an employer to exclude people from an occupation or workplace on genetic grounds rather than reducing carcinogen exposure for all employees? Some high-risk individuals could spend a lifetime working under high exposure to carcinogens and never develop cancer, whereas some low-risk co-workers might. There are also fears that insurance companies could use biomonitoring methods to exclude people from insurance opportunities on the basis of genetic risk* [21,24,25]. Confidentiality, interpretation of biomarker data, and the problem of obtaining genuinely informed consent emerge as the key ethical tension zones identified by occupational health stakeholders involved in one research project in Quebec [21]. The promise of pharmacogenomics and the ethical issues it raises have also been the subject of lengthy debate. The Human Genome Organization (HUGO) Ethics Committee released a statement in 2007 recognizing that “pharmacogenomics has the potential to maximize therapeutic outcomes and
*In the UK, such fears were voiced by a coalition of 46 organizations in a Joint Statement of Concern presented to a House of Commons Cross Party Group on February 14, 2006. The issue has also been the subject of much debate and policy analysis in the United States, given its system of private health insurance. The Genetic Information Nondiscrimination Act was passed in the U.S. House of Representatives on April 25, 2007. See U.S. National Institutes of Health fact sheet at http://www.genome.gov/page.cfm?pageID=10002328.
POPULATION BIOBANKS AND THE CHALLENGE OF HARMONIZATION
629
minimize adverse reactions to therapy, and that it is consistent with the traditional goals of public health and medical care to relieve human suffering and save lives” but noting many ethical concerns. These include the implications for developing countries and for those seeking access to therapy for neglected diseases, the impact on health care costs and on research priorities, and the fear that pharmacogenomics could reinforce genetic determinism and lead to stigmatization of individuals and groups [26]. Perhaps the widest range of social and ethical issues emerging from biomarker research, however, surround the process of collection, storage and use of human biological samples, and associated data for research purposes: to identify new biomarkers of exposure, effect, and susceptibility to disease and pharmacogenomic products. Many genetic and epidemiological studies require access to samples of annotated human blood, tissue, urine, or DNA and associated medical and lifestyle data. Often, they need large numbers of samples, and repeated sampling over many months or years. Often, the outcomes of the research are uncertain, technological advances in research methodologies are unpredictable, and neither can be anticipated. This discussion focuses on ethical issues relating to the biobanking process. The development of large-scale population databases has rendered the ethics of this technology complex, controversial, and publicly visible. Debates about biobanking also reveal the increasing inadequacy of the old ethics guidelines, frameworks, and protocols that have served us for the last 50 years.
POPULATION BIOBANKS AND THE CHALLENGE OF HARMONIZATION The “banking” of human biological samples for research is not a twenty-first century phenomenon. Human tissue has been gathered and collected for at least 100 years. According to the U.S. National Bioethics Advisory Committee, by 1999, a total of 282 million unique tissue specimens were being held in the United States [27]. The term biobank, however, is relatively new. It appeared in PubMed for the first time in 1996 [28] and was not common nomenclature until the end of the decade. The sequencing of the human genome, advances in computational biology, and the emergence of new disciplines such as biomarker discovery, pharmacogenomics, and nutrigenomics have sparked unprecedented demand for samples of human blood, tissue, urine, DNA, and data. Three-fourths of the clinical trials that drug companies submit to the U.S. Food and Drug Administration for approval now include a provision for sampling and storing human tissue for future genetic analysis [3]. Biobanking has become deserving of its own name and has gained a dedicated society, the International Society for Biological and Environmental Repositories (ISBER), as well as two recent worldwide congresses: the WorldWide BioBank Summits (organized by IBM Healthcare and Life Sciences) and Biobanking and Biorepositories (organized by Informa Life Sciences).
630
ETHICS OF BIOMARKERS
The collection of human samples and data for research has not just accelerated, it has evolved. Four features differentiate biobanks today from those of 20 years ago: the emergence of large population-level biobanks, increased levels of commercial involvement, the desire for international collaborations requiring samples and data to be shared beyond national borders, and finally, the prospective nature of many emerging collections. The increased speed and scale of biobanking has contributed to the increasing public and academic concern with the ethical and social implications of this technology. The rules and practices of research and research ethics developed prior to the consolidation of these trends now inhibit the ability to construct biobanks and related research efficiently. They also provide ineffective protection for individuals and populations. Small genetic databases containing a limited number of samples, attached to one research project focused on a specific disease were once standard. Such collections still exist: clinical collections within hospital pathology departments, and case- or family-based repositories for genetic studies of disease. Larger provincial, national, and international repositories are now increasingly common, as is the networking of existing collections. Provincial examples include the CARTaGENE project in Quebec (Canada). National diseasebased biobanks and networks include the Alzheimer’s Genebank, sponsored jointly by the U.S. National Institute on Ageing and the Alzheimer’s Association. Examples of national or regional population-level biobanks include the Estonian Genome Project (Estonia), Biobank Japan (Japan), Icelandic Health Sector Database, UK Biobank (UK), Medical Biobank (Sweden) and the Singapore Tissue Network (Singapore). International collaborations include the European GenomEUtwin Project, a study of twins from Denmark, Finland, Italy, the Netherlands, Sweden, the UK, France, Australia, Germany, Lithuania, Poland, and the Russian Federation (http:// www.genomeutwin.org/). Levels of commercial involvement vary among these biobanks. The Icelandic Biobank was founded as a public–private partnership between the Icelandic government and deCODE Genetics. UmanGenomics was given exclusive rights to commercialize information derived from Sweden’s Medical Biobank. The Singapore Tissue Network, by contrast, is publicly funded and will not be involved in commercialization. Biotechnology companies involved in biobanking include Newfound Genomics, which gathers DNA samples from volunteers across Newfoundland and Labrador. Many of these large population databases are designed as research infrastructures. They do not focus on one specific disease or genetic characteristic, but contain samples from sick and healthy persons, often across several generations. DNA, blood, or other tissues are stored together with health and lifestyle data from medical records, examinations and questionnaires. These large population databases support research into complex gene interactions involved in multifactoral diseases and gene–gene and gene–environment interactions at the population level. There are few clinical benefits to indi-
POPULATION BIOBANKS AND THE CHALLENGE OF HARMONIZATION
631
vidual donors. Benefits are expected to be long term and often cannot be specified at the time of data and tissue collection. It is a major challenge to the requirement of informed consent that persons donating biological and data samples cannot know the specific future research purposes for which their donations will be used. This proliferation of biobanks, and the advent of population-wide and transnational biobanking endeavors, has triggered a variety of regulatory responses. Some national biobanks have been created in association with new legislation. Estonia and Lithuania enacted the Human Genes Research Act (2000) and the Human Genome Research Law (2002), respectively, possibly motivated by the inadequacy of existing norms, a belief that genetic data and research require different regulation than traditional medicine, as well as by the need for democratic legitimacy [19]. The UK Human Tissue Act (2004), Sweden’s Act on Biobanks (2002), and the Norwegian Act on Biobanks (2003) all pertain to the storage of biological samples [29]. Other national initiatives do not treat genetic data as exceptional. They remain dependent on a network of existing laws. A series of national and international guidelines have also been produced, such as the World Medical Association’s Declaration on Ethical Considerations Regarding Health Databases (2002) and guidelines from the U.S. National Bioethics Advisory Commission (1999) and the Council of Europe Committee of Ministers (2006). As with national regulation, however, the norms, systems, and recommendations for collection and processing of samples, informed consent procedures, and even the terminology for degrees of anonymization of data differ substantially between guidelines. Anonymization terminology illustrates the confusion that can result from such diversity. European documents distinguish five levels of anonymization of samples [30]. Within European documents, anonymized describes samples used without identifiers but that are sometimes coded to enable reestablishing the identity of the donor. In most English Canadian and U.S. texts, however, anonymized means that the sample is irreversibly de-identified. Quebec follows the French system, distinguishing between reversibly and irreversibly anonymized samples. In European documents, coded usually refers to instances where researchers have access to the linking code. But the U.S. Office for Human Research Protection (OHRP) uses the word to refer to situations where the researcher does not have access to the linking code [30]. To add to the confusion, UNESCO has been criticized for creating new terms, such as proportional or reasonable anonymity, that do not correspond to existing categories [19]. Such confusion has led to repeated calls for harmonization of biobank regulations. The Public Population Project in Genomics consortium (P3G) is one attempt, a nonprofit consortium aiming to promote international collaboration and knowledge transfer between researchers in population genomics. With over 30 charter and associate members, P3G declares itself to have “achieved a critical mass to form the principal international body for the
632
ETHICS OF BIOMARKERS
harmonization of public population projects in genomics” (http://www. p3g.org). Standardization also has its critics, notably among smaller biobanking initiatives. In 2006, the U.S. National Cancer Institute (NCI) launched guidelines spelling out best practices for the collection, storage, and dissemination of human cancer tissues and related biological specimens. These high-level guidelines are a move toward standardization of practice, following revelations in a 2004 survey of the negative impact of diverse laboratory practices on resource sharing and collaboration [31]. The intention is that NCI funding will eventually depend on compliance. The guidelines were applauded in The Lancet by the directors of major tissue banks such as Peter Geary of the Canadian Tumor Repository Network. They generated vocal concerns from other researchers and directors of smaller banks, many of which are already financially unsustainable. Burdensome informed consent protocols and the financial costs of infrastructural adjustments required were the key sources of concern. This is a central problem for biobanking and biomedical ethics: the centrality, the heavy moral weight, and the inadequacy of individual and voluntary informed consent.
INFORMED CONSENT: CENTRALITY AND INADEQUACY OF THE IDEAL Informed consent is one of the most important doctrines of bioethics. It was introduced in the 1947 Nuremberg Code, following revelations during the Nuremberg trials of Nazi medical experimentation in concentration camps. It developed through inclusion in the United Nations’ Universal Declaration of Human Rights in 1948 and the World Medical Association’s Declaration of Helsinki in 1964. Informed consent is incorporated in all prominent medical, research, and institutional ethics codes, and is protected by laws worldwide. The purpose of informed consent in research can be viewed as twofold: to minimize harm to research subjects and to protect their autonomous choice. Informed consent requires researchers to ensure that research participants consent voluntarily to participation in research and that they be fully informed of the risks and benefits. The focus of informed consent has slowly shifted: from disclosure by health professionals toward the voluntary consent of the individual based on the person’s understanding of the research and expression of their own values and assessments [32]. Simultaneously, health research has shifted from predominantly individual investigator-designed protocols with specific research questions to multiple investigator and institution projects that gather many forms of data and samples to understand complex phenomena and test emerging hypotheses. Further, informed consent as a protection for autonomy has become important in arguments about reproductive autonomy. Informed consent has been described as representing the dividing line between “good” genetics and “sinful” eugenics [32].
INFORMED CONSENT: CENTRALITY AND INADEQUACY OF THE IDEAL
633
Unprecedented computational power now makes it possible to network and analyze large amounts of information, making large-scale population biobanks, and genetic epidemiology studies more promising than ever before. This research context raises the stakes of research ethics, making it more difficult to achieve individual consent and protect privacy while serving as the basis for strong claims of individualized and population health benefits. Large-scale biobanks and cohorts by their very nature cannot predict the exact uses to which samples will be put ahead of time. The ideal of voluntary participation based on knowledge of the research appears to require new informed consent for every emergent hypothesis that was not part of the original informed consent. The practicality of such an ideal approach is less than clear. Genetic testing can also use samples that were not originally collected for genetic studies. Tissue biopsies collected for clinical diagnosis are now providing information for gene expression studies [33]. The precise nature of future technologies that will extract new information from existing samples cannot be predicted. On the other hand, seeking repeated consent from biobank donors is a costly and cumbersome process for researchers that can impede or even undermine research. Response rates for data collection (e.g., questionnaires) in any large population may vary between 50 and over 90%. The need for renewed consent could therefore reduce participation in a project and introduce selection bias [34]. Repeat consent may also be unnecessarily intrusive to the lives of donors or their next of kin. Other forms of consent have been suggested and implemented for biobanking purposes. These include consent with several options for research use: presumed consent, broad consent, and blanket consent. Many European guidelines, including a memorandum from the Council of Europe Steering Committee on Bioethics, laws in Sweden, Iceland, and Estonia, and the European Society for Human Genetics guidelines, consider broad consent for unknown future uses to be acceptable as long as such future projects gain approval from Research Ethics Boards and people retain the right to withdraw samples at any time [30]. The U.S. Office for Human Research Protection went one step further in 2004, proposing to broaden the definition of nonidentifiable samples, upon which research is allowed under U.S. federal regulations without the requirement of informed consent. The problem is that no informed consent mechanism—narrow or broad— can address all ethical concerns surrounding the biobanking of human DNA and data [35]. Such concerns include the aggregate effects of individual consent upon society as a whole and upon family and community members given the inherently “shared” nature of genetic material. If people are given full choice as to which diseases their samples can be used to research, and they choose only to donate for well-known diseases such as cancer, rare disease may be neglected. The discovery that Ashkenazi Jews may have particular mutations predisposing them to breast, ovarian, and colon cancer has generated fears that they could become the target of discrimination [36].
634
ETHICS OF BIOMARKERS
Concerns include irreconcilable trade-offs between donor desires for privacy (best achieved by unlinking samples), control over the manner in which their body parts and personal information are used (samples can be withdrawn from a biobank only if a link exists), and access to clinically relevant information discovered in the course of research. For some individuals and communities, cultural or religious beliefs dictate or restrict the research purposes for which their samples can be used. The Nuu-chah-nulth nations of Vancouver Island became angry in 2000 after discovering that their samples, collected years before for arthritis research, had been used for the entirely different purpose of migration research [37,38]. In some cases, a history of colonialism and abusive research makes a group demand that their samples be used for research that benefits their community directly. Complete anonymization of samples containing human DNA is technically impossible, given both the unique nature of a person’s DNA and its shared characteristics. Consequently, in 2003, the Icelandic Supreme Court ruled that the transfer of 18-year-old student Ragnhildur Gudmundsdottir’s dead father’s health data infringed her privacy rights: “The court said that including the records in the database might allow her to be identified as an individual at risk of any heritable disease her father might be found to have had—even though the data would be made anonymous and encrypted” [39]. Reasonable privacy protection in a biobanking context is tough to achieve, extending to the technological capacity to protect privacy through linked or unlinked anonymized samples without risk of error. Informed consent cannot provide a basis for participants to evaluate the likelihood of benefit arising from their participation in a biobank when these merits are contested by experts. Critics of UK Biobank, for example, have little faith in the value and power of such prospective cohort studies, compared to traditional case–control studies, for isolating biomarkers and determining genetic risk factors. Supporters argue that the biobank will be a resource from which researchers can compile nested case–control studies. Critics claim that it will only be useful for study of the most common cancers, those that occur with enough frequency among donors. Others claim that even UK Biobank’s intended 500,000 participants cannot provide reliable information about the genetic causes of a disease without a study of familial correlations [12]. Informed consent is inadequate as a solution for ensuring that the impacts of biobanking and related research will be beneficial to individuals and society, will uphold the autonomy of the individual, or will facilitate justice. Given its historical importance and bureaucratic and legal dependence [40], it is not surprising that informed consent remains central to contemporary discussions of ethical and social implications of biobanking, biomarkers, and biomedical research. Unfortunately, the substance of such debates centers upon the inadequacy of both ideal and current procedures. As Hoeyer points out in reference to Medical Biobank run by Uman Genomics in northern Sweden, informed consent offers an illusion of choice without real consideration of the
SCIENCE, ETHICS, AND THE CHANGING ROLE OF THE PUBLIC
635
implications of such choices, “by constructing a diffuse arrangement of donors who can only be semiaccountable agents” [41].
SCIENCE, ETHICS, AND THE CHANGING ROLE OF THE PUBLIC Novel and innovative norms and models for biobank management have been proposed by bioethics, social science, and legal practitioners and theorists in recent years, in an attempt to deal with some of these issues. Alternative ethical frameworks based on social solidarity, equity, and altruism have been suggested [42,43]. These formed the basis for the recent Human Genome Organisation Ethics Committee statement on pharmacogenomics [26]. Onora O’Neil has argued for a two-tiered consent process in which public consent for projects is solicited prior to individual consent for donation of samples [44]. The charitable trust model has also been proposed for biobanking, as a way of recognizing DNA both as a common heritage of humanity and as uniquely individual, with implications for family members. “All information would be placed in a trust for perpetuity and the trustees overseeing the information would act on behalf of the people who had altruistically provided information to the population collection. They would be accountable to individuals but could also act as representatives for the community as a whole” [45]. It is not clear, however, whether such models could ever gain widespread public endorsement and legitimacy without direct public involvement in their design. Appeals to the need for community consultation [45] and scientific citizenship [46] may be more suited to the current mood. There is growing awareness globally, among government, policymakers, regulators, and advocacy groups alike, of the importance of public engagement, particularly in relation to emerging technologies. In the UK, crises over bovine spongiform encephalopathy (BSE), otherwise known as “mad cow disease,” and genetically modified (GM) crops have forced the government to proclaim the value of early public participation in decision making [47,48]. A statement by the UK House of Lords Select Committee in 2000 concluded that “today’s public expects not merely to know what is going on, but to be consulted; science is beginning to see the wisdom of this and to move out of the laboratory and into the community to engage in dialogue aimed at mutual understanding” [49]. In Canada, the provincial government of British Columbia pioneered a Citizens’ Assembly in 2003, charging 160 citizens with the task of evaluating the existing electoral system. A new BC Conversations on Health project aims to improve the health system by engaging in “genuine conversation with British Columbians” during 2007. Indeed, public consultations have become the norm for soliciting public support for new technologies. In the UK these have included Weekends Away for a Bigger Voice, funded by the National Consumer Council in 2001 and the
636
ETHICS OF BIOMARKERS
highly publicized government-funded GM Nation consultation in 2002. In Canada, notable examples include the 1999 Canadian Citizen’s Conference on Biotechnology and the 2001 Canadian Public Consultation on Xenotransplantation. In Denmark, more than 20 consensus conferences have been run by the Danish Board of Technology since 1989, on topics as diverse as genetically modified foods, electronic surveillance, and genetic testing [50]. In New Zealand, the government convened a series of public meetings in 2000 as part of its Royal Commission on genetic modification. UK Biobank marketing is careful to assert that the project has “undergone rigorous review and consultation at all levels” (http://www.ukbiobank.ac.uk/about/what.php). Traditional public consultations have their limitations, however. Past examples of consultations have either been unpublicized or restricted to stakeholder involvement, undermining the claim to be representative of the full range of public interests [8]. Some critics suspect consultations of being a front to placate the public, a means of researching market strategy, and speeding product development [51] or as a mechanism for engineering consent [13]. GM Nation is one example of a consultation that has been criticized for “capture” by organized stakeholder groups and as misrepresentative of the public it aimed to consult [52].
PROMISING FUTURE DIRECTIONS: PUBLIC CONSULTATION AND DELIBERATIVE DEMOCRACY The use of theories and practices of deliberative democracy within such public consultations are a more recent and innovative trend. Deliberation stands in opposition to the aggregative market model of representational democracy and the strategic behavior associated with voting. It offers a model of democracy in which free and equal citizens exchange reasons through dialogue, and shape and alter their preferences collectively, and it is rapidly gaining in popularity, as evidenced by the growth of nonprofit organizations such as the Everyday Democracy (http://www.everyday-democracy.org), AmericaSpeaks (http://www.americaspeaks.org/), and National Issues Forums (http://www. nifi.org/) throughout the United States. Origin stories of this broad deliberative democracy “movement” are as varied as its incarnations, and practice is not always as closely linked to theory as it could be. But most theorists will acknowledge a debt to the work of either (or both) Habermas and Rawls. Habermas’s wider program of discourse ethics provides an overarching rationale for public deliberation [53]. This asserts that publicly binding norms can make a legitimate claim to rationality—and thus legitimacy—only if they emerge from free argument between all parties affected. Claims about what “any reasonable person” would accept as right can only be justified by putting them to the test. This is then a far cry from the heavily critiqued [13,54] model of public consultation as a tool for engendering public trust or engineering acceptance of a new technology.
CONCLUSIONS
637
By asking participants to consider the perspectives of everyone, deliberation orients individuals away from consideration of self-interest and toward consideration of the common good. Pellizzoni characterizes this governance virtue as one of three key virtues of deliberative democracy [55]. The second is civic virtue, whereby the process of deliberation produces more informed, active, responsible, cooperative, and fair citizens. The third is cognitive virtue, the notion that discussion oriented to understanding rather than success enhances the quality of decisions, gives rise to new or unarticulated points of view, and allows common understanding of a complex problem that no single person could understand in its entirety. Deliberative democracy is not devoid of challenges when applied to complex issues of science and technology, rich as they can be in future uncertainties and potential societal impact. But it offers much promise as a contribution to biobanking policy that can provide legitimate challenges to rigidly structured research ethics.
CONCLUSIONS Biomarker research is greatly advanced by good-quality annotated collections of tissues, or biobanks. Biobanks raise issues that stretch from evaluation of the benefits and risks of research through to the complexity of informed consent for collections for which the research purposes and methods cannot be described in advance. This range of ethical and organizational challenges is not managed adequately by the rules, guidelines, and bureaucracies of research ethics. Part of the problem is that current research ethics leaves too much for the individual participant to assess before the relevant information is available. But many other aspects of biobanks have to do with how benefits and risks are defined, achieved, and shared, particularly those that are likely to apply to groups of individuals with inherited risks, or those classified as having risks or as being more amenable to treatment than others. These challenges raise important issues of equity and justice. They also highlight tradeoffs between research efficiency and benefits, privacy and individual control over personal information, and tissue samples. These issues are not resolvable by appeal to an existing set of rules or ethical framework to which all reasonable people agree. Inevitably, governance decisions related to biobanks will need to find a way to create legitimate policy and institutions. The political approach of deliberative democracy may hold the most promise for wellinformed and representative input into trustworthy governance of biobanks and related research into biomarkers. Acknowledgments The authors thank Genome Canada, Genome British Columbia, and the Michael Smith Foundation for Health Research for their essential support.
638
ETHICS OF BIOMARKERS
We also appreciate the support and mutual commitment of the University of British Columbia, the British Columbia Transplant Society, Providence Health Care, and Vancouver Coastal Health, and all participants in the Biomarkers in Transplantation initiative.
REFERENCES 1. Sigurdsson S (2001). Ying-yang genetics, or the HSD deCODE controversy. New Genet Soc, 20(2):103–117. 2. Sigurdsson S (2003). Decoding broken promises. Open Democracy. www. opendemocracy.net/theme-9-genes/article_1024.jsp (accessed June 1, 2004). 3. Abbott A (2003). DNA study deepens rift over Iceland’s genetic heritage. Nature, 421:678. 4. Mannvernd, Icelanders for Ethics in Science and Medicine (2004). A landmark decision by the Icelandic Supreme Court: the Icelandic Health Sector Database Act stricken down as unconstitutional. 5. Merz JF, McGee GE, Sankar P (2004). “Iceland Inc.”?: On the ethics of commercial population genomics. Soc Sci Med, 58:1201–1209. 6. Potts J (2002). At least give the natives glass beads: an examination of the bargain made between Iceland and deCODE Genetics with implications for global bioprospecting. Va J Law Technol, Fall, p. 40. 7. Pálsson G, Rabinow P (2001). The Icelandic genome debate. Trends Biotechnol, 19:166–171. 8. Burgess, M, Tansey J. (2009). Democratic deficit and the politics of “informed and Inclusive” consultation. In Einseidel E, Parker R (eds.), Hindsight and Foresight on Emerging Technologies. UBC Press, Vancouver, British Columbia, Canada. 9. Morrison Institute for Population and Resource Studies (1999). Human Genome Diversity Project: Alghero Summary Report. http://www.stanford.edu/group/ morrinst/hgdp/summary93.html (accessed Aug. 2, 2007). 10. Harry D, Howard S, Shelton BL (2000). Indigenous people, genes and genetics: what indigenous people should know about biocolonialism. Indigenous Peoples Council on Biocolonialism. http://www.ipcb.org/pdf_files/ipgs.pdf. 11. BBC (2007). Volunteers join £61m health study. BBC News, July 16, 2007. http:// news.bbc.co.uk/2/hi/uk_news/scotland/glasgow_and_west/6900515.stm (accessed Sept. 24, 2007). 12. Barbour V (2003). UK Biobank: a project in search of a protocol? Lancet, 361:1734–1738. 13. Peterson A (2007). Biobanks “engagements”: engendering trust or engineering consent? Genet Soc Policy, 3:31–43. 14. Peterson A (2005). Securing our genetic health: engendering trust in UK Biobank. Sociol Health Illness, 27:271–292. 15. Redfern M, Keeling J, Powell M (2001). The Royal Liverpool Children’s Inquiry Report. House of Commons, London.
REFERENCES
639
16. Royal College of Pathologists’ Human Tissue Advisory Group (2005). Comments on the Draft Human Tissue Authority Codes of Practice 1 to 5. The Royal College of Pathologists, London, Sept. 28. 17. Lin Z, Owen A, Altman R (2004). Genomic research and human subject privacy. Science, 305:183. 18. Roche P, Annas G (2001). Protecting genetic privacy. Nat Rev Gene, 2: 392–396. 19. Cambon-Thomsen A, Sallée C, Rial-Sebbag E, Knoppers BM (2005). Population genetic databases: Is a specific ethical and legal framework necessary? GenEdit, 3:1–13. 20. Illes J, Rosen A, Greicius M, Racine E (2007). Prospects for prediction: ethics analysis of neuroimaging in Alzheimer’s disease. Ann NY Acad Sci, 1097: 278–295. 21. Caux C, Roy DJ, Guilbert L, Viau C (2007). Anticipating ethical aspects of the use of biomarkers in the workplace: a tool for stakeholders. Soc Sci Med, 65:344–354. 22. Viau C (2005). Biomonitoring in occupational health: scientific, socio-ethical and regulatory issues. Toxicol Appl Pharmacol, 207:S347–S353. 23. The Economist (2007). Genetics, medicine and insurance: Do not ask or do not answer? Aug. 23. http://www.economist.com/science/displaystory.cfm?story_id= 9679893 (accessed Aug. 31, 2007). 24. Genewatch UK (2006). Genetic discrimination by insurers and employers: still looming on the horizon. Genewatch UK Report, Feb. 14. http://www.genewatch. org/uploads/f03c6d66a9b354535738483c1c3d49e4/GeneticTestingUpdate2006.pdf (accessed Aug. 31, 2007). 25. Rothenberg K, et al. (1997). Genetic information and the workplace: legislative approaches and policy challenges. Science, 275:1755–1757. 26. Human Genome Organisation Ethics Committee (2007). HUGO Statement on Pharmacogenomics (PGx): Solidarity, Equity and Governance. Genom Soc Policy, 3:44–47. 27. Lewis G (2004). Tissue collection and the pharmaceutical industry: investigating corporate biobanks. In Tutton R, Corrigan O. (eds.), Genetic Databases: Socioethical Issues in the Collection and Use of DNA. Routledge, London. 28. Loft S, Poulsen HE (1996). Cancer risk and oxidative DNA damage in man. J Mol Med, 74: 297–312. 29. Maschke KJ (2005). Navigating an ethical patchwork: human gene banks. Nat Biotechnol, 23:539–545. 30. Elger BS, Caplan AL (2006). Consent and anonymization in research involving biobanks. Eur Mol Biol Rep, 7:661–666. 31. Hede K (2006). New biorepository guidelines raise concerns. J Nat Cancer Inst, 98:952–954. 32. Brekke OA, Thorvald S (2006). Population biobanks: the ethical gravity of informed consent. Biosocieties, 1:385–398. 33. Cambon-Thomsen A (2004). The social and ethical issues of post-genomic human biobanks. Nat Rev Genet, 5:866–873.
640
ETHICS OF BIOMARKERS
34. Hansson MG, Dillner J, Bartram CR, Carlson JA, Helgesson G (2006). Should donors be allowed to give broad consent to future biobank research? Lancet Oncol, Mar, 7. 35. Burgess MM (2001). Beyond consent: ethical and social issues in genetic testing. Nat Rev Genet, 2:147–151. 36. Weijer C, Emanuel E (2000). Protecting communities in biomedical research. Science, 289:1142–1144. 37. Baird L, Henderson H (2001). Nuu-Chah-Nulth Case History. In Glass KC, Kaufert JM (eds.), Continuing the Dialogue: Genetic Research with Aboriginal Individuals and Communities, pp. 30–43. Proceedings of a workshop sponsored by the Canadian Commission for the United Nations Educational, Scientific, and Cultural Organization (UNESCO), Health Canada, and the National Council on Ethics in Human Research, pp. 26–27, Jan. 2001, Vancouver, British Columbia, Canada. 38. Tymchuk M (2000). Bad blood: management and function. Canadian Broadcasting Company, National Radio. 39. Abbott A (2004). Icelandic database shelved as court judges privacy in peril. Nature, 429:118. 40. Faden RR, Beauchamp TL (1986). A History and Theory of Informed Consent. Oxford University Press, New York. 41. Hoeyer K (2004). Ambiguous gifts: public anxiety, informed consent and biobanks. In Tutton R, Corrigan O (eds.), Genetic Databases: Socio-ethical Issues in the Collection and Use of DNA. Routledge, London. 42. Chadwick R, Berg K (2001). Solidarity and equity: new ethical frameworks for genetic databases. Nat Rev Genet, 2:318–321. 43. Lowrance W (2002). Learning from Experience: Privacy and Secondary Use of Data in Health Research. Nuffield Trust, London. 44. O’Neil O (2001). Informed consent and genetic information. Stud History Philos Biol Biomed Sci, 32:689–704. 45. Kaye J (2004). Abandoning informed consent: the case for genetic research in population collections. In Tutton R, Corrigan O (eds.), Genetic Databases: Socioethical Issues in the Collection and Use of DNA. Routledge, London. 46. Weldon S (2004). “Public consent” or “scientific citizenship”? What counts as public participation in population-based DNA collections? In Tutton R, Corrigan O (eds.), Genetic Databases: Socio-ethical Issues in the Collection and Use of DNA. Routledge, London. 47. Bauer MW (2002). Arenas, platforms and the biotechnology movement. Sci Commun, 24:144–161. 48. Irwin A (2001). Constructing the scientific citizen: science and democracy in the biosciences. Public Understand Sci, 10:1–18. 49. House of Lords Select Committee on Science and Technology (2000). Science and Society, 3rd Report. HMSO, London. 50. Anderson J (2002). Danish Participatory Models: Scenario Workshops and Consensus Conferences, Towards More Democratic Decision-Making. The Pantaneto Forum 6. http://www.pantaneto.co.uk/issue6/andersenjaeger.htm (accessed Oct. 22, 2007).
REFERENCES
641
51. Myshkja B (2007). Lay expertise: Why involve the public in biobank governance? Genet Soc Policy, 3:1–16. 52. Rowe G, Horlick-Jones T, Walls J, Pidgeon N (2005). Difficulties in evaluating public engagement activities: reflections on an evaluation of the UK GM Nation public debate about transgenic crops. Public Understand Sci, 14:331–352. 53. Habermas J (1996). Between Facts and Norms: Contributions to a Discourse Theory of Law and Democracy. MIT Press, Cambridge, MA. 54. Wynne B (2006). Public engagement as a means of restoring public trust in science: Hitting the notes, but missing the music? Commun Genet, 9:211–220. 55. Pellizzoni L (2001). The myth of the best argument: power, deliberation and reason. Br J Sociol, 52:59–86.
36 PATHODYNAMICS: IMPROVING BIOMARKER SELECTION BY GETTING MORE INFORMATION FROM CHANGES OVER TIME Donald C. Trost, M.D., Ph.D. Analytic Dynamics, Niantic, Connecticut
INTRODUCTION The purpose of this chapter is to introduce some approaches to thinking about biological dynamics. Pathodynamics is a term used by the author to describe a quantitative approach to disease that includes how the biological system changes over time. In many ways it is analogous to thermodynamics [1,2] in that it deals with the macroscopic, measurable phenotypic aspects of a biological system rather than the microscopic aspects such as those modeled in mathematical physiology [3]. For the purposes of this chapter, macroscopic will refer to measurements that do not involve the destruction of the biological system being studied but may vary in scale (i.e., cell, tissue, organ, and body). For example, clinical measurements would be macroscopic whether invasive or not, but histopathology, cell lysis, or genotype would be microscopic. Another way to view this is that when the dynamics of the system being studied result from something greater than the sum of the parts, usually from networks, it is macroscopic; otherwise, studying the parts of the system is microscopic. One of the problems in macroscopic biology is that many of the underlying system characteristics are immeasurable. This is why the biological Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
643
644
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
variation is just as important, if not more important than the mean behavior of a system because it represents the system changes that cannot be measured directly. The term parameter is used rather loosely in biology and other sciences and has a very specific meaning in mathematics. For the purposes here, it will be defined in a statistical manner as a quantity which is a characteristic of the system that is not directly measurable but can be estimated from experimental data and will be represented by Greek symbols. Quantities being measured will be referred to as variables. The letter y will designate a random, or stochastic, variable, one that has a probability distribution, and the letter x will represent a deterministic, or controlled, variable, one that is measured “exactly,” such as gender, temperature, and pH. The letters s, t, and T will refer to time and will be either deterministic or random, depending on the experimental context. Biomarkers are really mathematical functions of parameters. In modeling, parameters appear as additive coefficients, multiplicative coefficients, and exponents. Sometimes parameters are deterministic, sometimes random, and sometimes functions of time or variables. Probably the most common parameter in biology is the population mean of a random variable. The sample mean is an estimate of this parameter; it is also a statistic, a quantity that is a function of the data. However, under the definitions above, some statistics do not estimate parameters and are known as nonparametric statistics. Nonparametric statistics are generally used for hypothesis testing and contain little or no mechanistic biological information [4–7]. Analogy to Thermodynamics Thermodynamics is a macroscopic view of physics that describes the flow of energy and the disorder of matter (i.e., entropy) [1,2]. The former is reflected in the first law of thermodynamics,which says that Δenergy = Δwork + Δheat which is basically a law of the conservation of energy. The second law has various forms but generally states that entropy changes of a system and its exterior never decrease. Aging and disease (at least some) may be examples of increasing entropy. In chemistry, classical thermodynamics describes the behavior of molecules (particles) and changes in the states of matter. Equilibrium occurs at the state of maximum entropy [i.e., when the temperature is uniform throughout the (closed) system]. This requires constant energy and constant volume. In a system with constant entropy and constant volume, equilibrium occurs at a state of minimum energy. For example, if constant entropy occurs at a constant temperature, equilibrium occurs when the Helmholtz free energy is at its minimum. In all cases, the particles continue to move; this is a dynamics part. For nonequilibrium states, matter and energy will flow to reach an equi-
INTRODUCTION
645
librium state. This is also a dynamics part. Modern thermodynamics is a generalization of classical thermodynamics that relates state variables for any system in equilibrium [2], such as electromagnetism and fluid dynamics. The open question is whether or not these concepts apply to biological systems. For this chapter, the pathodynamics concept is that the states of a biological system can be measured and related in a manner similar to thermodynamics. At this time only the simplest relationships are being proposed. The best analogy seems to be viewing a biological system: in particular, a warm-blooded (constant-temperature) mammal as a particle in a high-dimensional space whose states are measured via (clinical) laboratory tests. The dimension of this space will be defined by the information contained in these biomarkers (see below). This particle is in constant motion as long as the system is alive and the microscopic behavior of the biology is generally unobservable, but the probabilitistic microscopic behavior of this particle is observable through its clinical states. The probabilistic macroscopic properties are described by the probability distributions of the particle. These distributions are inherently defined by a single biological system, and how they are related to population distributions for the same species is unknown. Hopefully, there will be some properties that are invariant among individuals so that the system behavior can be studied with large enough sample sizes to get sufficient information about the population of systems. One way to visualize this concept of pathodynamics is to imagine a single molecule moving in a homogeneous compressible fluid [8]. A drop of this fluid represents the probability distribution and is suspended in another fluid medium. The fluids are slightly miscible, so that there is no surface on the drop, but there is a cohesive (homeostasic) force that pulls the particles toward the center of the drop while the heat in the system tends to diffuse the particles out into the medium. Dynamic equilibrium occurs when the drop is at its smallest volume, as measured by levels of constant probability density. The conservation of mass is related to the total probability for the particle (mass = 1), which is analogous to the mass of the drop. The drop becomes distorted when external forces on it cause distortion of the shape and may even fragment the drop into smaller drops or change the drop so that holes appear. These external forces are due to factors such as external environment, disease, and therapies. A goal of pathodynamics is to infer the presence of an external force by observing the motion of the particle and finding the correspondence between the force and known causes. The Concept of Time Since dynamics relates to the changes over time, some mention of time is appropriate. Everyone has a general concept of time, but in mathematical and physical thinking, time is a little more complicated [9,10]. There are two main classes of time: continuous (analog) and discrete (digital). Although physicists use the reversibility of time, which at least makes the mathematics easier, Prigogine [10] argues that at the microscopic level, the world is probabilistic
646
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
and that in probabilistic (stochastic) systems, time can only go forward. Furthermore, he argues that “dynamics is at the root of complexity that [is] essential for self-organization and the emergence of life.” There are many kinds of time. In the continuous category are astronomical time (ordinary time), biological time (aging), psychological time, thermodynamic time, and information time [11]. Discrete time is more difficult to imagine. Whenever digital data are collected, the observations occur only at discrete times, usually equally spaced. Even then, the process being observed runs in continuous time. This is probably the most common case. However, discrete time is really just a counting process and occurs naturally in biology. Cell divisions, heartbeats, and respirations are a few of the commonly observed discrete biological clocks. In this chapter only continuous-time processes are discussed. An example of discrete-time pathodynamics can be found elsewhere [12].
BROWNIAN MOTION Diffusion One of the most basic continuous stochastic processes is Brownian motion. The concept of Brownian motion was born when a biologist named Robert Brown observed pollen under a microscope vibrating due to interactions with water molecules [13]. This concept should be familiar to biologists. Brownian motion has been modeled extensively [14,15]. Standard Brownian motion (Bt) is a Gaussian stochastic process with mean and variance μt = 0 σ t2 = t respectively. In the laboratory, Brownian motion is a good model for diffusion (e.g., immunodiffusion), where the concentration of the protein is related to the time of observation and the diffusion coefficient. In an open system where time goes forever or when the particles have no boundary, the diffusion of a particle is unbounded because the variance goes to infinity. To make it a useful concept in biology, the particle needs to be either in a closed container or stabilized by an opposing force. Homeostasis: Equilibrium Pathodynamics In biological systems, diffusion occurs only on a microscopic level and is not usually measurable macroscopically. However, in a probabilistic model of pathodynamics, the particle representing person’s clinical health state can be thought of as a microscopic object (e.g., a molecule in a fluid drop), while the probability distribution is the macroscopic view. The reader should note that
BROWNIAN MOTION
647
the concepts of microscopic biology as defined in the Introduction and this imaginary microscopic particle of pathodynamics are different uses of the microscopic/macroscopic dichotomy. The Ornstein–Uhlenbeck (OU) stochastic process [16] is a stationary Gaussian process with conditional mean and variance given y0: μ t = μ + e − αt ( y0 − μ ) σ t2 = σ 2 (1 − e −2 αt ) respectively, where y0 is the baseline value, μ is the equilibrium point (mean of yt), and σ2 is the variance of yt. The autocorrelation between two measurements of y is Corr [ ys , yt ] = e − α (t − s) for times t ≥ s. In thermodynamic terms, the average fluctuation of a particle is dμ t = −α ( μ t − μ ) dt dμ t = −α ( μ t − μ ) dt Read “dx” as a small change in x. This ordinary differential equation (ODE) is analogous to the stochastic differential equation (SDE) [14,15] that generates the OU process, dyt = −α ( yt − μ ) dt + 2α σ dBt which has a biological variation term that is driven by Brownian motion. In statistical physics, this is called the Langevin equation, or the equation of motion for a Brownian particle [17]. Here a link between thermodynamics and pathodynamics will be attempted. Using the Einstein theory of fluctuations [2,17], the change in entropy is ΔS = −
1 ( yt − μ )2 2 2σ
Since this quantity is always zero or negative, this says that in the equilibrium state the entropy is decreasing with time, which suggests that there is some organizing force acting on the system. This is the homeostatic force FH =
1 ( yt − μ ) σ2
In statistical terms under near-equilibrium conditions, the drag in the system increases as the autocorrelation increases or as the variance decreases.
648
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
When the time rate of change of ΔS is equal to the product −JF, where J is the current or flow of the particles in the system, the system is in an equilibrium state. Solving this equation gives JH =
1 ( yt − μ ) 2
which is the entropy current. In homeostasis the average force and the average flow are zero and the average of the change in entropy is −½. This can all be generalized to multivariate biomarkers by making the measurements and their means into vectors y and μ, respectively, and by making α and σ2 into symmetric positive definite matrices A and Σ, respectively. The usual thermodynamic parameters are embedded in these statistical parameters and can be determined as needed for specific biological uses. To illustrate this physical analog a little further, suppose that a drop of particles were placed in a well in the center of a gel, the external medium, and allowed to diffuse for a fixed time tH. If Fick’s law applies, there is a diffusion coefficient D and the Stokes–Einstein relation holds such that D=
kBT γ
where kB is Boltzmann’s constant, T is the absolute temperature of the system, and γ is the viscosity (friction) coefficient. Now suppose that the particles are charged and an electrical field is applied radially inward at tH so that the field strength is proportional to the distance from the center of the sample and is equal to the force of the diffusion. This means that the distribution of the particles is in steady state and that the particles experience no acceleration. This leads to the Langevin equation with the friction coefficient in the system at γ = 1/ασ2. It turns out that at this equilibrium state σ2 = 2DtH and then by substitution α=
1 2kB tHT
and
ασ=
D kBT
As long as T is constant, α is constant and inversely proportional to T, which may represent physical temperature or some biological analog but is assumed to be constant as well; and ασ2 is proportional to D/T. Signals of Change: Nonequilibrium Pathodynamics Changes from equilibrium are the signals of interest in pathodynamics. In the simplest case, the signal can be an observation that occurs outside the dynamic reference interval (“normal limits”). This interval can be constructed by estimating its endpoints at time t using
INFORMATION FROM DATA
649
μ + e − α (t − s) ( ys − μ ) ± 1.96σ 1 − e −2 α (t − s) where s is the time of a previous measurement and would be zero if it is the baseline measurement. In either case, two measurements are required to identify a dynamic signal when the parameters do not change with time. Some care needs to be exercised when estimating this interval [18]. The interval clearly will be shorter than the usual one, μ ± 1.96σ, will be different for each person, and will be in motion unless there is no autocorrelation. A value outside this interval would indicate a statistically significant deviation from homeostasis with a probability of a false positive being 0.05 for each pair of time points. Simultaneous control of the type I error requires multivariate methods. Nonequilibrium states might be modeled by allowing one or more parameters to change with time. If μ is changing with time, this is called convection, meaning that the center of gravity of the system is changing, resulting in a flow, or trajectory. If α or σ is changing, the temperature or the diffusion properties are changing. When the particle is accelerating, an acceleration term needs to be added to the Langevin equation. The Fokker–Planck equation [17] provides a way to construct the steady-state probability distributions of the particle along with the transition probability states for the general Langevin equation. It is possible for a new equilibrium distribution to occur after the occurrence of a disease or after a therapeutic intervention, which would indicate a permanent residual effect. In this paradigm, a “cure” would occur only if the particle distribution returned to the healthy normal homeostasis state. The observation of the dynamics of the particle may suggest that an external (pathological) force is acting on the system. Any changes in “thermodynamic” variables may form patterns that lead to diagnostic criteria. The construction of optimal criteria and the selection and measurement of biomarkers are discussed in the next section.
INFORMATION FROM DATA Parameter Estimation and the Mean Squared Error The mean squared error (MSE) is just what it says: the average of the squared difference between the estimated parameter and the true parameter. The square root of the MSE is often referred to by engineers as the root mean square (RMS). In general,
()
()
()
MSE θˆ = Variance θˆ + Bias θˆ
2
The accent mark over the parameter means that it is estimated from the data; a parameter without the accent mark is a theoretical, or unknown, quantity.
650
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
The MSE shows up in statistical estimation theory in the Cramér–Rao inequality [19–21]:
()
MSE θˆ I (θ ) ≥ 1 The second term, I(·) in this inequality is called the Fisher information. The statistician tries to construct an estimator of the unknown parameter θ to make this product as close to 1 as possible. An estimator that makes the product equal to 1 is an unbiased, minimum-variance estimator (UMV) and derives the maximum Fisher information from the data. Unfortunately, these estimators do not always exist for a given situation. The estimation strategy is usually to minimize the MSE when the UMV estimator is not available. There may be situations where a biased estimator will produce an MSE smaller than the unbiased approach. The maximum likelihood (ML) estimator is probably the most common estimation procedure used today, although it requires that a model for the probability can be written explicitly. It has the nice property that a function of an ML estimate is the ML estimate of the function, allowing the ML parameter estimates to be plugged directly into the function. This property may not be true for other estimation procedures. Since the mathematics is easier when a sample size (n) is large (a euphemism for asymptotic, i.e., when n is “near” infinity), ML estimates are favored because for large samples, they are also Gaussian and UMV even when the underlying distribution of the data is not Gaussian. However, an extremely large n may be needed to get these properties, a fact often overlooked in statistical applications. This is particularly true when estimating some parameter other than the mean, such as the standard deviation or the coefficient of variation (CV). For the analysis of experiments, ordinary least squares (OLS) estimation is used most often. Analysis of variance (ANOVA) and ordinary regression are special cases of OLS estimation. OLS estimation is unbiased if the correct model is chosen. If the residuals (difference between the observation and the model) are Gaussian, independent, and have the same variance, then OLS is equivalent to ML and UMV. When the variance is not constant, such as in experiments where the analytical variation (measurement error) is related to the size of the measurement, various methods, such as a logarithm transformation, need to be used to stabilize the variance; otherwise, OLS does not have these good properties, and signals can be missed or falsely detected. Fisher Information Fisher information (I) provides a limit on how precisely a parameter can be known from a single measurement. This is a form of the uncertainty principle seen most often in physics. In other words, what statistic gives you the best characterization of your biomarker? When measurements are independent, such as those measured on different experimental units, the information is
INFORMATION FROM DATA
651
additive, making the total information equal to nI. If you can afford an unlimited number of independent measurements, which is infinite information, the parameter can be known exactly, but this is never the case. For dynamic measurements (i.e., repeated measurements over time on the same experimental unit), although the measurement errors are generally independent, the measurements are not. It is usually cheaper to measure the same subject at multiple times than to make the same number of measurements, one per subject. In addition, information about the time effects within a person cannot be obtained in the latter case. This is key information that is not available when the measurements are not repeated in time. As illustrated in the examples below, this additional information can have major effects on the characteristics of the signal and the ability to detect it. The Fisher information is generally a theoretical quantity for the lower bound of the MSE that involves a square matrix of the expectations of pairwise second-order partial derivatives of the log-likelihood. For those interested, this theory can be found in many books [19–21] and is not covered here. However, some of the results given below for the OU model are used to illustrate the information gain. For this purpose, an ANOVA model will be used for the means at equally spaced time points μt. In such an experimental design, time changes can be detected using specific orthogonal (independent) contrasts. These can be written as follows: Constant effect: Linear effect: Quadratic effect: Cubic effect: Quartic effect:
1 5 1
(μ 0 + μ1 + μ 2 + μ 3 + μ 4 )
10 1 14 1 10 1 70
( −2μ 0 − μ 1 + μ 3 + 2μ 4 ) ( 2μ 0 − μ 1 − 2μ 2 − μ 3 + 2μ 4 ) ( − μ 0 + 2μ 1 − 2μ 3 + μ 4 ) ( μ 0 − 4μ 1 + 6μ 2 − 4μ 3 + μ 4 )
The Fisher information for these sums of means assuming that μ0 = 0 estimated from an OU process is compared in Figure 1 to the information when the time points are independent. To reduce the complexity in graphing the relationships, the OU information is plotted against λ = α · Δt. If the approximate value of α is known from previous experiments, Δt can be chosen to achieve improved efficiency in detecting the desired time relationship of the experimental response. All efficiency appears to be lost for λ greater than 4, while the information appears to increase exponentially when it is less then 1. The maximum relative efficiencies are 1, 1.5, 2.5, 9, and 69 for the constant, linear, quadratic, cubic, and quartic effects, respectively. Estimating a constant response is not at all efficient using dynamic measures. It is not immediately
652
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
10
Information gain
8
6 Constant Linear Quadratic Cubic Quartic
4
2
0 0
1
2
3
4
λ
Figure 1 Fisher information gain from dynamic measurements relative to independent measurements as a function of λ. (See insert for color reproduction of the figure.)
clear why a linear relationship would be less efficient than the others, although it is well known that the best design in this case is to put the observations at beginning and end, with none in the middle. Since it was assumed that the beginning was zero, only one point is needed to estimate a linear response. For in vivo biomarkers, it would be difficult to imagine a constant or linear response. For those not inclined mathematically, it should be noted that all continuous functions of time can be approximated with a polynomial in time if the degree is sufficiently large. This means that additional contrasts, extensions of those above, can be used to get a better estimate of the time relationship as long as the degree is less than the number of time points. If the change with time is not well modeled by low-degree polynomials, a regression using the specific time-dependent function of interest should be used to save error degrees of freedom. It seems apparent from Figure 1 that the Fisher information of dynamic measurements is increased as the degree of polynomial or autocorrelation increases. Besides obtaining additional information about the curvature of the mean time effect, the autocorrelation parameter α contains information that is seldom estimated or used. For the OU process, this parameter combined with the variance provides a measure of the biological variation. It is proportional to the homeostatic force needed to maintain the dynamic equilibrium. The Fisher information for α is mathematically independent of the mean and vari-
653
5
INFORMATION FROM DATA
2
3
α = 0.01
α = 0.025 1
Fisher information (log10 )
4
α = 0.005
0
α = 0.05 correlation half-life λ=3
α = 0.1 0
100
200
300
400
500
600
700
Δt
Figure 2
Fisher information for α as a function of time between measurements (Δt).
ance. The relationship between the base 10 logarithm of α-information as a function of Δt for various values of α is shown in Figure 2. The closed circles are the autocorrelation half-life for each α and the open circles represent a loose upper bound for Δt at λ = 3, which is an autocorrelation of 0.05. Those serious about capturing information about α should seriously consider λ < 1. For a given α, the information approaches 2/α2 as Δt goes to zero. Obviously, the larger α is, the less information about the autocorrelation is available in the data, necessitating larger m or smaller Δt to get equivalent information. The time units are arbitrary in the figure but must match the time unit under which α is estimated. Shannon Information Shannon information is about how well a signal is communicated through some medium or channel; in this case it is the biological medium. The measurement of variables to estimate parameters that indicate which signal was transmitted is the signal detection process. Pathological states and the body represent a discrete communication system where the disease is the signal transmitted, which may affect any number of biological subsystems that act as the communication channels, and the biomarker is the signal detected in one or more biomarker measurements. The disease is then diagnosed by partitioning the biomarker space into discrete, mutually exclusive regions (R) in a way that minimizes the signal misclassification. Information, or communication, theory is usually applied to electronic systems that can be designed to take
654
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
advantage of the optimal properties of the theory. In biology, it is mostly a reverse-engineering task. The signal is the health or disease state of the individual, which is transmitted through various liquid and solid tissues with interacting and redundant pathways. In this paradigm, biomarkers are the signal detection instruments and algorithms. Rules, usually oversimplified, are then constructed to determine which signal was sent based on the biomarker information. An elementary background in information theory is given by Reza [22]. Shannon information is really a measure of uncertainty or entropy and describes how well signals can be transmitted through a noisy environment. Probability models are the basis of this type of information which complements Fisher information rather than competing with it. The general framework for Shannon information follows. As a reminder, Bayes’ theorem, where P[·] is a probability measure and P[A|B] = P[event A given event B], states that P[A|B] = P[A and B]/P[B]. In this particular representation, some elemental probabilities will be used: the probability that the signal was sent: for example, the prevalence or prior probability, π i = P [ Si ] the probability that the signal was received, q j = P [ Dj ] and the probability that a particular diagnosis was made given that the signal was sent, qij = P [ Dj Si ] = ∫R dPi ( Y ) j
The last expression is rather ominous, but in words it is the probability that the multivariate biomarker Y, a list (vector, array) of tests, is in the region Rj given that signal Si was sent through a noisy channel, properly modeled by the probability function Pi. This P can be either continuous or discrete or both and is generally a function of unknown parameters and time. It reflects both the biological variability and the analytical variability. Since this function contains the parameters, it is where the Fisher information applies. In its crudest form, qij is just a proportion of counts. Although the latter is simple and tractable for a biologist, it is subject to losing the most information about the underlying signal and does not lend itself readily to the incorporation of time used in pathodynamics. Generally, any categorization of continuous data will result in information loss. Table 1 shows the correspondence between signals (S) and diagnoses, or decisions (D), in terms of the probability structure. If the number of decision
INFORMATION FROM DATA
TABLE 1
655
Noisy Discrete-Signal Multichannel Communication Probability Table Decision
Signal
D1
D2
…
D
Total
S1 S2 ⯗ Sk Total
π1q11 π2q21 ⯗ πkqk1 q1
π1q12 π2q22 ⯗ πkqk2 q2
… …
π1q1 π2q2 ⯗ πkqk q
π1 π2 ⯗ πk 1
… …
classes is not equal to the number of underlying signals, inefficiencies are likely to occur. However, in biology it is not always possible to optimize this, especially if some of the signals are unknown. The mathematical objects under the control of the biologist are Y, P, R, and D. Since the rest of this book is mostly about Y, and in most cases y, a single biomarker, this chapter is mostly about P, R, and D. In other words, the biologist can optimize the qij’s only when D is specified. The one exception is when the experiment can be designed so that the πi’s are known and individuals can be selected to make them equally likely. A few calculations from information theory are presented here. For applications below, S1 will represent the “normal” state and D1 will represent the “normal” classification. The others will represent “abnormal” or pathological states and classifications. Pharmacological or other therapeutic effects will be considered as abnormal states except when a cure results (i.e., the subject reverts back to the normal state). The idea of entropy comes from thermodynamics, and it has been shown that thermodynamic entropy and Shannon entropy (information) are related. There are three primary types of Shannon entropy (average uncertainty): the entropy of the source, k
H ( S ) = − ∑ π i log 2 π i i =1
the entropy in the receiver,
H ( D) = − ∑ q j log 2 q j i =1
and the communication system entropy, k
H ( S, D) = − ∑ ∑ π i qij log 2 π i qij i =1 j =1
The base 2 logarithm is used here because the units of information are in bits (binary digits). With an ideal, noise-free communication channel, H(S) = H(D) = H(S,D). This means that qii = 1 and qij = 0 for all i ≠ j.
656
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
TABLE 2 Biological Channel Capacity (Bits) by the Number of Possible Signals Transmitted and the Probability of an Abnormal Signal Being Sent Probability of an Abnormal Signal (1 − π1) Signals (k)
0.99
0.95
0.9
0.8
0.7
0.6
0.5
2 3 5 10 25 50 75 100
0.08 1.07 2.06 3.22 4.62 5.64 6.23 6.64
0.29 1.24 2.19 3.30 4.64 5.62 6.19 6.58
0.47 1.37 2.27 3.32 4.60 5.52 6.06 6.44
0.72 1.52 2.32 3.26 4.39 5.21 5.69 6.03
0.88 1.58 2.28 3.10 4.09 4.81 5.23 5.52
0.97 1.57 2.17 2.87 3.72 4.34 4.70 4.95
1.00 1.50 2.00 2.58 3.29 3.81 4.10 4.31
0.4
0.3
0.2
0.1
0.05
0.01
0.001
0.97 1.37 1.77 2.24 2.80 3.22 3.45 3.62
0.88 1.18 1.48 1.83 2.26 2.57 2.74 2.87
0.72 0.92 1.12 1.36 1.64 1.84 1.96 2.05
0.47 0.57 0.67 0.79 0.93 1.03 1.09 1.13
0.29 0.34 0.39 0.44 0.52 0.57 0.60 0.62
0.08 0.09 0.10 0.11 0.13 0.14 0.14 0.15
0.01 0.01 0.01 0.01 0.02 0.02 0.02 0.02
2 3 5 10 25 50 75 100
A measure of the information transmitted is k
I ( S, D) = ∑ ∑ π i qij log 2 i =1 j =1
qij qj
When qij = qj, for i ≠ j, the log term is zero; no information about Si is transmitted. The channel capacity (C) is then defined as the maximum of I(S,D) over all possible values of the πi’s. For a noise-free system, this maximum occurs when all signals are equally likely (i.e., C = log2k). In the 2 × 2 case, C = 1 bit; in general, for biological systems, C = I(S,D), because the πi’s are fixed by nature. Table 2 shows the maximum amount of information possible for the case where πi = (1 − π1)/(k − 1) for the abnormal signals. If the biologist has some idea of the prevalence of the signal of interest, this table can give some idea of how feasible (futile) searching for biomarkers might be. This table shows that when several relatively rare signals are being sent via the same channel, very little information is available even in the best of conditions. Table 3 is a slightly different way of looking at the same question. These are the values in Table 2 divided by log2k. Similar theory is available when S and
657
MEASURES OF DIAGNOSTIC PERFORMANCE
TABLE 3 Biological Communication Efficiency (%) by the Number of Possible Signals Transmitted and the Probability of an Abnormal Signal Being Sent Probability of an Abnormal Signal (1 − π1) Signals (k)
0.99
0.95
0.9
0.8
0.7
0.6
0.5
2 3 5 10 25 50 75 100
8.1 67.6 88.8 96.9 99.5 99.9 100.0 100.0
28.6 78.0 94.2 99.3 100.0 99.6 99.3 99.1
46.9 86.4 97.7 100.0 99.0 97.8 97.2 96.9
72.2 96.0 100.0 98.1 94.5 92.4 91.3 90.7
88.1 99.8 98.2 93.3 88.1 85.3 83.9 83.1
97.1 99.1 93.5 86.5 80.1 76.9 75.4 74.5
100.0 94.6 86.1 77.8 70.9 67.5 65.9 64.9
0.4
0.3
0.2
0.1
0.05
0.01
0.001
97.1 86.5 76.3 67.4 60.4 57.0 55.5 54.5
88.1 74.5 63.8 55.2 48.6 45.5 44.1 43.2
72.2 58.2 48.3 40.8 35.3 32.7 31.5 30.8
46.9 35.9 28.8 23.7 20.0 18.3 17.5 17.0
28.6 21.2 16.6 13.4 11.1 10.0 9.6 9.3
8.1 5.7 4.3 3.4 2.7 2.4 2.3 2.2
1.1 0.8 0.6 0.4 0.3 0.3 0.3 0.3
2 3 5 10 25 50 75 100
D are continuous rather than discrete, but this does not seem relevant for biomarkers since diseases are treated as discrete entities [23]. Some might argue that hypertension and hypercholesterolemia are continuous diseases, but under the paradigm in this chapter, they are just biomarkers, for which some arguments can be made that they are surrogates for particular diseases.
MEASURES OF DIAGNOSTIC PERFORMANCE Much has been written about this topic. In this section we point out some issues with the current wisdom, and the reader can decide if changes are needed. In addition, the impact of the measurement of time changes on Shannon information is discussed here. In standard statistical hypothesis testing, there are two types of inference errors: type I (α) is where the null hypothesis (H0) is rejected when it is true, and type II (β) is where the null hypothesis (H0) is accepted when it is false. With the null hypothesis symbolized by signal 1 (S1) and the alternative hypothesis (H1) by signal 2 (S2), Table
658
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
TABLE 4 Hypothesis-Testing Outcome Probabilities Decision Signal S1 S2
D1
D2
1−α β
α 1−β
TABLE 5 Two-Channel Signal Detection Probabilities Decision Signal
D1(H0)
D2(H1)
P[Si]
S1 S2 P[Dj]
π1(1 − α) π2β q1
π1α π2(1 − β) q2
π1 π2 1
4 shows the error relationship for the decision that signal 1 was detected (D1) or signal 2 was detected (D2). In customary frequentist statistical practice, α is chosen to be fixed at 0.05, and a fixed sample size (n) is estimated to attain a power = 1 − β for some specified value in the interval (0.75, 0.95). The reality in biomedical science is that the power is fudged to get a sample size that the scientist can afford. It should really be set at some conventional value like α, so that experiments are comparable and have the same probability of missing the signal. Everything in hypothesis testing is focused on controlling α and letting β float. Unless a biomarker whose characteristics are fully understood is being used to prove efficacy in a phase III trial, this is probably not the best way to evaluate the biomarker. Table 4 looks very similar to Table 1 but includes only the qij’s. Table 5 is the proper setup for evaluating the information. This is a Bayesianlike framework because is requires the prior probability for each hypothesis. If it is assumed that π1 = π2 = 0.5, then Tables 4 and 5 are equivalent, a state of ignorance in many cases. However, most biologists are not ignorant about the underlying system; a bad guess for these probabilities would probably be better for evaluating biomarkers than assuming equality because it can be very misleading to assume that the maximum information is transmitted if, in fact, it is not (see Tables 2 and 3). Much money can be wasted when numbers are misused. Diagnostic tests (biomarkers) are usually evaluated in terms of true positives (TPs), true negatives (TNs), false positives (FPs), and false negatives (FNs). Table 6 is a typical setup for this evaluation. Again these are just the qij’s obtained by counting in most cases. The concepts of biomarker sensitivity and specificity are as follows [5]:
MEASURES OF DIAGNOSTIC PERFORMANCE
659
TABLE 6 Outcomes for a Binary-Decision Diagnostic Test Decision Signal
D1(−)
D2(+)
Total
S1 S2 Total
TN FN TN + FN
FP TP FP + TP
TP + FN FP + TN n
TP = 1−β TP + FN TN = 1− α specificity = TN + FP sensitivity =
It is clear that these are just q11.and q22, respectively. Obviously, a biomarker such as aspartate aminotransferase (AST, SGOT) has information about many diagnoses; therefore, looking at each diagnosis one at a time seems less than optimal, could be grossly misleading, and probably should be avoided when possible. Mathematically, there is no reason to limit the diagnostic categories to two. However, with more than two outcomes, the terms sensitivity and specificity become somewhat meaningless. Perhaps a term such as D-specificity would be appropriate for the generalization, where D is replaced by the particular disease name. The D-specificity is a measure of how well the biomarkers Y detect the specific signal under the set of decision rules R. This is an important metric for the biomarker developer. However, for biomarker application the biomarker’s utility must be evaluated in light of the prior probabilities of the signals being sent. A common way, preferred by clinicians, to get at this issue is through the positive and negative predictive values, PPV and NPV, respectively [5]. For the normal–abnormal case, the PPV = P[S2|D2]. Bayes’ theorem can be applied to get PPV =
π 2 (1 − β ) π 1α + π 2 ( 1 − β )
Similarly, the NPV = P[S1|D1], which is NPV =
π 1 (1 − α ) π 1 (1 − α ) + π 2 β
In comparing these formulas to Table 5, these are the D-diagonal terms divided by the sum of the corresponding column. Both will have the value 1
660
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
in a noise-free system. For more than two outcomes, these can be generalized to D-predictive values (DPV), where DiPV = P[Si|Di] and Dj PV =
π j q jj k
∑ π i qij i =1
This is the probability that the clinician would use if considering only one diagnosis against all others. More commonly, a clinician is considering a number of diagnoses simultaneously; this is called the differential diagnosis. A list of probabilities would be appropriate for this situation. Since only one D is chosen in the information framework, a probability for each potential signal given D can be created to arrange the differential diagnosis in order of decreasing probability. The D-differential value can be defined as DjDV = P[Si|Dj] and calculated as Dj DV =
π i qij k
∑ π i qij i =1
which is just proportion for each cell in the Dj column for each S. These are the numbers that a clinician would use to order the differential diagnosis according to probability. Therefore, the context for the utility of a biomarker really depends on how it will be used. The ROC (receiver operating characteristic) curve is a common way to evaluate biomarkers. It combines specificity and sensitivity. Figure 3 illustrates how the ROC curve is constructed. These represent the probability density p(y) for a continuous Gaussian biomarker y. S1 has mean 0 and standard deviation 1 in all cases. For cases A and D, S2 has mean 4 and variance 1; for cases B and C, the means are 0.01 and 2, respectively. The vertical black line represents the partition determined by some optimality rule: The y values to the left of the line represent R1, and the values to the right, R2. If the signal observed is on the left side (D1), the signal is called S1; if on the right side (D2), it is called S2. The total channel noise (N) is given by N = π 1α + π 2 β These are the off-diagonal terms in Table 5. One optimization rule is to choose the cut point z so that N is minimized. This leads to the relationship π 1 p1 ( z) =1 π 2 p2 ( z) which can be rewritten in a simpler form
MEASURES OF DIAGNOSTIC PERFORMANCE (A)
(B) 0.4
0.4 D1
D2
0.2 S1
D1
0.3 f(x)
0.3 f(x)
661
D2
S2
0.2
S2
S1
0.1
0.1 α
β
α
β
0.0
0.0 −4
−2
0
2
4
6
8
−4
−2
0
2
x
4
6
8
x
(C)
(D) 0.4 0.6 D1
D2
D1 f(x)
f(x)
0.3 0.2
S1
D2
0.4
S2 0.2
0.1
S1
α
β
β
0.0
0.0 −4
−2
0
2 x
4
6
8
−4
−2
0
α S 2 2 x
4
6
8
Figure 3 Construction of diagnostic rules for various probability structures. (A) Signal S1 has prior probability 0.5, mean 0, and standard deviation 1; S2 has prior probability 0.5, mean 4, and standard deviation 1. (B) S1 has prior probability 0.5, mean 0, and standard deviation 1; S2 has prior probability 0.5, mean 0.1, and standard deviation 1. (C) S1 has prior probability 0.5, mean 0, and standard deviation 1; S2 has prior probability 0.5, mean 1, and standard deviation 1. (D) S1 has prior probability 0.9, mean 0, and standard deviation 1; S2 has prior probability 0.1, mean 4, and standard deviation 1. (See insert for color reproduction of the figure.)
log
π1 + log p1 ( z) − log p2 ( z) = 0 π2 π 1 1 2 log 1 − z2 + ( z − μ ) = 0 π2 2 2
The solution of this equation (z) is the point at which the density functions cross. In cases A, B, and D, the black line is placed at this point. For cases A, B, and C, π1 = π2 = ½; in case D, π 1 = 9 10 and π 2 = 1 10.
662
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
This approach is known in statistics as discriminant analysis and in computer science as supervised learning. Sometimes classification is done with logistic regression, which is equivalent when π 2 p2 ( z) π 1 p1 ( z) + π 2 p2 ( z) 1 = 1 + π 1 p1 ( z) π 2 p2 ( z)
P [ S2 z] =
=
1 1 + exp [ log ( π 1 π 2 ) + log p1 ( z) − log p2 ( z)]
A D
0.8
1.0
The cut point is the z where P[S2|z] = ½, which is the same as above as long as the logistic model has this form. Usually, the first log term is estimated from the data, which will give a result that is not optimal for the real prior probabilities. The receiver operating characterization (ROC) curve is constructed by sweeping the black line from right to left and calculating α and β at each point. The results are shown in Figure 4 for all four cases. The positively sloped diagonal line represents the case when the distributions are exactly the same. The letters for the cases in Figure 3 are placed at the optimal points, respectively. For equally likely priors, they fall on the negatively sloped diagonal line. A statistical test can be performed to see if the area under the ROC curve
1− −β
0.6
C
0.0
0.2
0.4
B
0.0
0.2
0.4
0.6
0.8
α
Figure 4
ROC curves for cases A to D.
1.0
MEASURES OF DIAGNOSTIC PERFORMANCE
663
is significantly greater than ½. Unfortunately, this only indicates that the biomarker can detect one of two signals under ideal conditions. A better test might be one that can detect if the length of the vector
(
P12 ( z) + [1 − P2 ( z)]
2
)
from the lower right corner to the optimal point extends significantly beyond the positive diagonal line. It should be noted that this length measure depends both on the data observed and on the priors. If one of the two signals is rare, it would be very hard to show significance. This is not true for the general test of ROC area. With respect to evaluating a biomarker, Figures 3 and 4 demonstrate some interesting aspects. Case A represents a situation where a single test discriminates well between S1 and S2 but it is not noise-free. It would be difficult to do better than this in the real world. What is striking is that case B, which shows essentially no visible separation, actually has an ROC area greater than ½ that could be detected with sufficiently large n. Typical ROC curves look more like case C, perhaps slightly better. The densities in this case show only modest separation and would not make a very impressive biomarker. However, the ROC curve area test might tend to get people excited about its prospects. If one of the signals is rare, it becomes essentially undetectable, but the ROC area test gives no indication of this. Case C is a good candidate for adding another test to make it a multivariate biomarker. In higher dimensions, greater separation may occur (i.e., more information might be transmitted). However, the ROC curve cannot handle more than two outcomes, at least not in a visually tractable way, although collapsing cells by summing in Table 1 to the 2 × 2 case might work in some cases. This takes us back to I(S,D) as a measure of the biomarker’s utility since it works for any number of signals and any number of tests. A generalization of the discriminant function would minimize the total noise, obtained by summing the off-diagonal elements of Table 1: k
k
N = ∑ ∑ π i qij − ∑ π i qii i =1 j =1
i =1
This minimization would probably be difficult using calculus as before and would require a numerical analysis approach, too involved to describe here. This minimization would determine the R’s. The partitions, the edges of the R’s, are a point, a line, a plane, and flat surfaces of higher dimension as the number of tests increases from one, respectively, assuming that the parameters of the P’s differ only in the means and that the P’s are Gaussian. When the other parameters differ among the P’s, the surfaces are curved. Without some involved mathematics, it is difficult to know if the minimal noise optimization is equivalent to the maximal information optimization. It seems entirely feasible that communication systems with the same total noise could have different diagonal elements, one of which might give the most information about all the signals in the system. The answer is unknown to the
664
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
author and is probably an open research question for biological systems. Another level of complication, but one that has more real-world relevance, is one that minimizes the cost. This is discussed at an elementary level elsewhere [24]. Multiplying each term in Table 1 by its cost, and minimizing the total expected cost, is a classical decision theory approach [25–27]. The previous discussion of diagnostic performance has said little about time effects. The time variable is contained in P and will add another dimension for each time point measured, in general. This presents a severe dimensionality problem, both for the biologist and the analyst, since each measurement of the biomarker on the same person creates a new dimension. If the measurement times are not the same for all cases, the entire optimization process may depend on which times are chosen. The biggest problem with dimensionality is that it usually involves a growing number of parameters. To obtain precise (high-Fisher-information) estimates of all the parameters simultaneously, the sample size requirement grows much faster than the dimension. Here is a place where invariance plays a key role. If some parameters can be shown to be biological constants analogous to physical constants, through data pooling they can be estimated once and reused going forward. If the parameters vary with time, the time function for the parameter needs to be determined and reused similarly. Functions and parameters are very compact objects for storing such reusable knowledge. GENERAL STRATEGY FOR DEVELOPING AND USING DYNAMIC BIOMARKERS To summarize, there are many types of information, only two of which were described here. In general, dynamic biomarkers will have more information than static biomarkers; multivariate biomarkers will have more information than univariate biomarkers. These ideas are foreign to most biologists and will take some time to spread among them. The standard entities used to evaluate biomarkers, such as sensitivity, specificity, and ROC curves, have questionable or limited utility but can easily be modified to fit the Shannon information framework. Some steps for biomarker development and implementation are given here as a guideline and probably contain significant gaps in knowledge, requiring further study: 1. Choose the signals (S) or the surrogates (D) that are relevant (biology). 2. Choose the best models (P) for the given S (mathematics/statistics) and all available biomarkers (Y) (biology). 3. Choose the best decision rules (R) given P (mathematics/statistics). 4. Choose the best subset, or subspace, of Y (statistics). The mathematical entities P, R, and Y determine the diagnosis of S. Currently, most of the effort is focused on Y.
MODELING APPROACHES
665
MODELING APPROACHES Signal Types This section deals with some very specific aspects of the models for P(Y t). Most people think of signals as being electrical. This is probably because most of the terminology and use comes from electrical engineering. However, the mathematics is completely general. Signals can be static or dynamic, meaning something that is measurable with a constant value or a time-varying value, respectively. Biomarkers are just biological signals. Signals are classified as analog (continuous) or digital (discrete). Birth and death are discrete signals; blood pressure and serum glucose levels are continuous signals. Dynamic signals vary over time. Time can also be classified as continuous or discrete, as described above. Anything that changes in time and space is a dynamic system. This generally implies that space has more than one dimension, but it does not have to be physical space. The space of biomarkers would be an example where each univariate biomarker (signal) defines a spatial dimension. A continuous system is modeled with a set of differential equations where the variables defining the space usually appear in more than one equation. A discrete system is modeled with a set of difference equations. Mathematical Models of Dynamic Signals Mathematical models are usually hypothesized prior to the experiment and then verified by the experimental data. Historically, these models have been deterministic differential equations. Pharmacokinetics is an example of this. Following are some typical mathematical models (ordinary differential equations) and their solutions: dx dt dx Quadratic model: dt dx Exponential model: dt dx Sigmoidall model: dt dx Sine model: dt Linear model:
= β1
⇒ x = β 0 + β1 t
= β1 + 2β 2 t
⇒ x = β 0 + β1 t + β 2 t 2
= β1 x
⇒ x = e β 0 + β1 t
=
β1 β0 β 2 x (β 2 − x ) ⇒ x = β2 β 0 + (β 2 − β 0 ) e β 1 t
= β1 x (β 22 − x 2 ) ⇒ x = β0 + β 2 sin β1t
The middle column is the hypothesized model of the velocity (first time derivative) of x, a dynamic quantity. Solving such models involves finding a method that ends up with x by itself on the left of the equal sign and a
666
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
function of time on the right. More complicated types of models are used by mathematical biologists to describe and predict the behavior of microscopic biological processes such as intracellular metabolism or nerve conduction [3]. Statistical Models Statistical models are models that have at least one random (stochastic) variable. Going forward, mechanistic biological models will probably be a mixture of deterministic and stochastic variables. Most statistical models describe the behavior of the mean of P as a function of other variables, including time. Unless there is a specific experimental design model such as analysis of variance (ANOVA), statistical models tend to be constructed after the behavior of the data is known. A serious loss of information can occur if the wrong model is chosen. Here are some typical statistical models of data varying in time, where T is the total time of observation: Mean model: ANOVA model: Straight-line model: Polynomial model: Trigonometric model:
Log-linear model: Sigmoidal model: Sine model: OU model:
yt = μt + εt yt = μ t + ε yt = β0 + β1 t + ε yt = β0 + β1 t + β 2 t 2 + β 3 t 3 + β 4 t 4 + + ε 2 πt 2 πt y t = β0 + β1 cos + β 2 sin T T 4 πt 4 πt + + ε + β 3 cos + β 4 sin T T 2
3
4
yt = eβ0 + β1t + β2 t + β3t + β4 t ++ ε β0 β 2 yt = +ε β0 + (β 2 − β0 ) eβ1t yt = β0 + β 2 sin β1 t + ε
y t = e − α (t − s) ys + μ (1 − e − α (t − s) ) t
+ 2ασ e − α (t − s) ∫ s eαu dBu + ε In all of these models, it is generally assumed that ε is the random measurement error, or residual difference between the mean of the model and the observed y. This error is assumed to be Gaussian with mean zero and variance σ 2ε , and each measurement error is statistically independent of all the others. Since the rest of the model is usually deterministic, in most fields of measurement it is the error term that induces y to be a random variable. However, in biology there are many sources of biological variation as well. In the OU model when the biological Brownian motion B is also contributing variation in a complicated way, mechanistic interpretation is much easier through the SDE. It should be noted that all the parameters in these models are also independent of time, except the first two.
MODELING APPROACHES
667
The “mean model” is not really a model; it represents simply the calculation of the mean and standard deviation at each time point, a typical biologist approach. These are descriptive statistics and do not lend themselves to efficient statistical analyses. The next five models are all special cases of the linear model; that is, they are linear in the unknown parameters, not in the variables. ANOVA is a model that parameterizes the mean for each combination of the experimental factors; there are many equivalent ways to parameterize these means. The fundamental difference between the mean model and ANOVA is that for the latter, the standard deviations are assumed to be equal, leading to a more precise estimate of the means, if the assumption is true. In the representation of ANOVA in the example below, the cell-mean model is used, meaning that a mean is estimated for each combination and any modeling of those means is done using contrasts of those means (shown above). Given the data and a linear model, a system of equations, one for each measurement, can be solved using linear algebra to get OLS estimates of the parameters. The last three models are nonlinear and require iterative methods to solve for the parameters, which means repeated guessing using mathematical techniques such as calculus. This requires a computer and a stopping rule to determine when the guess is close enough. Sometimes it is not possible or is very difficult to get the algorithm to find the solution. This is related to the nature of the model and the nature of the data. Statisticians prefer to avoid nonlinear models for this reason. Note that the OU model is nonlinear but can be “linearized” in special cases. Monte Carlo Data Generation Most software packages have random number generators. These are actually pseudorandom numbers because all of the algorithms are deterministic [28]. Monte Carlo simulations are just the generation of these numbers to suit a particular purpose. The two most common uses of Monte Carlo methods are the generation of random data to evaluate or display statistical results and the solution of deterministic equations that are too difficult to solve using mathematical methods. Commercial simulators use Monte Carlo methods as well. All of the data used in the examples below were generated by this method. Each of the example experiments used the same data sets for a given autocorrelation and response percentage, starting with the first observation in each group. The underlying probability distribution was the OU model using different means where the control (no-effect) group (n = 500) had all constant parameters and the experimental group (n = 500) had the same parameters except that the mean was a quadratic model in time deviating from the controls after t = 0, with δ being assigned randomly as a 0 or 1, with probability p for responding to the intervention: Parameter Control Group Experiment Group μ β0 β 0 + δ (β 1 t + β 2 t 2 )
668
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
The values for β0, β1, and β2 were 3, 1/5, and −2/300, respectively; σ was ½; and α was either 3 to simulate independence or 0.03 to simulate a value similar to those observed previously in liver tests [29,30]. The response proportion was varied among 1.0, 0.5, and 0.1 to show the effect of partial responses, which is common in biology. A measurement error was added with a CV of 10% (e.g., σε = 0.1μ), except that the added noise was based on the actual measured value, not the average measured value, where it was assumed that the raw data were log-Gaussian, like an enzyme [18,31]. The parameters of the quadratic response model were chosen so that the maximum is 3σ from the control group mean halfway between the ends of the time interval. The first 50 samples in each group of the simulated data are shown in Figures 5 and 6 for α = 3 and α = 0.03, respectively. In Figure 5 at 100% response, the quadratic response is very clear and diminishes with decreasing
6 5 4 3 2 1 0 0
5
10
15
20
Experimental Group
100 %
yt
yt
Control Group
25
6 5 4 3 2 1 0
30
0
5
10
Time
15
20
25
30
20
25
30
20
25
30
Time
6 5 4 3 2 1 0
yt
yt
50 %
0
5
10
15
20
25
6 5 4 3 2 1 0
30
0
5
10
Time
15 Time
6 5 4 3 2 1 0
yt
yt
10 %
0
5
10
15 Time
20
25
30
6 5 4 3 2 1 0 0
5
10
15 Time
Figure 5 Examples of the simulated data for α = 3 and responses 100%, 50%, and 10%.
MODELING APPROACHES
6 5 4 3 2 1 0 0
5
10
15
20
Experimental Group
100 %
yt
yt
Control Group
25
669
6 5 4 3 2 1 0 0
30
5
10
Time
15
20
25
30
20
25
30
20
25
30
Time
6 5 4 3 2 1 0
yt
yt
50 %
0
5
10
15
20
25
6 5 4 3 2 1 0
30
0
5
10
Time
15 Time
6 5 4 3 2 1 0
yt
yt
10 %
0
5
10
15 Time
20
25
30
6 5 4 3 2 1 0 0
5
10
15 Time
Figure 6 Examples of the simulated data for α = 0.03 and responses 100%, 50%, and 10%.
response. No statistician is needed here. In Figure 6 the signals have a very different appearance, even though it is only α that differs. The strong autocorrelation mutes the variation in the signal and causes it to lag in time. There is little visual difference among the difference responses. A look at the plots might suggest that there is no signal at all, but the mean signal is exactly the same as in Figure 5. Standard statistical methods currently used in drug development may not find this signal. The examples below are intended to illustrate some of the issues. It will obviously take more sophisticated methods, not used in this chapter, to detect this signal properly under autocorrelation. The following statistical models were used for the experimental effect while t was set to zero at all time points on the right-hand side for the control group:
670
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
Model 1: Model 2: Model 3: Model 4: Model 5: Model 6: Model 7: Mod del 8: Model 9: Model 10:
yt = μ t + ε yt = γys + μ t + ε yt = γy0 + μ t + ε yt − y0 = μ t + ε yt − y0 = γ ( ys − y0 ) + μ t + ε yt = β0 + β1t + β 2 t 2 + β 3 t 3 + β 4 t 4 + ε yt = γys + β0 + β1t + β 2 t 2 + β 3 t 3 + β 4 t 4 + ε yt = γy0 + β0 + β1t + β 2 t 2 + β 3 t 3 + β 4 t 4 + ε yt − y0 = β0 + β1t + β 2 t 2 + β 3 t 3 + β 4 t 4 + ε yt − y0 = λ ( ys − y0 ) + β0 + β1t + β 2 t 2 + β 3 t 3 + β 4 t 4 + ε
These models are compared in the simulated experiments below. Models 1 to 10 used the lm(·) function in R [32], which is OLS. Models 11 to 20 are the same models, respectively, except that the linear mixed-effect model lme(·) in R with restricted maximum likelihood estimation was used, allowing random baselines (intercepts). Model 1 is ANOVA, model 2 is ANOVA covariateadjusted for the previous observation, model 3 is ANOVA covariate-adjusted for the baseline, model 4 is ANOVA for the change from baseline, and model 5 is ANOVA for the change from baseline that is covariate-adjusted for the previous change from baseline. Models 6 to 10 replaced the means with fourthdegree polynomials in time. These models were chosen because they are typical of what a statistician or a biologist might apply to similar data sets to study biomarkers when the underlying time-dependence structure is not known. The focus is on the mean response, which is typical; no attempt was made to extract the biological variation component σ or the analytical variation component σε. The nonconstant variance of the residuals was also ignored. If there is no measurement error and the biology is known to follow an OU model, model 2 or 7 would be the “best” to estimate α, μ, and σ, although transformations would be needed to get the correct estimates. For example, α = −log γ/(t − s) only if t − s is a constant, which was used below. When γ is not in the model, a value of “na” is shown in the results; when it is negative, “*” is shown. The mean would have to be transformed similarly, but this was not done below in the calculation of the bias, since it is not likely that the analyst would know the proper transformation. If the underlying process is known to be OU and the baseline of both groups has the same distribution as the control group over time, covariance adjustment for the baseline loses information about α and increases the model bias, possibly masking the signal. Modeling the change from baseline underestimates the variance because it assumes that the variance of the baseline distribution is zero. This approach might give less biased results but is likely to have p-values that are too small, meaning that false signals may be detected.
EXAMPLE EXPERIMENTS
671
Statistical tests for signal detection in the experiments below are based on the differences in parameters between the two groups: linear effect, quadratic effect (quad), cubic effect, and quartic effect. Hierarchial statistical testing was performed to learn about the mathematical form of the biological response. An overall test for the deviation from baseline was done to see if any signal was present. This is the test that counts since it controls the type I error. The subsequent tests were secondary and were not corrected for multiple testing. Second, a test for a fourth-degree polynomial (poly) was done; if it is significant, it means that at least one coefficient in the polynomial is not zero. Then, each degree of the polynomial is tested to determine the functional shape of the curve. In the simulated data, only the true quadratic term is nonzero; any other significant findings are spurious. In all cases presented below, the estimates of σ are too low, resulting in p-values that are too small. To evaluate the Fisher information for each model, the MSE was calculated using the estimated variance (Var) of the time function integrated over time from 0 to 30 and the integrated bias using the true polynomial. This represents the mean squared area between the true time function and the estimated time function. In most experimental cases, the bias can never be known, but it can be reduced using techniques such as bootstrapping [21]. When using OLS, it is assumed that the estimated parameters are unbiased. In a real statistical experiment, the behavior of the particular estimation procedure would be simulated hundreds or thousands of times and the distributions of the results would be studied. Here only a single simulation was done for illustration, and no scientific conclusion should be inferred from this presentation. EXAMPLE EXPERIMENTS In Vitro Experiments This section is intended to explore the behavior of an OU process when only a few experiments are run but the response can be measured at many equally spaced time points. Assay development is not discussed here but can be found in many papers and textbooks, such as that of Burtis and Ashwood [33]. All the experiments that follow assume that the assay has been “optimized” according to the existing standards. One point that needs to be stressed here is the relationship between the true value of the biomarker μ and the variation of the measurement error σε, assuming that no assay bias is present. Many use the concept of the CV = σε/μ being constant. If this relationship really holds, the statistical analyst needs to know that and approximately what the constant is. Prior to using a biomarker, it is important to define the mathematical relationship between σε and μ so that the most information can be gained by the modeling. It should also be noted the since μ is a function of time, the measurement variance will also be a function of time. Most statistical procedures assume that the measurement variance is constant, including those in these examples. This assumption will tend to hide signals.
672
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
In the laboratory, it is relatively easy to obtain measurements at frequent, equally spaced time intervals. If a good biological model is available, this is the place to define the best (maximum Fisher information) mathematical model of the response. If such a model is not known, it should be explored first. This section looks at two simulation experiments comparing the size of α (3 vs. 0.03) with 100% response. Each has n = 5 experimental units per group, and each unit has m = 30 follow-up measurements. These could represent chemical reaction experiments, cell measurements, well measurements, or any similar in vitro biological model for a biomarker. The raw data are shown in Figures 5 and 6, and the statistics are shown in Tables 7a and 7b. When α is large, the measurements are effectively independent and there should be no relationship to the previous measurements, including the baseline. However, if the baselines do not come from the same distribution, the comparability of the results is called into question. Here in Table 7a, only models 1, 5, and 7 gave reasonable estimates of α. In the strong autocorrelation case, Table 7b shows that models 2, 5, 7, 10, 12, 15, 17, and 20 gave estimates of α that, although not very precise, were the correct order of magnitude. In Table 7a, every model detected the signal and the polynomial and the quadratic effect intended. However, there are many p-values showing significance where there should not be any. This can lead to spurious conclusions about the nature of the biomarker. In the experiment shown in Table 7b, several models gave reasonable indications for the magnitude of the autocorrelation but did not always find the underlying signal. In the first case, all the biases show overestimates of the time effect, while in the second case, the biases are in the opposite direction and have a larger magnitude, making the MSE larger—less information. The larger and negative bias is exacerbated by the fact that the time model parameter estimates need to be divided by 1 − e−α(t−s), which is always less than 1. This correction was not done because the estimates of α are not good enough to make the correction reasonable to use. Additionally, the analyst would not be estimating α, making these results more representative of current practice. When t − s is not constant, software for the estimation procedure is not readily available. In Vivo Experiments This section is meant to illustrate experiments of the size used in animal studies but may also apply to early clinical development. Here there are only m = 4 follow-up measurement, but the number (n = 36) per group was scaled up so that approximately the same number of observations are available. This results in comparable degrees of freedom for the model errors. In Tables 8a and 8b, the results are analogous to those in Tables 7a and 7b. However, in Table 8a the estimates of α are generally all bad, leading to the conclusion that autocorrelation is present, when it really is not. The quadratic
673
na 1.29 * na 6.39 na 1.58 * na * na * * na * na * * na *
na, not available. *, negative estimate.
a
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
240 239 239 240 239 295 294 294 295 294 231 230 9 231 230 286 285 286 286 286
Error df
0.59 0.57 0.50 0.50 0.50 0.57 0.56 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50
σ 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
All 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Poly 0.08 0.06 0.02 0.02 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Linear
α
Model
a
Statistical Results for α = 3, p = 1, m = 30, and n = 5
TABLE 7a
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quad
p-Value
0.54 0.52 0.67 0.64 0.64 0.01 0.03 0.06 0.04 0.04 0.00 0.00 0.23 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Cubic 0.29 0.44 0.34 0.32 0.32 0.02 0.04 0.06 0.05 0.05 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quartic 0.0110 0.0111 0.0080 0.0080 0.0090 0.0040 0.0091 0.0035 0.0031 0.0070 0.0080 0.0090 0.0080 0.0080 0.0090 0.0233 0.0308 0.0040 0.0036 0.0079
Var
0.2746 0.1452 0.2802 0.2795 0.2787 0.2430 −0.0166 0.0381 0.0626 0.0993 0.2746 0.2767 0.2802 0.2795 0.2819 0.2307 0.2905 0.0397 0.0640 0.1103
Bias
0.0864 0.0322 0.0865 0.0861 0.0867 0.0630 0.0093 0.0050 0.0070 0.0169 0.0834 0.0856 0.0865 0.0861 0.0884 0.0766 0.1152 0.0056 0.0077 0.0201
MSE
674
na, not available.
a
na 0.02 0.61 na 0.01 na 0.02 0.61 na 0.01 na 0.07 0.61 na 0.08 na 0.10 0.56 na 0.10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
a
α
240 239 239 240 239 295 294 294 295 294 231 230 9 231 230 286 285 286 286 286
Error df
0.51 0.12 0.44 0.49 0.12 0.47 0.12 0.41 0.45 0.12 0.26 0.12 0.26 0.26 0.12 0.25 0.12 0.25 0.25 0.25
σ 0.00 0.23 0.00 0.00 0.32 0.00 0.02 0.00 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
All 0.00 0.20 0.00 0.00 0.15 0.00 0.02 0.00 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Poly 0.00 0.13 0.00 0.00 0.08 0.00 0.04 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.22 0.00 0.00 0.00 0.00 0.00
Linear
Statistical Results for α = 0.03, p = 1, m = 30, and n = 5
Model
TABLE 7b
0.24 0.27 0.19 0.25 0.29 0.03 0.10 0.03 0.07 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quad
p-Value
0.77 0.13 0.69 0.69 0.13 0.08 0.14 0.07 0.13 0.16 0.00 0.00 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Cubic 0.45 0.32 0.41 0.49 0.34 0.11 0.15 0.09 0.16 0.16 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quartic 0.0085 0.0005 0.0062 0.0078 0.0005 0.0028 0.0003 0.0021 0.0025 0.0003 0.0023 0.0005 0.0023 0.0023 0.0005 0.0129 0.0011 0.0122 0.0126 0.0012
Var
MSE 0.3658 0.0686 0.3623 0.3629 0.0669 0.0670 0.9204 0.0865 0.1059 0.9333 0.3596 0.0781 0.3584 0.3573 0.0794 0.3055 0.8869 0.2960 0.3108 0.8980
Bias −0.5978 −0.2610 −0.5967 −0.5959 −0.2578 −0.2534 −0.9592 −0.2905 −0.3215 −0.9659 −0.5978 −0.2787 −0.5967 −0.5959 −0.2809 −0.5409 −0.9411 −0.5327 −0.5461 −0.9470
675
na, not available.
a
280 279 279 280 279 283 282 282 283 282 209 208 71 209 208 212 211 212 212 212
0.66 0.58 0.51 0.51 0.51 0.66 0.57 0.51 0.51 0.51 0.49 0.58 0.49 0.49 0.51 0.49 0.57 0.49 0.49 0.49
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
All 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Poly 0.05 0.00 0.05 0.05 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.97 0.00 0.00 0.00 0.00 0.00
Linear
na 0.09 0.00 na 0.27 na 0.64 0.00 na 0.27 na 0.09 0.00 na 1.92 na 0.09 0.00 na 0.27
σ
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Error df
α
Model
a
Statistical Results for α = 3, p = 1, m = 4, and n = 36
TABLE 8a
0.00 0.00 0.00 0.00 0.00 0.11 0.00 0.06 0.06 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quad
p-Value
0.43 0.47 0.21 0.21 0.47 0.23 0.00 0.16 0.16 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Cubic 0.34 0.05 0.26 0.25 0.15 0.22 0.01 0.15 0.14 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quartic 0.0024 0.0021 0.0014 0.0014 0.0018 0.0054 0.0062 0.0033 0.0033 0.0056 0.0013 0.0021 0.0013 0.0013 0.0018 0.0124 0.0062 0.0042 0.0040 0.0056
Var
MSE 0.0451 0.0041 0.0420 0.0420 0.0266 0.0056 0.1925 0.0085 0.0084 0.0389 0.0441 0.0041 0.0418 0.0418 0.0266 0.0126 0.1925 0.0093 0.0092 0.0389
Bias −0.2068 −0.0452 −0.2013 −0.2013 −0.1574 0.0124 −0.4316 −0.0717 −0.0718 −0.1825 −0.2068 −0.0452 −0.2013 −0.2013 −0.1574 0.0124 −0.4316 −0.0717 −0.0718 −0.1825
676
na 0.01 0.02 na 0.02 na 0.01 0.02 na 0.02 na 0.01 0.02 na 0.04 na 0.01 0.02 na 0.05
na, not available.
a
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
280 279 279 280 279 283 282 282 283 282 209 208 71 209 208 212 211 212 212 212
Error df
0.62 0.31 0.43 0.44 0.31 0.61 0.31 0.43 0.44 0.31 0.27 0.31 0.27 0.27 0.29 0.27 0.31 0.27 0.27 0.27
σ 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
All 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Poly 0.00 0.92 0.00 0.00 1.00 0.86 0.93 0.80 0.80 0.94 0.00 0.15 0.00 0.00 0.00 0.00 0.21 0.00 0.00 0.91
Linear
α
Model
a
Statistical Results for α = 0.03, p = 1, m = 4, and n = 36
TABLE 8b
0.26 0.00 0.11 0.11 0.00 0.70 0.59 0.59 0.59 0.59 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quad
p-Value
0.49 0.95 0.33 0.33 0.94 0.78 0.52 0.69 0.69 0.53 0.00 0.37 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Cubic 0.71 0.15 0.59 0.60 0.16 0.86 0.55 0.80 0.80 0.56 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quartic 0.0021 0.0005 0.0010 0.0010 0.0006 0.0047 0.0012 0.0023 0.0024 0.0013 0.0005 0.0005 0.0004 0.0004 0.0005 0.0155 0.0012 0.0064 0.0065 0.0020
Var
MSE 0.2096 0.1244 0.2086 0.2086 0.1255 0.4935 0.7737 0.4910 0.4910 0.7682 0.2080 0.1244 0.2080 0.2080 0.1377 0.5043 0.7737 0.4951 0.4952 0.7092
Bias −0.4556 −0.3520 −0.4556 −0.4556 −0.3535 −0.6992 −0.8789 −0.6991 −0.6991 −0.8757 −0.4556 −0.3520 −0.4556 −0.4556 −0.3704 −0.6992 −0.8789 −0.6991 −0.6991 −0.8409
DISCUSSION
677
signal is missed in many models in Table 8b. A second design was analyzed where there was only one follow-up time (m = 1). This is typical of designs that incorporate time but want independent measurements. In comparing Tables 8e to 8h with their counterparts Tables 8a to 8d, respectively, the signals detected are much weaker and, of course, there is no information about α even though estimates were calculated from the regression coefficients for the baseline term.
Clinical Trials Clinical trials generally have much larger sample sizes, especially phase III trials. In this section the same designs and models are used, but the sample size is increased. In Tables 9a and 9b the response is 50% in the experimental treatment group and n = 200. In Tables 9c and 9d, the response is only 10% in the experimental group and n = 500. The latter case is more typical of clinical safety data. With larger sample sizes, the properties of the models would be expected to improve. The p-values may get smaller and the MSEs, at least the variance component, should be reduced because more samples should produce more information. Here the bias seems to be unaffected for both response categories. This generally means that if the wrong model is chosen, more measurements will not make it better. Tables 9a and 9b look very similar to those above, but Tables 9c and 9d show some notable features. First because the models do not estimate the proportion p of responders, the models for the experimental treatment group are a weighted average of 10% quadratic response and 90% no response. This should cause the bias to increase, which it generally does. When autocorrelation is present and the response rate is low (Table 9d), the signal is lost completely (i.e., no information is available). It remains to be seen if better statistical procedures can find this signal.
DISCUSSION Overview of Results Biomarker experiments with repeated measures over time do not ensure that additional information will be obtained, even though, theoretically, it is guaranteed. The experimental design has to be correct, the biomathematical model of time response has to be correct, and the statistical modeling procedure must be an efficient estimator of that model. If any one of these parts is broken, information can be lost or destroyed completely. For efficacy biomarkers, this means wasted money or missed opportunities. For safety biomarkers, this leads to late attrition or a market recall.
678
na 0.10 0.00 na * na 0.10 0.00 na * na * 0.00 na * na * 0.00 na *
na, not available. *, negative estimate.
a
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
280 279 279 280 279 283 282 282 283 282 209 208 71 209 208 212 211 212 212 212
Error df
0.74 0.66 0.54 0.54 0.54 0.74 0.66 0.54 0.54 0.54 0.56 0.55 0.54 0.54 0.54 0.55 0.55 0.54 0.54 0.54
σ 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
All 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Poly 0.34 0.08 0.27 0.27 0.06 0.04 0.00 0.01 0.01 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Linear
α
Model
a
Statistical Results for α = 3, p = 0.5, m = 4, and n = 36
TABLE 8c
0.00 0.00 0.00 0.00 0.00 0.17 0.01 0.07 0.07 0.15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quad
p-Value
0.40 0.58 0.22 0.22 0.17 0.21 0.02 0.10 0.09 0.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Cubic 0.30 0.09 0.16 0.16 0.21 0.19 0.02 0.08 0.08 0.15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quartic 0.0030 0.0025 0.0016 0.0016 0.0017 0.0068 0.0060 0.0036 0.0036 0.0043 0.0017 0.0019 0.0016 0.0016 0.0017 0.0155 0.0175 0.0036 0.0036 0.0046
Var
MSE 0.0851 0.0406 0.0826 0.0826 0.0957 0.2607 0.5010 0.2869 0.2878 0.2417 0.0838 0.0862 0.0826 0.0826 0.0979 0.2694 0.2546 0.2869 0.2878 0.2323
Bias −0.2865 −0.1953 −0.2847 −0.2847 −0.3065 −0.5039 −0.7036 −0.5322 −0.5331 −0.4872 −0.2865 −0.2904 −0.2847 −0.2847 −0.3101 −0.5039 −0.4869 −0.5322 −0.5331 −0.4772
679
na 0.01 0.00 na 0.01 na 0.01 0.00 na 0.02 na 0.01 0.00 na 0.02 na 0.01 0.00 na 0.03
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
na, not available.
a
α
Model
a
280 279 279 280 279 283 282 282 283 282 209 208 71 209 208 212 211 212 212 212
Error df
0.61 0.30 0.42 0.42 0.30 0.61 0.30 0.42 0.42 0.30 0.26 0.30 0.26 0.26 0.29 0.26 0.30 0.26 0.26 0.26
σ 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
All 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Poly 0.01 0.49 0.00 0.00 0.77 0.67 0.14 0.40 0.40 0.14 0.00 0.00 0.00 0.00 0.98 0.00 0.00 0.00 0.00 0.00
Linear 0.29 0.01 0.07 0.07 0.01 0.93 0.39 0.80 0.80 0.40 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00
Quad
p-Value
TABLE 8d Statistical Results for α = 0.03, p = 0.5, m = 4, and n = 36
0.93 0.09 0.99 0.99 0.11 0.95 0.52 0.86 0.86 0.53 0.02 0.00 0.91 0.80 0.00 0.06 0.00 0.00 0.00 0.00
Cubic 0.78 0.71 0.72 0.72 0.70 0.94 0.58 0.86 0.86 0.59 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00
Quartic 0.0020 0.0005 0.0009 0.0009 0.0005 0.0045 0.0011 0.0022 0.0021 0.0012 0.0004 0.0005 0.0004 0.0004 0.0005 0.0151 0.0011 0.0059 0.0058 0.0016
Var
MSE 0.1946 0.1215 0.1956 0.1957 0.1253 0.5195 0.7838 0.4662 0.4651 0.7591 0.1930 0.1215 0.1951 0.1951 0.1281 0.5300 0.7838 0.4700 0.4688 0.7222
Bias −0.4389 −0.3479 −0.4412 −0.4413 −0.3533 −0.7176 −0.8847 −0.6812 −0.6804 −0.8706 −0.4389 −0.3479 −0.4412 −0.4413 −0.3573 −0.7176 −0.8847 −0.6812 −0.6804 −0.8489
680
na, not available.
a
64 63 64 67 66 67 64 63 64 67 66 67
0.65 0.52 0.52 0.65 0.55 0.55 0.23 0.18 0.18 0.23 0.19 0.19
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Poly 0.28 0.27 0.28 0.08 0.07 0.08 0.00 0.00 0.00 0.00 0.00 0.00
Linear
na 0.01 na na 0.02 na na 0.01 na na 0.02 na
All
1 3 4 6 8 9 11 13 14 16 18 19
σ
αa
Model
Error df
Statistical Results for α = 3, p = 1, m = 1, and n = 36
TABLE 8e
0.00 0.00 0.00 0.39 0.50 0.54 0.00 0.00 0.00 0.00 0.00 0.00
Quad
p-Value
0.27 0.20 0.20 0.48 0.69 0.74 0.00 0.00 0.00 0.00 0.00 0.01
Cubic 0.07 0.02 0.02 0.46 0.70 0.77 0.00 0.00 0.00 0.00 0.00 0.02
Quartic 0.0090 0.0058 0.0057 0.0212 0.0155 0.0152 0.0090 0.0058 0.0057 0.0212 0.0155 0.0152
Var
MSE 0.0385 0.0355 0.0355 0.0212 0.0217 0.0242 0.0385 0.0355 0.0355 0.0212 0.0217 0.0242
Bias −0.1718 −0.1726 −0.1726 0.0054 −0.0788 −0.0946 −0.1718 −0.1726 −0.1726 0.0054 −0.0788 −0.0946
681
na, not available.
a
64 63 64 67 66 67 64 63 64 67 66 67
0.59 0.42 0.43 0.60 0.42 0.42 0.21 0.15 0.15 0.21 0.15 0.15
0.08 0.01 0.02 0.16 0.01 0.02 0.00 0.00 0.00 0.00 0.00 0.00
All 0.08 0.01 0.02 0.16 0.01 0.02 0.00 0.00 0.00 0.00 0.00 0.00
Poly 0.05 0.04 0.07 0.97 0.87 0.84 0.00 0.00 0.00 0.80 0.19 0.10
Linear
na 0.03 na na 0.03 na na 0.03 na na 0.03 na
σ
1 3 4 6 8 9 11 13 14 16 18 19
Error df
α
Model
a
Statistical Results for α = 0.03, p = 1, m = 1, and n = 36
TABLE 8f
0.51 0.88 0.68 0.71 0.77 0.81 0.00 0.24 0.00 0.00 0.02 0.05
Quad
p-Value
0.35 0.33 0.38 0.64 0.67 0.72 0.00 0.00 0.00 0.00 0.00 0.00
Cubic 0.67 0.77 0.83 0.61 0.64 0.68 0.00 0.02 0.10 0.00 0.00 0.00
Quartic 0.0074 0.0038 0.0039 0.0177 0.0086 0.0089 0.0074 0.0038 0.0039 0.0177 0.0086 0.0089
Var
MSE 0.2599 0.2072 0.1956 0.5030 0.4864 0.4851 0.2599 0.2072 0.1956 0.5030 0.4864 0.4851
Bias −0.5025 −0.4510 −0.4378 −0.6966 −0.6912 −0.6901 −0.5025 −0.4510 −0.4378 −0.6966 −0.6912 −0.6901
682
na * na na * na na * na na * na
1 3 4 6 8 9 11 13 14 16 18 19
na, not available. *, negative estimate.
a
α
a
64 63 64 67 66 67 64 63 64 67 66 67
Error df
0.88 0.58 0.59 0.88 0.59 0.60 0.31 0.20 0.21 0.31 0.21 0.21
σ
0.41 0.09 0.10 0.24 0.02 0.03 0.00 0.00 0.00 0.00 0.00 0.00
All 0.41 0.09 0.10 0.24 0.02 0.03 0.00 0.00 0.00 0.00 0.00 0.00
Poly 0.37 0.97 0.77 0.46 0.53 0.48 0.00 0.75 0.02 0.00 0.00 0.00
Linear
Statistical Results for α = 3, p = 0.5, m = 1, and n = 36
Model
TABLE 8g
0.74 0.12 0.18 0.51 0.72 0.64 0.01 0.00 0.00 0.00 0.01 0.00
Quad
p-Value
0.81 0.86 0.83 0.48 0.72 0.63 0.05 0.15 0.09 0.00 0.01 0.00
Cubic 0.27 0.09 0.10 0.43 0.67 0.57 0.00 0.00 0.00 0.00 0.00 0.00
Quartic 0.0167 0.0074 0.0075 0.0384 0.0173 0.0178 0.0167 0.0074 0.0075 0.0384 0.0173 0.0178
Var
MSE 0.1662 0.0829 0.0954 0.4539 0.4908 0.4799 0.1662 0.0829 0.0954 0.4539 0.4908 0.4799
Bias −0.3866 −0.2748 −0.2965 −0.6445 −0.6881 −0.6798 −0.3866 −0.2748 −0.2965 −0.6445 −0.6881 −0.6798
683
na 0.00 na na 0.01 na na 0.00 na na 0.01 na
1 3 4 6 8 9 11 13 14 16 18 19
na, not available.
a
α
Model
a
64 63 64 67 66 67 64 63 64 67 66 67
Error df
0.62 0.43 0.43 0.61 0.44 0.44 0.22 0.15 0.15 0.21 0.15 0.15
σ 0.17 0.01 0.01 0.16 0.04 0.04 0.00 0.00 0.00 0.00 0.00 0.00
All 0.17 0.01 0.01 0.16 0.04 0.04 0.00 0.00 0.00 0.00 0.00 0.00
Poly 0.10 0.00 0.00 0.08 0.36 0.40 0.00 0.00 0.00 0.00 0.00 0.00
Linear 0.67 0.84 0.84 0.10 0.59 0.65 0.00 0.11 0.11 0.00 0.00 0.00
Quad
p-Value
TABLE 8h Statistical Results for α = 0.03, p = 0.5, m = 0, and n = 36
0.88 0.38 0.38 0.10 0.68 0.74 0.23 0.00 0.00 0.00 0.00 0.01
Cubic 0.21 0.73 0.73 0.11 0.73 0.80 0.00 0.01 0.01 0.00 0.01 0.04
Quartic 0.0082 0.0040 0.0039 0.0184 0.0096 0.0095 0.0082 0.0040 0.0039 0.0184 0.0096 0.0095
Var
MSE 0.2149 0.2368 0.2366 0.5061 0.4921 0.4917 0.2149 0.2368 0.2366 0.5061 0.4921 0.4917
Bias −0.4546 −0.4825 −0.4823 −0.6983 −0.6946 −0.6944 −0.4546 −0.4825 −0.4823 −0.6983 −0.6946 −0.6944
684
α
na 0.08 0.00 na * na 0.08 0.00 na * na 0.08 0.00 na * na 0.08 0.00 na *
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
a na, not available. *, negative estimate.
a
1592 1591 1591 1592 1591 1595 1594 1594 1595 1594 1193 1192 399 1193 1192 1196 1195 1196 1196 1196
Error df
0.72 0.61 0.51 0.51 0.51 0.72 0.61 0.51 0.51 0.51 0.51 0.61 0.51 0.51 0.51 0.51 0.61 0.51 0.51 0.51
σ 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
All 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Poly 0.47 0.00 0.67 0.69 0.64 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Linear 0.00 0.00 0.00 0.00 0.00 0.07 0.00 0.02 0.02 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quad
p-Value
Statistical Results for α = 3, p = 0.5, m = 4, and n = 200
Model
TABLE 9a
0.78 0.00 0.87 0.88 0.90 0.23 0.00 0.12 0.12 0.13 0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00
Cubic 0.64 0.07 0.57 0.57 0.58 0.26 0.00 0.14 0.14 0.15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quartic 0.0005 0.0004 0.0003 0.0003 0.0003 0.0012 0.0009 0.0006 0.0006 0.0007 0.0003 0.0004 0.0003 0.0003 0.0003 0.0028 0.0009 0.0006 0.0006 0.0007
Var
MSE 0.0706 0.0355 0.0692 0.0692 0.0694 0.2319 0.5383 0.2655 0.2665 0.2651 0.0704 0.0355 0.0692 0.0692 0.0701 0.2336 0.5383 0.2655 0.2665 0.2617
Bias −0.2648 −0.1876 −0.2626 −0.2625 −0.2630 −0.4804 −0.7330 −0.5147 −0.5156 −0.5142 −0.2648 −0.1876 −0.2626 −0.2625 −0.2643 −0.4804 −0.7330 −0.5147 −0.5156 −0.5109
685
na 0.01 0.00 na 0.02 na 0.01 0.00 na 0.02 na 0.01 0.00 na 0.03 na 0.01 0.00 na 0.04
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
na, not available.
a
α
a
1592 1591 1591 1592 1591 1595 1594 1594 1595 1594 1193 1192 399 1193 1192 1196 1195 1196 1196 1196
Error df
0.64 0.30 0.41 0.41 0.30 0.64 0.30 0.41 0.41 0.30 0.26 0.30 0.26 0.26 0.30 0.26 0.30 0.26 0.26 0.26
σ 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
All 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Poly 0.00 0.53 0.00 0.00 0.42 0.70 0.00 0.06 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Linear 0.43 0.00 0.01 0.01 0.00 0.90 0.01 0.38 0.38 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quad
p-Value
Statistical Results for α = 0.03, p = 0.5, m = 4, and n = 200
Model
TABLE 9b
0.60 0.10 0.96 0.96 0.12 0.82 0.02 0.41 0.41 0.02 0.00 0.00 0.20 0.02 0.00 0.00 0.00 0.00 0.00 0.00
Cubic 0.87 0.05 0.61 0.61 0.06 0.74 0.02 0.37 0.37 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quartic 0.0004 0.0001 0.0002 0.0002 0.0001 0.0009 0.0002 0.0004 0.0004 0.0002 0.0001 0.0001 0.0001 0.0001 0.0001 0.0030 0.0002 0.0010 0.0010 0.0003
Var
MSE 0.1810 0.1238 0.1856 0.1856 0.1299 0.6977 0.8165 0.5589 0.5591 0.7793 0.1807 0.1238 0.1855 0.1855 0.1326 0.6999 0.8165 0.5596 0.5598 0.7586
Bias −0.4250 −0.3517 −0.4307 −0.4307 −0.3603 −0.8348 −0.9035 −0.7474 −0.7475 −0.8827 −0.4250 −0.3517 −0.4307 −0.4307 −0.3641 −0.8348 −0.9035 −0.7474 −0.7475 −0.8708
686
a na, not available. *, negative estimate.
3992 3991 3991 3992 3991 3995 3994 3994 3995 3994 2993 2992 999 2993 2992 2996 2995 2996 2996 2996
0.72 0.61 0.50 0.50 0.50 0.72 0.61 0.50 0.50 0.50 0.51 0.61 0.50 0.50 0.50 0.51 0.61 0.50 0.50 0.50
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
All 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Poly 0.09 0.47 0.10 0.10 0.08 0.88 0.94 0.46 0.47 0.46 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Linear
na 0.08 0.00 na * na 0.08 0.00 na * na 0.08 0.00 na * na 0.08 0.00 na *
σ
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Error df
α
Model
a
Statistical Results for α = 3, p = 0.1, m = 4, and n = 500
TABLE 9c
0.00 0.00 0.00 0.00 0.00 0.38 0.34 0.13 0.13 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quad
p-Value
0.46 0.92 0.17 0.17 0.16 0.31 0.20 0.10 0.10 0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Cubic 0.47 0.18 0.26 0.26 0.27 0.32 0.16 0.12 0.12 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quartic 0.0002 0.0001 0.0001 0.0001 0.0001 0.0005 0.0003 0.0002 0.0002 0.0002 0.0001 0.0001 0.0001 0.0001 0.0001 0.0011 0.0003 0.0002 0.0002 0.0002
Var
MSE 0.1420 0.1296 0.1406 0.1406 0.1411 0.8031 0.9030 0.8532 0.8523 0.8497 0.1419 0.1296 0.1406 0.1406 0.1411 0.8037 0.9030 0.8532 0.8523 0.8497
Bias −0.3766 −0.3597 −0.3748 −0.3749 −0.3754 −0.8959 −0.9501 −0.9236 −0.9231 −0.9217 −0.3766 −0.3597 −0.3748 −0.3749 −0.3754 −0.8959 −0.9501 −0.9236 −0.9231 −0.9217
687
na 0.01 0.00 na 0.03 na 0.01 0.00 na 0.03 na 0.01 0.00 na 0.03 na 0.01 0.00 na 0.03
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
na, not available.
a
α
a
3992 3991 3991 3992 3991 3995 3994 3994 3995 3994 2993 2992 999 2993 2992 2996 2995 2996 2996 2996
Error df
0.6365 0.3033 0.3978 0.3978 0.299 0.6362 0.3033 0.3977 0.3977 0.299 0.2621 0.3033 0.2621 0.2621 0.299 0.262 0.3033 0.262 0.262 0.262
σ 0.81 0.86 0.43 0.43 0.81 0.73 0.77 0.29 0.29 0.71 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
All 0.81 0.86 0.43 0.43 0.81 0.73 0.77 0.29 0.29 0.71 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Poly 0.53 0.83 0.33 0.33 0.99 0.88 0.67 0.80 0.80 0.67 0.00 0.00 0.00 0.00 0.59 0.00 0.00 0.00 0.00 0.00
Linear 0.74 0.69 0.61 0.61 0.66 0.80 0.49 0.67 0.67 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quad
p-Value
Statistical Results for α = 0.03, p = 0.1, m = 4, and n = 500
Model
TABLE 9d
0.87 0.43 0.80 0.80 0.46 0.79 0.42 0.66 0.66 0.44 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Cubic 0.98 0.87 0.97 0.97 0.89 0.79 0.39 0.67 0.66 0.41 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Quartic 0.0002 0.0000 0.0001 0.0001 0.0000 0.0004 0.0001 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0012 0.0001 0.0004 0.0004 0.0001
Var
MSE 0.1468 0.1427 0.1466 0.1466 0.1432 0.9583 0.9878 0.9597 0.9597 0.9839 0.1466 0.1427 0.1466 0.1466 0.1432 0.9592 0.9878 0.9599 0.9599 0.9839
Bias −0.3829 −0.3777 −0.3828 −0.3828 −0.3784 −0.9788 −0.9938 −0.9796 −0.9796 −0.9919 −0.3829 −0.3777 −0.3828 −0.3828 −0.3784 −0.9788 −0.9938 −0.9796 −0.9796 −0.9919
688
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
Translational Aspects of Pathodynamics Signals are relatively easy to isolate and model in in vitro conditions, due mostly to the ability to control the environment and to the lack of exposure to the biological system network. If the biomarker cannot be characterized and modeled in the laboratory, the chance of getting meaningful information in vivo is slim to none. Once a laboratory model is established, an animal model can be chosen and the pathodynamics work starts all over again until the biomarker can be characterized and modeled in vivo. As all biologists know, just because it works in one species, it is not necessarily going to work in another species. Before going into humans, a pathodynamic model for the biomarker should be studied in several species. The similarities and differences in these interspecies results should provide guidance about the applicability in humans. For translation across species, the pathodynamic models must have some characteristics that are invariant. Without this invariance, all the preclinical work will probably be a waste of time and money. Mathematical physics would not exist without laws of invariance such as conservation of mass and energy. The same will probably hold true in biology. In the context of this chapter, P (probability structure), Y (biomarkers), and D (disease or decision space) have to be invariant in some sense. Here having invariance in P is not as strong as it seems. The simplest distribution type is the same, but the parameters of the distribution model have species-specific variation. A more complicated type of invariance is that the topologies of the interspecies probability spaces are equivalent. This just means that any “physical” distortion of the response distribution does not create or remove any holes. To a greater extreme, a type of invariance would be present if there is a one-for-one matching of probability objects between species (i.e., when a particular probability object is present in one species with disease, or response, D, there is always a probability object, not necessarily similar, in another species that represents D). The bottom line for biologists is that this will probably require more mathematics than most can presently do. This approach to translation is relatively worthless without laboratory standards. When biomarkers are developed and applied, either the exact assay methods and standards must be applied in every experiment or there needs to be a mathematical transformation that makes them equivalent. Currently in the clinic, these standards do not exist. Therefore, preclinical experiments using the same methods may work fine, but when the clinical trial is run, variation in methods and sample handling may distort or destroy the information.
Future Needs in Method Development Biologists need to get involved directly in pathodynamics to get an efficient merger of the biology and the mathematics. It is a rare mathematician or
REFERENCES
689
statistician that has biology training and intuition. Once the biologist gets involved in developing the models, progress will accelerate. Remember, the OU model is basically the simplest case for a pathodynamic model. As is illustrated in the examples above, simpler models will have information loss. Therefore, standard experimental design and analysis may not be sufficient. The second issue is whether the OU model is correct. Preliminary research suggests that it is not [34], but only minor modifications may be needed for modeling homeostasis (i.e., dynamic equilibrium). The models for disease or therapeutic effects are mostly unknown. Chronic effects may be directional diffusion or slow convection, while acute effects are likely to generate trajectories such as liver injury [31]. The mathematics of statistical physics [17] is likely to be needed. It seems clear from the examples presented here that the current statistical estimation algorithms commonly used and available are not efficient in a Fisher information sense when autocorrelation is present. This has been handled in the economic applications for equally spaced measurement times, but biology is not quite so regular, especially clinical trials, even under the strictest protocols. The communication/information theory and decision theory that is presented here was only an introduction. Optimal information and decision algorithms need to be developed in the context of pathodynamics. Such algorithms may be synergistic with Fisher information optimization or may have some conflict. How the information will be used should determine the optimization approach. In this chapter biomarkers have been defined as functions of parameters, as vectors of tests, and as signals. These are just aspects of the same mathematical object called a probability distribution function P(Y). The parameters are an integral part of the probability model, the vector Y represents the measurements that get combined in the model, and change in these measurements with time is the signal.
REFERENCES 1. Klotz IM, Rosenberg RM (1994). Chemical Thermodynamics: Basic Theory and Methods. Wiley, New York. 2. Kondepudi D, Prigogine I (1998). Modern Thermodynamics: From Heat Engines to Dissipative Structure. Wiley, New York. 3. Keener J, Sneyd J (1998). Mathematical Physiology. Springer-Verlag, New York. 4. Box GEP, Hunter WG, Hunter JS (1978). Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building. Wiley, New York. 5. Woolson RF, Clarke WR (2002). Statistical Methods for the Analysis of Biomedical Data, 2nd ed. Wiley, Hoboken, NJ. 6. Mendenhall W, Sincich T (1995). Statistics for Engineering and the Sciences, 4th ed. Prentice Hall, Upper Saddle River, NJ.
690
PATHODYNAMICS: IMPROVING BIOMARKER SELECTION
7. Puri ML, Sen PK (1971). Nonparametric Methods in Multivariate Analysis. Wiley, New York. 8. Thompson PA (1972). Compressible-Fluid Dynamics. McGraw-Hill, New York. 9. Hawking SW (1988). A Brief History of Time: From the Big Bang to Black Holes. Bantam Books, New York. 10. Prigogine I (1996). The End of Certainty: Time, Chaos, and the New Laws of Nature. Free Press, New York. 11. Frieden BR (1998). Physics from Fisher Information. Cambridge University Press, Cambridge, UK. 12. Trost DC (2008). A method for constructing and estimating the RR-memory of the QT-interval and its inclusion in a multivariate biomarker for torsades de pointes risk. J Biopharm Stat, 18(4):773–796. 13. Brown R (1828). A brief account of microscopic observations made in the months of June, July, and August, 1827, on the particles contained in the pollen of plants; and on the general existence of active molecules in organic and inorganic bodies. Philos Mag, 4:161–173. 14. Karatzas I, Shreve SE (1991). Brownian Motion and Stochastic Calculus, 2nd ed. Springer-Verlag, New York. 15. Øksendal B (1998). Stochastic Differential Equations: An Introduction with Applications, 5th ed. Springer-Verlag, Berlin. 16. Uhlenbeck GE, Ornstein LS (1930). Phys Rev, 36:823–841. 17. Reichl LE (1998). A Modern Course in Statistical Physics, 2nd ed. Wiley, New York. 18. Trost DC (2006). Multivariate probability-based detection of drug-induced hepatic signals. Toxicol Rev, 25(1):37–54. 19. Hogg RV, Craig A, McKean JW (2004). Introduction to Mathematical Statistics, 6th ed. Prentice Hall, Upper Saddle River, NJ. 20. Bickel PJ, Doksum KA (1977). Mathematical Statistics: Basic Ideas and Selected Topics. Holden-Day, San Francisco. 21. Stuart A, Ord JK (1991). Kendall’s Advanced Theory of Statistics, vol. 2, Classical Inference and Relationship, 5th ed. Oxford University Press, New York. 22. Reza FM (1994). An Introduction to Information Theory. Dover, Mineola, NY. 23. Kullback S (1968). Information Theory and Statistics. Dover, Mineola, NY. 24. Williams SA, Slavin DE, Wagner JA, Webster CJ (2006). A cost-effectiveness approach to the qualification and acceptance of biomarkers. Nat Rev Drug Discov, 5:897–902. 25. Wald A (1971). Statistical Decision Functions. Chelsea Publishing, New York. 26. Blackwell DA, Girshick MA (1979). Theory of Games and Statistical Decisions. Dover, Mineola, NY. 27. Chernoff H, Moses LE (1987). Elementary Decision Theory. Dover, Mineola, NY. 28. Knuth DE (1981). The Art of Computer Programming, vol. 2, Seminumerical Algorithms, 2nd ed. Addison-Wesley, Reading, MA. 29. Trost DC (2007). An introduction to pathodynamics from the view of homeostasis and beyond. Presented at the Sixth International Congress on Industrial and Applied Mathematics, Zürich, Switzerland, July 16–20.
REFERENCES
691
30. Rosenkranz GK (2009). Modeling laboratory data from clinical trials. Comput Stat Data An, 53(3):812–819. 31. Trost DC, Freston JW (2008). Vector analysis to detect hepatotoxicity signals in drug development. Drug Inf J, 42(1):27–34. 32. R Development Core Team (2007). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http:// www.R-project.org. 33. Burtis CA, Ashwood ER (eds.) (1999). Tietz Textbook of Clinical Chemistry, 3rd ed. W.B Saunders, Philadelphia. 34. Trost DC, Overman EA, Ostroff JH, Xiong W, March PD (in press). A model for liver homeostasis using a modified mean-reverting Ornstein–Uhlenbeck process.
37 OPTIMIZING THE USE OF BIOMARKERS FOR DRUG DEVELOPMENT: A CLINICIAN’S PERSPECTIVE Alberto Gimona, M.D. Merck Serono International, Geneva, Switzerland
INTRODUCTION Drug development is currently facing many challenges, from the everincreasing costs of developing drugs to the reducing numbers of new drug approvals. Bringing a drug to market requires on average 12 years and $1 billion. Approximately 53% of compounds entering phase II fail, resulting in amortized costs of approximately $0.8 billion per registered drug (DiMasi et al., 2003). The majority of these costs are due to failures encountered during drug development; this represents approximately 75% of the costs for registration (Figure 1). According to the Pharmaceutical Research and Manufacturers of America (PhRMA), the U.S. biopharmaceutical industry spends $49.3 billion per year on drug research and development. According to the 2004 estimate of the U.S. Food and Drug Administration (FDA), only 8% of drugs entering clinical trials had a legitimate chance of reaching the market. The reasons for drug development failure vary: from 1990 to 1999, the most relevant reason for drug development failure was related to pharmacokinetic (PK) and bioavailability issues (accounting for approximately 40% of
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
693
694
OPTIMIZING THE USE OF BIOMARKERS FOR DRUG DEVELOPMENT Biology
Chemistry
Basic OptimiTarget Screening Target ID Research zation validation
Clinical Development Preclinical Development
Phase l
Phase Il
Phase IIl Regulatory
Cumulative total costs ($880M)
$M 800
Cumulative attrition costs ($655M)
400
Figure 1
Components of the overall costs for one approved drug.
failures), while from 2000 to 2008, the leading causes for failure were related to efficacy (approximately 30%), while PK and bioavailability failures diminished to approximately 10% (Figure 2). As shown in Figure 1, the late-stage failures tended to be extremely expensive. It is now well established that tools such as biomarkers are able to predict the efficacy of a compound at the early stage of development. This can drastically increase the efficiency of the development process, resulting in an increased productivity. These lessons learned demonstrate the need to decrease the time required for drug approval as well as a decline in late-stage failures. Biomarkers, including imaging biomarkers, can address these needs. In the early drug development stage, biomarkers can be instrumental in ameliorating the decision-making process, introducing the concept of “fail early, fail fast,” thus allowing drug companies to concentrate their resources on the most promising drug candidates. In addition, a biomarker strategy applied during late-stage development would allow a drug candidate to achieve regulatory approval much earlier in cases where resources are limited. Fundamental differences do exist, depending on whether the biomarker strategy is applied to early- or late-stage drug development; implementing a biomarker strategy during early drug development may represent a risk that a company can more readily absorb. Indeed, biomarkers at the early stage are validated through the various phases of drug development and would be used solely to determine whether or not to proceed in development with the drug candidate in question. Conversely, a biomarker that is validated during the late phase of drug development could be used for drug approval purposes.
DEFINITION AND CLASSIFICATION OF BIOMARKERS
695
50%
Percentage of NCE projects fating
1991 2000 40%
30%
20%
10%
r he
ds U
nk
no
w
n/
ot
oo fg
to os
C
To
xi
co
lo
ci
gy
al
y
er
om C
ai av io
/b
m
la
bi
io at ul
rm Fo
lit
n
y ac fic
Ef
PK
C
lin
ic
al
sa
fe
ty
0%
Ressons for attrition during clinical devalopment
Figure 2 Comparison of the reasons for drug failure in 1991 vs. 2000.
DEFINITION AND CLASSIFICATION OF BIOMARKERS In 2001, the Biomarkers Definition Working Group, sponsored by the National Institutes of Health, defined a biomarker as “a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention”; a clinical endpoint as “a characteristic or variable that reflects how a patient feels or functions, or how long a patient survives”; and a surrogate endpoint as “a biomarker that is intended to substitute for a clinical endpoint.” There are important differences among biomarkers. Type 0 biomarkers are natural history markers of a disease and tend to correlate longitudinally with known clinical references, such as symptoms. Type I biomarkers capture the effect of an intervention in accordance with the mechanism of action of the drug, even though the mechanism might not be known to be associated with the clinical outcome. Type II biomarkers are considered surrogate markers since the change of a specific marker predicts a clinical benefit (Frank and Hargreaves, 2003). A hierarchy also exists among clinical endpoints. A clinical endpoint can be defined as an intermediate endpoint, which is a clinical endpoint that is not
696
OPTIMIZING THE USE OF BIOMARKERS FOR DRUG DEVELOPMENT
the ultimate outcome but is nonetheless of real clinical benefit; the ultimate outcome is a clinical endpoint such as survival, onset of serious morbidity, or symptomatic response that captures the benefits and risks of an intervention (Lesko and Atkinson, 2001). Classification can be made according to the specificity of a biomarker versus the intended therapeutic response. A linked drug biomarker demonstrates a strict correlation between the pharmacological action of the drug and its effect on the disease. Danhof et al. (2005) have proposed to classify biomarkers in three distinct categories: (1) pharmacological and (2) toxicological markers, both observed in healthy subjects, and (3) pathological biomarkers observed in subjects affected by disease (Danhof et al., 2005). It is noteworthy to review the definition of biomarkers from the 2003 FDA “Guidance for Industry: Exposure–Response Relationships”: • Biomarkers are considered valid surrogates for clinical benefit (e.g., blood pressure, cholesterol, viral load). • Biomarkers are thought to reflect the pathologic process and at least be candidate surrogates (e.g., brain appearance in Alzheimer’s disease, brain infarct size, various radiographic/isotopic function tests). • Biomarkers reflect drug action but of uncertain relation to a clinical outcome (e.g., inhibition of ADP-dependent platelet aggregation, ACE inhibition). • Biomarkers may be more remote from the clinical benefit endpoint (e.g., degree of binding to a receptor or inhibition of an agonist).
Classification Based on Mechanism of Action The COST B15 working group 2, “Markers of Pharmacological and Toxicological Action,” has proposed a conceptually similar classification. Based on the location of the biomarker in the chain of events from underlying subject genotype or phenotype to clinical scales, the following types of biomarkers have been be defined: Type Type Type Type Type Type Type
0: 1: 2: 3: 4: 5: 6:
genotype or phenotype concentration target occupancy target activation physiologic measures or laboratory tests disease processes clinical scales.
This classification is not universally accepted since the type 0 biomarker relating to a subject’s genotype or phenotype can be considered a covariate
DEFINITION AND CLASSIFICATION OF BIOMARKERS
697
rather than a biomarker. Similarly, the type 6 biomarker, such as a clinical scale, can be regarded as a measurement of a clinical endpoint and not a biomarker. These classifications have been proposed in attempt to reconcile disagreements on the potential role of biomarkers.
Classification Based on Clinical Applications The paradigm is now shifting from the classical model of clinical care to development and application of biomarkers in different therapeutic areas according to their clinical application, which can be classified as follows: • Preventive biomarkers, which identify people at a high risk of developing disease • Diagnostic biomarkers, which identify a disease at the earliest stage, before clinical symptoms occur • Prognostic biomarkers, which stratify the risk of disease progression in patients undergoing specific therapy • Predictive biomarkers, which identify patients who respond to specific therapeutic interventions • Therapeutic biomarkers, which provide a quantifiable measure of response in patients who undergo treatment
Classification According to Measurement Scale From a mathematical perspective, biomarkers can also be classified on the basis of the measurement scale it utilizes: • Graded response, which is a quantifiable biomarker that is causally linked to drug treatment and temporally related to drug exposure (e.g., blood pressure, cholesterol). Usually such endpoints are chosen based on the pharmacodynamic response. • Challenge response, which is a quantifiable graded response to a standardized exogenous challenge, modified by the administration of the drug (e.g., challenge test in asthma). Usually, these markers are based on the mode of administration (MoA) of the drug and the response is a continuous variable. Other types of responses can be observed with biomarkers: • Categorical response is usually a “Yes” or “No” response for a clinically relevant outcome based on the disease progression, regardless of MoA (e.g., response based on tumor size, incidence of an AE); such an event is generally not linked to the MoA of the drug.
698
OPTIMIZING THE USE OF BIOMARKERS FOR DRUG DEVELOPMENT
• Time to event response is usually a clinically relevant outcome regardless of the MoA, such as survival time or time to relapse. It is a censored continuous clinical variable which can be measured only once for each patient. • Event frequency/rate of response is the frequency of clinical events related to drug exposure (e.g., MRI lesions in multiple sclerosis); it is usually a censored continuous variable. DEVELOPMENT OF BIOMARKERS The development of biomarkers can be divided artificially into two steps: (1) evaluation/qualification of the candidate biomarker and (2) validation of biomarkers to become a surrogate endpoint. Evaluation and Qualification of Biomarkers Many disease biomarkers are well characterized and are used extensively in drug development. However, there is the frequently need to develop new biomarkers, especially in new therapeutic areas and/or when dealing with innovative therapeutic approaches. Development of new biomarkers should start at the preclinical stage with the intent to have a new biomarker when the lead candidate enters the human development stage. The objective of biomarker development should be clearly defined, such as the need for markers related to disease progression or the pharmacological effect of the drug or for markers indicating therapeutic activity. In the evaluation phase, the candidate biomarker should be measured against the following attributes that define a biomarker (Lesko and Atkinson, 2001): • Clinical relevance, which may theoretically reflect a physiologic or pathologic process or activity over a relatively short period of time. Ideally, this effect should be related to the MoA of the drug and to the clinical endpoint. This obviously requires an understanding of the pathophysiology of a disease and of a drug’s mechanism of action, taking into consideration the fact that diseases frequently have multiple causal pathways. • Sensitivity and specificity to treatment effects, defined as the ability to detect the intended measurement or change in the target patient population. • Reliability, defined as the ability to measure the biomarker analytically with accuracy, precision, robustness, and reproducibility. • Practicality, defined as noninvasiveness or only modest invasiveness. • Simplicity, for routine utilization without the need for sophisticated equipment or operator skill, extensive time commitment, or high measurement cost.
DEVELOPMENT OF BIOMARKERS
699
Validation of Biomarkers The validation of a biomarker is a work in progress that ends when the biomarker is validated as a surrogate endpoint. During the development of a biomarker, aside from the characteristics mentioned above, the investigator should take into account the risk of a false positive or false negative result [which occurs when the value(s) of specific biomarker(s) does not reflect a positive change in the clinical endpoint(s)]. During the validation process, the assay that is used must be highly reliable and reproducible. As far as the demonstration of the predictive value of the candidate as a surrogate endpoint for the clinical outcome is concerned, regulatory guidance does not specify which methodology should be used in validating biomarkers as surrogate endpoints. It is well recognized that developing a single biomarker as a surrogate endpoint can become rather cumbersome for a pharmaceutical sponsor (Lesko, 2007). To complicate matters further, a biomarker may become a surrogate endpoint for efficacy but not for toxicity. Indeed, there are few biomarkers of toxic effects (such as the QTc prolongation) that predict torsade de pointe, or the increase in aminotranspherases predicting liver failure. Biomarkers may also be misleading in areas where they may result in a short-term beneficial effect but a long-term deleterious effect. As a consequence, the benefit/risk ratio can rarely be evaluated based on a surrogate marker, hence the use of biomarkers as surrogate endpoints only in those areas with critical unmet medical needs. In the process for biomarker validation, the following properties should be evaluated: (1) feasibility of a surrogate marker in predicting the clinical outcome, and (2) statistical relationship between a biomarker and the clinical outcome. This should first be demonstrated by the natural history of the disease, then by adequate and well-controlled clinical trials that estimate the clinical benefit by changing the specific surrogate endpoint. It should be noted that during the biomarker validation process, it is insufficient to show only that the biomarker correlates with the clinical endpoint; it is also necessary to demonstrate that the effect on the surrogate endpoint interferes with the treatment effect on the clinical endpoint. In rare cases, a biomarker is elevated to the status of surrogate endpoint based solely on the results obtained from one drug. A metaanalysis of multiple clinical trials with different drugs and different stages of disease may be required to determine the consistency of effects and strengthen the evidence that a change in the biomarker level resulted in an effect on the clinical outcome. With the FDA Modernization Act of 1997, the U.S. Food and Drug Administration (FDA) has gained a legal basis for using surrogate endpoints in ordinary and accelerated drug approvals. Indeed, the FDA was given explicit authority to approve drugs for the “treatment of a serious or lifethreatening condition … upon a determination that a product has an effect on a clinical endpoint or on a surrogate endpoint that is reasonably likely to predict clinical benefit,” leading to market access of new drugs and drug
700
OPTIMIZING THE USE OF BIOMARKERS FOR DRUG DEVELOPMENT
products. The standards for linking a biomarker to a clinical outcome are higher for ordinary approvals than for accelerated approvals. This difference is based on consideration of many factors, including the degree of scientific evidence needed to support biomarker surrogacy, public health needs, relative risk/benefit ratio, and the availability of alternative treatments. For ordinary approvals there are relatively few approved surrogate endpoints, such as “lower cholesterol and triglycerides” for coronary artery disease, “lower arterial blood pressure” for stroke, heart attacks and heart failure, “increase cardiac output for acute heart failure,” “reduce HIV-RNA load and enhance CD4+ cells” for AIDS, “lower glycosilated hemoglobin” for diabetes, and “reduced tumor size” in solid tumors. Oncology is an interesting example of this practice. In oncology, survival is the ultimate clinical outcome. However, approvals in the United States in the field of oncology from 1990 to 2002 highlight that tumor response was the approval basis in 26 of 57 regular approvals, supported by relief of tumorspecific symptoms in 9 of these 26 regular approvals (Table 1). Relief of tumor-specific symptoms provided critical support for approval in 13 of 57 regular approvals; approvals were based on tumor response in 12 of 14 accelerated approvals. In Europe, regulatory awareness is increasing and some initiatives are ongoing, such as the EMEA/CHMP Biomarkers Workshop, which was held in 2006. However, while biomarker development is encouraged during earlystage development, there is significant hesitancy in accepting biomarkers as surrogate endpoints for drug approval. As an example, a review of the
TABLE 1 Summary of Endpoints for Regular Approval of Oncology Drug Marketing Applications, January 1, 1990 to November 1, 2002 Parametera Total Survival RR RR alone RR + decreased tumor-specific symptoms RR + TTP Decreased tumor-specific symptoms DFS TTP Recurrence of malignant pleural effusion Occurrence of breast cancer Decreased impairment of creatinine clearance Decreased xerostomia Source: Johnson et al. (2003). a
RR, response rate; TTP, time to progression; DFS, disease-free survival.
Endpoint 57 18 26 10 9 7 4 2 1 2 2 1 1
DEVELOPMENT OF BIOMARKERS
701
European guidelines reveals that outside the oncology and the muscoloskeletal field, biomarkers in general and biomarker imaging in particular, are not considered surrogate endpoints. A list of imaging biomarkers endpoints accepted as primary endpoints for regulatory submissions and those suggested as endpoints in early development are listed in Table 2. Overall, it is very complex to validate biomarkers to become surrogate endpoints, but the value of developing a biomarker resides in the information that can be obtained during such development. Among these benefits are the possibility of defining the population who may benefit from the drug candidate, the screening of patients for adverse events, the possibility to enrich the population for proof-of-concept studies, the possibility to stratify patients, the selection of doses for pivotal trials, and the potential for dose adjustments at patient level.
TABLE 2 Imaging Biomarkers: Review of Accepted Primary or Secondary Endpoints in the CHMP Guidelines
Condition Fungal infections Cancer Osteoporosis Juvenile idiopatic arthritis Psoriatic arthritis Osteoarthritis X-ray Ankilosing spondilitis Crohn disease Profilaxis of thromboembolic disease Peripheral artherial obstructive disease Ischemic stroke Treatment of venous trombotic disease
Incontinence Anxiety Multiple sclerosis Panic disorders Acute stroke
Imaging Accepted as a Primary Endpoint Relevant imaging is part of the clinical outcome — X-ray X-ray
Imaging Accepted as a Secondary Endpoint or Early in Development
Imaging and functional imaging DEXA for BMD
X-ray X-ray
MRI and ultrasound (US)
— — Ultrasound (detection of DVT) and venography —
X-ray/MRI/DEXA/US Endoscopy
— Ultrasound (detection of DVT) and venography; angiography for pulmonary embolism — — — — —
Hemodynamic measurements Neuroimaging techniques
Urodynamic studies or x-ray videography Functional neuroimaging MRI Neuroimaging MRI
702
OPTIMIZING THE USE OF BIOMARKERS FOR DRUG DEVELOPMENT
REASONS FOR FAILURE OF A BIOMARKER There are several examples of biomarker failure. One of the most recent examples is the approval of gefitinib for non-small cell lung cancer (NSCLC). Gefitinib was originally approved based on tumor response as opposed to overall survival. In a postmarketing survival trial that included approximately 1700 patients, there was no benefit over placebo on overall survival. Other examples include bone mineral density (BMD) in osteoporosis for fluoride treatment. However, the Cardiac Arrhytmia Suppression Trial (CAST) provides the best known example of a failure of a biomarker. This study was based on the hypothesis (supported by statistical association and a plausible biological mechanism) that suppression of arrythmias would prevent sudden death after myocardial infarction. The study demonstrated a worse outcome on mortality for patients receiving active treatments compared to those receiving placebo. The theoretical background for biomarker failure is given by Frank and Hargreaves (2003). The reasons biomarkers can lead to erroneous conclusions have been divided into the following five categories (Figure 3):
A
Surrogate endpoint
Disease
True clinical outcome
Intervention
B
Surrogate endpoint
Disease
True clinical outcome
Intervention
C Disease
Surrogate endpoint
True clinical outcome
Intervention
D
Surrogate endpoint
Disease
True clinical outcome
Intervention
E
Surrogate endpoint
Disease True clinical outcome cannot be measured at this stage of disease or does not discern uniquo treatment benefit
Figure 3
Reasons why biomarkers have failed to become surrogate endpoints.
IMAGING AS A BIOMARKER TO OPTIMIZE DRUG DEVELOPMENT
703
1. Changes in the biomarker reflect the effect of treatment, but these changes are irrelevant to the pathophysiology of the disease indicated (false positive). 2. Changes in the biomarker reflect an effect of treatment on an element of the pathophysiology, but this element is clinically irrelevent (false positive). 3. Changes in the biomarker reflect clinically relevant changes in pathophysiology but do not capture the mechanistic effect of the treatment (false negative). 4. Changes in the biomarker reflect one effect of the treatment, but there are other, more relevent effects on outcome that are not captured (false negative or positive). 5. The biomarker may not correlate well with classical clinical assessors because the biomarker is more sensitive or the classical assessor is irrelevant to a subset of the patient population, a novel mechanism, or a new indication. It is important to consider this theoretical framework while developing a biomarker or considering biomarkers in decision making. Using more than one biomarker and reviewing the consistency of the data across biomarkers may decrease the risk of making an incorrect conclusion. The classical example of this comes from the osteoporosis field, where BMD alone may provide misleading results, while BMD in combination with other biomarkers of bone metabolism (such as osteocalcin and collagen cross-links) provide a much more robust basis for decision making.
IMAGING AS A BIOMARKER TO OPTIMIZE DRUG DEVELOPMENT Imaging (e.g., x-ray) has been used in clinical practice for over a century, but mainly for diagnostic purposes. The value of imaging as a biomarker has been recognized only recently. In clinical development, the use of imaging must take into consideration the scope of clinical studies, which are usually multicenter and multinational (at least in confirmatory development), and there are restraints such as cost, effort and resources as well as the need to standardize techniques across centers to maximize the signal-to-noise ratio. The most common techniques used in clinical development include x-ray imaging, digitalized imaging [including DEXA (dual energy x-ray absorptiometry)], computed tomography (CT) scans, nuclear imaging such as positron-emission tomography (PET) and single-photon-emission CT (SPECT), ultrasound, magnetic resonance imaging (MRI), spectrometry MR, and functional MR. Some of these tools are available only in specialized centers and are therefore adapted mainly for small studies during early development.
704
OPTIMIZING THE USE OF BIOMARKERS FOR DRUG DEVELOPMENT
Imaging biomarkers can be used, for example, for the assessment of bioactivity (not only through change in anatomical shape but also through change in functional status), for the evaluation of the disposition of drugs, for the measurements of tissue concentrations of drugs, to characterize the number of receptors, for the binding efficiency and the receptor occupancy, and as a prognostic indicator as well as for assessment of molecular specificity. IMAGING AS A MARKER OF BIOLOGICAL ACTIVITY Imaging techniques are used to evaluate the biological activity of a drug candidate by performing pre- and posttreatment measurements (and in many instances, performing measurements during treatment). These include: • Oncology: CT scan or MRI for the measurement of the size of solid tumors • Neurology: MRI for the measurement of multiple sclerosis lesions and for the evaluation and quantification of brain atrophy in Alzheimer disease (e.g., brain boundary shift integral or ROI-based MRI) • Musculoskeletal diseases: x-ray and DEXA to evaluate vertebral fractures and BMD in osteoporosis; x-ray and MRI for the evaluation of erosions and joint space in rheumatoid arthritis and psoriatic arthritis; x-ray and MRI for the evaluation of joint space and cartilage volume in osteoarthritis; and the same techniques to evaluate spinal changes in spondyloarthritis In addition to the use of imaging to capture anatomical changes, imaging can be used to perform a functional evaluation of a tissue/organ before and after treatment. Classical examples include: • Oncology: the evaluation of tumor metabolism with fluorodeoxyglucose (FDG)-PET. This may be helpful as an early predictor of a later anatomical response. Indeed, the reduction in tumor size with imatinib was preceded by decreased tumor glucose intake by a median of 7 weeks (Stroobants et al., 2003). Another example is the measurement of tumor blood flow and blood volume with CT: after bevacizumab (VEGF- Ab) treatment a decreased blood flow and volume in colorectal cancer was observed as early as 12 days after initiation of treatment (Miller et al., 2005). • Neurology: FDG-PET was used in Alzheimer disease (AD) to evaluate regional cerebral metabolic rate and SPECT was used to evaluate blood flow in AD. In multiple sclerosis, PET and SPECT have shown a reduction in cerebral metabolism and blood flow. It should be noted, however, that these approaches
DISCUSSION AND CONCLUSIONS
705
have not been tested and qualified to measure the potential effect of a therapeutic intervention (Bakshi et al., 2005).
IMAGING TO EVALUATE DISPOSITION OF DRUGS An interesting use of imaging is to estimate the concentration of a drug in different tissues or organs or to evaluate the disposition of a drug into nonaccessible compartments. An example of the first use is the antifungal drug fluconazole. By using PET imaging and [18F]fluconazole, an in vivo pharmacokinetic profile was characterized. Concentrations were measured over time in multiple organs, such as the brain, muscle, heart, and bowel. Based on these data, it was possible to test fluconazole only for infections in organs in which adequate concentrations (to exert antifungal effect) were achieved (Pien et al., 2005). Imaging can also be used to characterize the number of receptors, the binding efficiency, and the receptor occupancy. An interesting example is aprepitant, which is an NK1 receptor antagonist (used for the prevention of nausea and vomiting with chemotherapy). By using an 18F ligand with high affinity and specificity to NK1 receptor, PET was used to image the displacement of this ligand by aprepitant given systemically, therefore demonstrating the ability of aprepitant to cross the blood–brain barrier. Imaging biomarkers as prognostic indicators and assessment of molecular specificity As mentioned above, FDG-PET can be used as an early indicator of bioactivity and therefore can help separate responders versus nonresponders at earlier time points compared to anatomical imaging. The case of imatimib is instructive in this sense: FDG-PET identified the biological response a median of 7 weeks earlier than did CT scans. In addition, all patients with metabolic responses were clinical responders later during the trial (Stroobants et al., 2003). Similar examples are available with chemotherapy in breast cancer (Schelling et al., 2000), in NSCLC (Weber et al., 2003), and in gastroesophageal cancer (Weber et al., 2001). Another interesting development for use of imaging is its use to assess molecular specificity. In oncology, spectrometry MR can evaluate the molecular content of tissues and assist diagnosis. It is also used extensively in brain, breast, and prostate cancers. In neurology the brain β amyloid content can be measured in AD patients using PET technique. (Sorensen, 2006).
DISCUSSION AND CONCLUSIONS The need for the pharmaceutical industry to increase its productivity and, in particular, the need to concentrate the available limited resources to the most promising drug candidates lead to the current emphasis on development and use of biomarkers. Since 2001 and as a result of the Biomarkers Definition
706
OPTIMIZING THE USE OF BIOMARKERS FOR DRUG DEVELOPMENT
Working Group, there is clarity on the definitions concerning biomarkers, surrogate markers, and clinical endpoints. Many classifications exist to guide the development of new biomarkers. The evaluation/qualification and eventually the validation of biomarkers is a long and complex process which needs to be supported by many stakeholders, such as the pharmaceutical industry, regulatory authorities, and academic centers. None of these entities alone can succeed easily in such an endeavor. A few biomarkers have reached the status of surrogate endpoint and can be used for registration purposes, notably in the areas of cardiovascular disease, diabetes, rheumatology, and oncology. Imaging can be particularly useful in the early stages of development in the evaluation of bioactivity and drug disposition, which are among the many possibilities. In the evaluation of biomarkers and in the decision-making process based on biomarkers, investigators and scientists should always consider the possibility that the biomarker may fail to predict the clinical outcome, therefore leading to potential false positives and false negatives. It is reassuring, however, that even when the biomarker fails, significant learning is achieved, which eventually will benefit not only the scientific community in general but the pharmaceutical industry as well. REFERENCES Bakshi R, Minagar A, Jaisani Z, Wolinsky JS (2005). Imaging of multiple sclerosis: role in neurotherapeutics. NeuroRx, 2:277–303. Biomarkers Definitions Working Group (2001). Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther, 69:89–95. Danhof M, Alvan G, Dahl SG, Kuhlmann J, Paintaud G (2005). Mechanism-based pharmacokinetic–pharmacodynamic modeling: a new classification of biomarkers. Pharm Res, 22:1432–1437. DiMasi JA, Hansen RW, Grabowskal HG (2003). The price of innovation: new estimates of drug development costs. J Health Econ, 22:151–185. European Medicines Agency (2006). Report on the EMEA/CHMP Biomarkers Workshop. EMEA/522496/2006. FDA Center for Drug Evaluation and Research (2003). Guidance for Industry: Exposure-response relationships—study design, data analysis, and regulatory applications. http://www.fda.gov/cder/guidance/index.htm. FDA (2004). The “Critical Path Initiative”–innovation–stagnation: challenge and opportunity on the critical path to new medical products. http://www.fda.gov/oc/ initiatives/criticalpath/whitepaper.html. Frank R, Hargreaves R (2003). Clinical biomarkers in drug discovery and development. Nat Rev, Drug Discov, 2:566–580. Johnson JR, Williams G, Pazdur R (2003). End points and United States Food and Drug Administration approval of oncology drugs. J Clin Oncol, 21:1404–1411. Lesko LJ, Atkinson AJ (2001). Use of biomarkers and surrogate endpoints in drug development and regulatory decisionmaking: criteria, validation, strategies. J Annu Rev Pharmacol Toxicol, 41:347–366.
REFERENCES
707
Lesko LJ (2007). Paving the critical path: how can clinical pharmacology help achieve the vision? Clin Pharmacol Ther, 81:170–177. Miller JC, Pien HH, Sahani D, Sorensen AG, Thrall TH (2005). Imaging angiogenesis: applications and potential for drug development. J Natl Cancer Inst, 97:172–187. Pien HH, Fischman AJ, Thrall JH, Sorensen AG (2005). Using imaging biomarkers to accelerate drug development and clinical trials. Drug Discov Today, 10:259–266. Schelling M, Avril N, Nährig J, et al. (2000). Positron emission tomography using 18F fluorodeoxyglucose for monitoring primary chemotherapy in breast cancer. J Clin Oncol, 18:1689–1695. Sorensen AG (2006). Magnetic resonance as a cancer imaging biomarker. J Clin Oncol, 24:3274–3281. Stroobants S, Goeminne J, Seegers M, et al. (2003). 18FDG-positron emission tomography for early prediction of response in advanced soft tissue sarcoma treated with imatinib mesylate (Glivec). Eur J Cancer, 39:2012–2020. Weber WA, Ott K, Becker K, et al. (2001). Prediction of response to preoperative chemotherapy in adenocarcinomas of the esophagogastric junction by metabolic imaging. J Clin Oncol, 19:3058–3065. Weber WA, Petersen V, Schmidt B, et al. (2003). Positron emission tomography in non-small-cell lung cancer: prediction of response to chemotherapy by quantitative assessment of glucose use. J Clin Oncol, 21:2651–2657.
38 NANOTECHNOLOGY-BASED BIOMARKER DETECTION Joshua Reineke, Ph.D. Wayne State University, Detroit, Michigan
ADVANTAGES OF NANOTECHNOLOGY TO BIOMARKERS Nanotechnology refers to the fabrication, manipulation, use, and study of phenomena of materials with at least one dimension in the nanoscale (<100 nm). The recent revolution in nanotechnology will benefit a large diversity of applications, including many consumer products, engineering, information technology, and medicine. When applied to biomarker science, nanotechnology will influence how we utilize current biomarkers and enable the discovery and development of many more new biomarkers. Major challenges facing the detection of biomarkers that are being addressed with nanotechnology approaches include multiplexing ability, specificity, rapid or real-time detection, label-free detection, determination of early therapy response, small sample sizes, and portability. Additionally, nanodevices are likely to be suitable for reliable scale-up and massive parallelization at low cost by utilizing processes similar to those employed in the electronics industry [1,2]. A further potential asset of nanotechnology is the detection of biomarker density and distribution, particularly in cases where biomarker concentration and not the biomarker itself is specific to a pathologic condition. Recent developments in genetics and proteomics have produced a large number of biomarkers [3–5], increasing dramatically the potential applications
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
709
710
NANOTECHNOLOGY-BASED BIOMARKER DETECTION
of biomarkers. However, for the potential of these recent developments to be realized, better biomarker detection systems must be engineered, and since multiple biomarkers are of particular interest in complex diseases such as cancer where single biomarker detection is insufficient [6], the detection systems must have multiplexing ability. In this chapter we review nanoparticles and nanodevices currently being developed for the enhanced detection of biomarkers and focus on the diversity of applications to biomarker research. Specific methods of detection system fabrication are not discussed despite many exciting recent discoveries, but descriptions may be found in the relevant referenced material. Following this review is a discussion of nanotoxicology issues and other challenges that nanotechnology will present to the biomarker field.
NANOPARTICLES Nanoparticles can be fabricated and tailored to specific morphologies and chemistries, to incorporate various imaging agents, and to target specific ligands, allowing virtually unlimited application. This versatility has lead to intensified research into nanoparticle-based detection systems for biomarkers. Nanoparticles may also be applied to biomarkers for concentrating, amplifying, and protecting the biomarker both in vivo and ex vivo [7]. With the development of quantum dot nanoparticles, the ability to detect biomarkers via fluorescence has great potential. Additionally, biomarker science will benefit from magnetic resonance imaging (MRI) agents based on nanoparticles. These and a few other novel nanoparticle-based biomarker detection systems are discussed below. Quantum Dots Several characteristics distinguish colloidal semiconductor quantum dots (qdots) from commonly used fluorophores. Qdots are single crystals 2 to 10 nm in diameter whose size and shape can be controlled precisely by the duration, temperature, and ligand molecules used during their synthesis, yielding qdots with composition- and size-dependent fluorescent emission (Figure 1) [8]. Absorption of a photon above the semiconductor band gap results in excitation. For nanocrystals smaller than the Bohr excitation radius, energy levels are quantized in an effect called quantum confinement (hence the name quantum dots). Radiative recombination of an excitiation results in the emission of a photon in a narrow symmetric energy band. A complete description of qdots can be found in a paper by Alivisatos [8]. The narrow emission wavelengths of qdots, tunability based on composition and size and long lifetime, make them far superior to commonly used fluorophores. Qdots possess a large number of advantageous qualities over traditional fluorescent probes, including photostability and increased intensity [8,9],
NANOPARTICLES 10
711
1.0
9
0.8 0.6
Qdot diameter (nm)
8
0.4 0.2
7
0.0
6
600 800 100012001400 Wavelength (nm)
5 4 3 2 1 400
600
Cds CdSe
InP
CdTe/CdSe
CdTe CdHgTe/ZnS
InAs PbSe
800 1000 1200 Emission wavelength (nm)
1400
Figure 1 Emission maxima and sizes of qdots of different composition. Qdots can be synthesized from various types of semiconductor materials characterized by different bulk bandgap energies. The curves represent experimental data from the literature on the dependence of peak emission wavelength on qdot diameter. The range of emission wavelength is 400 to 1350 nm, with size varying from 2 to 9.5 nm. All spectra are typically around 30 to 50 nm (full width at half maximum). Inset: Representative emission spectra for some materials. (From ref. 9, with permission.) (See insert for color reproduction of the figure.)
lower required excitation power [10], increased binding specificity [11], less binding interference and steric-hendrance [10,12–18], high surface area for multifunctionality, and the ability to escape reticuloendothelial system (RES) clearance [11]. Qdots can be surface modified to meet a diverse set of biomarker applications, as illustrated in Figure 2. The ability to conjugate qdots to antibodies specific to a single biomarker and to select qdots of a specific emission wavelength allows simultaneous detection and accurate quantification of multiple targeted biomarkers [20,21]. Figure 3 illustrates an example of in vivo imaging of qdots targeted to a tumor and imaging of three qdots with differing emission wavelengths in vivo. Several areas of untapped qdot potential are their use as customizable donor pairs for fluorescence resonance energy transfer (FRET) experiments [16] and deep tissue visualization with multiphoton imaging [10]. Recently, qdots have been developed for imaging in the near-infrared (IR) range to allow in vivo imaging without interferance of autofluorescence [9,19]. Despite the many advantages of qdots, their toxicity is problematic for in vivo and living cell applications, particularly if ultraviolet light is to be used [22]. Surface coating is often utilized to protect from toxicity and has yielded
712
NANOTECHNOLOGY-BASED BIOMARKER DETECTION
Figure 2 Qdot peptide toolkit. The light blue segment contains cysteines and hydrophobic amino acids assuring binding to the qdot and is common to all peptides. S, solubilization sequence; P, PEG; B, boitin; R, peptide recognition sequence; Q, quencher; D, DOTA; X, any unspecified peptide-encoded function. Qdot solubilization is obtained by a mixture of S and P. Qdots can be targeted with B, R, or other chemical moieties. Qdot fluorescence can be turned on or off by attaching a Q via a cleavable peptide link. In the presence of the appropriate enzyme, the quencher is separated from the qdot, restoring the photoluminescence and reporting on the enzyme activity. For simultaneous PET and fluorescence imaging, qdots can be rendered radioactive by D chelation of radionuclides; for simultaneous MRI and fluorescence imaging, qdots can be rendered radioactive by D chelation of nuclear spin labels. (From ref. 9, with permission.) (See insert for color reproduction of the figure.)
many successful results. Qdots encapsulated with an ABC triblock copolymer not only resulted in less cytotoxicity, but advantageously prevented aggregation and fluorescence loss after in vivo exposure (a major and not fully understood barrier to in vivo imaging with qdots [11,23]). These coated qdots were successful in both passive and active tumor targeting, allowing imaging of multiple biomarkers [21].
NANOPARTICLES
713
(a)
Tumors
Injection site (b)
1 μm
Figure 3 In vivo imaging of qdots. (a) Spectrally resolved image of a mouse bearing C4-2 human prostate tumors following injection of qdots functionalized with antibodies for prostate-specific membrane antigen. (b) Images on the right show qdots emitting green, yellow, or red light. The image on the left illustrates the in vivo imaging of the multicolor qdots at three injection sites. (From ref. 21, with permission.) (See insert for color reproduction of the figure.)
The utility of qdots for in vivo imaging in live animals was demonstrated further in their use for nonspecific uptake studies and lymph node mapping [19,24]. Additionally, qdots have been used for the in vivo imaging of blood vessels with increased contrast [10]. Targeted in vivo imaging of prostate cancer was achieved following intravaneous administration of qdots functionalized with prostate-specific membrane antigen [21]. Other cancer biomarkers have been targeted and imaged by qdots, including her2 [13] and erbB family receptor signaling events [25]. Recent developments in near-IR qdots lend
714
NANOTECHNOLOGY-BASED BIOMARKER DETECTION
great potential in the use of qdots for imaging of deep tissue tumors and for applications of real-time imaging to aid in surgical procedures [9]. Qdots’ largest potential to the future of biomarker science may come from the ability to give qdots an on–off switch that responds to chemical cues. This was illustrated by functionalizing qdots with a fluorescence quenching molecule that could be cleaved in the presence of a chemical species or enzyme [16]. Continued research and development of qdots should produce versatile in vivo and in vitro multiplexed biomarker detection systems. Nanoparticle-Based MRI Agents There are many nanoparticle formulations utilized in the clinic and in research for the enhancement of MRI contrast, such as gadolinium nanoparticles [26] and iron oxide nanoparticles [27–32]. Nanoparticles are advantageous to MRI applications, due to their increased circulation times and high efficacy-tosafety ratio relative to iodinated imaging agents [33]. An intense area of biomarker and drug discovery research is in the detection of the progression, regression, and dormancy of tumors mediated by angiogenesis. A number of investigators have utilized integrin-targeted nanoparticles as enhanced MRI agents to visualize angiogenesis [29,34–36] and have even shown the detection of signals from very low picomolar concentrations of epitopes [37]. These studies indicate great promise for the future of clinical applications, yet present a great need for modeling software to fully interpret the biomarker information gained. As an alternative to targeted imaging agents, “smart” MRI nanoparticles have been designed to be taken up by all cells, but have increased contrast only after a specific enzymatic reaction, inherent to a particular pathology, exposes gadolinium atoms to free water enhancing image contrast [38]. Nanoparticles may also be used for enhanced interoperative imaging. The use of dextran-coated paramagnetic iron oxide nanoparticles used as MRI contrast agents during the surgical treatments of brain tumors significantly improved interoperative permanence of imaging, inflammatory targeting, and low-level detectability over conventional MRI contrast agents [39]. The use of nanoparticles as enhanced MRI agents is also advantageous for detecting biomarkers in in vitro cell culture assays. As an example, telomerase activity (a biomarker of replicative potential [40]) was detected by MRI with the use of biologically smart nanoparticles. These nanoparticles switch their magnetic state when annealed with telomerase-sythesized TTAGGG sequences [41]. Novel Nanoparticle-Based Biomarker Detection Systems Quantum dot–based detection systems and nanoparticle-based MRI agents hold great potential in biomarker research. However, novel nanoparticle
NANOPARTICLES
715
systems that combine some of the assets or address some of the limitations of these systems will have the greatest utility in the future advancement of biomarker research. A major limitation of qdots is their cytotoxicity [9,22,42]. Cytotoxicity is often circumvented by surface modification of the qdots despite the fact that surface modifications result in adverse changes in imaging properties. Nanodiamonds have been proposed as a noncytotoxic alternative to qdots, due to their long-term stability without photobleaching or blinking. Nanodiamonds can be surface modified without changes in fluorescence properties since the signal originates solely from internal defects. Preliminary studies show enhanced imaging properties and the ability to track single particles over time [43]. Nanodiamonds are already mass-produced for the surface finishing industry, potentiating reproducible scale-up procedures with ease. However, the range and specificity of imaging wavelengths need to be further investigated. Photostable nanoparticles have also been produced via the encapsulation of metal-organic luminophores within silica [44]. This extends the range and versatility of luminphores for biomarkers beyond what has been established with qdots. The silica surface of these nanoparticles allows easy covalent surface attachments for targeted imaging. Antibody-bound silica nanoparticles with an encapsulated luminophore was able to bind selectively, and therefore identify, leukemia cells [44]. The fabrication of novel polyacrylamide “nano-PEBBLE” sensors by Xu et al. [45] allows biomarker detection within single cells. The nano-PEBBLE sensors described are nontoxic polyacrylamide nanoparticles containing glucose oxidase and oxygen-sensitive fluorophores that are readily taken up by cells. Intracellular glucose concentration could be determined in real time by fluorophore intensity as a response to oxygen groups produced by the glucose oxidase. The major advantages over free sensing dyes is the lack of interference from nonspecific protein binding, protection from cytotoxic dyes (allowing a greater range of dyes to be used in biomarker applications), and the ability to allow detection of biomarkers for which no specific dyes exist (due to the synergistic effect of material selected for encapsulation). NanoPEBBLE sensors have been developed to measure real-time intracellular concentrations of calcium, potassium, oxygen, and pH [46–49]. Additional nanoparticle systems that are relevant to biomarker detection and are not covered here include liquid perfluorocarbon MRI nanoparticles targeted to thrombi [50], neovascular integrins [35] and tumors [35], lowdensity lipid nanoparticles [51] and lipid- and polymer-shelled nanoparticles [52] as enhanced ultrasound agents, “bio-barcode” DNA [53] and prostatespecific antigen [54] detection with oligonuleotide-modified gold nanoparticles, silica nanoparticles containing pH-sensitive fluorescent dye for rapid, reversible pH detection [55], antibody-bound silica-coated magnetic nanoparticles for cell sorting [56], silver dendrimer nanoparticles for specific, nontoxic cell staining [57], and targeted nanoscale Raman probes [20]. Additionally, in
716
NANOTECHNOLOGY-BASED BIOMARKER DETECTION
vivo stem cell tracking was performed by first pretreating cells with magnetic nanoparticles [58,59]. Nanoparticles for Enhanced Biomarker Detection Devices Most of the proposed applications of nanoparticles to biomarker science involve the enhanced imaging of pathological tissue morphologies or the specific binding to biomarkers yielding information on biomarker location, concentration, and distribution as discussed above. However, nanoparticles are readily being used to enhance current in vitro biomarker detection assays as well. Nanoparticles are being used in clinical and research settings for enhancement of cell sorting assays, immunostaining assays, proteomic and genetic microarrays, and surface plasmon resonance assays [60,61]. Nanoparticles have even been proposed as selective protein harvesting agents for serum proteomics of low-molecular-weight proteolytic fragments found in trace amounts in various cancers [62,63].
NANODEVICES The greatest utility of the biomarker detection nanodevices discussed below is the ability to have real-time, selective, multiplexed, and tag-free detection of biomolecules in small sample volumes. The nanoscale of these devices also increases the portability and reduces costs associated with larger reagent volumes used in conventional assays. Nanowires Nanowires are nanoscale sensing wires that can be functionalized to bind proteins of interest and have great multiplexing potential [64]. When a biomolecule of interest binds a nanowire, there is a drop in its conductance. The ability of nanowires to act like nanoscale field-effect biotransistors allows their use as real-time sensors to multiple binding events, as illustrated in Figure 4. Also shown in Figure 4 is a scanning electron image of a single silicone nanowire between two electrodes. Nanowires made from silicon are advantageous due to their ability to be easily tuned and surface modified for specific biomolecule detection. Hahm and Lieber [65] utilized silicon nanowires for the specific detection of a DNA mutation site for cystic fibrosis transmembrane receptor. The nanowire construct allowed real-time detection and was ultrasensitive, detecting the mutation in fentomole concentrations. Hahm and Lieber also found high device-to-device reproducibility. Simultaneous multiplexed detection of four cancer biomarkers in fentomolar concentrations from undiluted serum samples was achieved by Zheng et al. [66]. Real-time detection results for a multiplexed silicon nanowire
Conductance
NANODEVICES
717
1
2
Conductance
Time
Conductance
Time
Time
Figure 4 Nanowire detection of a single-target biomolecule using conductive-based measurements. A biomolecule is immobilized selectively by antibody interaction, causing a change in conductance, recorded on the right. Inset: SEM image of a single silicon nanowire (scale bar = 500 nm). (From refs. 100 and 67, with permission.) (See insert for color reproduction of the figure.)
device designed to detect three biomarkers is shown in Figure 5. The sensitivity of the nanowire device was demonstrated further by the same group when they detected binding, activity, and inhibition of telomerase from unamplified extracts of as few as 10 cells. Detection is as sensitive as current telomere repeat application protocol (TRAP) assays but does not require the use of polymerase chain reaction (PCR) amplification and labeling. Silicon nanowires were also used by Wang et al. [67] to measure the ATP binding (and nonbinding) to an immobilized leukemia biomarker, Abl. Although this work also demonstrates the ability of nanowires for biomarker detection, the group went a step further in utilizing the device to measure the biomarker response to known inhibitors, lending potential to drug discovery
718
NANOTECHNOLOGY-BASED BIOMARKER DETECTION
(a)
1
2
1
3
2
3
(b)
Conductance (nS)
2,250
1
2
3
4
5
6
2,100 NW1 1,950 NW2
1,800 1,650 1,500
NW3 0
2,000
2,000
6,000
8,000
Figure 5 Multiplexed detection of cancer marker proteins. (a) Multiplexed protein detection by three silicon-nanowire devices in an array. Devices 1, 2, and 3 are fabricated from similar nanowires and then differentiated with distinct mouse antibody receptors specific to three different cancer markers. (b) Conductance versus time data recorded for the simultaneous detection of PSA, CEA, and mucin-1 on p-type siliconnanowire array in which NW1, NW2, and NW3 were functionalized with mouse antibodies for PSA, CEA, and mucin-1, respectively. The solutions were delivered to the nanowire array sequentially as follows: (1) 0.9 ng/mL PSA, (2) 1.4 pg/mL PSA, (3) 0.2 ng/mL CEA, (4) 2 pg/mL CEA, (5) 0.5 ng/mL mucin-1, (6) 5 pg/mL mucin-1. Buffer solutions were injected following each protein solution at points indicated by black arrows. (From ref. 66, with permission.) (See insert for color reproduction of the figure.)
and drug efficacy research. Detection in the nanomolar range and in a concentration-dependent manner (allowing quantification) make this sensitive method tenable for drug discovery and efficacy studies for any of the tyrosine kinases responsible for many cancers and diseases. Silicon nanowires have been developed for a number of other applications not discussed here, including pH sensing [64] and virus detection [68]. Nanocantilevers Nanocantilevers are a promising new approach to multimolecular sensing. Nanocantilever arrays, illustrated in Figure 6, consist of many nano-diving boardlike beams that can have antibodies attached covalently to their surface. When a biomolecule of interest binds, the beam bends and the deflection can be detected by either laser light observation or changes in resonant-vibration
NANODEVICES
719
Tumor biomarker proteins
Antibody Bent cantilever
Figure 6 Nanocantilever array. The biomarker proteins are affinity bound to the cantilevers and cause them to deflect. The deflections can be observed directly with lasers. Alternatively, the shit in resonant frequencies caused by the binding can be detected electronically. (From ref. 1, with permission.) (See insert for color reproduction of the figure.)
frequency [69–71]. The utility of cantilever detection has been demonstrated by multiplexed DNA assays to detect BRCA1 mutations, indicative of early breast cancer onset [72]. Additionally, nanocantilevers were able to detect and quantitate prostate-specific antigen in clinically significant concentrations [70]. A nanomechanical cantilever array was used by Arntz et al. [2] for the multiplexed detection of the cardiac biomarkers creatin kinase and myoglobin. Both biomarkers could be detected at micromolar concentrations from plasma samples. Further development of this system is being investigated for the early and rapid diagnosis of myocardial infarction. Nanomechanical cantilever detection systems are not limited to the detection of biomarkers due to hybridization and antibody interactions. Environmentally sensitive polymers coated onto nanocantilevers can also produce detectable cantilever stresses in response to sample chemistries. For instance, pH-sensitive poly(methacrylic acid) coated onto nanocantilevers were used as highly sensitive, small-volume pH sensors [73]. As new environmentally responsive polymers are developed, this application may grow rapidly. Similarly, analyte vapors in a gas phase resulted in a changed surface stress of nanocantilevers in a device termed an artificial nose [74]. Nanoarrays Currently, microarrays are highly utilized in research and the clinic for molecular diagnostics, genotyping, and biomarker-guided therapy. Development of nanoarrays will advantageously advance current uses by allowing higher degrees of multiplexing, higher specificity, more portability, and reduced cost associated with sample and reagent volumes. Advanced methods have already
720
NANOTECHNOLOGY-BASED BIOMARKER DETECTION
(a)
(b)
Figure 7 SEM images of arrays of multiwalled carbon nanotubes at (a) Ultraviolet lithography and (b) e-beam patterned Ni spots. Panels are 45 ° perspective views with scale bars of (a) 2 and (b) 5 μm. (From ref. 79, with permission.)
been developed for the fabrication of nanoarrays for proteomic profiles and diagnostics [75–78]. Carbon nanotube electrode arrays were used by Li et al. [79] for ultrasensitive DNA detection. Scanning electron images of the arrays (Figure 7) demonstrate the precision and versatility in the fabrication of carbon nanotube electrode arrays. Nanoelectrodes are highly desirable, due to the fact that electrode performace in terms of speed and spatial resolution scale inversely with electrode radius. Oligonucleotides covalently bound to the terminus of the carbon nanotubes were able to detect target DNA sequence hybridization at concentrations of only a few attamoles. The specificity and versatility of carbon nanotube electrode arrays allow statistically significant, multiplexed detection of biomolecules in clinically relevant concentrations. A carbon nanotube–based array sensor was employed to detect glucose by binding glucose oxidase to the terminus of the carbon nanotubes [80]. Direct electron transfer from the enzyme through the carbon nanotube and to a platinum transducer allowed real-time sensitive detection of the reaction and glucose presence. This technology may be expanded for the detection of many other biomarkers with electron-producing enzymatic reactions. Gold nanoarrays with immobilized anti-Escherichia coli antibodies have been developed for the early detection of bacteria during human kidney infections [81]. A novel aspect of this work was that the antibodies were all arranged in a post configuration as opposed to a random configuration used in most studies. This allowed for the detection of bacteria in concentrations two orders of magnitude below the detection limits of current methods. Additionally, the rapid detection method eliminated the need to run bacterial cultures overnight, as is a common practice in the clinic.
NANOTOXICOLOGY
721
Other Novel Nanodevices for Biomarker Detection Although the majority of research into nanodevices for biomarker detection fall into the three categories discussed above (nanowires, nanocantilevers, and nanoarrays), a plethora of other novel detection devices have been investigated, as discussed here briefly. Additionally, nanotechnology methods contribute to biomarker detection in a general sense, such as the ability to make precise nanoscale pores and channels for utilization of picoliter volumes [82–84]. In addition to their application to nanowires and nanoarrays, carbon nanotubes can be utilized as nano-biomarker probes for the precise detection of specific genes following a DNA hybridization assay [85]. The use of carbon nanotubes as probes has great potential, due to their intense Raman scattering (making it possible to detect a single nanotube) [86,87] and near-infrared fluorescence [88–91]. Nanotube-based sensors were first developed for the detection of NO2 and NH3 gas [92] but have since become more sophisticated to allow sensitive detection of biomarkers. Surface-modified carbon nanofiber electrodes have been developed for small-volume detection of glucose [93–95] and are being developed for glutamate and lactate [93]. Carbon nanoelectrodes coupled with oxidases may also be utilized for sensitive detection of cholesterol, alcohol, lactate, acetylcholine, choline, hypoxanthine, and xanthine from a variety of biological fluids [95]. Nanotubes have also been developed as sensors for autoimmune disease [96], single-nucleotide polymorphisms (SNPs) [97], and for reversible, small-volume pH detection [94]. Nanogap actuators have been proposed for the use of protein detection [98]. Immobilized antibodies in a nanogap actuator enable protein-binding specificity. Once bound, the target protein can be detected based on rigidity measurements yielding information on protein presence, concentration, and size. The diverse range of nanodevices discussed above will address key issues in biomarker research, including multiplexing ability, specificity, real-time detection, and tag-free detection. As a result, they will greatly aid diagnostics, treatment monitoring, and drug discovery.
NANOTOXICOLOGY Many of the nanotechnology-based biomarker detection systems discussed, nanoparticle-based systems in particular, potentiate the use for real-time in vivo analysis and imaging. Although this is very exciting from the standpoint of the various applications that may benefit, the toxicology of these systems must be investigated further. The field of nanotoxicology has recently intensified and raised many more questions than answers provided [99]. Two conclusions that are agreed upon are that the biological interaction with nanomaterials
722
NANOTECHNOLOGY-BASED BIOMARKER DETECTION
is different from bulk materials and that further research in nanotoxicology must be done. The potential toxicity of nanomaterials represents the largest barrier to in vivo, ex vivo, and in vitro cell culture biomarker assays currently being developed. Qdots have been the most intensely researched system for biomarker detection. However, their cytotoxicity is well known, requiring surface modification that can advantageously and disadvantageously alter imaging properties. Qdots are particularly toxic when ultraviolet light sorces are used [22]. This has led to the investigation of other materials with similar imaging properties and less cytotoxicity, such as nanodiamonds [43]. Regardless, the clearance of the nanoparticles is not clearly understood, and qdots have been seen in the lymph and bone marrow months after their administration [24]. On the other hand, some advances in nanotechnology-based biomarker detection address issues of toxicity that arise from conventional biomarker assay. For example, the nano-PEBBLEs discussed protect cells from cytotoxic fluorophores and therefore extend the range of fluorophores that can be utilized [45].
CHALLENGES FOR NANOTECHNOLOGY-BASED BIOMARKER DETECTORS Nanotoxicology is the greatest challenge facing nanotechnology-based biomarker detection systems. However, nanotechnology-based biomarker detection systems face many other challenges, will intensify current challenges, and present new challenges to biomarker research. The second major challenge is biofouling of biomarker detection systems. Although this is not a new challenge, it is more critical with nanodevices, where biofouling could be exacerbated due to the nanoscale of the detection systems. The need for biofouling-resistant materials will therefore be intensified. The potential of nanotechnology applications to biomarker research can be realized fully only if there is further development of mathematical models. Understanding nanocantilever beam stresses, identification of biological signatures from nanocantilever and nanowire devices, and accurate threedimensional reconstruction of in vivo imaging data are all areas that represent a great need for further developed mathematical models. Additionally, further optimization of all the detection systems discussed is needed to ensure reproducibility, statistically relevant data, and prevention of false positives. For instance, it is not fully understood why qdots with great imaging characteristics in vitro tend to experience fluorescence loss when utilized in vivo. Finally, understanding transit mechanisms of the nanoparticle systems is needed to prevent undesired pathway susceptibility, such as endosomal entrapment. As nanotechnology-based biomarker detection systems develop, further development will also be needed in optical detection systems and in the sensitivity and selectivity of electronic devices. Despite all the barriers discussed here, the factors that will truly determine if nanoparticles and nanodevices
REFERENCES
723
become standard tools in the clinical biomarker toolbox are the cost, ease of integration with current infrastructure, and performance variability [100].
FUTURE OF NANOTECHNOLOGY-BASED BIOMARKER DETECTORS Nanotechnology will present many solutions and many barriers to biomarker research. For nanoparticle systems, nanotoxicology research will need to identify safe, nontoxic materials for cell culture and in vivo use. Particular emphasis should be placed on understanding nanoparticle biodistribution, clearance, and the molecular mechanisms and pathways involved in their transit. Biofouling-resistant materials and further development of mathematical models are necessary for the full potential of nanodevices in biomarker research to be realized. However, nanotechnology-based biomarker systems able to yield real-time, multiplexed results with high accuracy and specificity will change the utilization of current biomarkers and allow the discovery of new biomarkers.
REFERENCES 1. Ferrari M (2005). Cancer nanotechnology: opportunities and challenges. Nat Rev Cancer, 5(3):161–171. 2. Arntz Y, Seelig J, Lang H, et al. (2003). Label-free protein assay based on a nanomechanical cantilever array. Nanotechnology, 14:86–90. 3. Sander C (2000). Genomic medicine and the future of health care. Science, 287(5460):1977–1978. 4. Etzioni R, Urban N, Ramsey S, et al. (2003). The case for early detection. Nat Rev Cancer, 3(4):243–252. 5. Srinivas PR, Kramer BS, Srivastava S (2001). Trends in biomarker research for cancer detection. Lancet Oncol, 2(11):698–704. 6. Wulfkuhle JD, Liotta LA, Petricoin EF (2003). Proteomic applications for the early detection of cancer. Nat Rev Cancer, 3(4):267–275. 7. Medina C, Santos-Martinez MJ, Radomski A, Corrigan OI, Radomski MW (2007). Nanoparticles: pharmacological and toxicological significance. Br J Pharmacol, 150(5):552–558. 8. Alivisatos A (1996). Semiconductor clusters, nanocrystals and quantum dots. Science, 271(5251):933–937. 9. Michalet X, Pinaud FF, Bentolila LA, et al. (2005). Quantum dots for live cells, in vivo imaging, and diagnostics. Science, 307(5709):538–544. 10. Larson DR, Zipfel WR, Williams RM, et al. (2003). Water-soluble quantum dots for multiphoton fluorescence imaging in vivo. Science, 300(5624):1434–1436. 11. Akerman ME, Chan WC, Laakkonen P, Bhatia SN, Ruoslahti E (2002). Nanocrystal targeting in vivo. Proc Natl Acad Sci USA, 99(20):12617–12621.
724
NANOTECHNOLOGY-BASED BIOMARKER DETECTION
12. Dubertret B, Skourides P, Norris DJ, Noireaux V, Brivanlou AH, Libchaber A (2002). In vivo imaging of quantum dots encapsulated in phospholipid micelles. Science, 298(5599):1759–1762. 13. Wu X, Liu H, Liu J, et al. (2003). Immunofluorescent labeling of cancer marker Her2 and other cellular targets with semiconductor quantum dots. Nat Biotechnol, 21(1):41–46. 14. Jaiswal JK, Mattoussi H, Mauro JM, Simon SM (2003). Long-term multiple color imaging of live cells using quantum dot bioconjugates. Nat Biotechnol, 21(1):47–51. 15. Ishii D, Kinbara K, Ishida Y, et al. (2003). Chaperonin-mediated stabilization and ATP-triggered release of semiconductor nanoparticles. Nature, 423(6940):628–632. 16. Medintz IL, Clapp AR, Mattoussi H, Goldman ER, Fisher B, Mauro JM (2003). Self-assembled nanoscale biosensors based on quantum dot FRET donors. Nat Mater, 2(9):630–638. 17. Dahan M, Levi S, Luccardini C, Rostaing P, Riveau B, Triller A (2003). Diffusion dynamics of glycine receptors revealed by single-quantum dot tracking. Science, 302(5644):442–445. 18. Rosenthal SJ, Tomlinson I, Adkins EM, et al. (2002). Targeting cell surface receptors with ligand-conjugated nanocrystals. J Am Chem Soc, 124(17): 4586–4594. 19. Yezhelyev M, Gao X, Xing Y, Al-Hajj A, Nie S, O’Regan R (2006). Emerging use of nanoparticles in diagnosis and treatment of breast cancer. Lancet Oncol, 7:657–667. 20. Gao X, Cui Y, Levenson RM, Chung LW, Nie S (2004). In vivo cancer targeting and imaging with semiconductor quantum dots. Nat Biotechnol, 22(8):969–976. 21. Kim S, Lim YT, Soltesz EG, et al. (2004). Near-infrared fluorescent type II quantum dots for sentinel lymph node mapping. Nat Biotechnol, 22(1):93–97. 22. Derfus A, Chan W, Bhatia S (2004). Probing the cytotoxicity of semiconductor quantum dots. Nano Lett, 4(1):11–18. 23. Ness JM, Akhtar RS, Latham CB, Roth KA (2003). Combined tyramide signal amplification and quantum dots for sensitive and photostable immunofluorescence detection. J Histochem Cytochem, 51(8):981–987. 24. Ballou B, Lagerholm BC, Ernst LA, Bruchez MP, Waggoner AS (2004). Noninvasive imaging of quantum dots in mice. Bioconjug Chem, 15(1):79–86. 25. Lidke DS, Nagy P, Heintzmann R, et al. (2004). Quantum dot ligands provide new insights into erbB/HER receptor-mediated signal transduction. Nat Biotechnol, 22(2):198–203. 26. Oyewumi MO, Yokel RA, Jay M, Coakley T, Mumper RJ (2004). Comparison of cell uptake, biodistribution and tumor retention of folate-coated and PEG coated gadolinium nanoparticles in tumor-bearing mice. J Control Release, 95(3):613–626. 27. Schellenberger EA, Bogdanov A Jr, Hogemann D, Tait J, Weissleder R, Josephson L (2002). Annexin V-CLIO: a nanoparticle for detecting apoptosis by MRI. Mol Imaging, 1(2):102–107.
REFERENCES
725
28. Harisinghani MG, Barentsz J, Hahn PF, et al. (2003). Noninvasive detection of clinically occult lymph-node metastases in prostate cancer. N Engl J Med, 348(25):2491–2499. 29. Winter PM, Morawski AM, Caruthers SD, et al. (2003). Molecular imaging of angiogenesis in early-stage atherosclerosis with alpha(v)beta3-integrin-targeted nanoparticles. Circulation, 108(18):2270–2274. 30. Perez JM, Simeone FJ, Saeki Y, Josephson L, Weissleder R (2003). Viral induced self-assembly of magnetic nanoparticles allows the detection of viral particles in biological media. J Am Chem Soc, 125(34):10192–10193. 31. Zhang Y, Sun C, Kohler N, Zhang M (2004). Self-assembled coatings on individual monodisperse magnetite nanoparticles for efficient intracellular uptake. Biomed Microdevices, 6(1):33–40. 32. Yan F, Xu H, Anker J, et al. (2004). Synthesis and characterization of silicaembedded iron oxide nanoparticles for magnetic resonance imaging. J Nanosci Nanotechnol, 4(1–2):72–76. 33. Rabin O, Manuel Perez J, Grimm J, Wojtkiewicz G, Weissleder R (2006). An x ray computed tomography imaging agent based on long-circulating bismuth sulphide nanoparticles. Nat Mater, 5(2):118–122. 34. Sipkins DA, Cheresh DA, Kazemi MR, Nevin LM, Bednarski MD, Li KC (1998). Detection of tumor angiogenesis in vivo by alphaνbeta3-targeted magnetic resonance imaging. Nat Med, 4(5):623–626. 35. Anderson SA, Rader RK, Westlin WF, et al. (2000). Magnetic resonance contrast enhancement of neovasculature with alpha(ν)beta(3)-targeted nanoparticles. Magn Reson Med, 44(3):433–439. 36. Winter PM, Caruthers SD, Kassner A, et al. (2003). Molecular imaging of angiogenesis in nascent Vx-2 rabbit tumors using a novel alpha(nu)beta3-targeted nanoparticle and 1.5 tesla magnetic resonance imaging. Cancer Res, 63(18):5838–5843. 37. Morawski AM, Winter PM, Crowder KC, et al. (2004). Targeted nanoparticles for quantitative imaging of sparse molecular epitopes with MRI. Magn Reson Med, 51(3):480–486. 38. Louie A, Huber M, Ahrens E, et al. (2000). In vivo visualization of gene expression using magnetic resonance imaging. Nat Biotechnol, 18:321–325. 39. Neuwelt EA, Varallyay P, Bago AG, Muldoon LL, Nesbit G, Nixon R (2004). Imaging of iron oxide nanoparticles by MR and light microscopy in patients with malignant brain tumours. Neuropathol Appl Neurobiol, 30(5): 456–471. 40. Hayflick L (1997). Mortality and immortality at the cellular level: a review. Biochemistry (Mosc), 62(11):1180–1190. 41. Grimm J, Perez JM, Josephson L, Weissleder R (2004). Novel nanosensors for rapid analysis of telomerase activity. Cancer Res, 64(2):639–643. 42. Kirchner C, Liedl T, Kudera S, et al. (2005). Cytotoxicity of colloidal CdSe and CdSe/ZnS nanoparticles. Nano Lett, 5(2):331–338. 43. Fu CC, Lee HY, Chen K, et al. (2007). Characterization and application of single fluorescent nanodiamonds as cellular biomarkers. Proc Natl Acad Sci USA, 104(3):727–732.
726
NANOTECHNOLOGY-BASED BIOMARKER DETECTION
44. Santra S, Zhang P, Wang K, Tapec R, Tan W (2001). Conjugation of biomolecules with luminophore-doped silica nanoparticles for photostable biomarkers. Anal Chem, 73(20):4988–4993. 45. Xu H, Aylott JW, Kopelman R (2002). Fluorescent nano-PEBBLE sensors designed for intracellular glucose imaging. Analyst, 127(11):1471–1477. 46. Clark HA, Hoyer M, Philbert MA, Kopelman R (1999). Optical nanosensors for chemical analysis inside single living cells: 1. Fabrication, characterization, and methods for intracellular delivery of PEBBLE sensors. Anal Chem, 71(21):4831–4836. 47. Clark HA, Kopelman R, Tjalkens R, Philbert MA (1999). Optical nanosensors for chemical analysis inside single living cells: 2. Sensors for pH and calcium and the intracellular application of PEBBLE sensors. Anal Chem, 71(21):4837–4843. 48. Brasuel M, Kopelman R, Miller TJ, Tjalkens R, Philbert MA (2001). Fluorescent nanosensors for intracellular chemical analysis: decyl methacrylate liquid polymer matrix and ion-exchange-based potassium PEBBLE sensors with real-time application to viable rat C6 glioma cells. Anal Chem, 73(10):2221–2228. 49. Xu H, Aylott JW, Kopelman R, Miller TJ, Philbert MA (2001). A real-time ratiometric method for the determination of molecular oxygen inside living cells using sol-gel-based spherical optical nanosensors with applications to rat C6 glioma. Anal Chem, 73(17):4124–4133. 50. Yu X, Song S-K, Chen J, et al. (2000). High-resolution MRI characterization of human thrombus using a novel fibrin-targeted paramagnetic nanoparticle contrast agent. Magn Reson Med, 44:867–872. 51. May DJ, Allen JS, Ferrara KW (2002). Dynamics and fragmentation of thick shelled microbubbles. IEEE Trans Ultrason Ferroelectr Freq Control, 49(10):1400–1410. 52. Bloch S, Wan M, Dayton P, Ferrara KW (2004). Optical observation of lipid- and polymer-shelled ultrasound microbubble contrast agents. Appl Phys Lett, 84:631–633. 53. Nam J, Mirkin C (2004). Bio-barcode-based DNA detection with PCR-like sensitivity. J Am Chem Soc, 126:5932–5933. 54. Nam JM, Thaxton CS, Mirkin CA (2003). Nanoparticle-based bio-bar codes for the ultrasensitive detection of proteins. Science, 301(5641):1884–1886. 55. Gao F, Wang L, Tang L, Zhu C (2005). A Novel nano-sensor based on rhodamine-b-isothiocyanate-doped silica nanoparticle for pH measurement. Microchim Acta, 152:131–135. 56. Yoon TJ, Yu KN, Kim E, et al. (2006). Specific targeting, cell sorting, and bioimaging with smart magnetic silica core-shell nanomaterials. Small, 2(2):209–215. 57. Lesniak W, Bielinska AU, Sun K, et al. (2005). Silver/dendrimer nanocomposites as biomarkers: fabrication, characterization, in vitro toxicity, and intracellular detection. Nano Lett, 5(11):2123–2130. 58. Frank JA, Miller BR, Arbab AS, et al. (2003). Clinically applicable labeling of mammalian and stem cells by combining superparamagnetic iron oxides and transfection agents. Radiology, 228(2):480–487.
REFERENCES
727
59. Kraitchman DL, Heldman AW, Atalar E, et al. (2003). In vivo magnetic resonance imaging of mesenchymal stem cells in myocardial infarction. Circulation, 107(18):2290–2293. 60. McFarland A, Duyne R (2003). Single silver nanoparticles as real-time optical sensors with zeptomole sensitivity. Nano Lett, 3:1057–1062. 61. Haes A, Van Duyne R (2003). A nanoscale optical biosensor: sensitivity and selectivity of an approach based on the localized surface plasmon resonance spectroscopy of triangular silver nanoparticles. J Am Chem Soc, 124:10596–10604. 62. Geho DH, Lahar N, Ferrari M, Petricoin EF, Liotta LA (2004). Opportunities for nanotechnology-based innovation in tissue proteomics. Biomed Microdevices, 6(3):231–239. 63. Liotta LA, Ferrari M, Petricoin E (2003). Clinical proteomics: written in blood. Nature, 425(6961):905. 64. Cui Y, Wei Q, Park H, Lieber CM (2001). Nanowire nanosensors for highly sensitive and selective detection of biological and chemical species. Science, 293(5533):1289–1292. 65. Hahm J-I, Lieber C (2004). Direct ultrasensitive electrical detection of DNA and DNA sequence variations using nanowire nanosensors. Nano Lett, 4(1):51–54. 66. Zheng G, Patolsky F, Cui Y, Wang WU, Lieber CM (2005). Multiplexed electrical detection of cancer markers with nanowire sensor arrays. Nat Biotechnol, 23(10):1294–1301. 67. Wang WU, Chen C, Lin KH, Fang Y, Lieber CM (2005). Label-free detection of small-molecule-protein interactions by using nanowire nanosensors. Proc Natl Acad Sci USA, 102(9):3208–3212. 68. Patolsky F, Zheng G, Hayden O, Lakadamyali M, Zhuang X, Lieber CM (2004). Electrical detection of single viruses. Proc Natl Acad Sci USA, 101(39):14017–14022. 69. Hansen KM, Ji HF, Wu G, et al. (2001). Cantilever-based optical deflection assay for discrimination of DNA singlenucleotide mismatches. Anal Chem, 73(7):1567–1571. 70. Wu G, Datar RH, Hansen KM, Thundat T, Cote RJ, Majumdar A (2001). Bioassay of prostate-specific antigen (PSA) using microcantilevers. Nat Biotechnol, 19(9):856–860. 71. Su M, Li S, Dravid V (2003). Microcantilever resonance-based DNA detection with nanoprobes. Appl Phys Lett, 82(20):3562–3564. 72. Chen H, Han J, Li J, Meyyappan M (2004). Microelectronic DNA assay for the detection of BRCA1 gene mutations. Biomed Microdevices, 6(1):55–60. 73. Bashir R, Hilt J, Elibol O, Gupta A, Peppas N (2002). Micromechanical cantilever as an ultrasensitive pH microsensor. Appl Phys Lett, 81(16):3091–3093. 74. Baller MK, Lang HP, Fritz J, et al. (2000). A cantilever array-based artificial nose. Ultramicroscopy, 82(1–4):1–9. 75. Demers LM, Ginger DS, Park SJ, Li Z, Chung SW, Mirkin CA (2002). Direct patterning of modified oligonucleotides on metals and insulators by dip-pen nanolithography. Science, 296(5574):1836–1838.
728
NANOTECHNOLOGY-BASED BIOMARKER DETECTION
76. Lee KB, Park SJ, Mirkin CA, Smith JC, Mrksich M (2002). Protein nanoarrays generated by dip-pen nanolithography. Science, 295(5560):1702–1705. 77. Lee KB, Lim JH, Mirkin CA (2003). Protein nanostructures formed via direct write dip-pen nanolithography. J Am Chem Soc, 125(19):5588–5589. 78. Bruckbauer A, Zhou D, Kang DJ, Korchev YE, Abell C, Klenerman D (2004). An addressable antibody nanoarray produced on a nanostructured surface. J Am Chem Soc, 126(21):6508–6509. 79. Li J, Ng HT, Cassell A, et al. (2003). Carbon nanotube nanoelectrode array for ultrasensitive DNA detection. Nano Lett, 3(5):597–602. 80. Sotiropoulou S, Chaniotakis NA (2003). Carbon nanotube array-based biosensor. Anal Bioanal Chem, 375(1):103–105. 81. Basu M, Seggerson S, Henshaw J, et al. (2004). Nano-biosensor development for bacterial detection during human kidney infection: use of glycoconjugatespecific antibody-bound gold NanoWire arrays (GNWA). Glycoconj J, 21(8–9):487–496. 82. Chu W-H, Chin R, Huen T, Ferrari M (1999). Silicon membrane nanofilters from sacrificial oxide removal. J Microeletromech Syst, 8(1):34–42. 83. Desai T, Hansford D, Kulinski L, et al. (1999). Nanopore technology for biomedical applications. Biomed Microdevices, 2(1):11–40. 84. Han J, Craighead HG (2000). Separation of long DNA molecules in a microfabricated entropic trap array. Science, 288(5468):1026–1029. 85. Hwang E-S, Cao C, Hong S, et al. (2006). The DNA hybridization assay using single-walled carbon nanotubes as ultrasensitive, long-term optical labels. Nanotechnology, 17:3442–3445. 86. McCreery R (2002). Photometric standards for Raman spectroscopy. In Chalmers J, Griffiths P (eds.), Handbook of Vibrational Spectroscopy. Wiley, Hoboken, NJ. 87. Dresselhaus MS, Dresselhaus G, Jorio A, Souza Filho AG, Pimenta MA, Saito R (2002). Single nanotube Raman spectroscopy. Acc Chem Res, 35(12):1070–1078. 88. Barone PW, Baik S, Heller DA, Strano MS (2005). Near-infrared optical sensors based on single-walled carbon nanotubes. Nat Mater, 4(1):86–92. 89. Barone PW, Parker RS, Strano MS (2005). In vivo fluorescence detection of glucose using a single-walled carbon nanotube optical sensor: design, fluorophore properties, advantages, and disadvantages. Anal Chem, 77(23):7556–7562. 90. Wray S, Cope M, Delpy DT, Wyatt JS, Reynolds EO (1988). Characterization of the near infrared absorption spectra of cytochrome aa3 and haemoglobin for the non-invasive monitoring of cerebral oxygenation. Biochim Biophys Acta, 933(1):184–192. 91. O’Connell MJ, Bachilo SM, Huffman CB, et al. (2002). Band gap fluorescence from individual single-walled carbon nanotubes. Science, 297(5581):593–596. 92. Kong J, Franklin NR, Zhou C, et al. (2000). Nanotube molecular wires as chemical sensors. Science, 287(5453):622–625. 93. Zhang X, Wang J, Ogorevc B, Spichiger U (1999). Glucose nanosensor based on Prussian-Blue modified carbon-fiber cone nanoelectrode and an integrated reference electrode. Electroanalysis, 11(13):945–949.
REFERENCES
729
94. Besteman K, Lee J-O, Wiertz F, Heering H, Dekker C (2003). Enzyme-coated carbon nanotubes as single-molecule biosensors. Nano Lett, 3(6):727–730. 95. Lin Y, Lu F, Tu Y, Ren Z (2004). Glucose biosensors based on carbon nanotube nanoelectrode ensembles. Nano Lett, 4(2):191–195. 96. Chen RJ, Bangsaruntip S, Drouvalakis KA, et al. (2003). Noncovalent functionalization of carbon nanotubes for highly specific electronic biosensors. Proc Natl Acad Sci USA, 100(9):4984–4989. 97. Woolley AT, Guillemette C, Li Cheung C, Housman DE, Lieber CM (2000). Direct haplotyping of kilobase-size DNA using carbon nanotube probes. Nat Biotechnol, 18(7):760–763. 98. Lee W, Cho Y-H (2004). Nanomechanical protein detectors using electrothermal nanogap actuators. 17th IEEE International Conference on Micro Electro Mechanical Systems, Maastricht, The Netherlands. IEEE Press, Piscataway, NJ, pp. 629–632. 99. Donaldson K, Stone V, Tran CL, Kreyling W, Borm PJ (2004). Nanotoxicology. Occup Environ Med, 61(9):727–728. 100. Portney NG, Ozkan M (2006). Nano-oncology: drug delivery, imaging, and sensing. Anal Bioanal Chem, 384(3):620–630.
INDEX
Abatacept, 612, 615, 617 Absolute risk, 266 Absorption, distribution, metabolism and elimination (ADME), 34, 504, 543, 594, 610 Accelerated Approval Rule, 490 Acceptance criteria, 189, 198, 202–204, 206– 208, 252, 255, 275 Accreditation, 84, 464–465, 626 Acquired immune deficiency syndrome (AIDS), 17, 414, 554, 700 ACR20, 612–613, 616–617 Actemra, 129 Activated partial thromboplastin time (aPTT), 415, 447–449, 456–457, 459 Acute coronary syndrome (ACS), 129, 544 Acute interstitial nephritis, 338 Acute phase reactants, 121, 128 Adaptive design, 240, 364, 372 Adipokines, 115 Adverse event(s), 80, 228–229, 233, 240, 290, 315, 331–332, 363, 369, 434, 439, 441, 508, 556, 582–583, 701 Adverse reaction(s) (AR), 179, 226–229, 435, 629 Adverse responses, 291, 294, 435 Alanine aminotransferase (ALT), 291, 440 Aldosterone, 371, 415 Algorithm(s), 83, 105, 130, 137–138, 143, 146, 149, 180, 203, 278, 280, 283, 316, 365, 389, 392, 507, 559, 598, 603, 654, 667, 689
Allele frequency(s), 611, 616 Alzheimer disease (AD), 18, 38, 86, 109, 127, 628, 696, 704 Alzheimer’s Disease Neuroimaging Initiative (ADNI), 86 American Association for Cancer Research, 23 American Association of Bioanalysts (AAB), 465 American Association of Blood Banks (AABB), 465 American Board for Clinical Chemistry, 469 American Board of Bioanalysts, 469 American Board of Medical Genetics, 469 American Board of Medical Microbiology, 469 American Board of Pathology, 469 American College of Radiology (ACR), 84 American Society for Clinical Oncology, 23, 150 American Society for Histocompatibility and Immunogenetics (ASHI), 465 Aminoglycoside(s), 295, 336, 338, 342 Amyotrophic lateral sclerosis, 24 Anaemia, see Anemia Anakinra, 129, 612, 614 Analysis Data Model (AdaM), 580 Analysis of variance (ANOVA), 268, 270– 272, 282, 650–651, 666–667, 670 Anaphylaxis, 330 Anemia, 10, 328, 330, 336
Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Mallé Jurima-Romet, and Ramin Rahbari Copyright © 2010 John Wiley & Sons, Inc.
731
732
INDEX
Angiogenesis, 67–70, 77, 114–115, 504, 605, 714 Angiography, 129, 701 Angiotensin, 337, 371, 415 Angiotensinogen, 115 ANOVA, 268, 270–272, 282, 650–651, 666– 667, 670. See also Analysis of variance Antibodies to citrullinated protein antigens (ACPA), 405–406, 411 Antibody array(s), 124, 522 Anticoagulant, 192, 445, 447–448, 450–451, 454, 457, 459, 460 Antisera, 323, 330, 341, 343 Apolipoprotein, 128, 130, 341 Apoptosis, 67–68, 341, 439–440, 479, 504, 597 Armune BioScience, 523–525 Arrhythmias, 10, 371 Arthritis, 129, 400, 405–407, 415, 544, 634, 701, 704 Artifact, 76–77, 79, 102–103, 256, 454 Asthma, 127, 415, 697 Atorvastatin, 556. See also Statins Attrition, 188, 376–377, 438, 445–446, 460, 518, 548, 677, 694–695 AUC, 107, 111–112, 306, 311–312, 317, 417, 542 Auscultation, 6 Autoantibody(s), 168, 171, 324–325, 401, 404–405, 408–409, 516, 524–525 Autoantigen(s), 161, 172, 323, 615 Autoimmune , 128–129, 133, 155, 157, 402–403, 486, 615, 617, 721 Autoimmunity 331 Autoradiography, 136 Bayesian approach, 256, 281–282, 364, 658 Bayesian information, 280 Bayesian method, 277, 281 Bayesian modeling, 281, 283 Bayesian simulation, 255 Bayesian statistics, 240, 280 BCR-ABL, 17, 239, 368–369, 553, 598 Below the levels of quantitation (BLQ), 274–275 Benefit/risk, 594, 699. See also Risk/cost benefit and Risk-benefit Bilirubin, 292 Bioanalytical drug assay guidance, 208 Bioanalytical laboratory, 193, 196, 198, 200, 204, 207–208, 210 Bioanalytical method, 189, 207, 416 Bioanalytical validation, 207–208, 293, 380 Bioassay(s), 11, 252, 329, 459, 542
Bioavailability, 302–303, 693–695 Biobank, 395, 625–627, 629–637 Biodistribution, 68–69, 723 Bioinformatics, 21, 26, 86–87, 105–106, 110, 232, 363, 388–389 Bioinformatics Organization, 86 Biomarker Consortium, 86, 210 Biomarker Qualification Pilot Process, 182, 380 Biomarkers Definitions Working Group, 695 Biomarkers Technical Committee, 296 Biospecimens, 25 Bisphosphonates, 311, 427 Blank(s), 194, 198–199, 209, 217–218 Blanket consent, 633 Blocking patent(s), 659, 571 Blood pressure, 5–6, 37, 85, 115, 368, 371, 405, 407, 416, 436, 665, 696–697, 700 Blood urea nitrogen (BUN), 23, 181, 315, 335, 338, 344, 346, 350, 542 Bone marrow, 138, 438–439, 478, 617, 722 Bone mineral density (BMD), 426, 701–704 Brain, 9, 46–47, 76, 81, 193, 341, 343, 425, 451–456, 504, 696, 704–705, 714 BrdU, 546 Buprenorphine, 423–424 C-reactive protein, 130 Caco-2, 36 Calcitonin, 308, 310 Calcium, 301–316, 318–319, 336, 347, 546, 715 Calibration, 67, 80, 124, 218, 393, 450, 467 Canadian Citizen’s Conference on Biotechnology, 636 Canadian Public Consultation on Xenotransplantation, 636 Canadian Tumor Repository Network, 632 Cancer antigen 125 (CA-125), 63, 130 Cancer Biomedical Informatics Grid (caBIG), 87 Cancer Genome Atlas, 21 Carcinoma, 86, 139, 507, 524, 545–546, 599, 603, 606–607 Cardiology, 8, 55, 578 Cardiotoxicity, 144, 296 Cardiovascular disease, 18, 26, 127, 129, 187, 400, 405, 407–410, 478, 706 Cardiovascular effects, 304 Cardiovascular function, 303 Cardiovascular morbidity/mortality, 407–408 Cardiovascular parameters, 436 Cardiovascular risk, 555–556
INDEX CardioVascular Research Grid (cvrGRID), 87 Cartilage oligomeric matrix protein (COMP), 410 Case report forms (CRFs), 582 C-C chemokine ligand 18 (CCL 18), 479–480 CCR5, 428–429, 554 CD4, 17, 408, 414–415, 421, 700 Cecum, 305 Center for Biomarkers in Imaging, 46 Center for Devices and Radiological Health (CDRH), 231, 233, 235, 242–244 Center(s) for Disease Control, 555 Center for Drug Evaluation and Research (CDER), 179, 188, 229, 233, 240, 242–244 Centers for Medicare and Medicaid Services (CMS), 85, 464–465 Central nervous system, 23, 127, 193, 303, 366, 370, 440, 479, 506, 545 Cetuximab, 239, 489 Chain of custody, 103, 197 Chemiluminescence, 124–125, 351 Chemokine(s), 121, 123, 127, 132, 415, 479, 554 Chemotherapy, 135–138, 141–142, 144–148, 150, 489, 705 Chitotriosidase, 479–480 Chlorosis, 10 Cholesterol, 37, 85, 143, 270, 292, 336, 366, 368, 408, 415, 478, 555–556, 657, 696– 697, 700, 721 Chromatography, 12–14, 102, 104, 162, 190, 194, 349, 476 column, 555 high performance liquid (HPLC), 12 Chronic myelogenous/myeloid leukemia (CML), 17, 366, 368, 553, 598 Chronic obstructive pulmonary disease (COPD), 127, 415 Citrullinated antigens, 405 Citrullinated proteins, 404–406 Civic virtue, 637 Clinical acceptance, 238 Clinical Data Interchange Standards Consortium (CDISC), 579, 581–583, 585–587 Clinical data management system (CDMS), 582 Clinical diagnostic surrogate biomarker, 52–53 Clinical Document Architecture (CDA), 580
733
Clinical Laboratory Improvement Act (CLIA), 127, 208, 464–465, 467–471, 524, 603 Clinical Laboratory Improvement Amendments (CLIA) of 1988, 473 Clinical Laboratory Standards Institute (CLSI), 199–200, 208, 234 Clinical pathology, 290–292, 308, 446, 551, 555 Clinical registries, 18 Clinical relevance, 57, 149, 220, 255, 259, 479, 504, 508, 698 Clinical significance, 210, 250, 378, 500, 506 Clinical utility, 25–26, 225–228, 241, 335, 464 Clusterin, 23, 116, 181, 339, 341–342, 350 Cmax, 417, 542, 544–545 Code of Federal Regulations (CFR), 208, 232–235, 242, 244, 490, 588 Co-development, 17, 26, 179, 229, 231–232, 234, 236–244, 499, 508, 518, 529, 596 Coefficient of variation (CV), 102, 218–219, 262, 650 Cognitive virtue, 637 Coincidence, 258 Collaborative Research and Development Agreement (CRADA), 580, 586 Collagen type II neoepitope (TIINE), 558 College of American Pathology/College of American Pathologists (CAP), 465, 522 Column chromatography, 555. See also Chromatography Commercialization, 58, 85, 125, 132, 188, 517, 522–523, 528–530, 532–533, 535, 538– 539, 571, 583, 627, 630 Common Procedural Terminology (CPT), 472 Companion diagnostics, 133, 216, 227–230, 232 Compendia Bioscience, 522, 525 Complement, 20–21, 68, 70, 105, 116, 131–132, 165, 219, 233, 295, 331, 341, 344, 366, 497–499, 654 Complement associated protein SP-40, 341 Complement cytolysis inhibitor protein, 341 Composition of matter, 569 Computational biology, 19, 627, 629 Computer model, 599 Confidence limit, 205, 217, 251 Confidentiality, 296, 627–628 Consortium, 20, 22, 85–86, 181–183, 210, 296, 338, 393, 631–632 Contract research organization(s) (CRO), 15, 67, 73, 267, 496, 583 Cooperate Research and Development Agreement (CRADA), 580, 586
734
INDEX
Coronary artery disease, 55–56, 129, 700 Cost-benefit analysis, 226 Coumadin, 447–448, 450–451, 454, 457, 459 Council of Europe Committee of Ministers, 631 Council on Laboratory Accreditation (COLA), 465 Creatinine, 23, 181, 193, 270, 291, 293, 306, 310, 315, 335, 337, 542, 546, 700 Creatinine kinase (CK), 293 Critical path initiative 231, 244, 416, 421 Critical Path Institute, 23, 181, 296, 338. See also Critical Path Initiative Critical Path Initiative (C-Path), 22–23, 85, 340, 343, 347, 491, 577. See also Critical Path Institute Crohn disease, 12, 7, 129, 701 Cross reactivity, 200, 234, 340, 351 Cross-validation, 138, 140, 143, 145–146, 209, 280, 384, 387 C-telopepetide of collagen type II (CTX II), 410 C-terminal hemagglutinin (HA) tag, 167 Cyclic citrullinated peptides (CCP), 405, 407, 409–410 Cyclophosphamide, 146 Cystatin C (Cys-C), 23, 181, 339, 342–343, 542 Cysteine protease inhibitor(s), 342 Cytokine(s), 115, 121, 123, 125, 127–130, 188, 196, 402, 404, 410, 415, 479, 486, 488, 517, 544, 612–614 Cytokine storm, 131 Cytotoxicity, 438, 440–441, 712, 715, 722 Data Discovery Query Builder (DDQB), 583 Data management, 18, 501, 582, 587 mining, 126, 207, 255, 559 warehouse(s), 587, 590 dbSNP, 582, 587 Decision gates, 33, 35–36, 498–499, 578–579 Decision theory, 664, 689 Declaration of Helsinki, 632 Declaration on Ethical Considerations Regarding Health Databases, 631 DeCODE Genetics, 625, 630 Design around, 570–571 Detection limits, 217, 720 Development plan, 33, 35, 241, 243, 503 Diabetic nephropathy, 21 Diagnostic likelihood ratios (DLR), 219, 221–222 Diethylene glycol, 289
Differential diagnosis, 481, 660 Diffusion MRI, 76. See also Imaging Digital Imaging and Communications in Medicine DICOM), 580–581, 585–586, 588 Disease management, 15, 22, 24, 27, 53, 129, 377 Disease modifying anti-rheumatic drugs (DMARD), 410, 612–613, 616–617 DNA hybridization, 721 Documentation, 103, 188, 194, 196–197, 202, 208–209, 235, 450, 588 Dose selection, 188–189, 303, 306, 381, 450, 458, 510 Dose-response, 74, 272, 318, 427 Doxorubicin, 146–148 Drug concentration(s), 201, 413–414, 417–420, 426, 449–450, 453, 455–459, 542 induced toxicity, 335, 435 safety, 188, 208. 239–240, 319, 429, 543, 547 target interactions, 57, 602 Drug-Diagnostic Co-Development, 179, 241 Duodenum, 305 Duration of action, 362 Dynamic Contrast Enhanced-MRI (DCEMRI), 68–69, 70–80, 82–84. See also Imaging Dynamic range, 102, 104, 107 Echoencephalogram, 8 EIA, 329 Electrocardiogram (ECG), 10, 55, 436, 583 Electrocardiographic features, 10 Electrocardiographic QT interval, 543 Electrocardiography, 9, 543 Electrochemiluminescence, 124 Electroencephalograph (EEG), 10, 55, 446, 542–543 Electromagnets, 10 ELISA(s), 12, 108, 114, 123–124, 157, 164, 168, 171, 335, 339, 344, 349–351, 425, 554, 558 Endoscopy, 127, 701 Endothelial activation, 478 Endothelial cell(s), 347, 369, 597 Endothelial growth factor reception expression, 47 Endothelial marker(s), 478 Endothelial smooth muscle injury, 292 Endothelin, 130 Enrichment, 60, 103–104, 179, 238, 618 Entropy, 644, 647–648, 654–655
INDEX Epidermal growth factor, 130, 331, 415 Epidermal growth factor receptor (EGFR), 239, 301, 332, 489, 507, 547, 584, 586, 603, 605–607 Epigenetics, 187 Epigenomics, 19, 21 Epithelial cells, 139, 336, 339, 341–342, 345, 347–350, 440, 546 Epitope, 167, 324–325, 328, 332, 350, 403, 516, 524, 615, 714 Equilibrium, 275, 419, 426, 429, 644–649, 652, 689 ErbB, 713 ErbB2, 139, 250, 596 Erlotinib, 489, 507, 603–604, 607 Etiology, 292, 295, 336–337, 400, 402–403, 411, 440, 597, 608–609 European Medicines Association (EMEA), 23, 180–183, 296, 700 European Organization for Research and Treatment of Cancer (EORTC), 62, 66 European Society for Human Genetics, 633 Exclusion criteria, 554 Exemption, 235, 243, 570–571 Extracellular signal-regulated kinases (ERK), 301–302 Extract-transform-load (ETL), 582, 586–587 Extrapolation, 157, 272, 421, 457 Eyeball test, 269 Factor X, 447–448, 450, 457–458, 535 Factor Xa (FXa), 447, 449–450, 452–458, 554–555 Family-wise error-rate control (FWER), 279 FatiGO, 389 Federal Food, Drug and Cosmetic Act, 232, 290 Femoral growth plate, 305 Femoral metaphysis, 305 Fibrinogen, 128, 188, 369, 405, 447 Fibrosis, 73, 115, 128, 236, 337, 339, 467, 471, 717 Fingerprinting, 106, 364, 607 First-in-human/first-in-man, 372, 459, 489, 510 Fit for purpose (FFP), 187, 189, 209, 211, 293, 349, 379, 381–383, 559 Flow cytometry, 11, 125, 438, 441 Fluorescence activated cell sorter (FACS), 11, 544 Fluorescence resonance energy transfer (FRET), 711 Fluorescent in situ hybridization (FISH), 239, 557, 603
735
Fluorophore(s), 710, 715, 722 Fluoroscope, 7 Forensic biostatistics, 248 Forward pharmacology, 362 Foundation for the National Institutes of Health (FNIH), 85–86, 88, 210 Fourier transform, 9 Fractionation, 104, 218 Fragmentation, 106 Freedom-to-operate (FTO), 570 Frog test, 11 Functional assay(s), 131, 196, 449 Gadolinium, 73–74, 79, 714 Galectin(s), 115–116 Galvanometer, 9 Gastrointestinal stromal tumors (GIST), 63, 66 Gastrointestinal tract, 303, 312–315 Gel electrophoresis, 14, 101, 122, 341, 472 GenBank, 582, 587 Gene amplification, 557, 603 frequency, 616 profiling, 135–139, 142, 144–151 Genetic Information Non-discrimination Act, 22 Genetic testing, 22, 26, 234, 466–467, 469–472, 586–587, 628, 633, 636 Genome mapping, 19 Genome wide association studies (GWAS), 19–21, 25, 225, 403, 490 Genomics Association Information Network (GAIN), 21 Genotype, 21, 275, 277, 403, 477, 507, 577, 586, 599, 609–611, 614, 616–617, 643, 696 Genotyping blind, 276 error, 276–277 for enzymes, 507 of tumors, 364 process, 276, 480, 719 service(s) 589 technology, 608 Gentamicin, 342, 347 Gleevac/Gleevec, 17, 239, 369, 553. See also Imatinib Globotriaosylceramide (Gb3), 477 Glomerular filtration rate (GFR), 335, 342 Glomerulosclerosis. 342, Glomerulus, 340, 342, 346 Glucagon, 188, 415, 609
736
INDEX
Glucose, 48, 65, 82, 187, 233, 344, 415, 434, 437, 489, 545, 605, 607, 609, 611, 665, 704, 715, 720–721 Glucose transporter(s), 65 Glucosidase, 477 Glutathione S-transferase(s) (GSTs), 339–340, 343–344, 350–351, 542 Glycolipid(s), 477 Glycoprotein, 341, 346–348, 350, 369 Glycoprotein III (GPIII), 341 Go/no go, 31–33, 35–36, 38, 507, 511 Goblet cell(s), 305 Gold standard, 171, 219, 221, 295, 387, 480, 516, 555, 610 Good laboratory practice(s) (GLP), 127–128, 150, 188, 197, 202, 208, 414, 446, 504, 509–510 Governance virtue, 637 Granulocyte(s), 131, 132 Granulocyte colony stimulating factor (G-CSF), 323 Growth factor(s), 121, 125, 188, 323 Guidance for Industry: Bioanalytical Method Validation, 380 Guideline on Non-Clinical Safety Studies for the Conduct of Human Clinical Trials for Pharmaceuticals, 290 Haemagglutination, see Hemagglutination Haematology, see Hematology Half-life, 34, 80–81, 326, 330, 417, 653 Haplotype, 20, 364, 614–615 HapMap, 20 Haptoglobin, 128, 130 Harmonization, 22, 75, 183–184, 230, 242, 290, 629, 631–632 HAVcr, 345 Hazard ratio (HR), 237 hCG, 11 Health and Environmental Sciences Institute, 338 Heart, 8, 10, 81, 87, 129, 308–310. 312, 315, 343, 370–371, 415, 436, 700, 705 Heat map, 168, 605 Hemagglutination, 11–12 Hemangiosarcoma, 545 Hematology, 10–11, 306, 434, 438, 465 Hematopoiesis, 114 Hematopoietic cell line, 438 Hematopoietic toxicity, 438 Heparin, 336, 447–449, 456–457, 459, 478, 509 Heparin cofactor II-thrombin (HCII-T), 478
Hepatic cell(s), 440 Hepatic enzyme level(s), 142 Hepatic failure, 439 Hepatic microsome(s), 439 Hepatic protein(s), 439 Hepatic transformation system(s), 18 Hepatitis, 130, 330, 345 Hepatocellular necrosis, 305, 317, 542 Hepatocyte(s). 36, 343, 439–440 Hepatocyte screening, 36 Hepatotoxicity, 142–144, 295, 439–440 Her2/Her2/neu or Her2Neu, 17, 38, 489, 507, 556–557, 713 HER2 receptor protein, 68, 507 Herceptin, 17, 38, 227, 239, 370, 507, 556. See also Trastuzumab hERG, 542, 544–545 Heterogeneity, 68, 73, 194, 393, 597, 612 Hexokinase, 65 High content screening, 36 High performance liquid chromatography (HPLC), 12. See also Chromatography High-complexity testing, 465, 471 High-density lipoprotein (HDL), 346, 415, 555–556 High-throughput bisulfite sequencing (HTBS), 21 Histochemical analysis, 489 Histology, 127, 305, 556, 558, 577, 612 Histopathology, 70, 128, 142, 290–291, 294– 295, 339, 542, 546, 605, 643 Home brew, 234 Homocysteine, 478 Human epidermal growth factor protein (HER2), 68, 489, 556, 713 Human Genes Research Act, 631 Human Genome Diversity Project, 626 Human Genome Organization (HUGO), 586, 628 Human Genome Project (HGP), 19, 157, 163, 626 Human Genome Research Law, 631, Human growth factor, 370 Human immunodeficiency virus (HIV), 17, 228, 414–415, 421, 428–429, 466, 554, 582, 700 Human international protein index (IPI), 106 Hypercholesterolemia, 292, 408, 657 Hyperphosphatemia, 306–307, 310–311, 313–314, 316 Hypersensitivity, 331 Hypertension, 115, 366, 371, 408, 415, 657
INDEX Ibandronate, 426–427 Icelandic Health Sector Database, 625, 630 Idiopathic membranous neuropathy, 340 Idiosyncratic hepatotoxicity, 439 Idiosyncratic liver injury, 439 Idiosyncratic toxicology reactions, 490 IL-1, 116, 127–130, 404, 612, 614, 616 IL-11, 127–128 IL-15, 129 IL-17, 127 IL-18, 128 IL-6, 127, 129, 192, 404, 486, 488, 612, 615–616 IL-8, 127, 130, 614 ILSI-HESI, see International Life Sciences Institute Imaging diffusion MRI, 76 dynamic contrast cnhanced-MRI (DCEMRI), 68–69, 70–80, 82–84 magnetic resonance (MRI), 9, 18, 38, 43, 704, 710 medical, 16, 19, 43–44, 54–55, 84, 577, 581 myocardial perfusion (MPI), 55–56. nuclear, 45–46, 58, 703 nuclear magnetic resonance (NMR), 9, 44–45, 48, 75, 546 optical, 43, 45 Imatinib, 17, 63, 66, 239, 489, 553, 598, 704. See also Gleevac Immune system, 121, 130–132, 171–172, 324, 326, 345, 402, 404–405, 414, 525 Immunity, 168–169, 171, 333, 404–406, 571, 572 Immunoanalyzer(s), 123 Immunoassay, 12, 108, 122–125, 127, 132, 171, 466 Immunogenicity, 323–333, 405 Immunohistochemical analysis, 349, 489 Immunohistochemical approaches, 150 Immunohistochemical examination, 345, 347 Immunohistochemical localization, 343 Immunohistochemical stain, 233, 344, 558 Immunohistochemical techniques, 476 Immunohistochemistry, 556 Immunoprecipitation, 157 Immunotoxicity, 131 Imprecision, 216, 251, 256 Impurities, 324–326 In vitro diagnostics (IVD), 231–235, 238, 463, 598 Inclusion criteria, 554 Incretin(s), 188
737
Induction, 253, 307, 314, 324–326, 328, 330, 332–333, 342, 440 Infectious disease, 26, 127, 162, 170, 172, 446, 466, 471, 616 Inflammatory disease(s), 127, 129, 399 Informatics, 18–19, 37, 86–87, 598, 602–603, 618 Informed consent, 109, 242, 390, 469, 625, 627–628, 631–634, 637 Infrastructure, 15–16, 18–20, 25–26, 36, 69, 77, 81, 87, 501, 535, 578, 590, 600, 630, 723 Infringement, 565, 568–573 Ingenuity, 389 Initial new drug (IND), 35, 180, 243, 250, 303, 489, 578 Inspection(s), 3–4, 66, 236, 464–465 Institutional review board(s) (IRBs), 32, 35, 114, 470, 508, 600 Instrument detection limit (IDL), 217–218 Integrating the Healthcare Enterprise (IHE), 84 Intellectual property (IP), 15, 26–27, 87, 297, 471, 520, 523, 534 Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM), 293 Intercellular adhesion molecule-type 1, 127 Interdisciplinary Pharmacogenomic Review Group (IPRG), 208 Interference(s), 81, 200, 206, 217–218, 234, 369, 468, 711, 715 Interferon gamma inducible protein (IP-10), 127–128 Interferon gamma release assay(s) (IGRA), 66, 171 Interferon regulating factor 5 (IRF-5), 406 Internal standard, 102, 218, 255, 264, 270, 351 International Conference on Harmonization (ICH), 183, 230, 290 International Life Sciences Institute (ILSI HESI), 296, 338, 340, 343, 347 International normalization ratio (INR), 415, 449–457, 459 International Society for Biological and Environmental Repositories (ISBER), 629 Intersocietal Accreditation Commission (IAC), 84 Intima media thickness (IMT), 409 Invention, 5–6, 566–572 Investigational New Drug Application (IND), 35, 180, 243, 250, 303, 489
738
INDEX
Investigator-Initiated NIH application (R 01), 532 Investment, 15, 19, 21, 26, 31–34, 69, 182, 376, 501, 506, 519, 521, 530, 534 Iron oxide, 714 Ischemia, 55, 314, 318, 342, 347–348 Isotope, 7–8, 13, 50, 58, 88, 101 JANUS Data Model, 580, 582 Jorden, 4 Juvenile diabetes, 24 Kidney disease, 4, 342–343 Kidney injury, 290–291, 335, 338–340, 342– 346, 348–351 Kidney injury molecule 1 (KIM-1), 23, 181, 339, 344–346, 348, 350 Kinetica, 419 Kininogen(s), 131–132 K-nearest-neighbour (K-NN), 138, 145 Knock-in, 506 Knock-out, 479, 506 Known valid biomarker(s), 54, 378, 381 Kymograph, 5 Lawsuit, 565, 568, 571–572 LC-MS/MS, 105–107, 111, 115, 557–558 Leave-one-out, 138, 140, 145–146 Left-censoring, 275 Leptin, 114, 188, 415 Leucocyte(s), 10 Leukemia, 17, 138–139, 302, 366, 368, 598, 715, 717 Leukotriene(s), 121, 131–132, 415 Library(s), 163–165, 344, 578 License(s), 58, 151, 468, 471, 523, 570–572, 625 Licensing, 471, 519, 523, 525, 569–570 Ligand(s), 122, 190, 195–196, 201–202, 209– 210, 365, 369, 418, 479, 599, 705, 710 Ligandin, 343 Limit of blank (LOB), 199–200 Limit of detection (LOD), 103, 116, 199–200, 217, 234, 274–275 Limit of quantitation (LOQ), 217–218 Linearity, 192, 195 Lipid(s), 37, 45, 115, 188, 341, 415, 434, 477–479, 555–556, 608, 715 Lipocalin(s), 115–116, 348 Liposarcoma, 546 Liquid chromatography-mass spectrometry (LC-MS), 13, 102, 104–107, 111, 114–115, 190, 557
Litigation, 571 Low-density lipoprotein (LDL), 37, 415, 555–556 Lower limit of detection (LLOD), 217–218, 253 Lower limit of quantification (LLOQ), 199– 200, 217–218, 275 Lowest observed effect levels (LOEL), 541, 543 Luminophore(s), 715 Lymphocyte(s), 17, 128, 131, 612–613, 617 Lymphopenia, 131 Lymphotactin, 127 Lysosomal disease(s), 475–476, 478 Lysosomal-associated membrane protein(s) (LAMP-1, LAMP-2), 477 Macrophage(s), 131, 331, 347, 406, 440–441, 614 Macrophage inflammatory cytokine (MIP1α), 128, 479 Macrophage inhibitory protein 1 alpha (MIP 1alpha), 128 Macroscopic examination, 290 Macroscopic measurement, 57, 643–647 Macular degeneration, 20, 115 Magnetic resonance imaging (MRI), 9, 18, 38, 43, 704, 710. See also Imaging Major histocompatibility class I, 340 Manometer, 5–6 MAP kinase kinase (MEK), 301–304, 307– 308, 315, 318 Maraviroc, 428–429, 554 Mass spectrometer, 13–14, 105 Mass spectroscopy (MS), 127, 507 Mass-to-charge ratio (m/z), 105–106 Matrix metalloproteinases (MMP), 129–130, 409, 556–557 Matrix-assisted laser desorption ionization (MALDI), 507 Maximum likelihood estimator (ML), 650, 670 Maximum tolerated dose (MTD), 368 Measurement error, 250–257, 260, 650–651, 666, 668, 670–671 Mechanism of action, 37–38, 77, 113, 131, 142, 188, 190, 302, 363–364, 366, 413, 418– 419, 440, 485, 504, 506, 606, 695–696, 698 Medical imaging, 16, 19, 43–44, 54–55, 84, 577, 581. See also Imaging Medicare, 84–85, 464–465, 472 Megakaryocyte, 330
INDEX Mesothelioma, 63 Metabolomics, 14, 19, 21, 362, 379, 387, 391, 490, 598 Metabonomics, 36, 440, 542, 545–547, 551, 556, 559, 577, 607 MetaCore, 389, 606 Method detection limit (MDL), 217–218 Methotrexate, 326, 337, 407, 410, 612, 614 Metronomy, 230 Michigan Economic Development Corporation, 523 Microbubble ultrasound, 78–81 Microglobulin, 23, 181, 339–340, 346–348 Microsphere, 123, 125–127, 132, 328 Milestone(s), 499, 570, 578–579 Mineralization, 301, 303–319 Minimal Information About Microarray Experiments (MIAME), 136 Minimum effective dose (MED), 368 Minimum expected difference, 264 Minimum-variance estimator (UMV), 650 MIP-1, 127 Mitogen-activated protein kinase (MAPK), 301 Model-based drug development (MBDD), 414, 416 Molecular biology, 13, 49, 362, 495 Monoclonal antibody(s) (mAb),.17, 167, 323–324, 349–350, 507, 596, 612 Monocyte(s), 128, 131, 614 Monocyte chemoattractant protein (MCP-1), 127–128 Monte Carlo, 255–256, 667 mRNA, 136, 163, 165, 188, 341, 344–345, 347, 437, 547, 596, 599, 608, 614, 618 Mucosa, 303, 305–306, 308–310, 312, 314–316 Multiple myeloma, 24, 205, 207, 510 Multiple sclerosis (MS), 18, 48, 127, 330, 698, 701, 704, 706 Multivariate, 107, 122, 127, 130, 348, 388, 598, 648–649, 654, 663–664 Mutation(s), 27, 164, 236, 239, 256, 276, 362, 364, 388, 466–467, 470–472, 547, 553– 554, 569, 584, 597, 634, 716, 719 Myasthenia gravis, 366, 369 Myocardial perfusion imaging (MPI), 55–56. See also Imaging Myocardium, 55, 76, 306, 314 Myoglobin, 293, 719 N-acetyl-beta-glucosaminidase (NAG), 340, 342 Nanoarray(s), 719–721
739
Nanocantilever(s), 718–719, 721 Nanodevice(s), 709–710, 716, 721–723 Nanodiamond(s), 715, 722 Nanoelectrode(s). 720–721 Nanoparticle(s), 710, 714–716, 722 Nano-PEBBLE, 715, 722 Nanoscale, 709, 715–716, 721–722 Nanotechnology, 19, 709–710, 721–723 Nanotoxicology, 710, 721–723 Nanowire(s), 716–718, 721 National Cancer Institute (NCI), 3, 62, 66, 84, 86–87, 580, 632 National Cancer Institute of Canada, 62 National Center for Biotechnology Information, 21 National Committee for Clinical Laboratory Standards (CLSI), 199–200, 208, 234 National Heart Lung and Blood Institute, 27 National Institutes of Health (NIH), 18, 23, 85, 88, 161, 163, 172, 210, 231, 505, 516–517, 519–520, 527–529, 532, 628, 695 National Library of Medicine, 21 National Research Council, 529–530 National Science Foundation (NSF), 527–528 Negative predictive value (NPV), 107, 116, 181, 219–221, 224–225, 659 Neoplasia, 301, 308 Neopterin, 415, 425 Nephrectomy, 342 Nephrogenic systemic fibrosis (NSF), 73 Nephrotoxic agents, 142–143 Nephrotoxic compounds, 142–144 Nephrotoxic effects, 344 Nephrotoxic side effects, 351 Nephrotoxicity, 142–143, 180, 182, 205, 296, 338–340, 342–344, 346–347, 350–351 Nervous system, 23, 127, 193, 303, 341, 566, 370, 440, 479, 506, 545 Neurotoxicity, 317, 370 Neurotransmitter, 370–371 Neutralization, 129, 329 Neutralizing agent(s), 369 Neutralizing antibody(s), 115, 329 Neutrophil(s), 131, 310, 348 Neutrophil gelatinase-associated lipocalin (NGAL), 115, 339, 348–350 New Drug Application(s) (NDAs), 180, 231, 578 No-adverse effect level(s) (NOAEL), 34, 306, 541, 543 Nobel Prize, 7, 10, 13 No-effect level(s) (NOEL), 312, 541, 543
740
INDEX
Nonequilibrium, 644, 648–649 NONMEM, 419 Non-obvious, 363, 567 Nonparametric test(s), 268–270 Non-responder(s), 17, 38, 138, 145–146, 227, 611, 705 Non steroidal anti-inflammatory drugs (NSAIDS), 336–337, 410 Norbuprenorphine, 423–424 Normal distribution, 206, 269, 273 Normalization, 103, 193, 264, 278, 311, 449, 455–456 Norwegian Act on Biobanks, 631 N-terminal poly-polyHistidine (polyHis) tag, 167 Nuclear imaging, 45–46, 58, 703. See also Imaging Nuclear magnetic resonance (NMR), 9, 44–45, 48, 75, 546, 555–556. See also Imaging Null hypothesis, 247, 266, 271, 280, 657 Nuremberg Code, 632 Odds ratio (OR), 220–222, 224–225, 237, 266– 267, 276–277, 283, 403, 614, 618 Office for Human Research Protection (OHRP), 631, 633 Office of Combination Products (OCP), 241– 243, 245 Office of In Vitro Diagnostic Devices and Safety (OIVD), 234–236 Oligonucleotide(s), 136, 720 Oligosaccharide(s), 477 Omics, 13–14, 559 Oncogene(s), 301, 364, 597 Oncogenicity, 301, 364, 368, 370, 599 Oncologics, 543, 547 Oncotype DX, 141, 150–151 Online Mendelian Inheritance in Man (OMIM) database, 612 Open Bioinformatics Foundation, 86 Open reading frames (ORFs), 161–166, 169, 172 Operational Data Mode(l) (ODM), 580 Optical imaging, 43, 45. See also Imaging Optimization, 50, 55, 58, 60, 75, 110, 179, 259, 438, 499, 660, 663–664, 689, 694, 722 Ordinary least-square(s) (OLS), 650 Osteoarthritis, 18, 38, 86, 557–558, 701, 704 Osteoarthritis Initiative (OAI), 86 Osteoblast(s), 308 Osteocalcin, 188, 308–310, 315–316, 703 Osteopontin (OPN), 339, 347
Osteoporosis, 205, 207, 427, 701–704 Outlier(s), 271, 274 Outsourcing, 496, 518–519 Ovarian cancer, 23, 25, 63, 110–111, 130, 161, 171 Overexpression, 171, 250, 607 Ovulation, 11 Pamidronate, 304, 311–312, 337 Pancreas, 343, 347–348 Parallelism, 194–195, 200, 209 Parametric test(s), 268–270, 272, 283 Parathyroid hormone (PTH), 306–308, 313 Patent, 10, 289, 390, 471, 520, 529, 565–573 application, 471, 566–568 attorney, 567, 570–571 Patent Cooperation Treaty (PCT), 568 Pathobiology, 36, 155, 510 Pathodynamics, 643, 645–648, 654, 688–689 Pathogenesis, 400–402, 408, 486–488, 594, 611 Patient progress, 188 selection, 38, 179, 241, 364, 372, 390, 392 stratification, 60, 188–189, 376, 489, 505–507 Pattern recognition, 25, 101, 127 Patterning, 121–122, 127, 132–133 Pegylation, 326 Peroxisome proliferator-activated receptor (PPAR), 188, 459, 545, 610 Personalized medicine(s), 25, 122, 250, 280– 281, 489, 499, 523, 556, 590, 593–595. 598–599, 605, 613, 616–618 Pharmaceutical Research and Manufacturers of America (PhRMA), 85, 693 Pharmacogenetic(s), 416, 489, 554, 594, 609 Pharmacogenomic(s), 19, 232, 235, 239, 244, 250, 280, 362, 584, 628–629, 635 Pharmacokinetic(s), 80, 190, 319, 333, 413– 414, 418, 420, 424, 496, 665 Pharmacotherapeutic decision, 24 Phenotype, 21, 25, 126, 224, 363, 477, 593, 595, 597–600, 605, 608–609, 611, 618, 696 Philadelphia chromosome, 17, 368, 489, 553 Phospholipid(s), 415, 440–441, 555 Phospholipidosis (PLD), 439–441, 452 Phosphorescence, 7 Phosphorus, 9, 301, 303, 305–319 Photobleaching, 715 Picture Archiving and Communications Systems (PACS), 582 Piezo-electric, 124, 159 Pioglitazone, 610
INDEX Pipeline(s), 16–17, 367, 491, 518 Pituitary, 343 Plasminogen activator inhibitor type 1, 130 Platelet(s), 10, 131, 366, 369–370, 449, 696 Polycystic kidney disease, 342 Polyethylene glycol (PEG), 326, 424–425, 712 Polymerase chain reaction (PCR), 18, 136– 137, 145, 149, 156, 164–166, 171, 344, 378, 386, 472, 547, 717 Polymorphism(s), 20–21, 236, 275, 364, 403, 406, 471, 490, 568, 608, 610, 612, 614– 615, 618, 721 Pompe’s disease, 24, 476–477 Population enrichment, 60 Positive predictive value (PPV), 219–221, 225, 650, 659 Positron emission tomography (PET), 8, 18, 43–46, 48, 58, 60–62, 66–68, 75, 80–81, 86, 425–426, 703–705, 712. See also Tomography Post-analytic factors, 471 Post-analytic testing, 468 Post-market applications, 180 commitments, 241 data, 491 decisions, 37 drug use, 422 patient monitoring, 188 reporting, 233 surveillance, 233, 382, 578 trials, 383, 451, 490, 702 Practical quantitation limit (PQL), 217–218 Pre-analytic circumstances, 192 Pre-analytic factors, 191, 202, 471 Pre-analytic testing, 466–467 Pre-analytic variables, 192 Pre-biomarker(s), 52 Predictability, 138, 327, 362, 438–439, 477 Predictive Safety Testing Consortium (PSTC), 181–182, 296 Pre-IND, 243 Pre-Investigational Device Exemption (preIDE), 235–236, 241, 243 Premarket approval (PMA), 233–236, 241 Prevalence, 220–221, 224, 228, 240, 383, 439, 654, 656 Primate(s), 327, 340 Prior art, 365, 566–568 Prioritization, 36, 602, 605–606 Probabilism, 645–646 Probable valid biomarker(s), 54, 296, 378, 381 Proficiency testing (PT), 465–466
741
Prognosis, 53, 128–129, 135, 139–141, 149–150, 171, 233, 282, 340, 401–402, 446, 464 Project management, 85, 495–496, 501–502, 510–511 Projection(s), 541, 543, 548 Prontosil, 289 Proof of concept, 24, 37, 65, 240, 364, 377, 410, 429, 491, 517, 521, 541, 579, 701 Proof-of-efficacy, 489 Proof-of-feasibility, 602 Proof-of-mechanism, 57, 60, 189 Proof of principle, 161, 477, 488 Proof-of-therapeutic-concept, 60 Proportional data, 266, 270 Prostate cancer, 114–115, 162–163, 206, 523– 524, 705, 713 Prostate-specific antigen (PSA), 114, 188, 233, 395, 523, 718–719 Protein Identification and Peptide Expression Resolver (PIPER), 106 Protein phosphorylation, 302 Proteinuria, 4, 337, 340, 346 Protesome. 510 Prothrombin time (PT), 447, 449, 451–454 Proximal tubular cell(s), 340, 343 P-selectin, 366–369 Psoriasis (PS), 21, 127, 129 Public Library of Science, 23 Public Population Project in Genomics consortium (P3G), 631–632 PubMed, 3, 582, 629 Pulmonary and activation-regulated chemokine (PARC), 479–480 Purified protein derivative (PPD), 171 QT prolongation, 545 Quality assurance (QA), 168, 378, 383, 391–392 Quality control (QC), 49, 55, 73, 84, 103, 124, 156, 198, 203–204, 235, 255, 278, 391– 393, 602 Quantitation, 121–122, 124, 145, 217–218, 260, 274–275, 344, 413 Quantitative Imaging Biomarkers Alliance (QIBA), 84 Quantum dot(s) (qdots), 710–715, 722 Radiation, 6–8, 58–59, 76, 78–79, 81, 524 Radioactivity, 7, 12 Radioimmunoassay, 7, 12, 343–344 Radiological Society of North America (RSNA), 84 RANTES, 130
742
INDEX
Rash(s), 317–318, 331–332, 337 Reactive intermediate(s), 439–440 Reagan-Udall Foundation, 23 Real time PCR (RT-PCR), 171, 190, 547 Receiver operator characteristic (ROC), 107, 111, 220, 222, 394, 660 Receptor signaling, 403, 713 Recombinant DNA, 16–17, 19 Red blood cell(s), 10, 131–132, 337 Reference laboratory(s), 466, 472, 520, 552, 556, 559 Reference range(s), 310, 315, 456, 468 Reference standard(s), 193–194, 196–198, 204, 208–209, 251, 255, 257 Registration, 53, 56, 60, 233, 364, 366, 464, 489–491, 496, 507, 693, 706 Regulatory approval, 38, 179, 229, 376, 395, 505, 509, 539, 565, 694 Regulatory submission(s), 63, 180, 230, 338, 349, 701 Reimbursement, 24, 84, 225, 228, 238, 472, 473 Relative risk, 31, 33, 224, 266, 406, 409, 700 Relative toxicity, 594 Relevance, 20, 57, 146–149, 196, 220, 250, 255, 259, 291, 348, 363, 413, 435, 439, 479, 504, 508, 548, 664, 698 Reliability, 49, 51, 59, 77, 144, 193, 224, 446, 448, 628, 698 Renagel, 304, 311 Renal clearance, 306 Renal cortical tubule(s), 305 Renal failure, 335, 342, 348, 371 Renal function, 74, 79, 338, 340, 342, 346, 351 Renal papillary antigen 1 (RPA-1), 339–340, 349–351 Renal transplantation, 340 Rennin, 371 Repeatability, 216, 234 Repository(s), 18, 197, 210, 582, 586–589, 629–630, 632 Reproducibility, 25, 70–71, 124, 126, 132, 148, 196, 198, 216, 234, 382, 387, 391, 393, 588–589, 698, 716, 722 Research plan, 530 Resolution, 9, 13, 19, 21, 45–46, 57–58, 80, 82, 104, 409, 506, 720 Responder(s), 17, 24, 38, 138, 145–146, 227, 489, 547, 578, 611, 616, 618, 677, 705 Response Evaluation Criteria in Solid Tumors (RECIST), 61–65, 78, 607 Return on investment (ROI), 31, 33, 67, 376, 704
Reverse pharmacology, 362 Rhabdomyolysis, 292, 337 Rheumatoid arthritis (RA), 127–129, 187, 192, 205–207, 309, 399, 400, 403, 406– 407, 415, 486, 544, 595, 611, 614–615, 618, 704 RIA, 329 Risk assessment, 24, 32–33, 38, 368, 406, 423, 441, 545 Risk-benefit, 33, 318, 700. See also Benefit/ Risk and Risk/Cost Benefit Risk/cost benefit, 501. See also Benefit/risk and Risk-benefit Rituximab, 331, 410, 612 RNA, 13, 128, 136, 138–140, 142–143, 145– 146, 148–149, 348, 421, 542, 554, 598, 613, 700 Robustness, 77, 196, 198, 209–210, 257, 259, 362, 386–388, 392, 506, 541, 558, 598, 698 Rosuvastatin, 556. See also Statins Royalty payment(s), 570 RPA-1, 184, 339–340, 349, 350–351 Rural Advancement Foundation International (RAFI), 626 S-ADAPT, 419 Safety assessment, 289–290, 346, 496 Safety margin(s), 371, 445, 541, 543–545 Safety pharmacology, 303–304 Safety profile, 52, 237, 240, 369, 448, 459, 616, 618 Sample size, 25, 51, 149, 192, 220, 260–261, 264, 275, 376, 384–387, 390, 394, 645, 650, 658, 664, 677, 709 Sandwich immunoassay, 123 Scleroderma, 128 Secretome, 113, 116 Self antigen(s), 324–325, 331, 405 Sematech, 296, Sepsis, 192, 355, 522 Sequencing, 19, 21, 104, 107, 157, 163, 172, 276, 488, 610, 629 Sertoli cell(s), 341 Serum amyloid A, 129 Serum sickness, 330–331 Service Oriented Architecture (SOA), 583, 588–590 Severe toxicity (or death) in 10% of the rodents (STD10), 316 Siderocalin, 348 Signal transduction, 301–302, 308, 315, 403, 504
INDEX Significance Analysis for Microarrays (SAM), 137 Simulation, 256, 364, 414, 416, 418, 420, 422– 423, 429, 667, 671, 672 Simvastatin, 556. See also Statins Single nucleotide polymorphism (SNP), 20, 364, 403, 490, 568, 608, 612, 721 Single photon emission computed tomography (SPECT), 8, 38, 43–44, 55–56, 58, 67, 703–704. See also Tomography Sjögren’s syndrome, 129, 407 Small Business Innovative Research (SBIR), 523, 527–530, 532, 534–539 SNP consortium, 20 Solution for Compliance in a Regulated Environment (SCORE), 582–583, 588 Source of error, 251 SParse Linear Programming (SPLP), 143 Species difference(s), 446 Sphygmomanometer, 5–6 Sphygmomètre, see Sphygmomanometer Spiking, 194, 198, 201 Sprycel, 553 Stacking, 122 Standard curve(s), 194–195, 200, 202–203, 209, 275, 456–457 Standard deviation, 199–200, 205, 207, 216– 219, 247, 260–262, 264, 268–270, 650, 660–661, 667 Standard for Exchange of Non-clinical Data (SEND), 579 Standard operating procedure (SOP), 103, 202, 383, 386, 391, 509, 585, 588, 600 Standardized operating protocol (SOP), see Standard operating procedure Standardization, 44, 53, 57, 61, 66–67, 70, 75–76, 82–85, 87–88, 193, 210, 632 Standard-of-care, 226–228, 295, 368, 602 Statin(s), 225, 292, 555–556 atorvastatin, 556 rosuvastatin, 556 simvastatin, 556. Statistical analysis, 18, 83, 162, 204, 248–249, 259, 263, 268, 274, 279, 282, 505 Statistical power, 207, 250, 260, 262, 264, 274– 275, 279 Statistical significance, 37, 225, 250, 256, 271, 505 Stomach, 308, 312, 315, 319, 341, 348, 415 Stratification, 24, 26, 60, 179, 188–189, 215, 226, 238, 241, 276, 363, 376, 400, 489, 505–507, 554, 556
743
Stratified approach(s), 379, 389 Stratified medicine, 590 Stratified model, 577 Stratified patient care, 590 Stratified patient population(s), 578 Stratified therapy, 507 Stratified tools, 379 Study Data Tabulation Model (SDTM), 580, 582–586 Study design, 25, 34, 108, 116, 180, 195, 235, 260, 263, 290, 394, 417, 425, 501 Subcutaneous adipose tissue, 115 Subcutaneous administration, 326, 424, 447 Sulfanilamide, 289–290, 337 Sulfated glycoprotein-2 (SG-2), 341 Supporting vector machine (SVM), 137, 139, 145 Sweden’s Act on Biobanks, 631, 557 SwissProt, 582, 587 Synovial fluid, 401–402 Systemic lupus erythematosus (SLE), 127, 403, 615 Systems biology, 19, 36, 488, 545, 598–600, 606 T cell immunoglobulin and mucin domaincontaining protein 1 (TIMD-1), 345 Tamm-Horsfall protein(s), 346 Tamoxifen, 141 Target product profile (TPP), 33, 35 Target selection, 362, 368 Target validation, 24, 363, 365–366, 369, 372, 416, 445, 694 Tasigna, 553 Testis, 341, 343, 506 Testosterone-repressed message-2 (TRPM-2), 341 TGN1412, 131 T-helper lymphocyte(s), 408, 554 Therapeutic agent(s), 22, 33, 292, 318, 485, 487–488, 535, 565, 596, 613 Therapeutic effect(s), 57, 488, 655, 689 Therapeutic index, 363, 368, 370–371, 442, 543, 547, 594, 596 Therapeutic intervention18, 68, 128, 130, 192, 280, 338, 350, 371, 376, 433, 446–447, 479, 490, 505, 610, 649, 695, 697, 705 Thiazolidinedione, 439, 607 Thrombin, 371, 447, 449, 478 Thrombocytopenia, 330, 336 Thromboplastin, 447, 450–456 Thrombosis, 366, 369, 449 Time-of-flight (TOF), 13, 104
744
INDEX
TIMP-1, 130 Tissue typing, 463 Titer(s), 130, 329–330 Tocilizumab, 129, 612–613, 616 Tolerance, 189, 193, 323–324, 326, 328, 331– 333, 363, 370, 437, 611 Tomography. 7–8, 18, 38, 43, 425, 703 positron emission (PET), 8, 18, 43–46, 48, 58, 60–62, 66–68, 75, 80–81, 86, 425– 426, 703–705, 712 single photon emission computed (SPECT), 8, 38, 43–44, 55–56, 58, 67, 703–704 x-ray computed (CT), 38, 43 Total protein, 23, 104, 181, 305 Toxicant signature, 142 Toxicogenomics, 143, 304, 306, 312, 316, 547 Toxicokinetic, 304, 306, 312, 316 Tracers, 46–47, 67–68, 82, 84 Trachea, 348 Transcriptomics, 440, 544–545, 608 Transformed data, 269, 273 Transgenesis, 328 Transgenic mice, 47, 328, 332 Translational component, 517 Translational research, 16–17, 21, 25, 496–499 Trastuzumab, 239, 489, 507, 556, 596. See also Herceptin Trefoil factor, 23, 181 Triglyceride(s), 506, 555, 700 True negative(s) (TN), 220–222, 224, 658–659 True positive(s) (TP), 219–222, 224, 254, 259, 658 Tuberculosis (TB), 162–163, 170–172, 233 Tubular damage, 342, 344, 346–348, 350 Tubular necrosis, 337–338, 345, 347 Tumor necrosis factor (TNF), 128, 404, 415 Tumorigenesis, 302, 597 Tyrosine kinase(s), 17, 75, 250, 301, 314, 368, 718 Tyrosine kinase inhibitor(s), 75, 553 Tyrosine kinase receptor, 17 UK Human Tissue Act, 631 Ulcerative colitis, 127 Ultracentrifugation, 555 Ultrasonic characteristics, 9 Ultrasonic diagnosis, 8 Ultrasonic equipment, 8 Ultrasonic reflectoscopes 8 Ultrasound, 8–9, 43, 45–47, 62, 78–81, 337, 409, 701, 703, 715
Unicorn, 551–553 United States Patent and Trademark Office (USPTO), 566–568, 570 Univariate, 664–665 Universal Declaration of Human Rights, 632 University of Michigan, 515–516, 521–523, 525, 529 Unlinking, 634 Urinalysis, 306, 308, 310 Uroprontin, 347 Uroscopy, 3–4 Urticaria, 331 US National Bioethics Advisory Commission, 629, 631 Vaccine(s), 17, 162, 168–170, 253, 266, 323, 533 Valid biomarker(s), 54, 296, 378, 381, 383, 490 Vascular cell adhesion molecule (VCAM-1), 128 Vascular endothelial growth factor, 47 Vasculature, 128, 292, 303, 305, 312, 545, 602 Vasculitis, 407–408, 545 Very low-density lipoprotein (VLDL), 555 Viral load, 17, 414–415, 428–429, 696 Vitamin D, 306–307, 310–311, 313–316, 318 Voluntary data submission (VXDS), see Voluntary eXploratory Data Submissions Voluntary eXploratory Data Submissions (VXDS), 180–181, 208, 380 Voluntary genomic data submissions (VGDS), 243 Voluntary submission, 22, 180, 508, Von Willebrand factor, 128 Waived category, 465 Waived test, 465 Weight of evidence, 281, 346, 413, 416 Weighted voting (WV), 137–138, 145–146 WinNonlin, 419 Workplan, 532 World Health Organization (WHO), 61–63, 450 X-ray, 6–9, 28, 43–45, 524, 701, 703–704 X-ray computed tomography (CT), 38, 43. See also Tomography YKP1358, 425–426
Assay validation
Clinical qualification
Biological model
Clinical trial
Image-based Measure
Clinical Endpoint
Biological Function or Disease Process
Chapter 4, Figure 2 Two-step process for imaging biomarker validation/qualification. Assay validation establishes the link between the image-based measure and a specific biological function or disease process by way of a biological, biophysical, or molecular “model.” This includes characterizing the measurement’s intra- and interobserver and test–retest variability (measurement reliability), as well as correlating the measurement to an established (nonimaging or invasive) standard. The goal of the second step, clinical qualification, is to establish empirically, by clinical trials, the relationship between the image-based functional measure and a relevant clinical endpoint or outcome.
m/z
m/z
RT (min) RT (min) Increasing intensity of selected ion
Human Plasma 3
m/z
Human Plasma 2
Human Plasma 1
RT (min)
Chapter 5, Figure 1 Consistent peptide intensity across all samples allows the detection of differentially expressed peptide ions. Shown is a partial view of peptide ion maps (as measured by LC-MS) from the plasma of three individuals. The horizontal axis is chromatographic retention time, the vertical axis is mass-to-charge ratio (m/z), and the peptide ion intensity is denoted by the size and color of the spots. The peptide ion circled shows differential expression across patients and increases in abundance from sample 1 to sample 3.
DonepezilTreated Patients
Healthy Controls
Chapter 5, Figure 4 Global proteomics analysis on Alzheimer disease and normal patients. Multidimensional scaling of proteomics data demonstrates the separation of healthy individuals (green spheres) from Alzheimer patients (red spheres). Caprion has defined a disease axis that is used to quantify relative disease state. The axis is a line that passes through the disease and healthy centroids (yellow spheres). Each patient is then positioned on the axis according to its orthogonal intercept. Donepeziltreated Alzheimer patients (purple spheres), as a group, are shifted on the disease axis from the disease group toward the healthy group.
Multidimensional Scaling (Log Transform OFF) Ovarian Breast Normal
MDS 3
MDS 1
MDS 2
Chapter 5, Figure 5 Sample groups distinguished by differentially expressed peptides. Multidimensional scaling analysis was performed using the intensity values for 4089 differentially expressed peptide ions from 24 samples. Separation along three axes of variance (MDS1 to MDS3) is shown, where each sphere represents a patient sample. The groups are identified by the colors indicated.
True Positive 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
False Positive
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 1 0 0 0 0 0 0 0 0 0
0.2 0.4 0.6
0.2 0.4
0.6
0
0.2 0.4
0.6
Median = 0.94
0
Median = 0.83*
0
Median = 0.78
0.8
0.8
0.8
1
1
1
1.2
1.2
1.2
Area Under the Curve (AUC)
0 –0.2
100
200
300
400
500
0 –0.2 600
50
100
150
200
250
300
350
450 400 350 300 250 200 150 100 50 0 –0.2 400
Chapter 5, Figure 6 Multiple peptide panels are effective at discriminating groups. ROC plots show improved performance when going from one to five to 10 peptides. Displayed are curves (left) representing true-positive (x-axis) and false-positive (y-axis) ratios and the area under the curve (AUC; right column) for single-peptide (top), fivepeptide (middle) and 10-peptide panels (bottom). In the case of multiple-peptide analysis, peptides were chosen randomly from the population of differentially expressed peptides. The optimal AUC for discriminating groups is a value of 1. This was best achieved as the panel size increased.
the median AUC panels of size 5 is under 0.4.
* Starting with random sets of 890 peptides,
(1000 random combinations)
10-peptide panel
(1000 random combinations)
5-peptide panel
(1000 peptides)
1-peptide panel
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Frequency
(A)
(B)
Chapter 8, Figure 3 Quality assessment for protein expression. Images of whole Vv proteome microarray probed with mAb against polyhistidine (A) and hemoagglutinin tags (B). 99.0% of Vv proteins were reactive to antipolyhistidine antibody and 88.2% to antihemoagglutinin antibody. The images show four dilutions of human IgG (yellow box), positive controls (green box), and six negative controls (mock transcription/ translation reaction) (red circles). The remaining spots are Vv proteins.
(A)
(B)
Chapter 8, Figure 4 Naive subject serum and VIG. Images of whole vaccinia virus proteome microarray probed with vaccinia immunoglobulin (Cangene) (A) and naive individual (B). The IgG control spots (shown in yellow) show signal intensities for both samples; both objects do not react with the no-DNA control spots (shown in red). The images show four dilutions of human IgG (yellow box), positive controls (green box), and six negative controls (mock transcription/translation reaction) (red circles). The remaining spots are Vv proteins.
(A)
MVA
DVX
(B) 1°
wk 6
I EGBDHAJFC I EGBDHAJ FC
MV
1. Structural
EV core other
2. Regulation
transcr. replic.
3. Virulence / host defense
4. Unknown
5000
pre
wk 4
2° pre
wk 4
9 70 157 137 15 116 91 141 57 11 65 10 97 9 70 157 137 15 116 91 141 57 11 65 10 97 2 12 13 5 16 14 6 4 3 7 8 1 2 12 13 5 16 14 6 4 3 7 8 1
pre
H3 D8 L1ss A14 A27 A13 A17 A9 A21 L5 H2 A28 B5 A58 A33 F13 A36 A34 F5 A10 I1 L4 F17 A3 D3 D2 G7 A4 A15 WR148 D13 A11 WR149 E10 F8 H5 A28 A18 A5 I8 D11 E10R J3 H4 WR083 A7 D1 D7 J6 I6 E5 A48 J2 E9 E3 K2 A46 WR011 C7 WR208 K1 C11 B9 A52 B19 B2 WR189 E2 A42 A19 G5 B12 WR018 A37 A51 A55 C1 L2 M1 F15 C8 A47 E7 F7 F11 WR148 C20 WR207 C17 B3
L L L L L L L L L L L L E/L E/L L L E/L L E L L L L L L L L L L L L L L L L E/L E E/L E E/L L L E L E L E E E/L E/L E E E E E E E/L E/L E/L E/L E E/L E E/L E E/L E/L E/L L L E E E E E/L E/L E E/L E E/L L E/L E/L E E
50000
Chapter 8, Figure 5 Antibody profiling of human antibody profiles for pre- and postvaccination with MVA (A) and WR (B). For Dryvax responses, primary (n = 13) and secondary (n = 12) infections are shown.
1:2S 1:2S
LQC
3.5 3.0
4.0
UCL = 3.817
3.5 Avg = 2.814
2.5
LQC
4.0
UCL = 3.052
3.0
Avg = 2.635
2.5
2.0
LCL = 1.811
LCL = 2.217
3 6 9 12 15 18 21 24 27 30 33 36 39
3 6 9 12 15 18 21 24 27 30 33 36 39
2.0 12
MQC
8 Avg = 8.28
9
11
UCL = 11.23
10
MQC
1:2S 1:2S
11
10 8
Avg = 7.74
7
6
LCL = 6.67
LCL = 5.33
3 6 9 12 15 18 21 24 27 30 33 36 39
3 6 9 12 15 18 21 24 27 30 33 36 39
5 24 22
1:2S
UCL = 21.46
18 Avg = 16.63
16 14 12
LCL =11.80
22 21 20 19 18 17 16 15 14 13
UCL = 18.16 Avg = 15.80
3 6 9 12 15 18 21 24 27 30 33 36 39
LCL = 13.45
3 6 9 12 15 18 21 24 27 30 33 36 39
10
HQC
20 HQC
UCL = 8.81
9
7
Run Number
Run Number
(A)
(B)
Chapter 10, Figure 4 Use of sample controls for trend analysis on variability. Low, middle, and high sample controls of a biomarker were monitored in Levey Jennings control charts. There was a noticeable shift in trend in all the sample control levels after run 27. The average, upper and lower control limits of analytical runs up to run 27 (B) were compared to the overall parameters (A).
Chapter 15, Figure 2 Mineralization of the aorta in a male rat administered PD0325901 at 3 mg/kg in a dose-range-finding study. Arrows indicate mineral in the aorta wall. Hematoxylin and eosin–stained tissue section.
TV
CTI
PD
PD
PS/AD
C A N D I D A T E
PS/AD DM
DM
L E A D
CTI TV
Experimental
Discovery
Chapter 18, Figure 3
PreDevelopment
Dev Track
Phase 1
2
3
4
Building translational medicine via biomarker research.
Chapter 20, Figure 2 Inflammatory cells in the RA joint. A number of immune cells have infiltrated the joint and local production of inflammatory mediators, including cytokines and antibodies, occurs. The synovial fluid, which functions as a cushion during joint movements, is in a healthy joint acellular. Illustrated to the left is the presentation of an antigen by the dendritic cell to a T-cell; the yellow connector is a MHC class II molecule. Cytokines, released from the immune cells, function as signaling molecules between cells.
15% 10% 5% 0% 0
Low 20
40
60
80
Percent Patients
25%
High
20% 15% 10% 5% 0% 0
Low 20
40
60
80
100 High
25% Percent Patients
100
20% 15% 10% 5% 0% 0
20
40
60
80
Clinical Response to Drug
High
20%
Clinical Response to Drug
Percent Patients
25%
Low 100
Clinical Response to Drug
Scenario 1 Scenario 2 Scenario 3 Clinical Response to Drug
Target Expression Level in Patients
Chapter 34, Figure 1 Principles supporting personalized medicine strategies. Scenarios 1, 2, and 3 are three distributions representing the proportion of patients with different degrees of abnormal pathway expression for three different targets or pathways in the same disease population. The solid black curve shows the correlation between the degree of expression of the abnormal target or pathway and the potential therapeutic benefit of a drug targeting that pathway. The shaded areas represent the proportion of the disease population with increased probability of achieving the best clinical response.
Chapter 34, Figure 3 The preclinical predictive therapeutics protocol allows prioritization of methodologies based on ability to predict optimal combinational designs derived from the networks identified. These tumorgrafts are established directly from the patient’s tumor by implantation into immune-compromised mice, and are characterized by both molecular profiling and histopathology. (A) A heat map following unsupervised hierarchical clustering shows tumorgrafts in the mouse host closely resemble their donating human tumor at the genomic level even when the analysis was restricted to only known drug targets. Patient tumors and their derived mouse tumorgrafts are coded the same color and co-cluster based on their overall genomic similarity. Probes encoding EGFR are highlighted to show the distribution of expression of this target across the various tumors. (B) The mean correlation coefficient in a direct comparison of human tumors with mouse tumor grafts is approximately 0.93, demonstrating excellent overall similarity at the biomarker level.
Chapter 34, Figure 4 Topological network analysis of overexpressed genes from a non-small cell lung carcinoma identified a potential key input node at the level of EGFR. The significance of each node can be inferred after comparison with the global connectivity map and the drug–target knowledge base applied to select corresponding inhibitors. This approach, which does not depend on prerequisite empirical data sets, can be applied for discovery of new disease targets, prioritization and/or validation of existing targets, and/or identification of new indications. A key aspect is successful identification of convergence or divergence hubs or nodes. 10
Information gain
8
6
Constant Linear Quadratic Cubic Quartic
4
2
0 0
1
2 λ
3
4
Chapter 36, Figure 1 Fisher information gain from dynamic measurements relative to independent measurements as a function of λ.
(A)
(B) 0.4
0.4 D1
D2
0.2 S1
D1
0.3 f(x)
f(x)
0.3
D2
S2
0.2
S2
S1
0.1
0.1 α
β
α
β
0.0
0.0 −4
−2
0
2
4
6
8
−4
−2
0
2
x
4
6
8
x
(C)
(D) 0.4 0.6 D1
D2
D1 f(x)
f(x)
0.3 0.2
S1
D2
0.4
S2 0.2
0.1
S1
α
β
β
0.0
0.0 −4
−2
0
2 x
4
6
8
−4
−2
0
α S 2 2
4
6
8
x
Chapter 36, Figure 3 Construction of diagnostic rules for various probability structures. (A) Signal S1 has prior probability 0.5, mean 0, and standard deviation 1; S2 has prior probability 0.5, mean 4, and standard deviation 1. (B) S1 has prior probability 0.5, mean 0, and standard deviation 1; S2 has prior probability 0.5, mean 0.1, and standard deviation 1. (C) S1 has prior probability 0.5, mean 0, and standard deviation 1; S2 has prior probability 0.5, mean 1, and standard deviation 1. (D) S1 has prior probability 0.9, mean 0, and standard deviation 1; S2 has prior probability 0.1, mean 4, and standard deviation 1.
10
1.0
9
0.8 0.6
Qdot diameter (nm)
8
0.4 0.2
7
0.0
6
600 800 100012001400 Wavelength (nm)
5 4 3 2 1 400
600
CdTe/CdSe
Cds CdSe
InP
CdTe CdHgTe/ZnS
InAs PbSe
800 1000 1200 Emission wavelength (nm)
1400
Chapter 38, Figure 1 Emission maxima and sizes of qdots of different composition. Qdots can be synthesized from various types of semiconductor materials characterized by different bulk bandgap energies. The curves represent experimental data from the literature on the dependence of peak emission wavelength on qdot diameter. The range of emission wavelength is 400 to 1350 nm, with size varying from 2 to 9.5 nm. All spectra are typically around 30 to 50 nm (full width at half maximum). Inset: Representative emission spectra for some materials.
Chapter 38, Figure 2 Qdot peptide toolkit. The light blue segment contains cysteines and hydrophobic amino acids assuring binding to the qdot and is common to all peptides. S, solubilization sequence; P, PEG; B, boitin; R, peptide recognition sequence; Q, quencher; D, DOTA; X, any unspecified peptide-encoded function. Qdot solubilization is obtained by a mixture of S and P. Qdots can be targeted with B, R, or other chemical moieties. Qdot fluorescence can be turned on or off by attaching a Q via a cleavable peptide link. In the presence of the appropriate enzyme, the quencher is separated from the qdot, restoring the photoluminescence and reporting on the enzyme activity. For simultaneous PET and fluorescence imaging, qdots can be rendered radioactive by D chelation of radionuclides; for simultaneous MRI and fluorescence imaging, qdots can be rendered radioactive by D chelation of nuclear spin labels.
(a)
Tumors
Injection site (b)
1 μm
Chapter 38, Figure 3 In vivo imaging of qdots. (a) Spectrally resolved image of a mouse bearing C4-2 human prostate tumors following injection of qdots functionalized with antibodies for prostate-specific membrane antigen. (b) Images on the right show qdots emitting green, yellow, or red light. The image on the left illustrates the in vivo imaging of the multicolor qdots at three injection sites.
Conductance
1
2
Conductance
Time
Conductance
Time
Time
Chapter 38, Figure 4 Nanowire detection of a single-target biomolecule using conductive-based measurements. A biomolecule is immobilized selectively by antibody interaction, causing a change in conductance, recorded on the right. Inset: SEM image of a single silicon nanowire (scale bar = 500 nm).
(a)
1
2
1
3
2
3
(b)
Conductance (nS)
2,250
1
2
3
4
5
6
2,100 NW1 1,950 NW2
1,800 1,650 1,500
NW3 0
2,000
2,000
6,000
8,000
Chapter 38, Figure 5 Multiplexed detection of cancer marker proteins. (a) Multiplexed protein detection by three silicon-nanowire devices in an array. Devices 1, 2, and 3 are fabricated from similar nanowires and then differentiated with distinct mouse antibody receptors specific to three different cancer markers. (b) Conductance versus time data recorded for the simultaneous detection of PSA, CEA, and mucin-1 on p-type siliconnanowire array in which NW1, NW2, and NW3 were functionalized with mouse antibodies for PSA, CEA, and mucin-1, respectively. The solutions were delivered to the nanowire array sequentially as follows: (1) 0.9 ng/mL PSA, (2) 1.4 pg/mL PSA, (3) 0.2 ng/mL CEA, (4) 2 pg/mL CEA, (5) 0.5 ng/mL mucin-1, (6) 5 pg/mL mucin-1. Buffer solutions were injected following each protein solution at points indicated by black arrows.
Tumor biomarker proteins
Antibody Bent cantilever
Chapter 38, Figure 6 Nanocantilever array. The biomarker proteins are affinity bound to the cantilevers and cause them to deflect. The deflections can be observed directly with lasers. Alternatively, the shit in resonant frequencies caused by the binding can be detected electronically.