Frontiers in Drug Design & Discovery Bentham Science Publishers Ltd. http://www.bentham.org/fddd
Volume 1, 2005
Contents Editorial: Cutting Edge Tools in Drug Discovery G.W. Caldwell, Atta-ur-Rahman and B.A. Springer
i
Discovering Novel Antibacterial Agents by High Throughput Screening S. Donadio, L. Brandi, S. Serina, M. Sosio and S. Stinchi
3
Small Molecule Drug Targeting of RNA G.J.R. Zaman
17
Drug Discovery and Design via High Throughput Screening of Combination Phage-Display Protein-Peptide Libraries K. Gazarian
29
High Throughput Screening: Will The Past Meet The Future? P. Englebienne
69
Variety of the DNA Hybridization Rate and Its Relationship with High Order Structure of Single Stranded Nucleic Acids M. Tsuruoka
87
An Overview of High Throughput Screeninkg at G Protien Coupled Receptors R.M. Eglen
97
contd….
Development in Hyphenated Spectroscopic Methods in Natural Products Profiling S. urban and F. Separovic
113
The Role of Kinetics in High Through Screening for Drugs A. Gomez-Hens and M.P. Aguilar-Caballos
167
Assessment of Idea pkEXPRESSTM for the Prediction of Caco-2 Permeabilities C,-P. (Matt) Hsu, G.W. Caldwell, J.A. Masucci, Z. Yan and D.M. Ritchie
197
Exploring the Viability of Metabonomic Urinalysis as a Toxicity Screen Within a Pharmaceutical Drug Discovery Division G.C. Leo, g.W. Caldwell, W. Hageman, B. Hastings, B. Starosciak, K. Snyder, J. Scowcroft and A. Krikava
211
231 Partition of Solvents and Co-Solvents of Nanotubes: Proteins and Cyclopyranoses F. Torrens Automating Literature-Based Lead Discovery J.D. Wren
267
Structural Biology in Early Phase Drug Discovery R. Alexander and J. Spurlino
287
Whole Gene Synthesis: A Gene-O-Matic Future L. Stewart and A.B. Burgin
297
Contributors Subject Index
343 347
Editorial
Frontiers in Drug Design & Discovery, 2005, Vol. 1 i
Editorial: “Cutting Edge Tools in Drug Discovery” The discovery of novel, ethical therapeutics for the treatment of unmet medical needs has never before been more rich with opportunities and at the same time more challenging with obstacles. The genomic revolution of the 1990’s unleashed a plethora of information on potential new drug targets and the drug industry has followed on with a frenzied attempt to unscramble the possible correlation of these new “targets” with specific disease syndromes. The massive amount of genetic information now available has overwhelmed all current industry mainstay approaches to target validation, confirmation and disease correlation. New techniques, technologies and approaches are under development everywhere in a harried attempt to catch up quickly. All of this is ongoing while financial markets have shifted their interest from developing technologies to delivery of drug products, stifling some of the technological creativity initiated in the 1990’s. Several decades ago it was common to have a wealth of background information available on a potential drug target thanks to years of basic, usually academic, research. In today’s world of drug discovery, targets are typically poorly understood, yet we forge ahead working to discover drugs as we define the target at the molecular level, and its’ potential correlation to disease, in parallel. As a consequence, the drug industry has become less successful (on a target by target basis) in moving compounds into the clinic than ever before. In addition to these challenges, the drug industry finds itself under more scrutiny than ever to provide efficacious drugs with no adverse side effects. Price controls are ever present on the political landscape adding additional pressure to an already overheated pot. Despite these tremendous hurdles consumers have an insatiable appetite for drugs to treat or cure not only the most obvious disease culprits, such as cardiovascular disease and cancers, but to treat lifestyle “conditions” as well, such as moderate obesity and directly related type II diabetes. The desperate need to achieve success drives innovation of new approaches to discover and develop new medicines. This series is dedicated to those on the front lines of drug discovery who seek to find better ways to bring novel drugs to patients faster and cheaper. Technology has always had a tremendous impact on the lifestyles we lead and have come to expect. This first version of Frontiers in Drug Design and Discovery presents some of the most up to date and exciting new technological approaches to speeding up the drug discovery process. Only a concerted effort to apply these new technologies to the discovery and development of new therapeutic drugs will succeed in modernizing the pharmaceutical industry. Although returns are often not realized from technical achievements for many years it is our constant hope for finding better ways to break down the barriers. Many of the chapters deal with advantages and limitations of screening techniques used in the drug discovery process. S. Donadio and colleagues prepared a chapter highlighting the power and limitations of high throughput screening for discovering novel antibacterial agents. The chapter by G.J.R. Zamam gives the reader a feel for the importance of targeting RNA instead of proteins with small molecule drugs. The targeting of human RNAs with small molecules is a relatively new approach in dealing with diseases. K. Gazarian introduces the reader to the design and high throughput screening of phage-displayed combinatorial libraries of proteins and peptides. This chapter describes very clearly the methodology and its main achievements. P. Englebiene has prepared an excellent chapter dealing with the past and future of high
ii Frontiers in Drug Design & Discovery, 2005, Vol. 1
Editorial
throughput screening in a drug discovery environment. The chapter by R.M Eglen gives an excellent overview of high throughput screening of G protein coupled receptors. The chapter describes measuring signal intensity changes using a microtiter plate format and measuring cellular protein redistribution using imaging-based techniques. A. GomezHens and M.P. Aguilar-Caballos have written a chapter describing the importance of understanding kinetics in the high throughput screening for drug candidates. M. Tsuruoka describes the relationship between steric hindrance and slow hybridization of labeled oligomers mixed with the amplified DNA of genes. The chapter by S. Urban and F. Separovuic gives an overview of hyphenated spectroscopy methods in natural product profiling. C.-P. (Matt) Hsu and colleagues describes the advantages and limitations of in silico prediction of Caco-2 permeabilities using a commercial software package. G.C. Leo and colleagues introduces the reader to the applicability of metabonomic urinalysis as a toxicological screen in drug discovery. F. Torrens presents an interesting chapter on the partitioning of solvents and co-solvents into nanotubes. The chapter by J.D. Wren gives an excellent overview of the use of literature-based sources of knowledge as a tool for discovering novel connections between, for example, diseases, drugs and genes. R. Alexander and J. Spurlino describe in their chapter the role of protein crystallography in drug discovery. The chapter illustrates the contribution of a structure-based drug design approach even at the early phase of the drug discovery process. L. Stewart and A.B. Burgin have prepared an excellent chapter on whole gene synthesis. They illustrate how this powerful technology has the ability to distill a growing body of genetic and structural information into improved nucleic acid sequences that are impossible to obtain by traditional cloning and mutagenesis methods.
Garry W. Caldwell Atta-ur-Rahman Barry A. Springer
Frontiers in Drug Design & Discovery, 2005, 1, 3-16
3
Discovering Novel Antibacterial Agents by High Throughput Screening Stefano Donadio*, Letizia Brandi, Stefania Serina, Margherita Sosio, Sofia Stinchi Vicuron Pharmaceuticals, Via R. Lepetit 34, 21040 Gerenzano, Italy Abstract: The increasing frequency of nosocomial infections due to multiresistant bacterial pathogens represents a serious health concern and is continuously threatening the therapeutic effectiveness of many antibiotics. This medical need calls for the discovery and development of novel antibiotics and for the improvement of existing compounds. Searching for novel chemical classes of antibiotics requires the identification of validated targets for structure-based design or for their transformation into assays suitable for high throughput screening. The power of bacterial genetics and the genomic revolution have provided us with hundreds of targets, which represent components of a bacterial cell essential for viability, well conserved in the desired range of pathogens, and significantly different from mammalian counterparts. In addition, several technological advances in automation and detection systems enable now the transformation of most validated targets into high throughput screening assays. Are these new targets and assays leading to the discovery of promising novel antibiotics? We will review the recent literature for new chemical classes discovered by high throughput screening, describing also the different assays and screening approaches. In addition, we will provide our own considerations on the need to integrate targets and assays with the type and novelty of the chemical diversity, highlighting the power and limitations of high throughput screening for discovering valuable drug leads.
INTRODUCTION In the mid 1980s infectious diseases were considered virtually conquered, thanks to the introduction into clinical practice, during the previous decades, of several antibiotics with different mechanisms of action. However, we are now well aware that pathogenic bacteria have become a major health concern: there are examples of Gram-positive bacteria resistant to virtually every clinically available drug, while the threat posed by some drug-recalcitrant Gram-negatives is also increasing. This resurgence of morbidity and mortality by bacterial pathogens can be ascribed to different causes, including a large fraction of elderly and immunocompromized individuals and the spread of antibiotic resistance due to extensive use (and sometime misuse) of antibiotics. These changes in the population, both human and bacterial, have been accompanied by the lack of novel antibiotic classes introduced into the clinics for many decades, which in turn is *Corresponding author: Tel: +39-0296474243; Fax: +39-0296474365; E-mail:
[email protected] Garry W. Caldwell / Atta-ur-Rahman / Barry A. Springer (Eds.) All rights reserved – © 2005 Bentham Science Publishers.
4
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Donadio et al.
exacerbating the resistance problem through the use of the same antibacterial agents. It is worth mentioning that only two novel classes of antibiotics have been introduced into the clinic over the last thirty years. Two different approaches can be used to identify novel antibiotics: the modification of existing compounds, and the discovery of molecules not affected by the current mechanisms of resistance. Obviously, only the latter approach can eventually lead to new chemical classes and mechanisms of action. These new compounds can be identified either by screening a large collection of diverse chemical entities (defined as a library) in an approach referred to as high throughput screening (HTS); or by the design of putative inhibitors from structural data on a target receptor. There is a blurred distinction between these two approaches, especially since the introduction of in silico screening processes [e.g. 1]. In the anti-infective field, the same pathology may be caused by one or more pathogens, which may exhibit different sensitivities to antibacterial drugs. Ultimately, the effectiveness of an antibiotic will depend on a combination of factors, which include its spectrum, its efficacy and safety in the human host. However, the discovery, development and commercialization of a new antibiotic by the pharmaceutical industry will also depend on extrinsic factors, such as the cost of developing and producing the antibiotic, the number of patients for which the antibiotic will be prescribed, and the cost per prescription that can be charged. These latter factors often constitute a major concern in the industry, and novel antibiotics for life-threatening infections are generally considered niche products of limited market value. TARGETS FOR ANTIBACTERIALS An antibiotic, like any other drug, exerts its action by interacting specifically with a target, usually inhibiting its function, thereby blocking bacterial growth, or some other relevant function. A target is therefore any component of the bacterial cell whose inhibition leads to the desired effect. In the anti-infective field, a target can be validated without having first a compound inhibiting it. Indeed, a valid antibacterial target is any component essential for the viability of a target pathogen, at least under desired circumstances. In addition to targets essential for cell growth, one may also consider targets essential for pathogenesis, for production of a toxin, or for conferring resistance to a particular drug. (Recently, it has been debated whether virulence factors represent an unexploited source of novel targets [e.g. 2]. In our view, virulence factors are too species- or strain-specific to represent good targets, unless the pathology to be cured is caused by a single, well-defined agent). Apart from being essential, a good target must satisfy two additional criteria: it must be conserved, i.e. present and playing an equivalent role in the desired range of pathogens; and it must be specific, i.e. unique to the bacterial world, or at least significantly different in humans. These requirements are extremely important, since an infectious disease is usually caused by significantly different pathogens, which multiply within an animal host. THE IMPACT OF THE GENOMIC REVOLUTION During the last decade, rapid advances in DNA sequencing have made it possible to decipher whole bacterial genomes at unprecedented speeds. This has resulted in over a hundred fully sequenced bacterial genomes, covering the major human pathogens. In the
Discovering Novel Antibacterial Agents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 5
early days, bacterial genomes were sequenced under the assumption that, since the clinically available antibiotics act on a limited number of targets (mostly translation and cell wall formation), additional targets were necessary to enhance discovery programs [3-5]. In fact, there was no lack of antibacterial targets, and many essential genes were well documented in the pre-genomic literature [6-7]. Furthermore, while many of the current antibiotics act on bacterial translation or cell wall biogenesis, these pathways require the concerted participation of tens of different factors, each representing a valid molecular target. There are therefore many more molecular targets in these pathways than those acted upon by the clinically available antibiotics. Finally, each multi-domain protein or RNA is likely to contain more than one binding site for potential antibiotics. It is now widely accepted that the products of essential genes represent a source of valid antibacterial targets. The combination of genome information with systematic mutagenesis of bacterial chromosomes has allowed the generation of inventories of essential bacterial genes. For example, Bacillus subtilis has been shown to contain 213 essential genes [8]. Through in silico approaches, each of these essential genes was queried against genomic databases to identify those genes that were conserved in other pathogens and absent from eucaryotic genomes, Fig. (1), leading to over a hundred essential, conserved and selective genes, whose products represent validated targets for novel antibacterials. This is an impressive number, and also an under estimation. Indeed,
Fig. (1). In silico approach at cataloguing bacterial genes for essentiality, conservation and specificity. See text for details.
6
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Donadio et al.
each of the genes encoding rRNA is individually dispensable [9], but rRNA is the target of useful antibiotics (e.g. aminoglycosides and macrolides). In addition, glycopeptide antibiotics interfere with intermediates in peptidoglycan formation, and not directly with a gene product [10]. The sequenced genomes have also spurred an interest in the identification of previously unknown essential genes, searching among the genes of unknown function [e.g. 11-12]. This approach was probably driven by the assumption that working on a proprietary target might increase the probability of finding a new antibiotic. While it is too early to evaluate whether essential genes of unknown function will actually provide novel antibiotics, this approach was also led by the lack-of-targets hypothesis. In addition, it should be noted that establishing essentiality for a given gene in one strain does not necessarily imply essentiality in another bacterial species. Examples of this sort exist in the literature, even when the function of the gene product is known [e.g. 13]. Thus, the essentiality of any potential target should be established over a range of relevant pathogens. Notwithstanding these cautionary notes, we should not forget that, prior to the genomic era, months of intensive lab work were required to obtain preliminary, and often inconclusive, data about conservation and selectivity of a chosen target. Nowadays, one can establish, within a day’s work of bioinformatic analysis, whether any target is present and well conserved in the pathogen(s) of interest; if it contains domains or sub-domains present in eukaryotic sequences. These rapid analyses enable the a priori definition of the potential spectrum exhibited by an inhibitor of the chosen target, and allow the design of effective counter-screens for detecting possible toxicity issues at early stages. Significant advances in target validation have come through genomic analyses. For example, while the essential role of MurA in peptidoglycan biosynthesis was well documented [14], Gram-positive cocci have been found to contain a second murA [15]. Streptococcus pneumoniae can survive with either one of the murA genes, while phosphomycin inhibits both MurA enzymes [15]. Clearly, a potential MurA inhibitor must possess inhibitory activity against both proteins to be effective against Grampositive cocci. Two-component signal transduction systems (TCSTS), typically consisting of a histidine kinase and a response regulator, play important roles in bacterial cells. Genomic information has allowed the identification of entire sets of TCSTS in different pathogens. For example, Throup et al. [16] identified 13 S. pneumoniae response regulators, many of which resulted important for adaptation and pathogenesis, while one was essential for growth. In eukaryotes and archaea, synthesis of isoprenoids proceeds through the mevalonate pathway. In bacteria, two distinct pathways are present: Gram-positive cocci make isoprenoids through the mevalonate pathway [17-18], while the glyceraldehyde 3phosphate-pyruvate pathway (called the nonmevalonate pathway) operates in most other bacteria [19-21]. Gene disurption experiments have identified five essential genes in S. pneumoniae, while sequence comparisons indicate a certain divergence between the corresponding five proteins and their mammalian counterparts [17-18]. Consistently, the S. aureus hydroxymethyl-glutaryl-CoA reductase is 104 fold less sensitive to fluvastatin than its human homolog [18]. On the other hand, since the enzymes of the
Discovering Novel Antibacterial Agents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 7
nonmevalonate pathway have no orthologs in mammals, they represent attractive targets for new antibacterials. However, inhibitors of these enzymes will necessarily be ineffective against Gram-positive cocci. Although sequence comparison provides a powerful tool for target validation, it has limitations. For example, bacterial and archaeal Ile-tRNA synthetases (IleRS) are the targets for mupirocin, which does not inhibit the eukaryotic enzyme. Sassanfar and coworkers [22] established that the bacterial IleRS differs from eukaryotic and archaeal enzymes in the C-terminal domain sequence, while the catalytic domain, where mupirocin binds, is conserved in the enzymes from the three domains of life. However, in the Thermus thermophylus IleRS, mupirocin makes contact with two critical residues, which are not conserved in the eukaryotic enzymes [23]. In addition, several authors have used genomic information to identify genes that are essential and unique to particular bacterial pathogens (e.g. Chlamydia, Helicobacter), with the aim of developing selective antibiotics that would not affect the remaining bacterial flora. It remains to be established whether antibiotics of this sort, if they ever become available, will encounter the favor of the practicing physicians. FROM TARGETS TO SCREENS With a plethora of validated antibacterial targets, approaches to screening become extremely important for a successful drug discovery program. Three general types of assays can be used: a) cell-based assays, where a putative inhibitor exerts its action on a suitable cell; b) functional cell-free assays, where one measures an enzymatic activity; and c) binding assays, which depend on measuring the formation of a ligand-receptor complex. Enzymatic or binding assays offer the advantage of measuring the effect directly on the target receptor, thus providing also a measure of the inhibitor’s affinity for the target. In addition, cell-free systems are usually very sensitive, since they are devoid of the barrier(s) that might hinder an inhibitor from accessing its target within a bacterial cell. Cell-based assays, on the other hand, provide a response in the presence of these same barriers, a situation that is closer to an antibiotic’s desired effect. However, since growth inhibition of a bacterial cell can result from interference with any target, cell-based assays have been designed that respond only to the inhibition of selected target(s). We describe some examples of recent advances in screening technologies for both cell-free and cell-based assays, emphasizing their application to antibacterial programs. TECHNOLOGICAL ADVANCES In recent years, HTS miniaturization has substantially increased screening capacity. High density microplates, associated with small volume liquid handling robotics and improved detection technologies, allow nowadays HTS operations at low cost, fast turnaround time, reduced space requirements and high quality data [24]. The increase in the number of wells per plate from 96 to 1536 allows a reduction of working reaction volume from 200 to 5 µL, with a 100-fold drop in reagent costs [25]. The 1536-well format seems well established, judging from several validated screening campaigns [2628]. Limits to the increase and development of the well format miniaturization (from 96 to 384, 1536, 3456 or 20,000 wells/plate) are imposed by the advanced technological
8
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Donadio et al.
requirements (for nanovolumes handling and for optimized detection in a manner that is both sensitive and rapid [29]). In addition, the intrinsic characteristics of some assays are such that they cannot be adapted to run in small volumes with good signal to noise ratios and with false positives and false negatives minimized [30]. Some alternative approaches to the well format are reported in the literature [31-32]. For example, the gel permeation technology allows the screening of chemical compounds using an “assay sandwich” made of three sheets: a first sheet of gel matrix containing the diffused target is brought into contact with a second sheet of polystyrene containing discrete compounds. A third gel matrix sheet containing additional reaction components and the detection reactants allows the identification of a single compound that interferes with the target activity [33]. The gel permeation technology uses the gel matrix microenvironment as a reaction vessel where all assay components are mixed, maintaining humidity, compound location and allowing reagent addition in a sheet format. For these reasons the well-less format overcomes many problems of the well format, such as evaporation, plate-edge effects, complex liquid-handling requirements. In theory, any 96-well plate assay can be adapted to the gel-permeation format [33]. The throughput of this format is very high, in the order of 200,000 tests per hour [34], with a 17-fold cost savings with respect to the well technology [35]. Although technological advances in HTS miniaturization are widespread, most antibacterial assays found in the literature use the traditional 96- or 384-well formats. An example of antibacterial miniaturized HTS is represented by a coupled Transcription/Translation (T/T) assay [36]. A cell-free T/T bacterial system driven by the firefly luciferase reporter gene was adapted to run in 1536-well format, validated for the screening of inhibitors of transcription and/or translation and applied to an 183,000compound chemical library. The high throughput of this screening, over 20,000 compounds tested per hour, allowed the analysis of the entire library in triplicate, yielding data of high confidence and quality with a significant savings in reagents and time. Technological advances are important not only for the test format but also for the design of innovative assays (for example, those based on molecular structure-function details of a target) or simply to improve the assay detection with a better signal to noise. An example of a knowledge-based assay using a new technology detection system is described by Bergendahl et al. [37], who used Luminescence Resonance Energy Transfer to screen for inhibitors of protein-protein interactions in bacterial RNA polymerase. The interaction between the β’ and σ70 subunits is essential for polymerase activity. Using a β’ fragment and σ70 labeled with different fluorophores, the authors could measure any perturbation of their interaction at very low protein concentrations (1 to 100 nM) and in a small reaction volume (10 µL). Under their experimental conditions, the assay was robust enough to be compatible with a library consisting of marine sponge extracts. REPORTER GENE ASSAYS In recent years, considerable attention has been dedicated to whole-cell assays for antibacterial research. This renewed interest in whole-cell assays for HTS probably derives from the observed limitation of cell-free assays and from a deeper understanding of the workings of bacterial cells. In fact, post-genomic technologies have shown that
Discovering Novel Antibacterial Agents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 9
antibiotics with different structures and mode of action affect transcription of several bacterial genes, usually at concentrations lower than those necessary for inhibiting bacterial growth [38]. Indeed, transcriptional profiling is a useful and powerful tool to identify promoters that respond to specific antibiotic stresses [39]. It appears that most antibiotics exert a global transcriptional response within a bacterial cell, and that antibiotics acting on the same cellular pathway, elicit similar responses. At the same time, many inhibitors discovered with sensitive enzymatic assays were found to be incapable of effectively inhibiting their cytoplasmic targets within a bacterial cell. Thus, the attractive possibility existed of engineering bacterial cells, so that the expression of an easily measurable reporter gene was under the control of a promoter specifically induced by antibiotics of a defined class and/or mechanism of action. Reporter assays present in principle the advantages of both the enzymatic and the growth inhibition assays, since they are target-specific while measuring a response within a live bacterial cell. In addition, reporter assays usually require low concentrations of test compounds and relatively short incubation times, since transcriptional responses are sensitive and rapid in bacteria. The scientific literature describes many gene fusions between a stress promoter and a reporter gene, which respond to one or more classes of antibiotics. Some examples are listed in Table 1. These include reporter assays for inhibitors of transcription, translation, cell wall or other metabolic reactions. A sensitive and highly selective assay for the detection of antimicrobial compounds affecting DNA replication has been described [40], which takes advantage of the ability of DNA-damaging compounds to induce SOSresponse genes. It consists of an E. coli strain bearing a single-copy fusion between the Table 1.
a
Some Examples of Reporter Assays
Straina
Stress
Pathway
Promoter
Reporter
Reference
Ec
protein misfolding
cell envelope
P3rpoH
lacZ
[44]
Ec
unknown
cell wall
bla
lacZ
[56]
Bs
cell-envelope damage
cell wall
liaIH
lacZ
[43]
Bs
unknown
cell wall
vanH
lacZ
[42]
Sc
unknown
cell wall
sigEp
neo
[45]
Ec
DNA damage
DNA replication
sulA
lacZ
[40]
Bs
unknown
fatty acid synthesis
fabHB
lacZ
[46]
Ec
unknown
secretion
secA
lacZ
[41]
Ec
heat shock
translation
ibp
lacZ
[44]
Ec
cold shock
translation
cspA
lacZ
[44]
Bs
many
many
lux
[47]
Strain abbreviations: Bs, Bacillus subtilis; Ec, Escherichia coli; Sc, Streptomyces coelicolor.
10
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Donadio et al.
lacZ gene and the SOS-inducible sulA promoter. A similar approach, developed for secretion inhibitors, uses a secA-lacZ fusion [41]. SecA regulates the activity of its own promoter in response to changes in secretion levels. Thus, the activity of β-galactosidase in the reporter assay will respond to inhibitors of the secretion process. In B. subtilis, the vanRS system appears to respond to many cell-wall inhibitors [42]. Recently, it was found that the liaIH genes are strongly induced by vancomycin and bacitracin [43]. Both systems, using appropriate lacZ fusions, represent potential reporter assays for detecting new cell-wall inhibitors in B. subtilis. Other examples include: the heat shock (ibpA and ibpB) or cold shock (cspA) genes, which have been investigated for their potential to be used as reporter system for the detection of H- and C-type protein biosynthesis inhibitors, respectively [44]; the extracytoplasmatic sigma factor σE as an indicator of compounds affecting the outer membrane or interfering with peptidoglycan biosynthesis [44-45]; and the fabX system, which responds to fatty acid biosynthesis inhibitors in B. subtilis [46]. A list of antibiotic-responsive promoters has been identified through whole-cell transcriptional profiling in B. subtlis [47]. A common feature to most reporter assays is that the cellular response is, within limits, independent of growth inhibition, and often detectable at sub-MIC levels. In the presence of very high antibiotic concentrations, the signal may be lost, resulting in a typical bell-shaped curve of signal versus antibiotic concentration. Another feature of reporter systems is that they usually respond to a whole pathway, so these assays can potentially detect inhibitors of many targets acting on the same pathway. It should be noted that reporter assays measure the transcriptional response of an entire bacterial cell to a particular stress. Thus, they provide an indirect measure of the effect on the desired cellular pathway. Because a bacterial cell is actually a network of different signaling systems, extensive validation of a reporter assay must be performed to establish whether it responds only to the desired inhibitors, or if it sees other types of stresses. In addition, stress responses can be strain-specific, thus the same gene fusions may not work to the same purpose in different bacterial species. FROM ASSAYS TO LEADS It is reasonable to expect a certain lag between the introduction into HTS programs of the many targets available from genomics and of the new assay technologies, and the discovery of new drug candidates through their application on a sufficiently large chemical diversity. It should be noted that many genome sequences were performed by pharmaceutical companies well before they became available in the public databases, and it is safe to assume that most targets have been available to big pharma for almost a decade. Thus, the existing literature should reasonably well reflect the lead compounds identified with the technologies implemented up to the late 1990s. Selected examples of lead compounds discovered through HTS in the last few years are reported in Table 2. The reader is referred to a previous review [48] for antibacterial compounds described up to the mid 1990s. It is worth emphasizing that Table 2 does not report those antibiotics that resulted from the modifications of existing compounds (whether in clinical use of not), from a re-evaluation of old antibiotics, or from rational drug design. In compiling Table 2, we have considered the target, the type of assay, the nature of the library screened, the affinity of the inhibitor for the target, its antibacterial activity and selectivity. Most of the lead compounds identified act on well-established bacterial targets, such as transcription, protein synthesis and modification, cell division,
Discovering Novel Antibacterial Agents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 11
DNA replication, cell wall formation or fatty acid biosynthesis. A common characteristic to most of the lead compounds described in Table 2 is that they have been identified using enzymatic assays. In a few cases, cell-based assays were employed. (To our knowledge, there are no reports of novel antibacterial agents identified through binding assays). The use of enzymatic assays as primary screening tools is probably the reason why some of the compounds are powerful inhibitors of their targets, but show limited antibacterial activity (Table 2). Chemical libraries, whether totally random or biased towards particular pharmacophores, have been the major source of chemical diversity employed in the screening programs (Table 2). It is likely that the few examples of lead compounds derived from natural sources, reflect a diminished use of microbial and plant extracts by the pharmaceutical industry, in comparison with chemical or combinatorial libraries. It should also be noted the use of biased chemical libraries, which are generated from a knowledge of the enzymatic class and/or the structure of the target enzyme. With peptide deformylase, this approach has led to the identification of compounds with the same core structure in three out of four inhibitors (Table 2), which also show antibacterial activity and good selectivity [49-51]. This core structure is actually related to that of actinonin, a known microbial metabolite. It remains to be determined whether biased libraries will be successful for other targets as well. Broad-spectrum compounds have been rarely identified: the saccharomicins, heptadecaglycoside antibiotics produced by Saccharothrix espanaensis, exhibit good activity against Gram-positive organisms and good to moderate activity against Gramnegative organisms [52]. In some cases, the inhibitory activity on a target enzyme did not translate into a meaningful antibacterial potency. It should be noted that an HTS program is expected to deliver a lead structure, which should be optimized through iterative steps of medicinal chemistry. In the case of one FabI inhibitor [53], chemical programs afforded a compound with good activity against Staphylococcus aureus and other Gram-positive pathogens, starting from a lead structure devoid of antimicrobial activity. This event, however, might represent an exception, and in many cases no further progress on the lead compounds has been reported. This suggests that many lead modification programs were either unsuccessful, or not undertaken at all. Thus, it should not be assumed that medicinal chemistry programs will eventually confer desired antibiotic property to any lead structure. HTS AND CHEMICAL DIVERSITY The recent literature shows a larger number of targets and assays for antibacterial discovery than of new chemical entities identified through HTS. In addition, there appears to be a recent bias towards cell-based assays, while most of the new chemical classes were discovered through the use of cell-free assays. While some lead compounds identified by HTS may have not yet surfaced in the literature, the overall impression is that the ability to identify targets, design assays and implement them in HTS has dramatically outpaced the productivity of HTS programs. The survey of new antibacterial leads, an analysis of overviews and commentaries recently appeared, and our personal feeling as scientists directly involved in the design of assays for discovering novel antibiotics by HTS, seem to converge onto a common
12
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Table 2.
Some Examples of Antibacterial Leads
Compound
Assayd Library (size)e
Target
IC50 (nM)
MIC (µg/ml)f
Reference
cpd 20
AcpS
enz
N.I.
15,000
12 (Sp)
[58]
cpd 1
DNA gyrase
rep
RC
372
64 (Sa); 64 (Ec*)
[59]
cpd 3
DNA ligase
enz
RC
0.14
1 (Sa)
[60]
SB418011
FabH
enz
N.I
16
N.R.
[61]
cpd 4a
FabI
enz
RC (305k)
47
0.008 (Sa); 8 (Ec)
[53, 62]
DHDPE
FabI
g.in.
RC
2,500
1 (Ec)
[63]
viriditoxin
FtsZ
enz
MPE (>100k)
12
4 (Sa); 25 (Ec*)
[64]
SB236049
β-lactamases
enz
E
≤2,000
N.R.
[65]
BB-78485
LpxC
hypers
BC
160
1 (Ec); 32 (Sa)
[66]
saccharomicin
membrane
N.R.
ME
0.1 (Sa); 16 (Ec)
[52]
α-pyrone I
b
membrane
rep
ME (46k)
1 (Sa); 128 (Ec*)
[67]
CHIR29498
membrane
N.R.
RC
10 (Sa); 20 (Ec)
[68]
RWJ cpds
MurA
enz
RC
4 (Sa); 8 (Ec)
[69]
INF
NorA
hypers
RC (10k)
g
[70]
VRC3375
PDF
enz
BC
4
1 (Sa); 0.25 (Ec*)
[51]
BB-3497
PDF
enz
BC
7
4 (Sa); 8 (Ec)
[49]
cpd 1
PDF
enz
RC
<5
128 (Hi)
[71]
41e
PDF
enz
BC
15
N.R.
[50]
cpd 1-7a
Phe-RS
enz
RC
5
0.2 (Ec); 0.4 (Sa)
[72]
A-692345
a
c
200
ribosome
enz
BC (300k)
14,000
16 (Sa)
[73]
CBR703
RNA polymerase
enz
RC (220k)
10,000
4.5 (Ec*)
[74]
GE23077
RNA polymerase
enz
ME
20
32 (Ec)
[75]
RpoE
N.R.
N.R.
N.R.
2.5 (Sa)
[76]
SPase I
enz
E (50k)
110
8 (Sp); 4 (Ec)
[77]
Sch419560
b
lipoglycopeptide
a
Donadio et al.
cpd 30
g.in.
BC
0.66 (Ec*); 4 (Sa)
[78]
kalimantacin
N.R.
E
25 (Ec); 0.2 (Sa)
[79]
cpd 1
g.in.
RC
0.4 (ONLY Hp)
[80]
derived from optimization of lead compound same chemical class identified with FAB screen d abbreviations: enz, enzymatic assay; g.in., growth inhibition; N.R., not reported; hypers, growth inhibition of hypersensitive strain; rep, reporter assay e abbreviations: BC, biased chemical library; E, extracts, microbial (M) or plant (P), when specified; RC, random chemical library. Sizes are in thousands. f the lowest MIC against one Gram-positive and/or Gram-negative are reported, where available. abbreviation: (Ec*), Escherichia coli hypersensitive strain, (Ec), Escherichia coli, (Sa), Staphylococcus aureus, (Hp), Helicobacter pylori, (Sp) Streptococcus pneumoniae, (Hi), Haemophilus influenzae. g the compound does not have an MIC, but exhibits synergism with ciprofloxacin. b c
Discovering Novel Antibacterial Agents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 13
conclusion. Reading with hindsight the literature of the last decade, it appears that the limiting factor in discovering new lead compounds was not the lack of adequate targets and of effective screening technologies. It seems that the discovery efforts in the pharmaceutical industry were mostly driven by the assumption that what was important was how (in terms of targets and assay technologies) to screen, and little attention was paid to what (in terms of chemical diversity) to screen. This was probably due to the impression that an infinite variety of chemical compounds was available, consisting in part of the impressive number of chemical entities synthesized by medicinal chemists during the previous decades and in part supplied by combinatorial chemistry. Since these libraries could be screened quickly with robotics, the driving force has thus far been the identification of any lead compound in a relatively short time, under the assumption that these leads could be transformed into drug candidates by chemical intervention. In fact, the chemical diversity represented by the existing synthetic libraries represents a very small fraction of the theoretical chemical space [54], thus the existing screening technologies can only search a small and often biased chemical diversity. As any scientist who has worked with frustration on attempts to transform a poor lead compound into a potential drug candidate can testify, lead optimization can only go that far. In this framework, two possible extreme scenarios can be envisioned. One is based on the assumption that the existing chemical diversity cannot be expanded at reasonable costs. Since the probability of discovering a valid lead compound represents a rare event, the chances of success can be increased by expanding the number of data-points. This requires the introduction of as many as possible HTS programs, very efficient HTS operations and the use of strict selection criteria. Advancement and optimization of novel screening technologies will be essential for coping with automation demands and for reducing reagent costs through volume miniaturization. Thus, this scenario will rely mostly on technological advances and in project management to increase output per cost unit. This approach might be successful in the short-term. However, it will inevitably lead to an exhaustion of the existing chemical diversity, and still relies on the how-toscreen paradigm. In our opinion, the successful search for novel antibacterial agents by HTS can only derive from a substantial renovation of the chemical diversity available for screening. This renovation cannot keep ignoring natural products just because they do not allow HTS programs as rapid as chemical libraries. However, we must also be aware of the fact that both plant and microbial products have been intensively screened for antibacterial activities. Thus, the chemical diversity produced by the previously screened plants and microbes cannot be expected to be infinite either. New valid natural sources are needed to increase our chances for success, as they did in the part [55-56]. Fortunately, we are now aware that just a fraction of the existing biodiversity is known and has been sampled. If these unexploited sources can be identified and introduced rapidly into screening programs, then we might be lifting the lid on a Pandora’s box of hitherto unknown chemical compounds. ABBREVIATIONS HTS RS T/T TCSTS
= = = =
High Throughput Screening tRNA synthetase Transcription/Translation Two Component Signal Transduction System
14
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Donadio et al.
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8]
[9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30]
Abagyan, R.; Totrov, M. Curr. Opin. Chem. Biol., 2001, 5, 375-382. Alksne, L.E.; Projan, S.J. Curr. Opin. Biotechnol., 2000, 11, 625-636. Allsop, A.E. Curr. Opin. Microbiol., 1998, 1, 530-534. Schmid, M.B. Curr. Opin. Chem. Biol., 1998, 2, 529-534. Moir, D.T.; Shaw, K.J.; Hare, R.S.; Vovis, G.F. Antimicrob. Agents Chemother., 1999, 43, 439-446. Sonenshein, A.L. Bacillus subtilis and Other Gram-Positive Bacteria, ASM Press, Washington DC, 1993. Neidhardt, F.C. E. coli and Salmonella, 2nd Edition, ASM Press, Washington DC, 1997. Kobayashi, K.; Ehrlich, S.D.; Albertini, A.; Amati, G.; Andersen, K.K.; Arnaud, M.; Asai, K.; Ashikaga, S.; Aymerich, S.; Bessieres, P.; Boland, F.; Brignell, S.C.; Bron, S.; Bunai, K.; Chapuis, J.; Christiansen, L.C.; Danchin, A.; Debarbouille, M.; Dervyn, E.; Deuerling, E.; Devine, K.; Devine, S.K.; Dreesen, O.; Errington, J.; Fillinger, S.; Foster, S.J.; Fujita, Y.; Galizzi, A.; Gardan, R.; Eschevins, C.; Fukushima, T.; Haga, K.; Harwood, C.R.; Hecker, M.; Hosoya, D.; Hullo, M.F.; Kakeshita, H.; Karamata, D.; Kasahara, Y.; Kawamura, F.; Koga, K.; Koski, P.; Kuwana, R.; Imamura, D.; Ishimaru, M.; Ishikawa, S.; Ishio, I.; Le Coq, D.; Masson, A.; Mauel, C.; Meima, R.; Mellado, R.P.; Moir, A.; Moriya, S.; Nagakawa, E.; Nanamiya, H.; Nakai, S.; Nygaard, P.; Ogura, M.; Ohanan, T.; O'Reilly, M.; O'Rourke, M.; Pragai, Z.; Pooley, H.M.; Rapoport, G.; Rawlins, J.P.; Rivas, L.A.; Rivolta, C.; Sadaie, A.; Sadaie, Y.; Sarvas, M.; Sato, T.; Saxild, H.H.; Scanlan, E.; Schumann, W.; Seegers, J.F.; Sekiguchi, J.; Sekowska, A.; Seror, S.J.; Simon, M.; Stragier, P.; Studer, R.; Takamatsu, H.; Tanaka, T.; Takeuchi, M.; Thomaides, H.B.; Vagner, V.; van Dijl, J.M.; Watabe, K.; Wipat, A.; Yamamoto, H.; Yamamoto, M.; Yamamoto, Y.; Yamane, K.; Yata, K.; Yoshida, K.; Yoshikawa, H.; Zuber, U.; Ogasawara, N. Proc. Natl. Acad. Sci. USA, 2003, 100, 4678-4683. Asai, T.; Condon, C.; Voulgaris, J.; Zaporojets, D.; Shen, B.; Al-Omar, M.; Squires, C.; Squires, C.L. J. Bacteriol, 1999, 181, 3803-3809. Walsh, C.; Antibiotics: actions, origins, resistance. ASM Press, Washington, DC, 2003. Arigoni, F.; Talabot, F.; Peitsch, M.; Edgerton, M.D.; Meldrum, E,; Allet, E.; Fish, R.; Jamotte, T.; Curchod, M.L.; Loferer, H. Nat. Biotechnol., 1998, 16, 851-856. Ji, Y.; Zhang, B.; Van Horn, S.F.; Warren, P.; Woodnutt, G.; Burnham, M.K.R.; Rosenberg, M. Science, 2001, 293, 2266-2269. Washburn, R.S.; Marra, A.; Bryant, A.P.; Rosenberg, M.; Gentry, D.R. Antimicrob. Agents Chemother., 2001, 45, 1099-1103. Brown, E.D.; Vivas, E.I.; Walsh, C.T.; Kolter, R. J. Bacteriol., 1995, 177, 4194-4197. Du, W.; Brown, J.R.; Sylvester, D.R.; Huang, J.; Chalker, A.F.; So, C.Y.; Holmes, D.J.; Payne, D.J.; Wallis, N.G. J. Bacteriol., 2000, 182, 4146-4152. Throup, J.P.; Koretke, K.K.; Bryant, A.P.; Ingraham, K.A.; Chalker, A.F.; Ge, Y.; Marra, A.; Wallis, N.G.; Brown, J.R.; Holmes, D.J.; Rosenberg, M.; Burnham, M.K. Mol. Microbiol., 2000, 35, 566-576. Wilding, E.I.; Brown, J.R.; Bryant, A.P.; Chalker, A.F.; Holmes, D.J.; Ingraham, K.A.; Iordanescu, S.; So, C.Y.; Rosenberg, M.; Gwynn, M.N. J. Bacteriol., 2000, 182, 4319-27. Wilding, E.I.; Kim, D.Y.; Bryant, A.P.; Gwynn, M.N.; Lunsford, R.D.; McDevitt, D.; Myers, J.E., Jr.; Rosenberg, M.; Sylvester, D.; Stauffacher, C.V.; Rodwell, V.W. J. Bacteriol., 2000, 182, 5147-5152. Boucher, Y.; Doolittle, W.F. Mol. Microbiol., 2000, 37, 703-716. Eisenreich, W.; Bacher, A.; Arigoni, D.; Rohdich, F. Cell. Mol. Life Sci., 2004, 61, 1401-1426. Rohdich, F.; Kis, K.; Bacher, A.; Eisenreich, W. Curr. Opin. Chem. Biol., 2001, 5, 535-540. Sassanfar, M.; Kranz, J.E.; Gallant, P.; Schimmel, P.; Shiba, K.; Biochemistry, 1996, 35, 9995-10003. Nakama, T.; Nureki, O.; Yokoyama, S. J. Biol. Chem., 2001, 276, 47387-47393. Battersby, B.J.; Trau, M. Trends Biotechnol., 2002, 20, 167-173. Kell, D. Trends Biotechnol., 1999, 17, 89-91. Maffia, A.M. 3rd; Kariv, I.I.; Oldenburg, K.R. J. Biomol. Screen, 1999, 4, 137-142. Li, Z.; Mehdi, S.; Patel, I.; Kawooya, J.; Judkins, M.; Zhang, W.; Diener, K.; Lozada, A.; Dunnington, D. J. Biomol. Screen, 2000, 5, 31-38. Berg, M.; Undisz, K.; Thiericke, R.; Moore, T.; Posten, C. J. Biomol. Screen, 2000, 5, 71-76. Silverman, L.; Campbell, R.; Broach, J.R. Curr. Opin. Chem. Biol., 1998, 2, 397-403. Mere, L.; Bennett, T.; Coassin, P.; England, P.; Hamman, B.; Rink, T.; Zimmerman, S.; Negulescu, P. Drug Discov. Today, 1999, 4, 363-369.
Discovering Novel Antibacterial Agents [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41]
[42] [43] [44] [45] [46] [47] [48] [49]
[50] [51] [52] [53]
[54] [55] [56] [57] [58] [59]
[60] [61]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 15
Walt, D.R. Science, 2000, 287, 451-452. Lenz, G.R.; Nash, H.M.; Jindal, S. Drug Discov. Today, 2000, 5, 145-156. Burns, D.J.; Kofron, J.L.; Warrior, U.; Beutel, B.A. Drug Discov. Today, 2001, 6, S40-S47. Groebe, D.R.; Maus, M.L.; Pederson, T.; Clampit, J.; Djuric, S.; Trevillyan, J.; Lin, C.W.; Burns, D.J.; Warrior, U. J. Biomol. Screen, 2003, 8, 668-675. Anderson, S.N.; Cool, B.L.; Kifle, L.; Chiou, W.; Egan, D.A.; Barrett, L.W.; Richardson, P.L.; Frevert, E.U.; Warrior, U.; Kofron, J.L.; Burns, D.J. J. Biomol. Screen, 2004, 9, 112-121. Kariv, I.; Cao, H.; Marvil, P.D.; Bobkova, E.V.; Bukhtiyarov, Y.E.; Yan, Y.P.; Patel, U.; Coudurier, L.; Chung, T.D.; Oldenburg, K.R. J. Biomol. Screen, 2001, 6, 233-243. Bergendahl, V.; Heyduk, T.; Burgess, R.R. Appl. Environ. Microbiol., 2003, 69, 1492-1498. Goh, E.B.; Yim, G.; Tsui, W.; McClure, J.; Surette, M.G.; Davies, J. Proc. Natl. Acad. Sci. USA, 2002, 99, 17025-17030. Fischer, H.P.; Brunner, N.A.; Wieland, B.; Paquette, J.; Macko, L.; Ziegelbauer, K.; Freiberg, C. Genome Res., 2004, 14, 90-98. Shapiro, E.; Baneyx, F. J. Bacteriol., 2002, 46, 2490-2497. Alksne, L.E.; Burgio, P.; Hu, W.; Feld, B.; Singh, M.P.; Tuckman, M.; Petersen, P.J.; Labthavikul, P.; McGlynn, M.; Barbieri, L.; McDonald, L.; Bradford, P.; Dushin, R.G.; Rothstein, D.; Projan, S.J. Antimicrob. Agents Chemother., 2000, 44, 1418-1427. Ulijasz, A.T.; Grenader, A.; Weisblum, B. J. Bacteriol., 1996, 178, 6305-6309. Mascher, T.; Zimmer, S.L.; Smith, T.A.; Helmann, J.D. Antimicrob. Agents Chemother., 2004, 48, 2888-2896 Bianchi, A.A.; Baneyx, F. Appl. Environ. Microbiol., 1999, 65, 5023-5027. Hong, H.J.; Hutchings, M.I., Neu, J.M., Wright, G.D., Paget, M.S., Buttner, M.J. Mol. Microbiol., 2004, 52, 1107-1121. Schujman, G.E.; Paoletti, L.; Grossman, A.D.; de Mendoza, D. Dev. Cell, 2003, 4, 663-672. Hutter, B.; Fischer, C.; Jacobi, A.; Schaab, C.; Loferer, H. Antimicrob. Agents Chemother., 2004, 48, 2588-2594. Chu, D.T.; Plattner, J.J.; Katz, L. J. Med. Chem., 1996, 39, 3853-3874. Clements, J.M.; Beckett, R.P.; Brown, A.; Catlin, G.; Lobell, M.; Palan, S.; Thomas, W.; Whittaker, M.; Wood, S.; Salama, S.; Baker, P.J.; Rodgers, H.F.; Barynin, V.; Rice, D.W.; Hunter, M.G. Antimicrob. Agents Chemother., 2001, 45, 563-570. Wei, J.; Yi, T.; Huntington, K.M.; Chaudhury, C.; Pei, D. J. Comb. Chem., 2000, 2, 650-657. Chen, D.; Hackbarth, C.; Ni, Z.J.; Wu, C.; Wang, W.; Jain, R.; He, Y.; Bracken, K.; Weidmann, B.; Patel, D.V.; Trias, J.; White, R.J.; Yuan, Z. Antimicrob. Agents Chemother., 2004, 48, 250-261. Singh, M.P.; Petersen, P.J.; Weiss, W.J.; Kong, F.; Greenstein, M. Antimicrob. Agents Chemother., 2000, 44, 2154-2159. Payne, D.J.; Miller, W.H.; Berry, V.; Brosky, J.; Burgess, W.J.; Chen, E.; DeWolf, W.E.; Fosberry, A.P. Jr.; Greenwood, R.; Head, M.S.; Heerding, D.A.; Janson, C.A.; Jaworski, D.D.; Keller, P.M.; Manley, P.J.; Moore, T.D.; Newlander, K.A.; Pearson, S.; Polizzi, B.J.; Qiu, X.; Rittenhouse, S.F.; Slater-Radosti, C.; Salyers, K.L.; Seefeld, M.A.; Smyth, M.G.; Takata, D.T.; Uzinskas, I.N.; Vaidya, K.; Wallis, N.G.; Winram, S.B.; Yuan, C.C.K.; Huffman, W.F. Antimicrob. Agents Chemother., 2002, 46, 3118-3124. Bohacek, R.S.; McMartin, C.; Guida, W.C. Med. Res. Rev., 1996, 16, 3-50. Parenti, F.; Coronelli, C. Annu. Rev. Microbiol., 1979, 33, 389-411. Wagman, G.H.; Weinstein, M.J. Annu. Rev. Microbiol., 1980, 34, 537-557. Sun, D.; Cohen, S.; Mani, C.; Murphy, C.; Rothstein, D.M. J. Antibiotics, 2002, 55, 279-287. Gilbert, A.M.; Kirisits, M.; Toy, P.; Nunn, D.S.; Failli, A.; Dushin, E.G.; Novikova, E.; Petersen, P.J.; McCarthy, D.J.; McFadyen, I.; Fritz, C.C. Bioorg. Med. Chem. Lett., 2004, 14, 37-41. Akihiko, T.; Oyamada, Y.; Ofuji, K.; Fujimoto, M.; Iwai, N.; Hiyama, Y.; Suzuki, K.; Ito, H.; Terauchi, H.; Kawasaki, M.; Nagai, K.; Wachi, M.; Yamagishi, J. J. Med. Chem., 2004, 47, 36933696. Oesterhelt, H.B.; Knezevic, I.; Bartel, S.; Lampe, T.; Warnecke-Eberz, U.; Ziegelbauer, K.; Haebich, D.; Labischinski, H. J. Biol. Chem., 2003, 278, 39435-39442. Khandekar, S.S.; Gentry, D.R.; Van Aller, G.S.; Warren, P.; Xiang, H.; Silverman, C.; Doyle, M.L.; Chambers, P.A.; Konstantinidis, A.K.; Brandt, M.; Daines, R.A.; Lonsdale, J.T. J. Biol. Chem., 2001, 276, 30024-30030.
16 [62]
[63] [64] [65]
[66] [67] [68]
[69] [70] [71]
[72]
[73]
[74] [75] [76] [77]
[78] [79] [80]
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Donadio et al.
Miller, W.H.; Seefeld, M.A.; Newlander, K.A.; Uzinskas, I.N.; Burgess, W.J.; Heerding, D.A.; Yuan, C.C.; Head, M.S.; Payne, D.J.; Rittenhouse, S.F.; Moore, T.D.; Pearson, S.C.; Berry, V.; DeWolf, W.E. Jr.; Keller, P.M.; Polizzi, B.J.; Qiu, X.; Janson, C.A.; Huffman, W.F. J. Med. Chem., 2002, 45, 3246-3256. Ling, L.L.; Xian, J.; Ali, S.; Geng, B.; Fan, J.; Mills, M.D.; Arvanites, A.C.; Orgueira, H.; Ashwell, M.A.; Carmel, G.; Xiang, Y.; Moir, D.T. Antimicrob. Agents Chemother., 2004, 48, 1541-1547. Wang, J.; Galgoci, A.; Kodali, S.; Herath, K.B.; Jayasuriya, H.; Dorso, K.; Vicente, F.; Gonzalez, A.; Cully, D.; Bramhill, D.; Singh, S. J. Biol. Chem., 2003, 278, 44424-44428. Payne, D.J.; Hueso-Rodriguez, J.A.; Boyd, H.; Concha, N.O.; Janson, C.A.; Gilpin, M.; Bateson, J.H.; Cheever, C.; Niconovich, N.L.; Pearson, S.; Rittenhouse, S.; Tew, D.; Diez, E.; Perez, P.; de la Fluente, J.; Rees, M.; Rivera-Sagredo, A. Antimicrob. Agents Chemother., 2002, 46, 1880-1886. Clements, J.M.; Coignard, F.; Johnson, I.; Chandler, S.; Palan, S.; Waller, A.; Wijkmans, J.; Hunter, M.G. Antimicrob. Agents Chemother., 2002, 46, 1793-1799. Singh, M.P.; Kong, F.; Janso, J.E.; Arias, D.A.; Suarez, P.A.; Bernan, V.S.; Petersen, P.J.; Weiss, W.J.; Carter, G.; Greenstein, M. J. Antibiotics, 2003, 56, 1033-1044. Goodson, B.; Ehrhardt, A.; Ng, S.; Nuss, J.; Johnson, K.; Giedlin, M.; Yamamoto, R.; Moos, W.H.; Krebber, A.; Ladner, M.; Giacona, M.B.; Vitt, C.; Winter, J. Antimicrob. Agents Chemother., 1999, 43, 1429-1434. Baum, E.Z.; Montenegro, D.A.; Licata, L.; Turchi, I.; Webb, G.C.; Foelno, B.D.; Bush, K. Antimicrob. Agents Chemother., 2001, 45, 3182-3188. Markham, P.N.; Westhaus, E.; Klyachko, K.; Johnson, M.E.; Neyfakh, A. Antimicrob. Agents Chemother., 1999, 43, 2404-2408. Molteni, V.; He, X.; Nabakka, J.; Yang, K.; Kreusch, A.; Gordon, P.; Bursulaya, B.; Warner, I.; Shin, T.; Biorac, T.; Ryder, N.S.; Goldberg, R.; Doughty, J.; He, Y. Bioorg. Med. Chem. Lett., 2004, 14, 1477-1481. Beyer, D.; Kroll, H.P.; Endermann, R.; Schiffer, G.; Siegel, S.; Bauser, M.; Pohlmann, J.; Brands, M.; Ziegelbauer, K.; Haebich, D.; Eymann, C.; Broetz-Oesterhelt, H. Antimicrob. Agents Chemother., 2004, 48, 525-532. Dandliker, P.J.; Pratt, S.D.; Nilius, A.M.; Black-Schafer, C.; Ruan, X.; Towne, D.L.; Clark, R.F.; Englund, E.E.; Wagner, R.; Weitzberg, M.; Chovan, L.E.; Hickman, R.K.; Daly, M.M.; Kakavas, S.; Zhong, P.; Cao, Z.; David, C.A.; Xuei, X.; Lerner, C.G.; Soni, N.B.; Bui, M.; Shen, L.L.; Cai, Y.; Merta, P.J.; Saiki, A.Y.C.; Beutel, B.A. Antimicrob. Agents Chemother., 2003, 47, 3831-3839. Artsimovitch, I.; Chu, C.; Lynch, A.S.; Landick, R. Science, 2003, 302, 650-654. Ciciliato, I.; Corti, E.; Sarubbi, E.; Stefanelli, S.; Gastaldo, L.; Montanini, N.; Kurz, M.; Losi, D.; Marinelli, F.; Selva, E. J. Antibiotics, 2004, 57, 210-217. Chu, M.; Mierzwa, R.; Xu, L.; He, L.; Terracciano, J.; Patel, M.; Zhao, W.; Black, T.A.; Chan, T.-M. J. Antibiotics, 2002, 55, 215-218. Kulanthaivel, P.; Kreuzman, A.J.; Strege, M.A.; Belvo, M.D.; Smitka, T.A.; Clemens, M.; Swartling, J.R.; Minton, K.L.; Zheng, F.; Angleton, E.L.; Mullen, D.; Jungheim, L.N.; Klimkowski, V.J.; Nicas, T.I.; Thompson, R.C.; Peng, S. J. Biol. Chem., 2004, 279, 36250-36258. Haoyun, A.; Becky, D.; Cook, P.D. J. Med. Chem., 1998, 41, 706-716. Kamigiri, K.; Suzuki, Y.; Shibazaki, M.; Morioka, M.; Suzuki, K. J. Antibiotics, 1996, 49, 136-139. Ando, R.; Kawamura, M.; Chiba, N. J. Med. Chem., 2001, 44, 4468-4474.
Frontiers in Drug Design & Discovery, 2005, 1, 17-28
17
Small Molecule Drug Targeting of RNA Guido J.R. Zaman* N.V. Organon, Molecular Pharmacology Unit, P.O. Box 20, 5340 BH Oss, The Netherlands Abstract: Targeting at the RNA level is considered as an alternative approach to traditional drug discovery focusing on proteins. The targeting of bacterial ribosomal RNA with aminoglycoside antibiotics has provided clear precedence for the targeting of RNA with small molecule drugs. Aminoglycosides can also bind to human cytoplasmic ribosomal RNA and suppress premature termination codons in human messenger RNAs. This suppression activity is explored for the development of novel therapies for genetic disorders caused by premature stop codon mutations, such as cystic fibrosis and Duchenne muscular dystrophy. While aminoglycosides act on ribosomal RNA, certain small molecule metabolites, including vitamins, lysine and purines, can regulate gene expression by binding to messenger RNA at so-called ‘riboswitches’. Riboswitches consist of complex three-dimensional structures, located in the 5’-untranslated region of bacterial and fungal messenger RNAs, coding for proteins involved in the uptake, biosynthesis and export of these metabolites. It is unknown whether riboswitches also occur in human cells. Riboswitches provide yet another proof that small molecules can influence biological events by binding directly to RNA. The targeting of human messenger RNAs with small synthetic chemical molecules is a relatively new approach, but may create new and unique opportunities for drug discovery. By using messenger RNA as a target, all genes in the human genome may be considered, including those that encode proteins that are not amenable to highthroughput screening, but for which clear associations with disease processes follow from (molecular) genetic or pharmacological investigations. In this area, progress has been made recently in the targeting of the Alzheimer’s βamyloid precursor protein messenger RNA, and the tumor necrosis factor-α messenger RNA.
INTRODUCTION Modern drug discovery is a highly industrialized process in which the chance of finding good drug leads is enhanced by screening high numbers of protein targets against large collections of synthetic chemical molecules. To further reduce the risk of failure of this high-throughput screening (HTS) operation, many pharmaceutical companies confine themselves more and more to certain families of targets that have proven to be particularly successful, as evidenced by the number of marketed medicines interacting with these protein classes. These ‘druggable’ protein families include *Corresponding author: Tel: +31-412-661043; E-mail:
[email protected] Garry W. Caldwell / Atta-ur-Rahman / Barry A. Springer (Eds.) All rights reserved – © 2005 Bentham Science Publishers.
18
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Guido J.R. Zaman
G-protein coupled receptors (GPCRs), nuclear receptors, ion channels, transporters and enzymes [1]. Generic technologies to assay these druggable proteins are available, and greatly speed up the process from gene-to-screen [2-4]. Screening assays can either be biochemical or cell-based. Biochemical assays are based on (partly) purified proteins, or membrane preparations, and are used to measure receptor binding or enzyme activity [4]. Cell-based assays use functional or phenotypic read-outs [2,3]. In the case of receptor targets, these assays measure ligand ‘efficacy’ [5]. Since the start of HTS in the early 1990s, this factory-like approach for drug discovery has yielded a limited number of new drugs and several tens of new drug candidates [6]. Sequencing of the human genome has revealed the DNA code of many new members of the druggable protein classes [1]. However, the human genome project has also identified many proteins that are not amenable to HTS, because assay technologies are not available, or because the biochemical function of these proteins are not yet known [7]. In fact, the major drug target classes cover only 17 % of the human genome [7]. Furthermore, in more than half of all instances, the screening of druggable proteins does not deliver a good lead compound [6]. It is therefore not surprising that targeting at the RNA level is gaining interest in pharmaceutical industry and biotech as an alternative approach to traditional drug discovery focusing on proteins. RNA has been targeted with antisense oligonucleotides, small inhibitory RNAs (siRNA), or RNA enzymes (ribozymes). Until now, the success of these sequence-specific targeting approaches has been limited to one approved therapeutic, Vitravene™ (fomivirsen) (http://www.isispharm.com). Vitravene™ is an antisense drug for treatment of cytomegalovirus retinal infections in people with AIDS. The development of nucleic acid-based therapies is significantly hampered by delivery problems of these large molecules. The targeting of RNA with small organic molecules holds great promise. A characteristic feature of RNA is that it can fold into complex three-dimensional structures comprising loops, bulges, pseudoknots and turns [7]. These structures are responsible for the diverse actions of RNA molecules within cells. In this respect, RNA resembles more a protein than DNA, which is less flexible and has a less diverse tertiary structure. The unique shapes in various target RNAs create potential binding sites for small molecules [7]. A property of messenger RNA (mRNA) that makes it an attractive target for drug discovery is that mRNA molecules are usually present in cells at relatively low copy numbers, as compared to proteins. mRNA is therefore highly sensitive to small changes in the concentrations of regulatory signals, such as proteins and small molecules. PROOF-OF-CONCEPT PROVIDED BY NATURE The aminoglycoside antibiotics have provided clear precedence for the targeting of RNA with small molecule drugs. Aminoglycosides bind to the RNA component of the small ribosomal subunit, the 16S ribosomal RNA (rRNA) [8]. They exert their antimicrobial activity by inducing misreading of the genetic code. This results in the loss of functional proteins and the accumulation of abnormal proteins in the bacterial membrane, which compromises its barrier function and finally results in cell lysis [9]. In 1943, streptomycin was the first member of the aminoglycoside family discovered to have antimicrobial activity against gram negative bacteria and Mycobacterium tuberculosis, the causative agent of tuberculosis [10]. In contrast to the discovery of penicillin by Alexander Fleming in 1929 [11], which was largely due to a matter of
Small Molecule Drug Targeting of RNA
Frontiers in Drug Design & Discovery, 2005, Vol. 1 19
chance, the isolation of streptomycin was the result of a long-term, systematic effort by Selman A. Waksman and co-workers, who tested the antimicrobial activity of 10,000 different soil microbes for their antibacterial activity [10]. In fact, streptomycin was the first drug that was discovered as the result of ‘screening’, as opposed to ‘rational design’. Since the breakthrough discovery of streptomycin, which was awarded with the Nobel Prize to Dr Waksman in 1952, more than 150 aminoglycosides with highly diverse chemical structures have been isolated from the culture filtrates of bacteria. In addition, many semi-synthetic variants have been derived from these natural products, with the aim to develop more selective RNA targeting drugs, or to combat drug resistance. MECHANISM OF ACTION OF AMINOGLYCOSIDE ANTIBIOTICS The study of the binding of aminoglycoside antibiotics to rRNA has revealed the basic principles of RNA – drug interactions. Small RNA oligonucleotides derived from the decoding A-site bind aminoglycosides similar to the whole ribosome [12]. Binding is primarily determined by electrostatic interactions [13-15] and relatively non-specific [16]. Affinities for 16S rRNA fragments range from 300 nM for neomycine to 5 µM for other aminoglycosides, such as paromomycin and kanamycin B [15]. Aminoglycosides also bind to human 18S rRNA, with affinities that are only two- to six-fold lower than for the binding to bacterial 16 S rRNA [15]. In view of the success of the aminoglycosides as antibacterials for over half a century, this may come as a surprise. It should be noted, however, that aminoglycoside therapy could cause severe side effects. Of these, renal toxicity is the most common. Despite changes in treatment regimens and monitoring procedures, still 5 to 25 % of patients suffer from this side effect, as has been reported in clinical studies [17]. The second main side effect is ear toxicity, which can result in reversible or irreversible deafness [18]. Both side effects are explained by the specific accumulation of aminoglycosides in renal tubular epithelial cells or in hair cells of the inner ear, which express a specific membrane receptor for aminoglycosides, megalin [19]. There are two main factors that determine the selective cytotoxic action of aminoglycosides on bacterial cells. The first is the sequence differences between bacterial and human rRNA. This is illustrated by the fact that a ‘pseudo-eukaryotic’ rRNA can be created by mutation of Escherichia coli rRNA at one specific residue in the A-site [20]. Expression of this ‘pseudo-eukaryotic’ rRNA renders bacteria 100 times more resistant to neomycin. The second important factor is the occurrence in prokaryotes of transporter proteins that actively take up and concentrate aminoglycosides in the cytoplasm [21]. The mechanism by which aminoglycosides, through binding to ribosomal rRNA, interfere with the fidelity of the translation machinery is understood in great detail by the co-crystal structure of the characteristic example paromomycin, and the 30S ribosomal subunit [14]. Aminoglycoside binding shifts the equilibrium between two conformations of the rRNA towards a confirmation that is more prone to the binding of non-cognate (incorrect) transfer RNAs (tRNA), thus resulting in an increased error rate of the ribosome. Thus, like compounds that act on protein receptors, aminoglycosides exert their action by directing the conformation of an RNA target.
20
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Guido J.R. Zaman
RIBOSWITCHES While aminoglycosides act on ribosomal RNA, certain small molecule metabolites can regulate gene expression by binding to messenger RNA (mRNA). Binding occurs at defined target sites within so-called ‘riboswitches’, Fig. (1). Riboswitches consist of complex three-dimensional structures, located in the 5’-untranslated region (UTR) of bacterial and fungal mRNAs, coding for proteins involved in the uptake, biosynthesis and export of these metabolites, Riboswitches have been found that bind flavin mononucleotide (FMN) [22,23], thiamine pyrophosphate (TPP) [22,24], coenzyme B12 [25], S-adenosylmethionine (SAM) [26,27], lysine [28,29], guanine [30], adenine [31], and glucosamine-6-phosphate (GlcN6P) [32].
Fig. (1). Structure of riboswitch aptamers with their corresponding metabolites, from reference [66].
Riboswitches are composed of two parts: an ‘aptamer’, which is a high affinity binding site for the metabolite, and an ‘expression platform’, which translates ligand binding into regulation of gene expression. The mechanism of riboswitches is very
Small Molecule Drug Targeting of RNA
Frontiers in Drug Design & Discovery, 2005, Vol. 1 21
similar to that of aminoglycosides acting on rRNA. Binding of the metabolite ligand induces a switch between two conformational states of the RNA target. This switch either results in termination of transcription, or in inhibition of translation initiation. In the first case, the riboswitch involves formation of an intrinsic translation terminator in the 5’-UTR, which causes transcription to terminate before the coding region is reached. In the second mechanism, a stable alternative stem-loop structure is formed, in which the Shine and Dalgarno sequence is closed, thus preventing translation initiation. In a very recently described example, GlcN6P, the binding site in the glmS mRNA consists of a ribozyme [32]. This RNA performs self-cleavage, which is enhanced 1000-fold in the presence of GlcN6P [32]. The affinities of several ligands that bind to riboswitches are in the same range as those of small molecule drugs binding to proteins, Table (1). For the few cases studied, binding was also specific. For instance, for FMN and TTP almost 1000-fold differences in binding constants were observed compared to riboflavin and thiamine, which lack the phosphate groups [23,24]. For SAM a 100-fold discrimination against S-adenosylhomocysteine, which lacks a single methyl group and one positive charge, and 10,000fold against S-adenosyl-cysteine was observed [27]. The guanine and adenine aptamers are identical, except for one nucleotide to which they owe their specificity [30,31]. The GlcN6P riboswitch also shows specificity, as analogs such as glucose-6-phosphate and glucosamine do not stimulate cleavage [32]. Table 1.
Affinities of Metabolites for their Cognate Riboswitches
Metabolite:
Kd
Reference
Flavin mononucleotide
5 nM
[22,23]
Thiamine pyrophosphate
100 nM
[24]
Coenzyme B12
300 nM
[25]
S-adenosylmethionine
4 nM
[26,27]
L-lysine
1 µM
[28,29]
Guanine
5 nM
[30]
Adenine
300 nM
[31]
Glucosamine-6-phosphate
200 µM
[32]
Riboswitches are widespread among bacterial species. In Bacillus subtilus, at least 69 genes are predicted to be under the control of riboswitches [30]. This comprises approximately 2 % of the total genome of this bacterium. Riboswitches also occur in fungal mRNAs [33], and have been predicted to occur in the mRNA of plants [34]. However, most of the genes in which riboswitches have been found in prokaryotes are absent in the genome of ‘higher’ species, such as man. It is therefore unknown whether riboswitches, or similar mechanisms occur in human cells.
22
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Guido J.R. Zaman
Riboswitches create entirely new opportunities for the development of novel antimicrobials [7,35]. Since many of the genes in bacteria that are regulated by riboswitches are involved in fundamental biochemical processes, metabolite analogs that interfere with riboswitch function can be expected to impair vital metabolic functions and cause cessation of growth. If the phylogenetic distribution of riboswitches indeed is limited to ‘lower’ species, riboswitches are attractive targets for the development of specific bacterial anti-metabolites TREATMENT OF AMINOGLYCOSIDES
HUMAN
GENETIC
DISORDERS
WITH
Based on their ability to bind not only to bacterial, but also to human rRNA, aminoglycosides are being applied for the development of novel treatment of human genetic disorders. The antibiotics G-418 and gentamicin have been shown to cause readthrough of a premature stop codon in the gene for the cystic fibrosis transmembrane conductance regulator (CFTR), a transmembrane chloride transporter [36,37]. These premature stop codons in the CFTR gene occur in a 5 % subpopulation of cystic fibrosis patients. Read-through resulted in increased levels of full-length CFTR and restored CFTR-mediated chloride transporter activity in a CF bronchial epithelial cell line [37]. There were no immediate detrimental effects on other cellular functions. Topical administration of gentamicin to the nasal epithelium of patients with cystic fibrosis resulted in the expression of functional CFTR [38]. The suppression of stop codons with aminoglycosides is also being explored as a potential treatment for Duchenne muscular dystrophy (DMD), which is caused by the absence of dystrophin protein in striated muscle [39]. In DMD, 6 % of all cases arise from premature stop codons in the dystrophin gene [39]. Promising data were obtained with a mouse model of DMD. Aminoglycosides increased the expression of full-length protein from a dystrophin allele carrying a stop codon mutation [38]. However, these results could not be reproduced in patients with DMD [40]. Aminoglycosides also have been shown to cause read-through of premature stop codons in genes involved in other human genetic disorders, such as Hurler syndrome, which is a lysosomal storage disease [41], cystinosis [42], late infantile neuronal ceroid lipofuscinosis [43], and in disorders involving the p53 gene [44]. Read-through resulted in increased levels of full-length proteins [41-44]. TARGETING OF HIV-1 VIRAL RNAs Besides rRNA, aminoglycosides can bind in a saturable fashion and with similar micromolar affinities to a variety of other RNA structures containing non-duplex elements. Based on this property, aminoglycosides have been used as lead compounds for the development of more selective RNA targeting drugs. In human immunodeficiency virus-1 (HIV-1) RNA, there are two structures that are bound by aminoglycosides: i.e. the Rev protein response element (RRE) and the transactivationresponsive element (TAR), Fig. (2). RRE is the binding site of the viral Rev protein in the envelope gene of HIV-1, and plays a role in the transport of unspliced viral RNA from the nucleus to the cytoplasm. The TAR RNA hairpin is present in all HIV-1 mRNAs and plays an important role in the viral replication cycle. TAR is bound by the viral Tat protein, which is essential for transcriptional activation and elongation of HIV1 mRNA synthesis. This protein – RNA interaction has been recognized as an important target for the development of new therapies against AIDS and has been studied
Small Molecule Drug Targeting of RNA
Frontiers in Drug Design & Discovery, 2005, Vol. 1 23
extensively [45]. A number of molecules, varying from heterocyclic compounds, peptides and peptidomimetics have been reported to interfere with the Tat – TAR interaction [46,47]. Some of these have been shown to inhibit HIV-1 replication in cells [48], but none has led to new drugs.
Fig. (2). Target sites of aminoglycosides in HIV-1 RNA.
TARGETING OF ALZHEIMER’S β-AMYLOID PRECURSOR PROTEIN mRNA Targeting human mRNAs with small synthetic chemical molecules is a relatively new approach. In general, tertiary structures in mRNA, such as hairpins or pseudoknots, are expected to provide more selective binding sites than single-stranded regions or stable double-stranded domains [7]. Although the structure of mRNA is considered to be less complex than that of rRNA, there are a few well-defined secondary structure elements in mRNA and drug target sites described in literature. One example of a welldefined secondary structure in mRNA is the iron response element (IRE), Fig. (3), which is contained in several mRNAs involved in iron homeostasis [49]. At low cytosolic concentrations of iron, the IRE is bound by the iron sensor protein aconitase. When the concentration of iron in the cell increases, the protein dissociates from the IRE. In some mRNAs, such as the mRNA that codes for the iron-storage protein ferritin, the IRE is located in the 5’ leader, where it functions as a translational silencer. In other mRNAs, such as the mRNA that codes for the transferrin receptor, the IRE is located in the 3’UTR. Here the function of the IRE is to increase the stability of the mRNA. At low cytosolic concentrations of iron, aconitase binds to the IRE in both the ferritin and transferrin receptor mRNA. This results in inhibition of ferritin translation. At the same time, the stability, and consequently the translation, of the transferrin mRNA is increased. In response to increased levels of iron, aconitase dissociates from both mRNAs. This results in increased synthesis of ferritin and iron storage, while the synthesis of the transferrin receptor and the cellular uptake of iron are decreased. Recently, IRE has been identified in the 5’-UTR of the Alzheimer’s β-amyloid precursor protein (APP) [50]. APP is the precursor protein of the main component of the
24
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Guido J.R. Zaman
neurological amyloid plaques that are formed during the progression of Alzheimer’s disease. Reduction of the intracellular concentration of iron using chelators resulted in a decrease of APP 5’-UTR controlled translation [50], Fig. (3). A cell-based assay consisting of a luciferase reporter whose translation is driven by the 5’-UTR of APP mRNA, has been used to identify APP inhibitors by screening [51]. Several compounds with activities in the (sub)millimolar range have been identified [51].
Fig. (3). Iron response element in the 5’ leader of ferritin mRNA, the 3’-UTR of transferrin receptor mRNA, and the 5’-UTR of the mRNA coding for Alzheimer’s β-amyloid precursor protein (APP).
TARGETING OF TUMOR NECROSIS FACTOR-α mRNA Tumor necrosis factor-α (TNFα) is a pro-inflammatory cytokine involved in various immune diseases. Like other cytokines and proto-oncogenes, TNFα is regulated at the post-transcriptional level through adenylate/uridylate-rich elements (AREs) in the 3’UTR of its mRNA [52]. Several proteins have been identified that bind AREs to mediate RNA decay, and hence translational efficiency [53,54]. It has been shown that the synthesis of TNFα can be inhibited by thalidomide [55]. Thalidomide is a glutamic acid derivative that was introduced as a sedative hypnotic in 1956, but was withdrawn in 1961, due to the development of grove congenital abnormalities in babies born to mothers using it for treating morning sickness. The compound was re-introduced as a therapeutic for leprosy and more recently has demonstrated potency in the treatment of a variety of cancers [56]. The initial mechanism underpinning the action of this compound was shown to be inhibition of TNFα protein expression and it was further demonstrated to act at the post-transcriptional level to facilitate turnover of the mRNA [57]. The action
Small Molecule Drug Targeting of RNA
Frontiers in Drug Design & Discovery, 2005, Vol. 1 25
of thalidomide to lower TNFα levels is not particularly potent, but analogues have been synthesized that inhibit TNFα production in peripheral blood monocytes in the low micro molarrange [56]. By using luciferase reporter assays, it was shown that the thalidomide analogs act through the ARE-containing sequences in the 5’-UTR of the TNFα mRNA [56]. CONCLUSION The targeting of human mRNAs with small synthetic chemical molecules is a relatively new approach, but may create new and unique opportunities for drug development. Proof-of-concept that small molecules can influence biological events directly, without the interference of protein receptors has been provided by the mechanism of action of aminoglycoside antibiotics and by the riboswitches. By using mRNA as a target, it will be possible to target any gene, including those that encode proteins that are not amenable to HTS, but for which clear associations with disease processes follow from (molecular) genetic or pharmacological investigations. PERSPECTIVE While targeting RNA creates unique opportunities for drug discovery in many therapeutic areas, it may be in particular attractive for the development of new male contraceptives [7]. Spermatogenesis is a strictly regulated cellular development process, in which several RNA binding proteins play a critical role [58]. This is evidenced from genetic deficiencies that cause male sterility. For instance, deletions in the Y chromosome encompassing the genes coding for the RNA binding proteins RBM [59] or DAZ [60,61] correlate with severe impairment of spermatogenesis. Also based on studies with transgenic mouse models, sperm cell specific RNAs have been identified that could be used as targets for the development of new male contraceptives [7]. What makes drug discovery for RNA different from drug discovery for proteins? In many cases, the binding of compounds to protein receptors or enzymes has been shown to result in a change of the conformation of the target. This conformational change is essential and determines the nature of the biological effect [5]. For example, different drugs that act through the nuclear estrogen receptor induce distinct conformations of the receptor, which have been visualized by X-ray crystallography [62]. The different receptor conformations result in the recruitment of different associating co-factors, and exerts different biological effects [63]. Also for GPCRs different receptor conformations have been proposed to explain the different behavior of ligands of these receptors [64]. Aminoglycosides and metabolites that bind to riboswitches act through a similar mechanism, as they exert their action by directing the conformation of the RNA. Methodologies to discover small molecules that bind to RNA targets by HTS are based on the same principles and read-out technologies that are applied for drug discovery on proteins [7]. Usually, the RNA molecules are radio labeled or labeled with fluorescent probes. RNA binding proteins [7], or allosteric ribozymes [65] can be used to monitor the binding of small molecules. The screening of large chemical libraries for compounds that fortuitously bind to RNA has, however, not delivered new drug leads that have entered clinical trials. It may be that this approach is not the most optimal. Recently, some successes have been reported in the identification of compounds that inhibit the synthesis of APP or TNFα at the post-transcriptional level by applying cell-
26
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Guido J.R. Zaman
based reporter assays [51,56]. However, whether these compounds actually work by binding to RNA remains to be determined. ABBREVIATIONS APP
=
β-amyloid precursor protein
ARE
=
Adenylate/uridylate-rich element
CFTR
=
Cystic fibrosis transmembrane conductance regulator
DMD
=
Duchenne muscular dystrophy
FMN
=
Flavin mononucleotide
GlcN6P
=
Glucosamine-6-phosphate
HIV-1
=
Human immunodeficiency virus-1
HTS
=
High-throughput screening
IRE
=
Iron response element
mRNA
=
Messenger RNA
RRE
=
Rev protein response element
rRNA
=
Ribosomal RNA
SAM
=
S-adenosylmethionine
siRNA
=
Small inhibitory RNA
TAT
=
Transactivation-responsive element
TNFα
=
Tumor necrosis factor α
TTP
=
Thiamine pyrophosphate
UTR
=
Untranslated region
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
Hopkins, A.L.; Groom, C.R. Nature Reviews Drug Discov., 2002, 1, 727-730. González, J.E.; Oades, K.; Leychkis, Y.; Harootunian, A.; Negulescu, P.A. Drug Discov. Today, 1999, 4, 431-439. Cacace, A.; Banks, M.; Spicer, T.; Civoli, F.; Watson, J. Drug Discov. Today, 2003, 8, 785-792. Zaman, G.J.R.; Garritsen, A.; de Boer, Th.; van Boeckel, C.A.A. Comb. Chem. & HTS, 2003, 6, 313320. Kenakin, T. Nature Reviews Drug Discov., 2002, 1, 103-110. Fox, S.; Farr-Jones, S.J.; Sopchak, L.; Boggs, A.; Comley, J. J. Biomol. Screen., 2004, 9, 354-358. Zaman, G.J.R.; Michiels, P.J.A.; van Boeckel, C.A.A. Drug Discov. Today, 2003, 8, 297-306. Moazed, D.; Noller, H.F. Nature, 1987, 327, 389-394. Davis, B.D.; Chen, L.; Tai, P.C. Proc. Natl. Acad. Sci. USA, 1986, 96, 10129-10133. Schatz, A.; Bugie, E.; Waksman, S.A. Proc. Soc. Exp. Biol. Med., 1944, 55, 66-69. Fleming, A. J. Exp. Pathol., 1929, 10, 226. Purohit, P.; Stern, S.; Nature, 1994, 370, 659-662. Wong, C.-H.; Hendrix, M.; Manning, D.D.; Rosenbohm, C.; Greenberg, W.A. J. Am. Chem. Soc., 1998, 120, 8319-8327.
Small Molecule Drug Targeting of RNA [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 27
Carter, A.P.; Clemons, W.M.; Brodersen, D.E.; Morgan-Warren, R.J.; Wimberly, B.T.; Ramakrishnan, V. Nature, 2000, 407, 340-348. Zaman, G.J.R.; Michiels, P.J.A.; In Progress in RNA research; Columbus, F. Ed. NovaScience Publishers: Hauppage, NY, U.S.A., in press. Verhelst, S.H.; Michiels, P.J.A.; van der Marel, G.A.; van Boom, J.H.; van Boeckel, C.A.A. Chem. Bio. Chem., 2004, 5, 937-942 Gilbert, D.N.; Lee, B.L.; Dworkin, R.J.; Leggett, J.L.; Chambers, H.F.; Modin, G.; Täuber, M.G.; Sande, M.A. Am. J. Med., 1998, 105, 182-191. Hutchin, T.; Cortopassi, G. Antimicrob. Agents Chemother., 1994, 38, 2517-2520. Schmitz, C.; Hilpert, J.; Jacobsen, C.; Boensch, C.; Christensen, E.I.; Luft, F.C.; Willnow, T.E. J. Biol. Chem., 2002, 277, 618-622. Recht, M.I.; Douthwaite, S.; Puglisi, J.D. EMBO J., 1999, 18, 3133-3138. Leviton, I.M.; Fraimow, H.S.; Carrasco, N.; Dougherty, T.J.; Miller, M.H. Antimicrob. Agents Chemother., 1995, 467-475. Mironov, A.S.; Gusarov, I.; Rafikov, R.; Lopez, L.E.; Shatalin, K.; Kreneva, R.A.; Perumov, D.A.; Nudler, E. Cell, 2002, 111, 747-756. Winkler, W.C.; Cohen-Chalamish, S.; Breaker, R.R. Proc. Natl. Acad. Sci. USA, 2002, 99, 1590815913. Winkler, W.; Nahvi, A.; Breaker, R.R. Nature, 2002, 419, 952-956. Nahvi, A.; Sudarsan, N.; Ebert, M.S.; Zou, X.; Brown, K.L.; Breaker, R.R. Chem. Biol., 2002, 9, 1043-1049. Ephstein, V.; Mironov, A.S.; Nudler, E. Proc. Natl. Acad. Sci. USA, 2003, 100, 5052-5056. Winkler, W.C.; Nahvi, A.; Sudarsan, N.; Barrick, J.E.; Breaker, R.R. Nature Struct. Biol., 2003, 10, 701-707. Sudarsan, N.; Wickiser, J.K.; Nakamura, S.; Ebert, M.S.; Breaker, R.R. Genes Dev., 2003, 17, 26882697. Rodinov, D.A.; Vitreschak, A.G.; Mironov, A.A.; Gelfand, M.S. Nucl. Acids Res., 2003, 31, 67486757. Mandal, M.; Boese, B.; Barrick, J.E.; Winkler, W.C.; Breaker, R.R. Cell, 2003, 113, 577-586. Mandal, M.; Breaker, R.R. Nature Struct. Mol. Biol., 2004, 11, 29-35. Winkler, W.C.; Nahvi, A.; Roth, A.; Collins, J.A.; Breaker, R.R. Nature, 2004, 428, 281-286. Kubodera, T.; Watanabe, M.; Yoshiuchi, K.; Yamashita, N.; Nishimura, A.; Nakai, S.; Gomi, K.; Hanamoto, H. FEBS Lett., 2003, 555, 516-520. Sudarsan, N.; Barrick, J.E.; Breaker, R.R. RNA, 2003, 9, 644-647. Winkler, W.C.; Breaker, R.R. Chem. Bio. Chem, 2003, 4, 1024-1032. Howard, M.; Frizzell, R.A.; Bedwell, D.M. Nature Med., 1996, 2, 467-469. Bedwell, D.M.; Kaenjak, A.; Benos, D.J.; Bebok, Z.; Bubien, J.K.; Hong, J.; Tousson, A.; Clancy, J.P.; Sorscher, E.J. Nature Med., 1997, 2, 467-469. Wilschanski, M.; Yahav, Y.; Yaacov, Y.; Blau, H.; Bentur, L.; Rivlin, J.; Aviram, M.; Bdolah-Abram, T.; Bebok, Z.; Shushi, L.; Kerem, B.; Kerem, E. New Engl. J. Med., 2003, 349, 1433-1441. Barton-Davis, E.R.; Cordier, L.; Shoturma, D.I.; Leland, S.E.; Sweeney, H.L. J. Clin. Invest., 1999, 104, 375-381. Wagner, K.R.; Hamed, S.; Hadley, D.W.; Gropman, A.L.; Burstein A.H.; Escolar, D.M.; Hoffman, E.P.; Fischbeck KH. Ann. Neurol., 2001, 49, 706-711. Keeling, K.M.; Brooks, D.A.; Hopwood, J.J.; Li, P.; Thompson, J.N.; Bedwell, D.M. Hum. Mol. Genet., 2001, 10, 291-299. Helip-Wooley, A.; Park, M.A.; Lemons, R.M.; Thoene, J.G. Mol. Genet. Metab., 2002, 75, 128-133. Sleat, D.E.; Sohar, I.; Gin, R.M.; Lobel, P. Eur. J. Paediatr. Neurol., 2001, 5: Suppl. A, 57-62. Keeling, K.M.; Bedwell, D.M. J. Mol. Med., 2002, 80, 367-376. Krebs, A.; Ludwig, V.; Boden, O.; Göbel, M.W. ChemBioChem, 2003, 4, 972-978. Wilson, W.D.; Li, K. Curr. Med. Chem., 2000, 7, 73-98. Du, Z. ; Lind, K.E.; James, T.L. Chem. & Biol., 2002, 9, 707-712. Mei, H.-Y.; Cui, M.; Heldsinger, A.; Lemrow, S.M.; Loo, J.A.; Sannes-Lowery, K.A.; Sharmeen, L.; Czarnik, A.W. Biochemistry, 1998, 37, 14204-14212. Theil, E.C. Biochem. Pharmacol., 2000, 59, 87-93. Rogers, J.T.; Randall, J.D.; Cahill, C.M.; Eder, P.S.; Huang, X.; Gunshin, H.; Leiter, L.; McPhee, J.; Sarang, S.S.; Utsuki, T.; Greig, H.H.; Lahiri, D.K.; Tanzi, R.E.; Bush, A.I.; Giordano, T.; Gullans, S.R. J. Biol. Chem., 2002, 277, 45518-45528.
28 [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66]
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Guido J.R. Zaman
Rogers, J.T.; Randall, J.D.; Eder, P.S.; Huang, X.; Bush, A.I.; Tanzi, R.E.; Venti, A.; Payton, S.M.; Giordano, T.; Nagano, S.; Cahill, C.M.; Moir, R.; Lahiri, D.K.; Greig, N.; Sarang, S.S.; Gullans, S.R. J. Mol. Neurosci., 2002, 19, 77-82. Kruys, V.; Marinx, O.; Shaw, G.; Deschamps, J.; Huez, G. Science, 1989, 245, 852-855. Kruys, V. ; Huez, G. Biochemie, 1994, 76, 862-866. Chen, C.-Y.; Shyu, A.-B. Trends Biol. Sci., 1995, 20, 465-470. Sampaio, E.P.; Sarno, E.N.; Gallily, R.; Cohn, Z.A.; Kaplan, G. J. Exp. Med., 1991, 173, 699-703. Zhu, X.; Giordano, T.; Yu, Q.-S.; Holloway, H.W.; Perry, T.A.; Lahiri, D.K.; Brossi, A.; Greig, N.H. J. Med. Chem., 2003, 46, 5222-5229. Moreira, A.L.; Sampaio, E.P.; Zmuidzinas, A.; Frindt, P.; Smith, K.A.; Kaplan, G. J. Exp. Med., 1993, 177, 1675-1680. Venables, J.P.; Eperon, I.C. Curr. Opinion Genet. & Dev., 1999, 9, 346-354. Elliott, D.J.; Millar, M.R.; Oghene, K.; Ross, A.; Kiesewetter, F.; Pryor, J.; McIntyre, M.; Hargreave, T.B.; Saunders, P.; Vogt, P.H.; Chandley A.C.; Cooke H. Proc. Natl. Acad. Sci. USA, 1997, 94, 38483853. Reijo, R.; Lee, T.Y.; Salo, P.; Alagappan, R.; Brown, L.G.; Rosenberg, M.; Rozen, S.; Jaffe, T.; Straus, D.; Hovatta, O. et al. Nat. Genet., 1995, 10, 383-393. Reijo, R.; Alagappan, R.K.; Patrizio, P.; Page, D.C. Lancet, 1996, 347, 1290-1293. Brzozowski, A.M.; Pike, A.C.; Dauter, Z.; Hubbard, R.E.; Bonn, T.; Engstrom, O.; Ohman, L.; Greene, G.L.; Gustafsson, J.A.; Carlquist, M. Nature, 1997, 389, 753-758. Shang, Y.; Brown, M. Science, 2002, 295, 2465-2468. Kenakin, T. Trends Pharmacol. Sci., 2003, 24, 346-354. Soukup, G.A.; Breaker, R.R. Proc. Natl. Acad. Sci. USA, 1999, 6, 3584-3589. Soukup, J.K.; Soukup, G.A. Curr. Opin. Struct. Biol., 2004, 14, 344-349.
Frontiers in Drug Design & Discovery, 2005, 1, 29-67
29
Drug Discovery and Design Via High Throughput Screening of Combinatorial Phage-Display ProteinPeptide Libraries Karlen Gazarian* Department of Molecular Biology and Biotechnology of the Institute of Biomedical Research, Mexican National University, Mexico D.F, Mexico Abstract: The advent in the mid to late 80s of the concept and methods of construction and high throughput screening of phage-displayed combinatorial libraries of proteins and peptides is a revolutionary step in the modern history of molecular biology and its area dedicated to make deep insights into the principles of inter-molecular recognition, the cornerstone of all mechanisms and processes in the living systems. High throughput screening of combinatorial libraries is a miniature model of selection of “best among many candidates” occurring everyday in the nature and resulting in this laboratory scale in new substances, peptides and proteins, with capacities of recognition of their partner molecules of extreme physiological significance required for the basic studies and resolution of chronically persisting and newly emerging biomedical concerns. In this review I intend to give a summary of the emergence of the methodology and its main achievements, focusing on the perspectives it shows for the drug design and discovery via the use of its highly effective protein-peptide engineering and selection techniques. The analysis is directed towards the most troubling infectious and autoimmune diseases (AIDS, hepatitis, cancer, diabetes, parasitological, toxicological). The objective of this review is to give a sketch of the frontiers in drug design and discovery based on the actual state of the research, the tendencies that are seen and the limitations that should be overcome to direct the potential of highthroughput screening technology to resolution of major medical problems.
INTRODUCTION During their evolution, animals and plants have developed systems and mechanisms stabilizing normal, “physiological” homeostasis necessary for each individual to realize its genetically determined ontogenetic program. The cumulative evidence that the program is realized lies in the life span with gradual aging and death after the exhaustion of the programmed resources. Microorganisms and parasitic uni- and multi-cellular species, which use human organism as their milieu of reproduction and evolution, cause permanent diseases and premature aging. Hosts and pathogens evolved together in an equilibrium that was maintained throughout evolution by natural selection but has
*Corresponding address: Tel: 5255 5622 3821; Fax: 5255 5550 0048;E-mail:
[email protected] Garry W. Caldwell / Atta-ur-Rahman / Barry A. Springer (Eds.) All rights reserved – © 2005 Bentham Science Publishers.
30
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
destabilized in the evolution of humans due to the non-natural social factors that gradually replace the natural factors. The medicine is faced with large list of anomalies of human organism, which exist mostly in animals (cancer, pandemic infections, etc) but are extended and deepened by specific human style of the life. In the last century, globalization of human activity has promoted the extensive evolution of the pathogenic viruses and bacteria that existed and that emerge as zoonotic events. These changes raise the crucial question whether or not the modern medicine is equipped to meet this reality. In the context of these challenges, medicine seeks materials, drugs, and methods to address the major problems: diagnosis and treatment, and, ideally, the prevention When one wants to speak about the discovery and design of drugs, he or she becomes aware how unlimited the notion of drug is. It includes any kind of material that can cure but not damage, the principal dilemma of the medicine. Actually, there is no need for a precise definition of the notion because everyone knows what it means from personal experience. For simplicity’s sake, drugs may be defined as substances which are introduced into organism with a particular anomaly to stop the pathological process, and, if possible, to restore the norm. Initially, materials mostly of plant origin were used as drugs in empirical public medicine. Nowadays, plants continue to provide a large list of pharmacological materials. The discovery of antibiotics became the first step in the industrial production of the naturally evolved biologically active substances. Venom of snakes and scorpions, rich in specific channel blockers, in enzymes, were another source of the pharmacological industry. In the recent times bioorganic compounds from animals and humans became more and more usable in the therapeutic practices. As animal and human tissues can be source of infectious agents such as HIV or new viruses like SARSCoV, a large group of medicines of this origin should be produced in cultured cells under controlled conditions. The advent of the recombinant methodologies opened new perspectives in the use as producers genetically modified microorganisms, somatic cells, plants and animals; however this route can only satisfy the minor part of the demanded drugs. Organic chemistry industry remains the major drug producer, which resolves the actual problems but suffers from the adverse effects of these substances due to their inability of distinguishing between normal and pathologcal cells, and a low specificity of the interaction with biological macromolecules. Nevertheless, they are widely used because cannot be substituted with new generation of effective and highly specific and safe substitutes. There is expectations that the newly developed concepts and thechnique that use large libraries of bio-molecules and methods for their throughput screening with human origin reagent molecules, form the frontiers of the future drug design and discovery to which this review is dedicated. Emergence of the Principle of Phage Display and Its Realization The advent of phage display was one of the several lines of the progress in biological sciences, which, after having accumulated information and techniques in the areas of biochemistry, genetics, and molecular biology over decades, entered in the mid-seventies in the period of a burst of fundamental discoveries: nucleosomes, intron-exon organization of eukaryotic genes, mobile elements in animals originally found in plants by McClintock in the early fifties, restriction enzymes and DNA recombination
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 31
techniques which permitted the cloning and expression of guest genes in transformed bacteria, somatic cells, in transgenic animals and plants, manipulation of mammalian embryos with first reports on the cloning of mouse and culturing of embryonic stem cells. The gene cloning in plasmids, phagemids, generation of genome-scale libraries in lambda cloning system, provided structurally defined genetic material for expression in in vitro and in vivo transcription and transcription-translation systems to elucidate the coding and surrounding them regulatory sequences in cloned DNA fragments. Expressed proteins have to be purified using time and material consuming chromatographic procedures, followed by the assessment their properties, modification and new rounds of expression of the genes. The demonstration in 1985 by George Smith [1] of the feasibility of expression and surface display of a foreign protein fused with filamentous phage coat protein appeared to be a key step towards efficient combination of these procedures, expression, assessment, selection, modification, and amplification of proteins in a single and relatively simple system. The work became the cornerstone of the phage display technology that revolutionized protein engineering. Subsequent demonstration that recombinant phage can be enriched via the screening (biopanning, [2]) of the phage population with binders recognizing the displayed foreign protein, and the idea that libraries of random peptides can be expressed in this way and screened, has promptly been realized in his and other laboratories for peptide, antibody and non-antibody protein display [5-8]. De la Cruz et al. [9] and Ilyichev et al. [10] constructed recombinant phage displaying proteins and shown that they are immunogenic. Felici and coworkers [11] developed their vector for gpVIII-fusion random peptides and focused on selection of mimotopes with monoclonal [11] and polyclonal serum antibodies [12, 13]. Antibody fragments selected in combinatorial libraries served for mapping epitopes, and for generating by affinity maturation mutagenesis HIV-1 neutralizing Fabs. Cunningham, Wells [14, 15] used the phage display of growth hormone to impulse and extends their developed Ala-scanning mutagenesis strategy of identification of amino acids involved in protein function, in epitopes, in particular. After the implementation of phage display, this permitted to accelerate drastically the rate of mutagenesis and screening procedures using randomized amino acids. This opened a possibility of protein engineering for reducing (“minimizing”) the protein structure leaving only amino acid groups required for the function [16]. Another strategy of “minimization” was developed through selection of hormone substitutes in random peptide libraries screened with receptors of hormones. The selected peptides were modified by randomization of particular amino acids to obtain finally peptides capable of inducing erythropoiesis [17-18]. The display of enzymes permitted to discover substrate analogs and elucidate the structure of catalytic amino acid groups. Projects on generation of vaccines based on peptide-mimotopes selected by mAbs [19] or patient serum [20] have been proposed and are currently underway [21]. Thus, in few years many lines of experimentation with phage display system gave representative examples demonstrating the power of the methodology and establishing the general potential of the peptide and protein engineering-fusion-expression-displayselection-amplification using a single multi-step procedure. In subsequent years, these
32
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
lines have been exploited, extending these results but at the same time revealing limitations of the practiced methodology. This created an impression that the methodology has exhausted its initial resources of the “infancy” [22] and requires new ideas and/or techniques to reach maturity so as to resolve complicated problems of the modern biotechnology and medicine. In recent years, some of the branches of the methodology accumulate hidden force for the new explosion of discoveries. One of these is seen in the intensified cell-surface directed screening [23] with recent intravascular library screening in a patient [24] with perspectives of organism-level search for ligands in their most natural state. The earlier proposed landscape library system [25] becomes actively exploited [26-28] for the descovery of tumor-specific mammalian cell surface receptors and for large-scale probing on bacterial surface markers for diagnostic purposes and for highly specific and safe detection of dangerous bacteria including species potent in bioterrorism [27,28]. These and other publications document that the methodology continues movement towards its main goals: diagnosis and treatment. Phage Display Libraries After the description by G. Smith of the filamentous phage display-expressionselection principle, first libraries of protein fusion to the coat proteins, major gpVIII and minor gpIII, have been constructed and screened (reviewed by [29-31]). Vectors based on the phage genome only give libraries in which every protein, gpIII or gpVIII, displays the fusion product (“full display”), and vectors derived from phage-bacterial recombinant genomes, phagemids, displaying the foreign molecule in a part of the gpIII or gpVIII proteins (“hybrid display”). Display of many proteins or peptides containing amino acid differences, each on separate phage is a library that can be screened (panned) on a selector molecule, e.g. an antibody, and, after washing out the phage unbound to the selector, to release (elute) the bound protein or peptide recognized by the selector. This basic principle is exploited via many procedures to yield proteins or peptides containing specific sites of recognition as contiguous or, mostly, discontinuous, amino acid groups. Full display discriminates the functional activity of the gpIII (infectivity) and gpVIII (phage particle assembly) proteins, posing limitations on the size of the displayed molecule and not permitting the display of amino acid sequences exceeding certain length. In case of the gpVIII-type libraries made using the 50aa-long capside protein interacting with each other (and possibly with DNA inside the particle), the size of display is limited to 6-8 residues. In case of the gpIII-type involving no more than five host protein molecules, each consisting of three domains, the size of the fused protein is less restricted because the foreign molecule is located on the N-terminus, whereas the gpIII function (interaction with the E.coli pili) is accomplished by the internal domain forming knob. It also made possible due to the strategy of using only one of the gpIII molecules (monovalent libraries) by means of superinfection with helper phage providing wild-type gpIII for the recombinant phage. gpIII-fusion peptide (these are the sequences of less than 50 amino acids) libraries have been so far constructed displaying from 6 to 43aa-long peptides. Linear and Constrained Libraries The first libraries contained linear peptides which are flexible and form high number of non-stabilized configurations. Then it was demonstrated that by inserting cysteine
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 33
residues at appropriate distance (typically at the termini of the random peptide sequence) the flexibility can be constrained [32,33]. Peptides containing two cysteines usually form loops [33] and in many instances such disulfide-bond constrained libraries yield higher affinity peptides than linear libraries [34]. The protein folding in periplasm of E.coli favours the interaction of the neighbour cysteines existing in the guest peptide or protein, but the possibility of their interaction with the phage gpIII cysteines (8 residues) is not completely prevented. In order to minimize the formation of such cys-cys bonds, random libraries can be designed with a lower cysteine content (e.g. in libraries provided by Biolabs, New England, USA). Despite this reduced cysteine content, binders with affinity to loop-structures can find in such cysteine-deficient libraries peptides with two cysteines [35]. Landscape Libraries Using gpVIII gene-derivative vectors [29, 36], landscape phage library displaying peptides on every protein has been constructed [25,26]. An alpha-landscape library has also been constructed in which sub-terminal gpVIII helical positions are random [37]. Since this is a novel construction we give a brief description of its characteristics. The library carries six degenerate codons in gene VIII, specifying amino acid residues 12, 13, 15,16,17, and 19, and is characterized by a great diversity of amino acids at these randomized positions, which is maintained during repeated subculture. It is suggested (corroborated by circular dichroism spectroscopy) that due to the fact that the variegated segments lie in an extended alpha-helical portion of the protein, the library sequences should be strongly conformationally constrained and stabilized by numerous inter- and intra-subunit contacts. The authors indicate that the conformational homogeneity of diverse elements in this library allows the selection of ligands with α-helical organization. Also, due to their alpha-helical constrain, peptides from this library are rigid structures, which may be of high importance for particular goals in phage display studies, for example, (a) selection of mimotopes on fibrous proteins forming coiled coil alpha- helical structures, and (b) act as a source of new fiber materials for nanotechnology with highly ordered physical and chemical properties. Such materials are expected to contribute to the development of diagnostics, bio-detectors, biosensors, vaccines. Primary (Random) and Secondary-Generation (Biased) Libraries Two type screening strategies, single-phase and two-phase, were developed [36]. In the first of them, fully random (primary, naïve) gpIII or gpVIII library is subjected to three (sometimes more) consecutive rounds of selection (Fig. (1)). In the two-phase strategy, first, a gpVIII random library is screened to collect relatively low affinity (due to the polyvalent display) peptides, then a second generation gpIII library is constructed in which the selected amino acids are conserved and remaining positions are randomized. The new library is screened for the selection of high affinity peptides. Combiantorial Oligonucleotide Libraries. Tuerk and Gold [38] and Ellington and Szostak [39] published in 1990 their pioneering work on oligonucleotide library construction and selection. The methodology [40] is known as SELEX (Systematic Evolution of Ligands by Exponential Enrichment).
34
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
Although these combinatorial libraries are beyond the scope of the review, they are to be briefly mentioned here because are designed for the same purpose as phage display, selection of ligands, here aptamers. Vast libraries of 1014-1015 of single chain nucleic acid sequences, RNA, DNA or modified nucleic acid sequences are subjected to iterative screening with various molecules including proteins, followed by PCR-amplification of selected subsets to isolate after many cycles (up to 14) populations of molecules tightly binding to the selecting molecule. The outstanding feature of this selection strategy is the immense number of variant sequences that are screened. Another important property is that the nucleic acid sequences (RNA and single-stranded DNA) work at all three structural levels, primary, secondary, and three-dimensional, which is not the case in amino acid libraries. Owing this property, nucleic acid sequences comprising libraries for SELEX fold into a huge variety of shapes formed by secondary-structure elements, providing a wide spectrum of binding possibilities for the molecular surfaces presented by the selecting molecules. Ligands were selected both by protein natural binding sites
Yu, Smith (1996) Fig. (1). Two peptide selection strategies.
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 35
and by their regions, which are not destined for such kind interactions. This distinguishes SELEX from Phage Display, where natural intermolecular contact regions are normally selectors. Libraries of natural RNAs are screened for defining their sites interacting with proteins [41-45], for example, binding of mRNA loops to T4 DNA polymerase, to the rho translation-terminator factor of E. coli, tRNA sites of interaction with synthetases and elongation factor EF-Tu. A number of aptamers have been selected using HIV-1 proteins Rev, Tat, RT. In particular, the known region in viral RNA interacting with Rev was randomized and the resulted library was selected to find Rev-binding sequence and design its three-dimensional model. RNA ligands for HIV-1 Tat and RT proteins were also isolated from a library of 32 random nucleotides. High affinity aptamers of RT were isolated that did not interact with RT of other retroviruses. Notably, aptamers selected for RT of avian mieloblastosis virus and Molony Murine Leukemia Virus had a biological activity - inhibited the polymerase and RNase H enzymatic activities. Of interest in the context of the drug discovery are the selected aptamers to proteins, normally known not to interact with nucleic acids, which can be potent inhibitors or promoters of their biological activity; for example, binders selected for thrombin able to inhibit its clothing function. In this experiment, a 96-base oligonucleotide was selected that bound to thrombin which then was “minimized” to derive a small molecule with 100-fold higher affinity (see below for similar phage display protein “minimization” strategy). The bFGF (basic Fibroblast Growth Factor, involved in angiogenesis) was used to select aptamers. In SELEX, powerful computer-assisted analytical programs are implemented for processing of selected aptamers and identifying in them consensus patterns and predicted secondary stricture [45]. RESULTS OF THE SCREENING OF PHAGE DISPLAY COMBINATORIAL LIBRARIES WITH VARIOUS SELECTORS Since 1990, general principles, methods, applications, and results of phage display have been described in many manuals and reviews, to which the reader is referred for the detailed and comprehensive knowledge of the methodology [29, 30, 46, 47, 48]. Below, we describe representative results of these studies most relevant to the aim of this review - drug design and discovery. Protein Display and Engineering to Minimize the Structure Immunoglobulins Hybridoma technique permits generation of murine origin mAbs, which to be used as drugs are “humanized”. Human origin mAbs became available after the implementation of procedures permitting immortalization of human lymphocytes by Epstein-Barr virus [49]. Many well-known antibodies of therapeutic significance, such as the 2F5 to the gp41 ELDKWA epitope, 447-52 to gp120 V3 principal neutralizing epitope, were of this origin [50]. Phage libraries of immunoglobulins are an independent source of antibodies. In these libraries are displayed variable, VH and VL, domains of the Ig molecule that either associate non-covalently (Fab) or connected covalently (scFv). The first libraries exploiting PCR-copies of variable genes encoding sequences of Fab fragments were from mice. A large series of Fabs have been selected to various antigens with purposes of their laboratory and clinical usage. Further engineering of the fragments via generation of minibodies, dibodies is done. Resulting recombinant antibody fragments can show lower affinity than parental Ab or the same affinity as in case of the scFv
36
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
against p53 [51] or may even exceed [52] in affinity the original antibody. The enhanced affinity can be achieved using the CDR walking procedure of directed mutagenesis as has been done for the anti-HIV-1 Fab b12 [53, 54], and for anti-oncogen erb-2 scFv [55], which permitted to reach picomolar affinity level. Paratope-Derivative Peptides Displaying the Specificity of Their Parental Antibody Continuing the “minimal antibody” strategy, researchers started to further dissect the variable parts of IgG by using short peptides carrying their paratope sequences. Peptides derived from paratopes have been assayed with the antigens and shown that they are still carriers of the antibody specificity. One of the first systems was scFv to angiotensin, an octopeptide hormone representing the primary active component of the renin-angiotensin system that plays a central role in the regulation of blood pressure. Cohen et al. [56] used the scFv generated from high affinity (1011M-1) mAb 4D8 to angiotensin II for investigation of the antigen-binding capacity of the paratope sequences. This scFv had the same specificity profile and affinity constant as the intact antibody and was investigated further to know whether paratope-derivative peptides will retain these functional activities. It has been found that several overlapping peptides are still able to bind angiotensin II [57, 58]. The capacity of peptides derived from the V(H) and V(L) sequences of two other antibodies, anti-thyroglobulin, and anti-lyzozyme, have also been investigated. Of high interest are the data obtained with the anti-lyzozyme antibody HyHEL-5, as the results of the peptide analysis could be compared with the information from X-ray crystallography of the antibody-antigen complex. Small peptides were found to display the whole antibody binding specificity. Of the 38 residues identified as critical for the paratope binding to lyzozyme, 22 corresponded to the residues found in the interface by the X-ray crystallography. In other papers from this group [59], synthetic peptides derived from the variable regions of an anti-CD4 mAb have been shown to bind to CD4 and inhibit HIV-1 promoter activation in virus-infected cells. The combination of the paratope-derivative peptide assay with site-directed mutagenesis permitted also to map functional amino acids in paratopes of neutralizing antibodies to HIV-1 [60]. More recently, the crystal structure of complexes of several paratope–derivative peptides of the broadly neutralizing IgG 2F5 were resolved by X-ray and the obtained information was used for the elucidation of conformation of the ELDKWA epitope [61]. Thus, the developed peptide approach permits to map short sequences in paratopes displaying the antibody binding function and provide information complementing data from other structural studies. Its advantage over the traditional structural methods is that it allows physical dissection of the structure-function relationships in antibody paratopes and to use short sequences from binding domains for designing miniaturized therapeutic mono-specific molecules. Its next step is seen in the generation of phage libraries of paratope sequences with randomized amino acid positions and selection sequences with properties of both antibodies and epitopes. Scaffolds for Engineering Novel Binding Sites in Proteins The Emergence and Principle of Scaffold Strategy The use of combinatorial protein chemistry along with powerful selection techniques had led to the development of novel ligands for non-antibody proteins using the strategy
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 37
of randomization of surface residues of a parental protein, which is used as a scaffold. With the 3D structure of a suitable starting protein, variants with affinity to a desired target can be selected from combinatorial libraries constructed via random but yet targeted amino acid substitutions directed to the molecular surface. Such new proteins, based on different protein frameworks of various topologies selected from libraries via specific binding towards a given target, have perspectives to complement, substitute or modify natural proteins, antibodies, hormones, etc. in several biomedical applications relevant to the diagnosis and the therapy of diseases. This section of the review will deal with the results from combinatorial approaches used to obtain proteinaceous ligands derived from an original protein of beneficial features. Many groups have recruited naturally existing proteins or domains for engineering via display, randomization and optimization (reviewed by [62]). A prerequisite for the success is that the protein should comprise regions at the molecular surface that can be manipulated via amino acid replacement or insertion maintaining the characteristic scaffold parameters (the specific secondary and 3D structure) and can be expressed and purified. Structural information on the scaffold candidate is of great importance in defining the target regions for the randomization. In cases in which the native template protein itself has a binding function, residues involved in that interaction were the first candidates for the randomization. Successful randomizations to improve on a native binding function have been started by the use of the strategy of amino acid substitution with alanine (“alanine scanning”), indicating the importance of the residue in the protein function, described in protein engineering papers [63-66]. The obtained information indicated the ways towards the engineering of new binding specificities. This is a further challenge because the success requires, apart from the individual amino acid data, preliminary structure-function knowledge of the proteins with various topologies obtained from special investigations. This is required to enhance the probability of obtaining a complementary surface, protrusion or crevice, as interface for the binding. Several groups have taken the first step towards the selection of such 'artificial binding molecules', employing native or engineered presentation scaffolds of different overall topologies for combinatorial randomization of surface residues. Using such approaches, sequences capable of, for example, serving as substrates [67-69] or inhibitors [70], or sites for binding of antibodies [71], or receptors [17] have been identified. Immunoglobulin-Like Scaffolds The first engineering procedures with the hypervariable region forming the paratope interface has been done for enhancing the binding function which resulted in picomolar affinity not existing in natural antibodies (cited above). The immunoglobulin scaffold, present in Fab fragments offers also excellent possibilities for other type engineering but also may lead to complications due to the two interacting molecules to be engineered and assessed. The use of single-chain antibody fragments containing only three loops which can bind antigen (like antibodies of camels devoid of light chains [72]), eliminates the difficulty. Scaffolds of minibodies have been extensively used for partial randomization [73, 74]. In the first of these works, a minibody scaffold of 61 residues derived from a mAb was used for randomization of parts corresponding to the exposed hypervariable loops H1 and H2 [73]. Martin et al. [74] used a phage library of 5x107 variants with random sequences in the loops for screening against human IL-6 and
38
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
isolated a minibody (MB02) that inhibited IL-6 receptor through the binding to H2 loop with micromolar affinity and contained a sequence matching well seven consecutive amino acid stretch in the receptor of the IL-6. Randomization of the H1 loop permitted to develop a higher affinity to IL-6. Loop libraries based on the tendamistat scaffold (a 74 amino acid Ig-like protein [75] that inhibits amylase) have been constructed and screened. A phagemid library of variants obtained via randomization of loops and subjected to panning against an endothelin-specific mAb yielded among many different sequences one found in about 50% of the analyzed clones possessing α-amylase binding activity. Its activity was mediated by the non-variegated “loop one,” which implies a native-like fold involved in the binding. The loop insert sequence in this variant had no obvious similarity to endothelin. This supported the results of alanine-substitution performed on endothelin indicating that the sequence recognized by the mAb is a nonlinear α-helical epitope. Staphylococcal Protein A Bacterial staphylococcal protein A, an Ig-binding protein composed of five homologous domains has been attractive system as a source of shorter single-domain scaffolds for engineering due to peculiar features such as several ordered alpha-helical structures folding without cysteine bridges (can occur also in the cytoplasm) and the high contribution of hydrophobic surface areas binding to Fc. The 58aa domain B composed of three helices, of which two have binding interfaces, has been a source of the small scaffold derivatives extensively used for generation of gpIII libraries of Fcbinding helical structures via randomization of the amino acids (see [62]). Synthetic “Z” domain of 38 amino acids containing the two α-helices with the Fc-binding sites was produced and used for engineering. Structurally equivalent to the original domain Fcbinding variants (Z38, Z34C) were selected and investigated by various methods including NMR [76]. In another series, the synthetic gene for the whole 58aa domain was variegated at 13 amino acid encoding sites of which seven are on the Fc-binding interface, to produce a library of 4.5 x 107 variants with conserved hydrophobic residues important for the protein stability. The library was a source of proteins binding Tag DNA polymerase, human insulin, human apolipoprotein A-1. Circular dichroism has shown their significant proportion still maintain the high helical content, suggesting an overall fold similar to the native domain. The important advantages of the system as a scaffold are: small size of the proteins permitting the use of synthetic versions and introducing non-natural amino acids, and its relatively flat binding surface for the selection of ligands to relatively large molecules. The elimination of the third a-helix devoid of the Fc-binding site but important for the stability necessitated the inclusion of an inter-helical cysteine bond in case of Z34C. Libraries of Zinc-Fingers Zinc finger [77] found first in transcription factor TFIIIA as its DNA-binding domain [78] is also scaffold of a great interest for randomization engineering. Zinc fingers of the Cys2His 2 (or TFIIIA) motif are abundant in eukaryotic transcription factors. They are represented by domains of around 30 amino acids structured as pair of anti-parallel β strands followed by a α-helix, in which two cysteines and two histidines coordinate a zinc atom stabilizing the protein structure. The amino acids in the N-terminal part of the helix are responsible for the DNA sequence-specific binding. Each amino acid specifically recognizes three base pairs of the DNA sequence [79]. The use of phage
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 39
libraries for manipulating zinc fingers is successfully used for generating fingers with novel specificity. Libraries of fingers with randomized DNA-contacting residues were constructed and screened with hair-pin-like DNA binders containing different three-base sequences (codons), and clones recognized by each DNA binder were isolated as novel zinc-finger variants [80-83]. A large number of zinc fingers capable of binding new and defined DNA targets were obtained from the libraries with variegated DNA-binding region that permits deciphering the amino acid recognition three-nucleotide code. The information gained in these studies is useful for designing novel transcription regulators, as, for example, has been done for the exogenous erb-2 promoter [84]. Furthermore, by designing transcription factors, Barbas’ group reported on the successful modulation of the activity of endogenous genes [85,86]. The use of this potential of zinc fingers and that of the combinatorial phage display machinery opens a new avenue in the programmed regulation of the genomic functions through newly designed and engineered zinc-finger transcription regulators. The zinc-finger scaffold also provides a framework useful for elucidating proteinprotein interaction. Employing a parental Cys2His2 zinc finger (CP-1) as a scaffold, Bianchi et al [87] generated libraries of variants with variegated five discontinuous positions in the α-helical subdomain. The screening of the libraries with a mAb to Shigella flexneri bore a common consensus epitope His-Fen-Val-Gln-His/Arg. This study demonstrates that zinc fingers provide a possibility of epitope identification on the folded well stabilized protein context, which cannot be identified with conventional phage display epitope mapping approach. One limitation of this approach is a low diversity that can be achieved by randomization a small area on the protein surface. Speaking of the potentials of this scaffold, worth of mentioning is the report on a smallest, 23-residue, zinc-finger-like peptide able to fold in the absence of the metal [88]. Phage-Display Engineering of Loops and Helices in Proteins Cytochrome E. coli cytochrome b562, a 106aa four-helix bundle, devoid of cysteines [89] was used to construct a library of about 108 phage clones expressing its variants with mutations in two loops produced via randomization at five and four positions; the library was panned on a conjugate between N-methyl-p-nitrobenzylamine derivative and BSA. Sequencing of the variants revealed in each of the loops two conserved amino acids. In the second library the non-conserved positions were fully and the conserved positions slightly randomized. When panned on the same conjugate, an additional consensus amino acid was identified to complement the four previous. Despite the long procedure, newly obtained cytochromes failed to bind free V-methyl-p-nitrobenzylamine ligand as well as that conjugated to ovalbumin or BSA alone, suggesting that the BSA was a participant of the putative epitope. In a series of studies α-helical regions have been used as targets for randomization. For example, Houston et al. [90] used a synthetically produced 24aa-long helices forming coiled-coil that was stabilized by insertion of two lactam bridges, and used for grafting a discontinuous five-residue epitope recognized by a IgA mAb which retained its epitope specificity to the antibody. The disadvantage of this kind otherwise interesting small scaffold was the absence of stabilizing inter-helix loop that should be compensated by the introduced lactam. In another work [91], a 56-residue coiled-coil
40
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
stem-loop miniprotein was constructed, avoiding the necessity of additional stabilizers and permitting better possibilities: the use of both the α-helical and loop positions as targets for engineering via randomization. A conformationally constrained synthetic library represented by an 18-residue α-helical peptide consisting of leucine and lysines was used for the selection of functional proteins [92]; it was found that some of the replaced amino acids increased the helical content of the original peptide. The introduction of randomized positions along α-helically constrained frameworks is expected to yield biased helical libraries from which proteins with desired functions are expected to be isolated. Recent development [37] of alpha-helical-type library is a promoter of these perspectives, permitting to carry out basic studies with α-helical structure and design helix-containing proteins as scaffolds. Protease Inhibitors The characteristic feature of protease inhibitors is the highly constrained structure stabilized with intramolecular disulfide bonds. The inhibition is achieved by binding their cognate protease with high affinity via exposed loops that mimic the substrate but resist the cleavage by the protease. Loops in general (as shown above) are suitable targets for engineering and in case of protease inhibitors, where they are the recognition regions in the protein, the randomization strategy is productive, generating a spectrum of modified loops differing in their binding and protease-resisting properties for selection of variants that may also be used for discovery of non-protein protease inhibitors. Wang et al. [93] constructed a phage library of 4x102 clones expressing an E. coli serine protease inhibitor, ecotin, in which the positions of the reactive residues, Met 84 and Met85, were randomized. The screening with human urokinase-type plasminogen activator yielded ecotin variants enriched in basic amino acids at the random positions, of which one variant with arginine in the both randomized positions, had a 2800-fold higher urokinase-binding ability than the original ecotin. Röttgen and Collins [94] have used libraries of the 56-residue human pancreatic secretory trypsin inhibitor (PSTI) with exposed-loops for selection of proteins with chymotrypsin-binding specificity. Sequences were selected that could also be predicted via computer-aided design but with no reported binding properties. Using a Kunitz inhibitor domain of human lipoproteinassociated coagulation inhibitor (LACI-DI) as phage displayed scaffold variants for iterative optimization of two loops, and plasmin as a selector, a variant was found with a 12,500-fold higher inhibition power compared with the parental LACI-DI [95]. The authors reported on the selection from this library of a high affinity plasma kallikrein inhibitor. In this experiment proteins selected against human thrombin bound to noncatalytic region. Phage Display Engineering of Proteins of Signal Transduction Cytokines and Growth Factors Another group of extensively engineered functional proteins are cytokines and growth factors, which interact with their receptors and initiate the signal transduction. The importance of physiological functions in which cytokines participate in many physiological systems including regulation and coordination of immune responses and inflammatory processes, as well as the hope that a detailed understanding of such protein–protein interactions and generation of proteins with new properties will lead to
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 41
new therapeutic possibilities motivated extensive engineering projects. The success of the structural modifications in ligands, maintaining their function, is in direct dependence on the elucidation of the structures, epitopes, interacting with the receptor. This is why a major focus in these studies has been on the identification of epitopes in ligands and their cognate receptors responsible for interaction and elicitation of biological effects. One of the first targets was human growth hormone, hGH, a cytokine which binds to two identical receptor subunits via its two different epitopes [96]. Consequently, the complete hGH receptor complex is assembled sequentially: once the complex of hGH with the first receptor subunit is formed via site I, the binding site to the second receptor subunit is formed by the residues of hGH and the receptor subunit. The contribution of alanine scanning to elucidation of structural determinants of the binding function of hGH is well recognized [97]. In the experiments by Wells, Cunningham and their coworkers [65, 96, 97, 98, 99] the amino acids important in the ligand–receptor interactions were identified by alanine scanning of the hGH region involved in binding. The development of the multiple amino acid testing using the phage display and the information on 3D structure of the hGH-receptor complex [100], and in the IL-receptor complex [101, 102], allowed more efficient and rapid structure-function characterization and modification of the determinants to have access to the regulation of their and their cognate receptors activity. Elucidating the ligand–receptor interactions was critical for designing cytokine (or receptor) antagonists in such a way that the protein primary receptor-binding site is intact and the region interacting with the second receptor subunit is modified and such ligands become unable of initiating signal transduction and convert into receptor antagonists blocking the activity of cytokines [103,104]. The gp130 cytokine receptor was a focus due to its central role in the formation of high affinity receptor complexes for a family of cytokines involved in the regulation of various processes: hematopoiesis, immune response and inflammation. When IL-6 and IL-11 are acting, it homodimerize, when LIF, oncostatin M, ciliary neurotrophic factor, or cardiotrophin-1 are effectors it associates with receptor, LIFRβ. The binding of cytokine triggers dimerization of large-size receptors initiating the signaling cascade by inducing a phosphorylation and activation of the tyrosine kinases phosphorylating the tyrosine residues on cytoplasmic domains of cytokine receptors thus initiating intracellular pathway of the signal transduction leading to activation of transcription by transcription factors which migrate into the nucleus and activate the transcription of the IL-6 family of target genes. One of the aims pursued in the engineering of this complex physiological system is to generate antagonists blocking the activity of the cytokines. Fusion ligand-receptor protein was produced able to bind directly to gp130 and to block the biological activity of all gp130 family cytokines (“hyper-antagonists”) [105, 106]. The phage display has much to propose for a scaled manipulation of these systems of cell regulation and induction [107]. First of all, peptide selection by these factors would be used for identification of critical amino acids in the epitopes and their engineering via randomization. An example of this is the work reported by Oliver et al. [108] in which the screening of a phage display library of small cyclic peptides with anti-gp130 antibodies, that neutralize the function of oncostatin M, permitted isolation of a consensus motif that is also found in the elongation factor (EF)loop of the D2 domain of gp130. The alanine scanning identified gp130 residues that are involved in binding oncostatin M but not leukemia inhibitory factor (LIF). Other examples of the successful engineering are described below. Fibroblast growth factor (FGF) receptor agonist was produced by fusing a leucine-zipper motif (mimics the
42
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
heparin-binding segment of the factor) to small cyclic peptides from phage-display that induced dimerization of the FGF receptor [109]. Here, structural considerations combined with a combinatorial approach permitted the generation of molecules that are structurally unrelated to fibroblast growth factor but have similar specificities. New cytokines and growth factors with altered specificities were produced based on molecular modeling of protein regions involved in the interaction that permitted to switch the enzyme activity [110]. Gene Shuffling for Generating Novel Proteins Gene shuffling is a combinatorial gene-recombination approach to modify proteins [111-113]. Homologous genes are recombined by cutting down individual genes to fragments of 50-100 base pairs and then reassembled by a self-priming process similar to PCR. Subsequent amplification of the recombinants is achieved using conventional PCR followed by isolation of products of interest under the condition of stringent selection. Chang et al. [114] used eight human interferon-alpha genes for multigene DNA shuffling and generated a chimeric interferon-α with 285000-fold higher affinity than human interferon-2α. The improved clone was derived from only five parental genes and did not contain point mutations, in contrast to earlier DNA shuffling procedures, in which point mutations that were introduced by the PCR reactions were also found in the selected clones. PEPTIDES SELECTED FROM COMBINATORIAL PHAGE LIBRARIES WITH PROPERTIES OF LIGANDS OF PROTEIN BINDING SITES Numerous Examples of Isolation of Interesting Peptide Sequences have been Reported Using Different Types of Libraries Peptides Selected by Receptors Phage-presented peptide libraries have been screened for binding to the receptors. This has led to the identification of small peptides able to mimic large protein interfaces of cytokine proteins and function either as antagonists or agonists. Erythropoietin. The outstanding results with this hormone have been obtained by researchers at Affymax -R.W. Johnson Research Laboratories (Palo Alto) and the Scripps Research Institute (San Diego). The Affymax group screened random libraries with the receptor of erythropoietin (EPO) and isolated a disulfide-bonded cyclic peptide with no similarity to the EPO sequence able to induce erythropoiesis in mice by binding to, and activation, the EPO receptor [17, 115]. This was the first cytokine agonist selected from combinatorial libraries. To isolate the peptides with affinity to the erythropoietin receptor, the sequential strategy of selection from different libraries [36] was employed, Fig. (2). First, a highly polyvalent gpVIII library was screened with purified receptor to select relatively low-affinity peptides sharing particular amino acids (motif). Then, monovalent gpIII secondary libraries based on these sequences were constructed, in which the nonmotif residues are randomized. Selection from this type of library permitted the isolation of high affinity binders that acted as peptide agonist that binds to the EPO receptor to form an activated peptide-receptor dimeric complex. Livnah et al. [115] resolved the EPO-receptor complex at 2.8A describing structural details of the interaction of the
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 43
peptide with receptor promoting its dimerization. Thus, the obtained cyclic peptide mimicked the function of the natural hormone.
Wells, 1996 Fig. (2). The two-stage strategy used for selection of erythropoietin (EPO) mimics binding to the receptor (EBP).
44
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
Other peptides selected for cytokines and displaying agonistic or antagonistic activities are: (1) peptide-mimics of the 332 amino acid thrombopoietin with agonist activity leading to platelet production [116]; (2) the IL-1 receptor antagonist peptides binding the extracellular domain of the human type 1 IL-1 receptor and preventing the activation of the IL-1-driven responses in human and monkey cells [101,102]. Thrombin receptor. The thrombin receptor of platelets is an integral membrane protein and is cleaved by thrombin to expose a "tethered ligand" that binds to, and triggers, the receptor. Doorbar and Winter [117] isolated a peptide antagonist of this receptor by panning the library directly on platelets and using for its elution from the complex the peptide agonist of the thrombin receptor. The obtained peptides had a consensus sequence MSRPACPNDKYE, and displayed some features common with the tethered ligand (in particular, an arginine residue followed by a proline) and showed about ten-fold higher anti-aggregating activity compared with the previously described peptide antagonists of the thrombin receptor. N-methyl-D-aspartate receptor. Li et al. [118] investigated by phage display the brain oligomeric N-methyl D-aspartate receptor (NMDAR), which is a ligand-gated ion channel that becomes selectively permeable to ions upon binding to ligands. Aimed at the finding peptide modulators of NR1, one of the subunits among the five implicated in the channel, the authors selected cyclic peptides (Mag-1) able to associate with the receptor that contains the putative ligand binding domain with a consensus sequence of CDGLRHMWFC; its synthetic form inhibited in a non-competitive manner the receptor channel activity. Estrogen receptor. Paige et al. [119] selected peptides acting as sensitive probes of estrogen receptor conformation and demonstrated that their ability to produce distinct biological effects owes to the conformational changes in the receptors they induce. The biological effects of estrogen receptor agonists and antagonists acting through these receptors are suggested to be different. Tumor Necrosis Factor - alpha (TNF-a). TNF-a is one of the cytokines among humoral factors with immunoregulatory function produced by immune system. Its enhanced plasma concentrations in various infectious and inflammatory diseases (rheumatoid arthritis, bacterial sepsis) exert deleterious effects [120]. TNF-a is a trimer interacting with two receptors displayed on membranes of many cell types: R-1 and R-II, 55kDa and 75kDa, respectively. The TNF-a sites interacting with the receptors [121] are potential targets for selecting peptide substitutes of the receptor with antagonistic potentials. Partidos et al. [21] screened a 15-mer gpIII phage display library with recombinant TNF-a and isolated peptides presumably mimicking certain TNF-a receptor binding site(s). Despite showing no structural similarity to the receptors, the peptides inhibited the TNF-a-induced cytotoxicity (apoptosis). Synthetic peptide version of one of the mimotopes retained this ability to inhibit the cytotoxicity of mouse and human TNFa. In other selections, mimotopes blocking the binding of TNF-a to its receptor were obtained [122,123]. Mimotopes with agonistic and antagonistic activities for various other cytokines or hormones have been generated. Some of them are described below. Streptavidin. Low-affinity peptide ligands with the common sequence His-Pro-Gln were isolated from linear peptide libraries screened with streptavidin [124], a model receptor system. Giebel et al. [125] screened with streptavidin cyclic peptide gpIII libraries and isolated sequences with HPQ motif. Analysis of binding kinetics and
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 45
affinities showed that the conformationally constrained cyclic peptides bound streptavidin with affinities up to 3 orders of magnitude higher than their linear variants. These results demonstrated the potential of screening conformationally constrained peptide libraries for high-affinity novel receptor ligands or enzyme substrates that were repeatedly observed by others. Ribonuclease SS. Smith et al. [126] isolated from a random hexapeptides phage library clones displaying peptides that bind S-protein, a 104-amino-acid fragment of bovine pancreatic ribonuclease (RNase). The selected peptides showed a consensus sequence motif, (F/Y)NF(E/V)(I/V)(L/V), bearing a little resemblance to S-peptide, a 20-aa fragment of RNase that is natural ligand for S-protein. The chemically synthesized peptide with sequence YNFEVL bound S-protein and behaved as antagonist of the Speptide regarded as a new RNase-specific 'drug'. Concanavalin A. Oldenburg et al. [127] selected peptides with consensus sequence Tyr-Pro-Tyr as ligands for the carbohydrate-binding protein Concanavalin A. The peptides had a affinity comparable to that of the natural ligand, methyl α-Dmannopyranoside, and were able to inhibit precipitation of the α-glucan dextran 1355 by Concanavalin A. Scott et al. [128] isolated from a hexapeptide library peptides mimicking the binding specificity of methyl-D-mannopyranoside (Me Man) to Concanavalin A. The peptides shared a motif MYWYPY, competed with Me Man for binding to Concanavalin A and showed only weak cross-reactivity with a closely related D-mannose-binding lectin. This work was the first to demonstrate that the phage display may be used for obtaining highly selective sugar-mimics for lectins. Neutrophil elastase inhibitors. Roberts et al. [129] engineered inhibitors of human neutrophil elastase by designing and producing a library of phage-displayed protease inhibitory domains derived from wild-type bovine pancreatic trypsin inhibitor; by screening with the target protease a mimic of human neutrophil elastase was selected with a affinity (Kd = 1.0 pM) that was 3.6 x 106-fold higher than that of the parental protein and exceeded by a factor of 50 the highest affinity reported to that time for any reversible human neutrophil elastase inhibitor. Gal80 repressor. Phage display was employed for isolation of peptides with the ability to bind Gal80 protein and mimic the active site in the activation domain of the yeast Gal4 protein that binds specifically to the Gal80 repressor (Gal4 supposedly associates with co-activators in the RNA polymerase II holoenzyme). The selected Gal80 protein-binding peptides competed with the native Gal4 activation domain for the repressor suggesting that they bind to the same site [130]. Calmodulin. The potential of the phage library screening for obtaining peptideantagonists of calcium-binding protein, calmodulin was explored. Dedman et al. [131] identified in a 15mer library peptides with a Trp-Pro motif. Then, a library displaying cyclized octamer random peptides with immobilized bovine calmodulin was used for the screening [132] and resulted in peptides with better specificity for calmodulin with antagonist activity. The peptides contained the sequence Trp-Gly-Lys and required Ca2+ to bind calmodulin. One of the peptides, SCLRWGKWSNCGS, has been shown to bind calmodulin better than its reduced form or an analogue in which the cysteine residues were replaced by serine. The functional test has shown the peptide to inhibit calmodulindependent kinase activity. Systematic alanine substitution of residues in this peptide indicated tryptophan as most critical residue for the binding. The study has demonstrated
46
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
the importance of the conformational constraint of these phage library origin peptides in their activity as specific, Ca2+-dependent calmodulin ligands. Adey and Kay [133] screened with calmodulin a longer (26mer) peptide library and isolated peptides with three consensus sequence motifs, +W-OlambdaR, WRAAV, WRXXAAAL (where +,-, O, lambda and X are positively charged, negatively charged, hydrophobic, leucine or valine, and any residue, respectively). The results of these selections demonstrated that the type of the identified calmodulin-binding motif can vary between peptides selected from different types of combinatorial libraries. Nevalainen et al. [134] reported on the isolation of two peptides, AWDTVRISFG and AWPSLQAIRG, that bound to supposedly distinct putative regions of calmodulin involved in activation of different calmodulin-dependent enzymes. In the more recent work, Demartis et al. [135] developed a general strategy for the isolation of enzymatic activities from large phageenzyme repertoires using specific anti-product affinity reagents. By displaying enzymecalmodulin chimeric proteins on phage as gene III fusions, they achieved the conditional anchoring of reaction substrates and products on phage. Insulin-like growth factor 1 (IGF-1)-binding proteins. IGF-1 is a hormone with diverse mitogenic and metabolic functions, which it accomplishes via a high affinity binding to its cell-surface receptor. The binding is influenced by a set of IGF-1-binding protein co-factors regulating its activity. Using these factors, high affinity peptides were selected, blocking their interaction with IGF-1. The important result in these selections is that the human IGF-1 apart from the site for its receptor contains six overlapping epitopes for binding to IGF-1-binding proteins, whereas each of the selected peptides was specific to only one of them [97]. Their fine specificity demonstrated the potential of mimotopes in dissecting the overlapping epitopes of crossreacting ligands into individual epitopes (also displayed by mimotopes selected by crossreacting antibodies [136,137]). Isolating peptides to each of the proteins and to the receptor is a way to generate a multiple-epitope map of the IGF-1 and use the peptides for the site-specific intervention into the complex hormonal functions. Caveolin. Couet et al. [138] used the caveolin-scaffolding domain as a receptor to select from phage display libraries tryptophan-rich peptides using also a known caveolininteracting protein, Gi2α , as a native ligand. The authors suggest that caveolin (a principal component of caveolae membranes) functions as a scaffolding protein to organize and concentrate certain caveolin-interacting proteins within caveolae membranes as signaling molecules, including G-proteins, Src-like kinases and others. The study aimed at elucidating of how the caveolin-scaffolding domain recognizes these molecules have shown that the peptides selected by them separate the multi-component interacting systems into individual interactions. Enzyme-Substrate Engineering Substrate-Phage To define substrate sites of enzymes, Matthews and Wells [139] proposed the use of phage-display system for selection of protease substrates. Smith et al. [140] described efficient discovery in a hexamer library of highly active and selective substrates for the matrix metalloproteinases stromelysin and matrilysin. The library containing a “tether” recognized by mAbs was treated in solution with protease and the cleaved phage were separated from uncleaved using a mixture of tether-binding mAbs and Protein A. The
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 47
procedure identified phage encoding peptide sequences susceptible to cleavage by the enzymes and to the selection of stromelysin and matrilysin substrates. Enzyme-Phage. Phage-displayed enzymes have been shown to retain their enzymatic function permitting their engineering. The first report on “enzyme-phage” has been on E. coli alkaline phosphatase. McCafferty et al. [141] described the expression of wild–type enzyme and its mutant with a mutation in the active site on phage surface as gpIII fusions and demonstrated that they display catalytic and kinetic properties comparable with those of free enzyme. Siemers et al. [142] constructed a library of Enterobacter cloacae P99 beta-lactamase mutants and investigated the effects on the catalytic function of substitution of residues in a putative enzyme active site. After random mutagenesis, the phage library could be enriched for active beta-lactamase by incubation of infected bacteria with beta-lactam antibiotics. The authors show the significance of the obtained data for the therapy. Corey et al. [143] generated gpIII and gpVIII M13 libraries expressing the serine protease, trypsin, which possessed kinetic parameters approximating those of the wild type enzyme. Ecotin, an endogenous E. coli protease inhibitor was found to co-purify with the trypsin-phage and proposed as selective binder of the phage. Phage display of glutathione transferase was used for engineering novel binding specificities onto the pre-existing protein framework of the enzyme. Novel glutathione-S-transferases were obtained with varying affinities. A library of mutant glutathione-S-transferases with differences in the active-site region was generated by random mutagenesis of 10 amino acid residues involved in the binding of electrophilic substrates. Novel glutathione transferases with altered specificity for active-site ligands were isolated by adsorption of the fusion protein on the surface of phage to analogs of an electrophilic substrate [144]. Protein phosphatase -1 (PP1) mediates signaling pathways that regulate a variety of the eukaryotic signal transduction pathways. Peptides with the motif sequence VX(F/W (X= H,R,S,T) existing in PP1-binding proteins were selected; synthetic peptides with this motif inhibited PP1 in vitro (IC 50 3-10uM). The peptide from the muscle glycogenbinding subunit containing this motif sequence bound to the region in PP1 opposite to the active site, i.e. the inhibition was not due to the competition with its ligand but acted distantly. The work demonstrates how selection of ligands uncovers the protein-protein interaction in complex system and provides peptides for the regulation of these interactions [145]. Dihydropholate reductase (DHFR) converts dihydropholate to tetrahydropholate and is a target for cancer, antimalarial, and antibacterial chemotherapy [146]. Peptides sharing the motif sequence (K/R/F/W) (D/E) XWLXXY bound to DHFR and the binding was blocked by inhibitors of the DHFR methotrexate or trimethoprim, suggesting that the peptides bind at, or near, the active site [147]. Troponin C. Pierse et al. [148] selected from a gpIII library 10 distinct novel peptides specifically binding to troponin C. The peptides shared consensus sequence V/L)(D/E) XLKXXLXXLA that could be involved in the binding. The clone with the highest activity had a 62.5% homology to troponin C and to the N-terminal region of troponin I isoforms. In the presence of calcium, the peptide formed a stable complex with troponin C and was able to inhibit the maximal calcium-activated tension of rabbit psoas muscle fibers.
48
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
Peptides with Anti-Tumor Activity Angiogenesis Inhibitors Peptide antagonists to angiogenin able of inhibiting its interaction with actin has been isolated [149]. Combined effects of this peptide and the earlier generated neutralizing mAb to angiogenin [150] is a way to inhibiting the growth of solid tumors. VEGF (Vascular Endothelial Growth Factor) is involved in development and permeability of blood vessels [151]. VEGF binds to the kinase domain receptor and mediates vascularization and tumor-induced angiogenesis. To control the VEGF function in the angiogenesis of solid tumors [152] the selection of high-affinity peptides that block its binding to its receptors is very important. Several peptides were selected by screening against VEGF and shown to interact specifically with it and replace the natural ligand KDR (see [97]). Further selections from secondary libraries of partially randomized obtained sequence yielded better peptide clones. Fairbrother et al. [153] produced peptides that inhibit binding of VEGF to its receptors, KDR and Flt-1. Using phage display libraries of short disulfide-constrained peptides, three distinct classes of peptides were obtained that bind to the receptor-binding domain of VEGF with micromolar affinities. The highest affinity peptide was able to antagonize the VEGFinduced proliferation of primary human umbilical vascular endothelial cells. The peptides bind to a region of VEGF known to contain the contact surface for Flt-1 and the functional determinants for KDR binding. In the following work with this system, Binetruy-Tournaire et al [154] sought to identify in phage libraries peptides able to block the VEGF-KDR interaction and isolated peptides binding to membrane-expressed KDR and a anti- VEGF neutralizing mAb. One of the clones displaying the ATWLPPR sequence completely abolished VEGF binding to cell-displayed KDR and, when tested at cellular level, inhibited the VEGF-mediated proliferation of human vascular endothelial cells in a dose-dependent and endothelial cell type-specific manner. In vivo, ATWLPPR totally abolished VEGF-induced angiogenesis in a rabbit corneal model. These results demonstrate that ATWLPPR is an effective antagonist of VEGF binding and may be a potent inhibitor of tumor angiogenesis and metastasis. Oncogen MDM2. Bottger and colleagues [155] isolated from phage libraries 12mer and 15mer peptide ligands for hdm2 which would interfere with its binding to p53 and act as anti-tumor agent. The peptide sequences showed striking homology with the previously established mdm2 binding site on p53, confirming that the 18-TFSDLW-23 site is crucial for the interaction. Free synthetic peptides with the selected sequences inhibited 100 times stronger the interaction of p53 with mdm2 than the p53-derivative peptide. Taxol. Rodi et al. [156] selected peptides binding to the anticancer drug paclitaxel Taxol, small non-protein molecules binding to beta–tubulin. The selected peptides had no significant similarity with tubulin but included a subset that exhibited similarity to a non-conserved region of the anti-apoptotic human protein Bcl-2. The peptide binding was accompanied with conformational change in tubulin. In vivo, treatment with paclitaxel leads to inactivation of Bcl-2 and concomitant phosphorylation of residues in a disordered, regulatory loop region of the protein. Similarity between paclitaxelselected peptides and this loop region suggest that the apoptotic action of paclitaxel may involve the binding of paclitaxel to Bcl-2. The results demonstrated that phage-displayed peptides can mimic the ligand-binding properties of disordered regions.
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 49
Cell Adhesion Integrin Ligands A large number of phage display studies aimed at selection of peptides recognizing integrins, which had led to the discovery that RGD is a critical sequence for the binding. Integrin family of heterodimeric, transmembrane cell surface receptor molecules consist of an alpha (120-180kDa) and beta (90-110kDa) subunits [157]. In a series of phage display studies, purified integrin [32, 158], alpha v beta 3 [159] and alpha v beta 5n [34] have been used for selection of its ligands and a large group of peptides sharing RGD motif has been isolated from linear or constrained libraries; selected sequences displayed considerable amino acid variations in the regions flanking RGD. It is supposed that the diverse composition around RGD is due to the role of the ligand in vivo as a versatile receptor for RGD-containing extracellular matrix proteins. Peptides recognized by the cell surface receptors of fibronectin (alpha 5 beta 1 integrin), vitronectin (alpha v beta 3 and alpha v beta 5 integrins), and fibrinogen (alpha IIb beta 3 integrin) could be isolated using phage libraries expressing different sized cyclic peptides, CX5C, CX6C, CX7C, as well as a library with only one cysteine (CX9). The importance of the cyclic conformation of the ligand affinity was clearly demonstrated in the selection from this library: the integrin-binding sequences derived from it contained another cysteine. A cyclic synthetic peptide ACRGDGWCG effectively inhibited cell attachment to fibronectin. The most interesting structure appeared to contain two disulphide bonds. One such peptide, ACDCRGDCFCG, was synthesized chemically and was shown to be at least 20-fold more potent inhibitor of alpha v beta 5- and alpha v beta 3-mediated cell attachment to vitronectin than similar peptides with a single disulphide bond, and 200fold more potent than commonly used linear RGD peptides. All these results emphasized the importance of conformational restriction for the affinity of integrin-binding RGD motif-peptides. E.selectin Ligands E. selectin, an inducible cell adhesion molecule, mediates rolling of neutrophils on the endothelium, an early event in the development of an inflammatory response. Inhibition of selectin-mediated rolling is a possible way of controlling inflammationinduced diseases. Martens et al. [160] described phage library-derivative ligands binding to E-selectin, which were able of blocking E-selectin-mediated adhesion of neutrophils and to reduce the neutrophil transmigration to the site of inflammation in mice. The peptides are considered candidates for anti-inflammatory therapeutic agents. Internalization Via Phage-Displayed Ligands The phage display methodology has stimulated the research into the mechanisms of internalization and their use for targeted drug delivery, including gene therapy [161,162]. Among distinct subgroups of cell surface receptors responsible for intracellular penetration with the cell-type specificity, integrins were the first that were exploited to target the intracellular translocation of a filamentous bacteriophage particle [163]. RGD-containing ligands to integrins are used by viruses whose internalization is mediated by integrins via specific recognition of an RGD (or related motif), expressed on the surface of the pathogen. This natural mechanism of intracellular translocation used by pathogens has been exploited in experimental internalization. Sequences with
50
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
RGD were utilized for the integrin-mediated internalization of a filamentous bacteriophage particle into cultured mammalian cells [163,164]. These studies encouraged the use of the phage display methodology for identification of cell-typespecific integrin-binding ligands and stimulated the research into the internalization mechanism and the intracellular fate of the integrin-phage complex (reviewed by [165]). Along with this approach, a second strategy was developed based on the mechanisms used by growth factors [166]. It consists in display of specific growth factors on the surface of bacteriophage to target cancer cells, which overexpress their receptors. One possibility that also was exploited is the display of scFv with the specificity to growth factor receptors, permitting the intracellular penetration of certain phage constructs as vectors. A special phage vector expressing scFv to ErbB2 was constructed for internalization [167,168] and procedures were designed for direct selection of phage displaying peptides that recognize and bind to motifs on the surface of mammalian cells [161,162]. Panning on Live Cell Surface Markers Cultured Cells Along with the routinely done phage library selections against purified molecules, raising concerns as to how these molecules reflect their properties in cells, in recent years experiments on selection of peptides recognized by cell-surface proteins have been started. This was primarily directed to target cell-surface specific receptor molecules in various normal and pathological tissues. One aim of the selection of cell-specific peptides is to extend the list of cell-specific internalizing vehicles for gene therapy [161]. Spear et al. [169] described successful use of the peptide phage display to target viable malignant glioma cells. Samoylova et al. [170] panned a landscape f8-1/8-mer type phage display library on viable glial cells and isolated, apart from the phage that bound to cell surface, a subset of internalized phage. The sequencing of peptides displayed by these phages have shown that they contain RGD-containing motif E(L,V,S)RGDS. This latter group has been apparently bound to integrin that resulted in the internalization of the phage. Remarkably, RG2 glioma cells could uptake them 63-fold more efficiently than astrocytes. The authors note that these properties of selected peptides could be beneficial in the design of effective combinations of drugs for anti-glioma treatments. This opens a possibility to select from peptide libraries clones with this and other “internalizing” motifs and use them as vehicles for drug delivery. Information on cell surface markers is commonly obtained in experiments with cellsurface-specific antibodies [171]. Peptides from phage display libraries provide new possibilities for detection of cell-surface markers that are not sites for antibodies [172]. In the above cited work, Samoylova and coworkers [170] have selected ligands for unknown glioma cell surface binders and describe two families of surface-bound peptides, one, containing motif V(T,S,L)P(E,T)H recognized by a marker common for glioma cells as well as normal brain cells and cells of non-brain origin, and the other, D(T,S,L)TK, with a pronounced glioma-selective properties. Intra-Vascular Biopanning of Phage-Peptide Libraries In these experiments two-month-old Balb C or C57Bl/6J mice were perfused with gpIII-fusion random peptide libraries of 10 8-109 different sequences to target the markers
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 51
propagated by endothelial cells of vasculature (reviewed by [173]). The authors report that endothelial markers in vasculature of each organ are tissue-specific and, after the library is introduced, peptides recognized by these markers are bound and home. In each analysed tissue the authors recovered peptides that supposed to bind specific receptors that represent markers of angiogenic vessels [174-176]. It is expected that peptides can be found that home to tumors and may be used as vehicles for delivery of anticancer drugs directly to tumors. Arap and coworkers [24] extended these in vivo screenings to humans. An C7C library with diversity approximately 2x108 different sequences (dose: 1014 transducing units in 100 ml of saline) was intravenously infused to a patient with macroglobulinemia, and after 15 min tissue biopsies were done. Phage recovered from probes was used for the determination of sequences they display. The sequences were analyzed using a high-throughput pattern-recognition software. In total, 4,716 sequences were processed to find out tripeptide motifs frequent in each of the tissues. Some of these motifs were then found to appear within known human proteins and finally compiled a panel of candidate human proteins that could be mimicked by the motifs. Diagnostic Peptides Peptide Detectors of Pathogenic Bacteria Selection of peptides able to detect infectious viruses and bacteria is a relatively recent application of phage display but has already recommended itself as one of the most perspective branches of the methodology able to bypass traditional methods [177180]. Diagnostic phage-displayed peptide probes selected among billions of variants are more specific and easier to manage. Peptides specifically bind highly pathogenic bacteria distinguishing between spores and vegetative cells in a highly selective manner. Petrenko and coworkers initiated extensive exploration of the issue using the developed landscape-type gpVIII libraries and selected high-affinity peptides binding to Salmonella typhimurium, and other bacteria including the B. antracis Sterne [27, 28]. Apart from the utility for routine hospital sanitary and food monitoring purposes, the probes are of high importance in the rapid detection of the bacteria recognized as biological weapon. Peptides Mimicking Immunodominant Epitopes of Pathogens Hepatitis. Hepatitis B was the first disease to which the phage display mimotope selection has been applied. Motti et al. [181], using murine mAb raised to HbsAg, selected from a random peptide library mimotopes and found a peptide that was recognized by serum of 30% of investigated individuals infected with the virus. This was an indication that the epitope mimicked by this peptide was immunodominant. Folgori et al [12] developed a multi-step procedure of biopanning on serum of Hepatitis B patients and demonstrated the possibility of selecting disease-specific mimotopes able to recognize HbsAg. Based on this possibility, the authors propose mimotope selection as a general strategy in studies of diseases. The procedure described in this and in subsequent publication from this group [182] includes a series of counter–screening with positive and negative sera for depleting Ig from disease-irrelevant Ab species and from diseaserelated but weak binders, so as to have finally a selected library containing tightly binding peptides recognized by sera of around 60% of tested patients.
52
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
Epitopes of hepatitis C were identified by phage display using mAbs based on the sequence similarity with the proteins shown by selected mimotopes [183]. Using sera from hepatitis C virus (HCV)-infected patients and non-infected subjects to screen random peptide libraries displayed on phage, the authors selected peptides specifically reacting with sera from infected patients; this indicated that the peptides mimicked distinct HCV determinants and can serve as diagnostic markers. In addition, these phage-displayed HCV mimics were able to induce specific response against HCV when used as immunogens in mice. These results support the search for HCV mimics with the potential to elicit a protective immune response as leads for the development of a mimotope-based vaccine against viral infection. Human Immunodeficiency virus type 1 (HIV-1). Soon after the mimotope selection approach became available it was applied to HIV-1 epitopes. Fabs to gp120 [52,53] and gp41 [184] were isolated and used for mapping their epitopes on viral proteins and for purposes of neutralization. Fab “b12” that binds the site on gp120 recognized by CD4 with picomolar affinity is of high therapeutic value because of the sensitivity to neutralization of the gp120-CD4 interaction. Peptides selected from libraries by the b12 had no structural similarity to the gp120 [185]. By contrast, the mAb 447-552 elicited by the principal neutralizing determinant located on the apex of gp120 V3 loop efficiently selected epitope-similar GPGR motif [186]. Another broadly neutralizing antibody, IgG 2F5, recognizing the ELDKWAS sequence on the gp41 transmembrane domain, has also been used and retrieved from libraries epitope-similar sequences [33,187]. Sera of individuals infected with HIV-1 were used for peptide selection. Two types of sequences were found in the selected peptides: one showed no structural similarity to any of HIV-1 antigens but recognized sera of many infected individuals suggesting that they mimic conformational epitopes [188]. The second type of selected mimotopes has a similarity to the immunodominant and conserved loop-epitope on gp41 [189, 190]. The antibody elicited by this epitope is unable to neutralize HIV-1 and even can enhance the viral infectivity [191]. Respiratory sincitia virus. A neutralizing and protective mAb19 recognizing conformation-dependent epitope on the fusion protein of this virus selected peptides reactive with the mAb; the peptides were subjected to amino acid substitutions, which resulted in a higher affinity (4,93x10 9 for one of them) as determined by surface plasmon resonance method. The authors noted that Tyr to Ser substitution in one peptide was responsible for the enhanced affinity. Antibodies induced in mice by these mimotopes competed with the mAb19 for binding to the virus and were able to neutralize the virus [192]. Lyme disease. Peptides for the diagnosis of the disease caused by spirochete Borrelia burgdorferi, (about 15,000 infections in the USA each year) require for its control an early diagnosis. Kouzmitcheva et al. [193] panned phage libraries on sera from patients and chose in the recovered population 17 peptides with a diagnostically useful binding pattern taking as a high stringent criterion the ability to recognize at least three positive sera and no reactivity with any of negative sera. Despite the apparent relevance to the infection these peptides show no sequence similarity to the pathogen, B. burgdorferi. Peptides selected in this work represent highly sensitive reagents for the diagnosis of the Lyme disease. Multi-cellular parasites. The cestode warm Taenia solium infects pigs and humans and homes in brain of a large contingent of population in South regions causing severe
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 53
neurological pathology, neurocysticercosis, which is also the etiological cause of epilepsia in these regions. Although the parasite is a multicellular organism, particular antibodies to its antigens dominate in serum and especially in cerebrospinal fluid. Using the fluid several families of peptides with consensus motifs were isolated [194,195]. Schistomiasis is widely distributed disease caused by Schistsoma mansoni. Peptides isolated by a highly protective mAb to S mansoni antigens, when conjugated to BSA and used for immunization of mice, induced complement-mediated anti-parasite effect that led to 40% reduction of the parasite burden [196]. These mimotopes are considered candidates for generation of synthetic anti-parasite vaccine. Autoimmune Diseases Rheumatoid Arthritis Dybwad et al., [197] screened a gpVIII library with Ig purified from individuals with RA and described peptides that were able to recognize sera from other individuals with the RA better than control sera. There was a peptide carrying GGA resembling the repetitive sequence AGGGA which is recognized by autoantigens (collagen, cytokeratines). In the following publication [198], the group reported on data showing that sera of patients with rheumatoid arthritis containing antibodies against type II collagen recognize mimotopes selected by a mAb to the protein. It was suggested that an epitope in the type II collagen induced this mAb and also the antibody in patients i.e. might be related to the disease. Mimotopes were also selected by means of TNF-specific autoantibodies [199]. Multiple Sclerosis Cortese et al. [200] reported on the selection of peptides by the screening with cerebrospinal fluids of patients with multiple sclerosis. Although the peptides could not distinguish between sera of patients and healthy individuals, the lack of the specificity is not an indication against the conclusion of the authors on the disease relevance of selected sequences because the antibodies related to the disease are predominantly produced locally and those in the sera could be irrelevant antibodies of broad specificity. Diabetes Mellitus Two disease-related peptides were selected from a gpVIII library upon screening with Ig purified from patient sera, which recognized sera of 20-26% of patients with insulin dependent diabetes mellitis and only less than 5% of normal serum IgG [201]. Trombocytopenia Purpura Using affinity purified autoantibodies to platelets to screen a gpIII-type library; mimotopes were selected for two epitopes on autoantigen GPIIb/IIIa. One of the epitopes is linear and map to the cytoplasmatic tail of GPIIIA. The second epitope was conformational and could not be mapped [202]. Antiphospholipid Syndrome (APS) APS leads to recurrent fetal loss, thromboembolic phenomena and thrombocytopenia. By selection from a hexapeptide library of peptides and their lengthening to correspond with the site of the β2GPI involved in the autoimmune syndrome, Blank et al. [203] described the peptides NTLKTPRVGGC and KDKATFGCHDGC that bound
54
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
to ILA-3 mAb, and the peptide CATLRVYKGG mimicking the epitopes recognized by the mAbs to β2GPI and to ILA-1, ILA-1 3 and H-3, respectively. The peptides were able to inhibit the biological functions of these anti-β2GPI mAbs and prevented mice from developing experimental APS. The Epitope-Mimicking Potentials of Peptides Selected from Combinatorial Libraries Antigenic Mimicry Antibody epitopes represent protein sites that elicit antibody and cytotoxic responses in higher vertebrate organisms, which are destined for recognition and neutralization of pathogens but sometimes can also be induced by self proteins under pathological circumstances. The peptides selected by these induced molecules, antibodies or receptors on the surface of cells, are sources of information on their potentials and limits of structural and functional mimicry of epitopes. The epitope-paratope interface resolved by X-ray crystallography shows that 14 to 20 or some more residues are involved in the interaction [204, 205]. Short peptides are capable of reproducing the antigenic specificity of epitopes because a small group of the interface residues is critical for the binding and the remaining only contribute to the affinity of the interaction. The small group of the critical residues is known as “mini-epitope” [206] and can be represented by various protein regions, continuous stretches, loops, alpha-helices, turns and sometimes more complex three-dimensional conformations [207,208]. These latters represent conformational epitopes whose structure peptides are unable to present [209] except for some epitopes containing short contiguous stretches. The majority of the epitopes has conformational structure. They are usually composed of amino acids from different parts of the polypeptide that are brought together by its folding. In most selections, peptides bound and retrieved by antibodies to these regions from libraries lack amino acid similarity to the epitope, even when they are able to bind the antibody and compete with the epitope for the binding [210,211]. Peptides with these characteristics have been first described and termed mimotopes by Geysen et al. [212]. Mimotopes are peptides able to mimic the specificity of the epitope using other amino acid combination than is in the epitope. Identification of the protein sites where conformational epitopes are located based on sequence of peptides selected by antibodies is generally beyond the capacity of peptides. Mimotopes can sometimes contain individual discontinuous amino acids with a similarity to the epitope. If the 3D structure of the protein is known and the amino acids can be replaced to test their importance for binding, the epitope can be identified [13]. Mimotopes selected by anti-carbohydrate antibodies are perhaps the best in presenting the mimicry without structural similarity [213]. As carbohydrates are poor immunogens and only can induce strong T-cell specific responses when are coupled to carriers, mimotopes selected from phage libraries by their induced antibodies are of interest because they can be used for inducing anti-carbohydrate responses [214]. Peptides selected by some antibodies reflect only a part of its epitope. Phalipon et al. [215] isolated peptides recognized by a mAb to Shigella flexneri and concluded that the antibody selected mimotopes representing only part of the complex epitope. Neutralizing antibody BCF2 to the scorpion toxin Cn2 selected three motifs each presumed to mimic a part of this complex epitope [35].
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 55
Thus, peptides selected from random peptide libraries displayed on phage include both sequences that show similarity and no similarity to the epitope sequence. An example of the perfect epitope mimicry by peptides is shown in Fig. (3).
Arnaud, Gazarian, Palacios Rodríguez, Gazarian and Sakanyan (2004), modified.
Fig. (3). Peptides mimicking the HIV-1 gp41 immunodominat epitope that were selected by serum of HIV-1-infected patients from a 12mer gpIII non-constrained library ( = marks disulfide bridge). The upper part of the scheme in the Fig. (3) presents the sequence from the HIV-1 gp41 region (“cluster 1”) where locates the immunodominant loop-epitope. Below are 20 sequences selected from a 12mer non-constrained library by serum antibody of two patients with AIDS progression. The antibody exclusively selected in the library rare peptides that contained two cysteines embodying five epitope positions residues, of which one, lysine, is conserved at the same position and the remaining are variable. Immunogenic Mimicry In the first experiments with phage-displayed peptides it has been shown that these peptides are immunogenic [9, 10, 216], furthermore, the experiment indicated that the
56
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
display on phage improves the peptide immunogenicity [9]. Apart from the better presentation to immune system, phage-displayed immunogens do not require adjuvant [217] for efficient immunization and have been considered vaccine candidates [20]. Keller and coworkers [186] immunized rabbits with mimotope sequence of the HIV-1 V3 loop principal neutralizing epitope in form of synthetic peptide and observed a strong anti-peptide response able to neutralize the virus in vitro . Chen et al. [218] immunized rhesus macaques with mimotopes selected by serum of a Long-Term Non-Progressor. The mimotopes were immunogenic and induced HIV-1- envelope-specific antibody responses. Protective responses against the Human Respiratory Syncitial Virus were observed upon immunization with mimotopes followed by the challenge with the virus [192]. Nevertheless, until now, mimotopes failed to demonstrate that they can protect against pathogens, especially against such diversifying pathogens as RNA viruses. The principal deficiency of mimotopes selected in phage libraries is that although they are able to induce antibodies, which recognize the original epitope on the antigen, these antibodies do not possess the ability of neutralizing the pathogen. In the recent purposeful work by Leslie Matthews, Robert Davies and George Smith [219] the issue has been a subject of special experimental trial and led to the conclusion that peptides selected from libraries of random sequences lack what is defined as “immunogenic fitness” as contrast to peptides selected from natural antigen fragment libraries. The question that stands at present is whether or not after the application of the two-phase selection strategy and of all possible engineering strategies the mimotopes selected from random libraries by anti-pathogen antibodies can acquire the fitness to become real vaccine candidates? PROSPECTS Practical Significance of Peptides and Proteins Generated Using Combinatorial Approaches Fifteen years of exploration of the high throughput screening methodology have demonstrated its outstanding possibilities in many fields of the biotechnology and biomedicine. In each of the applications, the experiments on protein engineering have shown results that either cannot be obtained by other methods or are obtained more easily. The results selectively presented in this review show that starting from a suitable protein domain, the use of combinatorial approaches coupled with a powerful selection strategy permits to obtain novel substances capable of binding a desired target molecule. By the involvement of structural data from X-ray crystallography, NMR or computer modeling one can find residues of the functional sites for variegation, generate libraries of variants maintaining the overall structure of the original protein, and select variants with desired binding properties. Highly diverse scaffolds with different binding and structural characteristics (a-helical, beta-sheet, turn, loops, 3D folds stabilized or not with cysteine bridges) have been utilized. A major interest in the protein engineering is generating minimal versions of ligands by using two strategies: “minimization” through elimination of parts non-involved in the binding function and selection from combinatorial libraries of peptides recognized by receptors of these ligands. The success in the selection and affinity evolution of short peptide-mimics of the binding function of these ligands is demonstrated by their three
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 57
properties: specificity (and affinity) of the binding to the receptor, ability to dimerize it and the ability to induce ligand-specific biological effect, the erythropoiesis in case of the EPO, Fig. (4). A large spectrum of peptides with antagonistic and agonistic properties specific to biologically active proteins involved in the physiological and pathological processes has been selected by the screening of phage libraries. The principal property of these peptides is the binding specificity, and the ability to produce biological effects similar to their natural counterparts or, sometimes, exceeding them in strength. Indeed, some of these peptides surpass their natural parental ligands in the binding affinity.
Fig. (4). Mimetic peptide. The dimeric peptide with hair-pin-like strcuture (blue) flanked by two molecules of the extracellular domain of the EPO receptor. F93, F205, M150 are hydrophobic contact residues [230].
In most experiments each of these proteins has been only probed once just to test the potential of the engineering procedures. For practical use of the obtained binders in potential applications, features such as proteolytic stability, folding efficiency, productivity, and solubility are of great importance, and these should be considered in the choice of starting scaffold. Some scaffolds represented by large families of homologous proteins of similar structure are of interest in future experiments, such as scorpion toxins [220]. Scorpion toxins, small (29-76aa), compact molecules, evolutionarily developed to bind receptors via their active sites on gates of membrane channels and block conductivity, are envisaged to present excellent scaffolds stabilized by three or four cysteine cross-links, containing loop, turn, alpha-helical and beta-sheet
58
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
elements and overall similar 3D structure known for some of them from X-ray and NMR studies [221]. They are perspective scaffolds for designing and fabrication of ligands with new specificities not requiring “minimization” (already done by the nature). Besides, engineered molecules generated from these and other scaffolds can be genetically fused to other domains of desired regulatory function (with sites for interaction with effectors, inductors, enzymes, anticancer agents) to yield novel multifunctional reagents of great potential in various biomedical applications. Besides, the display phage method has contributed to the high throughput selection and amplification of these engineered protein derivatives that exhibit greatly increased affinity for a pre-determined targets. Despite these properties, the minimized proteins and peptides selected from combinatorial libraries do not make themselves good drugs. Actual and future collections of peptide mimics represent primary raw material for further refinement of their structure and activity, taking their exclusive advantages such as small size, known structure suitable for engineering for rational design of drug candidates and converting to peptidomimetics via elongation, modification of the peptide backbone, substitution of amino acids by analogues or chemical mimics, addition of D-amino acids to stabilize against proteases or inclusion of carbohydrate moieties and conformational constraints (through cyclization, etc) to enhance the immunogenicity. The main requirement in the modifications of many proteins is to maintain the initially determined specificity to the target [222]. A serious problem that can arise with minimized ligands is that, although small size offers many benefits for these engineered drugs concerning their clinical application (permitting oral route of administration, better stabilization, immunological safety, tissue penetrating ability), they will also be devoid of the evolutionarily acquired ability of interaction with the co-factors and the ability to be integrally involved in complex physiological mechanisms. In the multi-component processes equilibrated by various factors, the effects of these foreign participants may shift this equilibrium unless their dosage, time-coarse of the action is regulated. One problem is the unnecessary interactions with surfaces and molecules. The majority of the surface of a protein is protected from “undesirable” interactions with other proteins or substrates within a cell where they exist in high concentrations, compartmentalized and protected by special chaperons. For example, the active site of an enzyme is characterized by a large, deep shape to establish favorable interactions with the substrate [223]. The clefts in active sites contain solvent-exposed hydrophobic residues that order water molecules [224], which are displaced during substrate or ligand binding. Although some of the peptides can be supplied with these properties, other formats of their implication in drug discovery are being considered. One of them is to bind the peptide to protein carriers developed specially to meet all these requirements. Besides, filamentous phage, displaying the ligand peptide, is not only a source of this peptide but, since the peptides are surface-displayed and reactive, the phage can be used [225] as a new giant macromolecular “carrier” where the peptide is its active site among many other active sites of the phage. Further engineering of the phage can convert it into a multi-functional macromolecule. Due to the multivalent potential of the phage coat proteins, phage can contain additional engineered sites for its programmed successful odyssey within the organism, permitting penetration into the target cell in pre-determined site of the organism directed by sequences recognized by internalizing receptors, factors of the intracellular translocations in different compartments, including the organelles, mitochondria, nuclei or even nucleoli. These factors are well known for many such
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 59
intracellular movements: specific internalizing scFv [167,168] or RGD-carrying peptides [164]. Such peptides exist in random libraries and when a library is panned on live cells they are recognized and internalize the phage [170]. Another application not requiring these special characteristics of peptides is their use as probes in proteomic and genomic studies. The further progress in cloning and expression of human genes requires for their characterization in high throughput DNA and protein array techniques an arsenal of specific probes with binding function of known specificity. Peptides selected from libraries by natural binders, antibodies, receptors, can be such probes. The use of the peptides selected for known proteins and the further selection of new peptides to unknown protein binding sites can be exclusively effective in functional characterization of these proteins, definition of their functional profiles [226,227]. This post-selection work with collections of peptides as leads for designing and developing drugs is a separate chapter in medical biotechnology that is beyond the scope of this review but is to be clearly seen for the predetermined selections of peptides to facilitate their future conversion to drugs, which, apart from their own use as material, can include their use as lead structures for the development of small non-peptidyl molecules that mimic or block specific ligand–receptor interactions [228]. By using peptides and various organic compounds competing with them for the binding sites, new small non-protein carriers of their specificity but with better drug characteristics can be produced. A recently reported example from the use of this strategy is the identification of a small nonpeptidyl fungal metabolite that exerted an anti-diabetic activity in mice, mediated specifically via the insulin receptor, as reported by Zhang et al. [229]. Combinatorial Processes in the Nature and in Laboratory Tube In his commentary on the work from the Affimax and the Scripps Institutes that resulted in the generation of the EPO peptide substitute, James Wells noted: “Deciphering the “rules” for how proteins bind to small molecules or other proteins is largely an empirical science” [230, p.449]. The goal of an empirical science is to describe the natural processes and to learn how to use them. Hence, technical progress is the major indicator of scientific advances in biology, which in concentrated form is demonstrated by this and similar successful generation of peptides capable of reproducing the effect of the hormone that has been created during millions of years by the nature. A generalization from the results in the area of the combinatorial manipulation of bio-molecules is that the principles and methods of molecular evolution that occurred in the nature are becoming realized and reproduced in laboratory. Once this is done, the evolution in the tube can exceed in the rates of the natural molecular evolution. This is because in nature the evolving systems are species of organisms whose molecules are only elements, whereas the within-tube evolution of the same molecules is an independent system able to realize all its potential, one of the most outstanding aspects of which is the power of combinatorial rearrangements of genes and their expressed proteins. From the beginning of the biological evolution, this potential has been the driving force of the adaptation of species. Its simplified manifestation may be seen in generation of diversity via mutations coupled with the unequal recombination mechanism to give homologous genes and proteins. In bacteria homologous proteins do not form within-cell families, each cell contains typically one protein, whereas multi-
60
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
cellular organisms generated families of homologous proteins within each individual. The multitude of the homologous proteins in an individual forms the arsenal accomplishing physiological functions in different conditions. In higher organisms whose complex ontogenesis and diversified tissues require many protein variants for functioning at different developmental stages and in differentiated tissues the number of homologous proteins in the families have grown much more. The immune system has especially been advanced in this respect because its defense function required more homologous proteins. Genes of non-immune systems encode the relatively limited number of proteins with predetermined structure. In invertebrates, the genes of immune system also encode fixed number of proteins but subsequent evolution of vertebrates generated repertoires of precursors for engineering genes whose final structure and the structure of the protein they will encode is formed at demand via developed combinatorial shuffling mechanism. Recombination of genes in the sexual reproduction is a representation of the combinatorial mechanism but is limited to a low number of genes and, most importantly, very rarely and slowly creates new proteins. The somatic gene recombination developed in the immune system is the natural mechanism, which generates new genes. After the genetic engineering in tubes has become possible and at the first stage reproduced the individual gene recombination similar to that in sexual reproduction, it would be logical to expect that the next step would be to reproduce the natural combinatorial mechanism: this means not simply to recombine the existing genes but to generate new genes from the large repertoires of precursor sequences either of the same as in the nature [231] or even higher diversity. The next point, which is an integral part of the combinatorial principle in molecular biology, is the notion and the concept of mimicry. This phenomenon also came from the natural processes generating homologous proteins. Every protein evolutionarily destined to accomplish a function has a multitude of other proteins, homologous and analogous, that are its mimics i.e. able to do the same as it does albeit with a difference. The natural phenomenon of mimicry has been first reproduced in tube by Geysen [212], who applied the term in the biological practice. Geysen worked with the structures able to substitute antigenic determinants, epitopes, and the term that he proposed is mimotope. The most remarkable property of the mimicry is that it designates inter-protein relationships not requiring amino-acid-level similarity. It is now well recognized that this type of relationship is a rule, whereas the mimicry with structural resemblance is seldom. This is amply demonstrated by the peptide substitutes isolated so far. The phenomenon of mimicry in the nature and in laboratory experiment is a universe of structures from which using high throughput screening procedures one can draw substitutes of biomolecules with drug potential. ACKNOWLEDGEMENTS The author gratefully acknowledges grants from Dirección General de Asuntos del Personal Académico, Mexican National Autonomous University, Mexico (# IN210402); and Consejo Nacional de Ciencia y Tecnología, Mexico (#251166N). The author is grateful to Valery Petrenko for discussions on the subjects of the article, and to R. Hernández and G. Gazarian for assistance in the preparation of the manuscript.
Drug Discovery and Design
Frontiers in Drug Design & Discovery, 2005, Vol. 1 61
ABBREVIATIONS mAb
=
monoclonal antibody
EPO
=
Erythropoietin
EBP
=
Erythropoietin-binding protein
HIV-1
=
Human Immunodeficiency Virus Type 1
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]
[25] [26] [27] [28] [29] [30]
Smith, G.P. Science, 1985, 228, 1315-1317. Parmley, S.F.; Smith, G.P. Gene, 1988, 73, 305-318. Scott, J.K.; Smith, G.P. Science, 1990, 249, 386-390. Cwirla, S.E.; Peters, E.A.; Barrett, R.W.; Dower, W.J. Proc. Natl. Acad. Sci. USA, 1990, 87, 63786382. Barbas, C.F. III.; Lerner, R.A. Methods. Companion Methods Ezymol., 1991, 2,119-124. Hoogenboom, H.R.; Griffits, A.D.; Johnson, K.S.; Chiswell, D.J.; Hudson, P.; Winter, G. Nucl. Acid Res. 1991, 19, 4133-4137. Cunningham, B.C.; Wells, J.A. Proc. Na.t Acad. Sci. USA, 1991, 88, 3407-3411. Devlin, J.J.; Panganiban, L.C.; Devlin, P.E. Science, 1990, 249, 404-406. de la Cruz, V.F.; Lal, A.A.; McCutchan, T.F.; J. Biol. Chem., 1988, 263, 4318- 4322. Ilyichev, A.A.; Minenkova, O.O.; Talkov, S.I.; Karpyshev, N.N.; Eroshkin, A.M.; Petrenko, V.A.; Sandakhchiev, L.S. Proceed. Nat. Acad. USSR (English transl), 1989, 307, 196-198. Felici, F.; Castagnoli, L.; Musacchio, A.; Jappelli, R.; Cesareni, G. J. Mol. Biol., 1991, 222, 301-310. Folgori, A.; Tafi, R.; Meola, A.; Felici, F.; Galfre, G.; Cortese, R.; Monaci, P.; Nicosia, A. EMBO J., 1994, 13, 2236-2243. Luzzago, A.; Felici, F.; Tramontano, A.; Pessi, A.; Cortese, R. Gene, 1993, 128, 51-57. Cunningham, B.C.; Wells, J.A. Science, 1989, 244, 1081-1085. Wells, J.A.; Lowman, H.B. Curr. Opin. Biotechnol., 1992, 3,355-362. Li, B.; Tom, J.Y.K.; Oare, D.; Yen, R.; Fairbrother, W.J.; Wells, J.A.; Cunningham, B.C. Science, 1995, 270,1657-1660. Wrighton, N.C.; Farrell, F.X.; Chang, R.; Kashyap, A.K.; Barbone, F.P.; Mulcahy, L.S.; Johnson, D.L.; Barrett, R.W.; Jolliffe, L.K.; Dower, W.J. Science, 1996, 273, 458-464. Cwirla, S.E.; Balasubramanian, P.; Duffin, D.J.; Wagstrom, C.R.; Gates, C.M.; Singer, S.C.; Davies, A.M.; Tansic, R.L.; Mattheakis, L.C.; Boytos, C.M.; Schatz, P.J.; Baccanari, D.P.; Whrighton, N.C.; Barrett, R.W.; Dower, W.J. Science, 1997, 276, 1696-1699. Keller, P.M.; Arnold, B.A.; Shaw, Alan. R.; Tolman, R.L.; Middlesworth, F.V.; Bondy, S.; Rusiecki, V.K.; Koenig, S.; Zolla-Pazner, S.; Conard, P.; Emini, E.A.; Conley, A.J. Virology, 1993, 193, 709716. Meola, A.; Delmastro, P.; Monaci, P.; Luzzago, A.; Nicosia, A.; Felici, F.; Cortese, R.; Galfré, G. J. Immunol., 1995, 154, 3162-3172. Partidos, C.D.; Steward, M.W. Comb. Chem. High Throughp. Screen., 2002, 5, 15-27. Wilson, D.R.; Finlay, B.B. Canad. J. Microbiol., 1998, 44, 313-329. Fong, S.; Doyle, L.; Devlin, J.; Doyle, M. Drug Develop. Res. 1994, 33, 64-70. Arap, W.; Kolonin, M.G.; Trepel, M.; Lahdenranta, J.; Cardo-Vila, M.; Giordano, R.J.; Mintz, P.J.; Ardelt, P.U.; Yao, V.J.; Vidal, C.I.; Chen, L.; Flamm, A.; Valtanen, H.; Weavind, L.M.; Hicks, M.E.; Pollock, R.E.; Botz, G.H.; Bucana, C.D.; Koivunen, E.; Cahil,l D.; Troncoso, P.; Baggerly, K.A.; Pentz, R.D.; Do, K.A.; Logothetis, C.J.; Pasqualini, R. Nat. Med., 2002, 8, 121-127. Petrenko, V.A.; Smith, G.P.; Gong, X.; Quinn, T. Prot. Eng., 1996, 9, 797-801. Petrenko, V.A.; Smith, G.P. Prot. Eng., 2000, 13, 589-592. Brigati, J.; Williams, D.D.; Sorokulova, I.B.; Nanduri, V.; Chen, I-H.; Turnbough, Jr. C.L.; Petrenko, V.A. Clin. Chem., 2004, 50, 1899-1906. Petrenko, V.A.; Vodjanoy, V.J. J. Microbiol. Meth., 2003, 53,253-262. Smith, G.P.; Petrenko, V.A. Chem Rev., 1997, 97, 391-410. Barbas, C.F. III.; Burton, D.R.; Scott, J.K.; Silverman, G.J. Phage Display. A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 2001.
62 [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68]
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Karlen Gazarian
Marks, J.D.; Hoogenboom, H.R.; Bonnert, T.P.; McCafferty, J.; Griffiths, A.D.; Winter, G. J. Mol. Biol., 1991, 222, 581-597. O’Neil, K.T.; Hoess, R.H.; Jackson, S.A.; Ramachandran, N.S.; Mousa, S.A.; De Grado, W.F. Proteins, 1992, 14, 509-515. Menendez, A.; Chow, K.S.; Pan, O.C.C.; Scott, J.K. J. Mol. Biol., 2004, 338, 311-327. Koivunen, E.; Wang, B.; Ruoslahti, E. Biotechnology (NY), 1995, 13, 265-70. Gazarian, T.; Selisko, B.; Gurrola, G.; Hernández, R.; Possani, L.; Gazarian, K. Comb. Chem. High Throup. Screen., 2003, 6, 119-132. Yu, J.; Smith G.P. Methods Enzymol., 1996, 267, 3-27. Petrenko, V.A.; Smith, G.P.; Mazooji, M.M.; Quinn, T. Prot. Eng., 2002,15, 943-950. Tuerk, C.; Gold, L. Science, 1990, 249, 505-510. Ellington, A.D.; Szostak, J.W. Nature, 1990, 346, 818-822. Szostak, J.W. Harvey Lectures, 1999, 93, 95-118. Tuerk, C.; MacDougal-Waugh. Gene, 1993, 137, 33-39. Tuerk, C.; MacDougal, S.; Gold, L. Proc. Natl. Acad. Sci. USA, 1992, 89, 6988-6992. Bock, L.C.; Griffin, L.C.; Latham, J.A.; Vermass, E.H.; Toole, J.J. Nature (Lond) , 1992, 355, 564566. Kubik, M.F.; Stephens, A.W.; Schneider, D.; Marlar, R.A.; Tasset, D. NAR, 1994, 22, 2619-2626. Davis, J.P.; Janjic, N.; Javornik, B.E.; Zichi, D.A. Methods Enzymol., 1996, 267, 302-314. Kay, B.K.; Adey, N.B.; Stemmer, W.P.C. In: Phage Display of peptides and proteins. A Laboratory Manual (Kay B.K; Winter J; McCarthy J, Eds) Academ. Press, San Diego, 1996. Rodi, D.J.; Makowski, L. Curr. Opin. Biotechnol., 1999, 10, 87-93. Cortese, R.; Monaci, P.; Luzzago, A.; Santini, C.; Bartoli, F.; Cortese, I.; Fortugno, P.; Galfré, G. Curr. Opin. Biotechnol., 1995, 6, 73-80. Gorny, M.K.; Gianakakos, V.; Shrape, S.; Zolla-Pazner, S. Proc. Natl. Acad. Sci. USA, 1989, 86, 1624-1628. Buchacher, A.; Predl, R.; Strutzenberger, K.; Steinfellner, W.; Trkola, A.; Purtscher, M.; Gruber, G.; Tauer, C.; Steindl, F.; Jungbauer, A.; Katinger, H. AIDS Res. Hum. Ret., 1994, 10, 359-369. Cohen, P.A.; Mani, J.C.; Lane, D.P. Oncogene, 1998, 17, 2445-2456. Barbas, C.F. III.; Collet, T.A.; Amberg, W.; Roben, P.; Binley, J.M.; Hoekstra, D.; Cababa, D.; Jones, T.M.; Williamson, R.A.; Pilkington, G.R. J. Mol. Biol., 1993, 230, 812-823. Barbas, CF III.; Hu, D.; Dunlop, N.; Sawyer, L.; Cababa, D.; Hendry, R.M.; Nara, P.L.; Burton, D.R. Proc. Natl. Acad. Sci. USA, 1994, 91, 3809-3813. Yang, W.-P.; Green, K.; Pinz-Sweeney, S.; Briones, A.T.; Burton, D.R.; Barbas, C.F. III. J. Mol. Biol., 1995, 254, 392-403. Schier, R.; McCall, A.; Adams, G.P.; Marshal, K.W.; Merrit, H.; Yim, M.; Crawford, R.S.; Weiner, L.M.; Marks, C.; Marks, J.D. J. Mol. Biol., 1996, 263, 551-567. Cohen, P.A.; Laune, D.P.; Teulon. I.; Combes, T.; Pugniére, M.; Badouaille, G.; Granier, C.; Mani, J.C.; Simon, D. J. Immunol. Methods, 2001, 254, 147-160. Laune, D.; Molina, F.; Ferrieres, G.; Mani, J.C.; Cohen, P.; Simon, D.; Bernardi, T.; Piechaczyk, M.; Pau, B.; Granier, C. J. Biol. Chem., 1997, 272, 30937-30944. Laune, D.; Pau, B.; Granier, C. Clin. Chem. Lab. Med., 1998, 36, 367-371. Monnet, C.; Laune, D.; Laroche-Traineau, J.; Biard-Piechaczyk, M.; Briant, L.; Bes, C.; Pugniere, M.; Mani, J.C.; Pau, B.; Cerutti, M.; Devauchelle, G.; Devaux, C.; Granier, C.; Chardes, T. J. Biol. Chem., 1999, 274, 3789-3796. Bes, C.; Briant-Longuet, L.; Cerutti, M.; Heitz, F.; Troadec, S.; Pugniere, M.; Roquet, F.; Molina, F.; Casset, F.; Bresson, D.; Peraldi-Roux, S.; Devauchelle, G.; Devaux, C.; Granier, C.; Chardes, T. J. Biol. Chem., 2003, 278, 14265-14273. Ofec, G.; Tang, M.; Sambo,r A.; Katinger, H.; Mascola, J.R.; Wyatt, R.; Kwong, P.D. J. Virol., 2004, 78, 10724-10737. Silverman G.J. In Phage Display. A Laboratory Manual; Barbas, Burton, Scott, Silverman, Eds.; Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 2001, pp. 5.1-5.24. Cunningham, B.C.; Jhurani. P.; Ng. P.; Wells, J.A. Science, 1989, 243, 1330-1336. Lowman, H.B.; Bass, S.H.; Simpson, N.; Wells, J.A. Biochemistry, 1991, 30, 10832-10838. Cunningham, B.C.; Lowe, D.G.; Li, B.; Bennett, B.D.; Wells, J.A. EMBO J., 1994, 13, 2508-2515. Schier, R.; Bye, J.; Apell, G.; McCall, A.; Adams, G.P.; Malmqvist, M.; Weiner, L.M.; Marks, J.D. J. Mol. Biol., 1996, 255, 28-43. Schatz, P.J. Biotechnology (NY), 1993, 11, 1138-1143. Ohkubo, S.; Miyadera, K.; Sugimoto, Y. Comb. Chem. High Throughput Screen, 2001, 4, 573-583.
Drug Discovery and Design [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 63
Wu, J.; Ma, Q.N.; Lam, K.S. Biochemistry, 1994, 33, 14825-14833. McBride, J.D.; Freeman, N.; Domingo, G.J.; Leatherbarrow, R.J. J. Mol. Biol., 1996, 259, 819-827. Lu, Z.; Murray, K.S.; Van Cleave, V.; LaVallie, E.R.; Stahl, M.L.; McCoy, J.M. Biotechnology (NY), 1995, 13, 366-372. Desmyter, A.; Transue, T.R.; Ghahroudi, M.A.; Thi, M.H.; Poortmans, F.; Hamers, R.; Muyldermans, S.; Wyns, L. Nat. Struct. Biol., 1996, 3, 803-811. Tramontano, A.; Bianchi, E.; Venturini, S.; Martin, F.; Pessi, A.; Sollazzo, M. J. Mol. Recognit., 1994, 7, 9-24. Martin, F.; Toniatti, C.; Salvatti, A.L.; Ciliberto, G.; Cortese, R.; Sollazzo, M. J. Mol. Biol., 1996, 255, 86-97. McConnell, S.; Hoess, R.H. J. Mol. Bio1., 1995, 250, 460-470. Starovasnic, M.A.; Braisted, A.C.; Wells, J.A. Proc. Natl. Acad. Sci. USA, 1997, 94, 10080-10085. Pellegrino, G.R.; Berg, J.M. Proc. Natl. Acad. Sci. USA, 1991, 88, 671-675. Miller, J.; McLachlan, A.D.; Klug, A. EMBO J., 1985, 4, 1609-1614. Pablovich, N.P.; Pabo, C.O. Science, 1991, 252, 809-817. Choo,Y.; Sanchez-Garcia, I.; Klug, A. Nature (Lond), 1994, 372, 642-645. Jamieson, A.C.; Kim, S.H.; Wells, J.A. Biochemistry, 1994, 33, 5689-5695. Rebar, E.J.; Pabo, C.O. Science, 1994, 263, 671-673. Wu, H.; Yang, W.P.; Barbas, C.F. III. Proc. Natl. Acad. Sci. USA, 1995, 92, 344-348. Beerli, R.R.; Dreier, R.; Barbas, C.F. III. Proc. Natl. Acad. Sci. USA, 2000, 97, 1495-1500. Segal, D.J.; Dreyer, B.; Barbas. C.F. III. Proc. Natl. Acad. Sci., 1999, 96, 2758-2763. Segal, D.J.; Barbas, C.F. III. Curr. Opin. Chem. Biol., 2000, 4, 34-39. Bianchi, E.; Folgori, A.; Wallace, A.; Nicotra, M.; Acali, S.; Phalipon, A.; Barbato, G.; Bazzo, R.; Cortese, R.; Felici, F.; Pessi, A. J. Mol. Biol., 1995, 247, 154-160. Struthers, M.D.; Cheng, R.P.; Impreiali, B. J. Am. Chem. Soc., 1996, 118, 3073-3081. Brunet, A.P.; Huang, E.S.; Huffine, M.E.; Joeb, J.E.; Weltman, R.J.; Hecht, M.H. Nature, 1993, 364, 355-358. Houston, M.E. Jr.; Wallace, A.; Bianchi, E.; Pessi, A.; Hodges, R.S. J. Mol. Biol., 1996, 262, 270-282. Miceli, R.; Myszka, D.; Mao, J.; Sathe, G.; Chaiken, I. Drug Des. Discov., 1996, 13, 95-105. Perez-Paya, E.; Houghten, R.A.; Blondelle, S.E. J. Biol. Chem., 1996, 271, 4120-4126. Wang, C.I.; Yang, Q.; Craik, C.S. J. Biol. Chem., 1995, 270, 12250-12256. Röttgen, P.; Collins, J. Gene, 1995, 164, 243-250. Markland, W.; Ley, A.C.; Ladner, R.C. Biochemistry, 1996, 35, 8058-8067. Wells, J.A. Annu. Rev. Biochem., 1996, 65, 609–634. Sidhu, S.S.; Lowman, H.B.; Cunningham, B.C.; Wells, J.A. Meth. Enzymol., 2000, 328, 333-363. Wells, J.A. Proc. Natl. Acad. Sci. USA, 1996, 93, 1–6. Clackson, T.; Wells, J.A. Science, 1995, 267, 383–386. De Vos, A.M.; Ultsch, M.; Kossiakoff, A.A. Science, 1992, 255, 306–312. Yanofsky, S.D.; Baldwin, D.N.; Butler, J.H.; Holden, F.R.; Jacobs, J.W.; Balasubramanian, P.; Chinn, J.P.; Cwirla, S.E.; Peters-Bhatt, E.; Whitehorn, E.A.; Tate, E.H.; Akeson, A.; Bowlin, T.L.; Dower, W.J.; Barrett, R.W. Proc. Natl. Acad. Sci. USA, 1996, 93, 7381-7386. Hage, T.; Sebald, W.; Reinemer, P. Cell, 1999, 97, 271–281. Altmann, S.W.; Kastelein, R.A. J. Biol. Chem., 1995, 270, 2233–2240. Grunewald, S.M.; Werthmann, A.; Schnarr, B.; Klein, C.E.; Brocker, E.B.; Mohrs, M.; Brombacher, F.; Sebald, W.; Duschl, A. J. Immunol., 1998, 160, 4004–4009. Renne, C.; Kallen, K.J.; Mullberg, J.; Jostock, T.; Grotzinger, J.; Rose-John, S. J. Biol Chem, 1998, 273, 27213-27219. Fischer, M.; Goldschmitt, J.; Peschel, C.; Brakenhoff, J.P.G.; Kallen, K.J.; Wollmer, A.; Grötzinger, J.; Rose-John, S. Nat. Biotechnol., 1997, 15, 142-145. Sporeno, E.; Savino, R.; Ciapponi, L.; Paonessa, G.; Cabbibo, A.; Lahm, A.; Pulkki, K.; Sun, R.X.; Toniatti, C.; Klein, B. Blood, 1996, 87, 4510-4519. Olivier, C.; Auguste, P.; Chabbert, M.; Lelièvre, E.; Chevalier, S.; Gascan, H. J. Biol. Chem., 2000, 275, 5648–5656. Ballinger, M.D.; Shyamala, V.; Forrest, L.D.; Deuter-Reinhard, M.; Doyle, L.V.; Wang, J.X.; Panganiban-Lustan, L.; Stratton, J.R.; Apell, G.; Winter, J.A.; Doyle, M.V.; Rosenberg, S.; Kavanaugh, W.M. Nat. Biotechnol., 1999, 17, 1199–1204. Altamirano, M.M.; Blackburn, J.M.; Aguayo, C.; Fersht, A.R. Nature, 2000, 403, 617–622. Stemmer, W.P. Nature, 1994, 370, 389–391. Crameri, A.; Raillard, S.A.; Bermudez, E.; Stemmer, W.P. Nature, 1998, 391, 288–291.
64
Frontiers in Drug Design & Discovery, 2005, Vol. 1
[113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152]
Karlen Gazarian
Zhao, H.; Arnold, F.H. Nucleic Acids Res., 1997, 25, 1307–1308. Chang, C.C.; Chen, T.T.; Cox, B.W.; Dawes, G.N.; Stemmer, W.P.C.; Punnonen, J.; Patten, P.A. Nat. Biotechnol., 1999, 17, 793–797. Livnah, O.; Stura, E.A.; Johnson, D.L.; Middleton, S.A.; Mulcahy, L.S.; Wrighton, N.C.; Dower, W.J.; Jolliffe, L.K.; Wilson, I.A. Science, 1996, 273, 464–471. Yayon, A.; Aviezer, D.; Safran, M.; Gross, J.L.; Heldman, Y.; Cabilly, S.; Givol, D.; KatchalskiKatzir. Proc. Natl. Acad. Sci. USA, 1996, 93, 7381-7386. Doorbar, J.; Winter, G. J. Mol. Biol., 1994, 244, 361-369. Li, M.; Yu, W.; Chen, C.H.; Cwirla, S.; Whitehorn, E.; Tate, E.; Raab, R.; Bremer, M.; Dower, B. Nat. Biotechnol., 1996, 14, 986-991. Paige, L.A.; Christensen, D.J.; Gron, H.; Norris, J.D.; Gottlin, E.B.; Padilla, K.M.; Chang, C.Y.; Ballas, L.M.; Hamilton, P.T.; McDonnel, D.P.; Fowlkes, D.M. Proc. Natl. Acad. Sci. USA, 1999, 96, 3999-4004. Tracey, K.J.; Cerami, A. Annu. Rev. Cell Biol., 1993, 9, 317-43. Banner, D.W.; D’Arcy, A.; Jones, W.; Gentz, R.; Schoenfeld, H.-J.; Broger, C. Cell, 1993, 73, 431441. Kruszynski, M.; Shealy, D.J.; Leone, A.O.; Heavner, G.A. Cytokine, 1999, 11, 37-44. Chirinos-Rojas, C.L.; Steward, M.W.; Partidos, C.D. Cytokine, 1997, 9, 226-232. Lie, B.L.; Tunemoto, D.; Hemmi, H.; Mizukami, Y.; Fukuda, H.; Kikuchi, H.; Kato, S.; Numao, N. Biochem. Biophys. Res. Commun., 1992, 188, 503-9. Giebel, L.B.; Cass, R.T.; Milligan, D.L.; Young, D.C.; Arze, R.; Johnson, C.R. Biochemistry, 1995, 34, 15430-15425. Smith, G.P.; Schultz, D.A.; Ladbury, J.E. Gene, 1993, 128, 37-42. Oldenburg, K.R.; Loganathan, D.; Goldstein, I.J.; Schultz, P.G.; Gallop, M.A. Proc. Natl. Acad. Sci. USA, 1992, 89, 5393-5397. Scott, J.K.; Loganathan, D.; Easley, R.B.; Gong, X.; Goldstein, I.L. Proc. Natl. Acad. Sci. USA, 1992, 89, 5398-5402 Roberts, B.I.; Markland, W.; Ley A.C.; Kent, R.B.; White, D.W.; Guterman, S.K.; Ladner, R.C. Proc. Natl. Acad. Sci. USA, 1992, 89, 2429-2433. Han, Y.; Kodadek, T. J. Biol. Chem., 2000, 275, 14979-14984. Dedman, J.R.; Kaetzel, M.A.; Chan, H.C.; Nelson, D.J.; Jamieson, G.A. Jr. J. Biol. Chem., 1993, 268, 23025-23030. Pierse, H.H.; Adey, N.; Kay, B.K. Mol. Divers., 1996, 1, 259-265. Adey, N.B.; Kay, B.K. Gene, 1996, 169, 133-134. Nevalainen, L.T.; Aoyama, T.; Ikura, M.; Crivici, A.; Yan, H.; Chua, N.H.; Nairn, A.C. Biochem. J., 1997, 321, 107-115. Demartis, S.; Huber, A.; Viti, F.; Lozzi, L.; Giovannoni, L.; Neri, P.; Winter, G.; Neri, D. J. Mol. Biol., 1999, 286, 617-633. Gazarian, T.; Selisko, B.; Hérion, P.; Gazarian, K. Molecular Immunology, 2000, 37, 755-766. Hernández, R.; Gazarian, T.G.; Hérion, P.S.; Gazarian. K.G. Immunol. Lett., 2002, 80, 97-103. Couet, J.; Li, S.; Okamoto, T.; Ikezu, T.; Lisanti, M.P. J. Biol Chem., 1997, 272, 6525-6533. Matthews, D.J.; Wells, J.A. Science, 1993, 260, 1113-1117. Smith, M.M.; Shi, L.; Navre, M. J. Biol. Chem., 1995, 270,6440-6449. McCafferty, J.; Jackson, R.H.; Chiswell, D.J. Prot. Eng., 1991, 4, 955-961. Siemers, N.O.; Yelton, D.E.; Bajorath, J.; Senter, P.D. Biochemistry, 1996, 35, 2104-2111. Corey, D.R.; Shiau, A.K.; Yang, Q.; Janowski, B.A.; Craik, C.S. Gene, 1993, 128, 129-134. Widersten, M.; Mannervik, B. J. Mol. Biol., 1995, 250, 115-122. Egloff, M.P.; Johnson, D.F.; Moorhead, G.; Cohen, P.T.; Cohen, P.; Barford, D. EMBO J., 1997, 16, 1876-1887. Burgen, A.S. Annu. Rev. Pharmacol. Toxicol., 2000, 40, 1-16. Kay, B.K.; Hamilton, P.T. Comb. Chem. High Throughput Screen., 2001, 4, 535-543. Pierce, H.H.; Schachat, F.; Brandt, P.W.; Lombardo, C.R.; Kay, B.K. J. Biol. Chem., 1998, 273, 23448-23453. Gho, Y.S.; Lee, J.E.; Oh, K.S.; Bae, D.G.; Chae, C-B. Cancer Res. 1997, 57, 3733-3740. Olson, K.A.; Fett, J.W.; French, T.C.; Key, M.E.; Vallee, B.L. Proc. Natl. Acad. Sci. USA, 1995, 92, 442-446. Dvorak, H.F.; Brown, L.F.; Detmar, M.; Dvorak, A.M. Am. J. Pathol., 1995, 146, 1029-1039. Folkman, J. Nature Med., 1995, 1, 27-33.
Drug Discovery and Design [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 65
Fairbrother, W.J.; Christenger, H.W.; Cochran, A.G.; Fuh, G.; Keenan, C.J.; Quan, C.; Shriver, S.K.; Tom, J.Y.; Wells, J.A.; Cunningham, B.C. Biochemistry, 1998, 37, 17754-17764. Binetruy-Tournaire, R.; Demangel, C.; Malavaud, B.; Vassy, R.; Rouyre, S.; Kraemer, M.; Plouet, J.; Derbin, C.; Perret, G.; Mazie, J.C. EMBO J., 2000, 19, 1525-1533. Bottger, V.; Bottger, A.; Howard, S.F.; Picksley, S.M.; Chene, P.; Garcia-Echeverria, C.; Hochkeppel, H.K.; Lane, D.P. Oncogene, 1996, 13, 2141-2147. Rodi, D.J.; Janes, R.W.; Sanganee, H.J.; Holton, R.A.; Wallace, B.A.; Makowski, L. J. Mol. Biol., 1999, 285, 197-203. Hynes, R.O. Cell, 1992, 69, 11-25. Koivunen, E.; Gay, D.A.; Ruoslahti, E. J. Biol. Chem., 1993, 268, 20205-20210. Healy, J.M.; Murayama, O.; Maeda, T.; Yoshino, K.; Sekiguchi, K.; Kikuchi, M. Biochemistry, 1995, 34, 3948-3955. Martens, C.L.; Cwirla, S.E.; Lee, R.Y.; Whitehorn, E.; Chen, E.Y.; Bakker, A.; Martin E.L.; Wagstrom, C.; Gopalan, P.; Smith, C.W. J. Biol. Chem., 1995, 270, 21129-21136. Barry, M.A.; Dower, W.J.; Johnston, S.A. Nat. Med., 1996, 2, 299-305. Mazzucchelli, L.; Burritt, J.B.; Jesaitis, A.J.; Nusrat, A.; Liang, T.W.; Gewirtz, A.T.; Schnell, F.J.; Parkos, C.A. Blood, 1999, 93, 1738-1748. Hart, S.L.; Knight, A.M.; Harbottle, R.P.; Mistry, A.; Hunger, H.D.; Cutler, D.F.; Williamson, R.; Coutelle, C. J. Biol. Chem., 1994, 269, 12468-12474. Ivanenkov, V.; Felici, F.; Menon, A.G. Biochim. Biophys. Acta, 1999, 1448, 450-462. Uppala, A.; Koivunen, E. Comb. Chem. High Throughput Screen, 2000, 3, 373-392. Larocca, D.; Witte, A.; Johnson, W.; Pierce, G.F.; Baird, A. Hum. Gene Ther., 1998, 9, 2393-2399. Becerril, B.; Poul, M.A.; Marks, J.D. Biochem. Biophys. Res. Commun., 1999, 255, 386-399. Poul, M.A.; Marks, J.D. J. Mol. Biol., 1999, 288, 203-211. Spear, M.A.; Breakefield, X.O.; Beltzer, J.; Schuback, D.; Weissleder R.; Pardo, F.S.; Ladner, R. Cancer Gene Ther., 2001, 8, 506-511. Samoylova, T.I.; Petrenko, V.A.; Morrison, N.E.; Globa, L.P.; Baker, H.J.; Cox, N.R. Mol. Cancer Ther., 2003, 2,1129-1137. Siegel, D.L. In Phage Display. A Laboratory Manual; Barbas, Burton, Scott, Silverman, Eds.; Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 2001, pp. 23.1-23.12. Aina, O.H.; Sroka, T.C.; Chen, M.L.; Lam, K.S. Biopolymers, 2002, 66, 184-199. Pasqualini, R.; Arap, W.; Rajotte, D.; Rouslahti, E. In Phage Display. A Laboratory Manual; Barbas, Burton, Scott, Silverman, Eds.; Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 2001, pp.221-224. Pasqualini, R.; Ruoslahti, E. Nature, 1996, 380, 364-366. Pasqualini, R.; Koivunen, E.; Ruoslahti, E. Nat. Biotechnol., 1997, 15, 542-546. Arap, W.; Pasqualini, R.; Ruoslahti, E. Science, 1998, 279, 377-380. Petrenko, V.A.; Sarakulova, I.B. J. Microbiol. Methods, 2004, 58, 147-68 Rev. Ivnitski, D.; O’Neil, D.; Gattuso, A.; Schlicht, R.; Calidonna, M.; Fisher, R. Biotechniques, 2003, 35, 862-869. Peruski, A.H.; Peruski, L.F. Jr. Clin. Diagn, Lab. Immunol., 2003, 10, 506-513. Bruno, J.G.; Yu, H. Appl. Environ. Microbiol., 1996, 62, 3474-3476. Motti, C.; Unzo, M.; Meola, A.; Galfre, G.; Felici, F.; Cortese, R.; Nicosia, A.; Monaci.; P. Gene, 1994, 46, 191-198. Felici, F.; Galfré, G.; Luzzago, A.; Monaci, P.; Nicosia, A.; Cortese, R.; Methods Enzymol., 1996, 267, 116-129. Prezzi, C.; Nuzzo, M.; Meola, A.; Delmastro, P.; Galfre, G.; Cortese, R.; Nicosia, A.; Monaci, P. J. Immunol., 1996, 156, 4504-4513. Binley, J.; Ditzel, H.; Barbas, C.F. III.; Sullivan, N.; Sodroski, J.; Parren, P.; Burton, D. AIDS Res. Hum. Retrovir., 1996, 12, 911-924. Boots, L.J.; McKenna, P.M.; Arnold, B.A.; Keller, A.P.; Gorny, M.K.; Zolla-Pazner, S.; Robinson, J.E.; Conley, A.J. AIDS Res. Hum. Retrov., 1997, 13, 1549-1559. Keller, P.M.; Arnold, B.A.; Shaw, A.R.; Tolman, R.L.; Middlesworth, F.V.; Bondy, S.; Rusiecki, V.K.; Koenig, S.; Zolla-Pazner, S.; Conard, P.; Emini, A.E.; Conley, A.J. Virology, 1993, 193, 709716. Muster, T.; Steindl, M.; Purtscher, M.; Trkola, A.; Klima, A.; Himmler, G.; Ruker, F.; Katinger, H. J. Virol., 1993, 67,6642- 6647. Scala, G.; Chen, X.; Liu, W.; Telle, J.N.; Cohen, O.; Vaccarezza, M.; Igarashi, T.; Fauci A. J. Immunol., 1999, 162, 6155-6161.
66
Frontiers in Drug Design & Discovery, 2005, Vol. 1
[189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212] [213] [214] [215] [216] [217] [218] [219] [220] [221] [222] [223] [224] [225]
Karlen Gazarian
Enshell-Seijffers, D.; Smelyanski, L.; Vardinon, N.; Yust, I.; Gershoni. FASEB J., 2001, 15, 21122120. Arnaud, M.-C.; Gazarian, T.; Palacios Rodríguez, Y.; Gazarian, K.; Sakanyan, V. Proteomics, 2004, 4, 1959-1964. Robinson, W.E.; Gorny, M.K.; Xu, J.-Y.; Mitchell, W.M.; Zolla –Pazner, S. J. Virol., 1991, 65(8), 4169-4176. Chargelegue, D.; Obeid, O.E.; Hsu, S.C.; Shaw, M.D.; Benbury, A.N.; Taylor, G.; Steward, M.W. J. Virol., 1998,72, 2040-2046. Kouzmitcheva, G.A.; Petrenko, V.A.; Smith, G.P. Clin. Diagn. Lab. Immunol., 2001, 8, 150-160. Manoutcharian, K.; Sotelo, J.; Garcia, E.; Cano, A.; Gevorkian, G. Clin. Immunol., 1999,91,117-121. Gazarian, K.; Rowley, M.; Gazarian, T.; Sotelo, J.; Garcia-Mendoza, E.; Hernández, R. Combin. Chem. High Troughput Screen, 2001, 4,165-179. Arnon, R.; Tarrab-Hazdai, R.; Steward, M. Immunology, 2000, 101, 555-562. Dybwad, A.; Bogen, B.; Natwig, J.B.; Forre, O.; Sioud, M. Clin. Exper. Immunol., 1995, 102, 438442. Dybwad, A.; Forre, O.; Natwig, J.B.; Sioud, M. Clin. Immunol. Immunopathol, 1995, 75, 45-50. Sioud, M.; Dybwad, A.; Jesperson, L.; Suleyman, S.; Forre, O. Clin. Exper. Immunol., 1994, 98, 520525. Cortese, L.; Tafi, R.; Grimaldi, L.M.E.; Martino, G.; Nicosia, A.; Cortese, R. Proc. Natl. Acad. Sci., USA, 1996, 93, 11063-11067. Mennuni, C.; Santini, C.; Lázaro, D.; Dotta, F.; Farilla, L.; Firabrazzi, A.; Bottazzo, G.; Di Mario, U.; Cortese, R.; Luzzago, A. J. Mol. Biol., 1997, 268,599-606. Bowditch, R.D.; Tani, P.; Fong, K.C.; McMillan, R. Blood, 1996, 88,4 579-4584. Blank, M.; Shoenfeld, Y.; Cabilly, S.; Heldman, Y.; Fridkin, M.; Katchalski-Katzir, E. Proc. Natl. Acad. Sci USA, 1999, 96, 5164-5168. Davies, D.R.; Cohen, G.H. Proc. Natl. Acad. Sci. USA, 1996, 93, 7-12. Van Regenmortel, M.H.V.; Choulier, L. Comb. Chem. High Throughput Screen, 2001, 4, 385-395. Tighe, P.J.; Pawell-Richards, A.; Sewell, H. F.; Fisher, D.; Donoso, L.; Dua, H.S. Exper. Eye Res., 1998, 68(6), 679-684. Cesareni, G.; Castagnoli, L.; Cestra, G. Comb. Chem. High Throughput Screen, 1999, 2, 1-17. Deroo, S.; Muller C.P. Comb. Chem. High Throughput Screen, 2001, 4, 75-110. Bonnycastle, L.L.C.; Mehroke J.S.;Rashed, M.; Gong,X.; Scott, J.K. J. Mol. Biol., 1996, 258(5), 747762. Adda, C.G.; Anders, R.F.; Tilley, L.; Foley, M. Comb. Chem. High Throughput Screen, 2002, 5, 1-14. Rudolf, M.P.; Vogel, M.; Kricek, F.; Ruf, C.; Zurcher, A.W.; Reuschel, R.; Auer, M.; Miescher, S.; Stadler, B.M. J. Immunol., 1998, 160 (7), 3315-3321. Geysen, H. M.; Rodda, S.Y.; Mason, T.J. Mol. Immunol., 1986, 23, 709-715. Valadon, P.; Nusssboum, G.;Boyd, L.F.; Margulies, D.H.; ScharffM, D. J. Mol. Biol ., 1996, 261, 1122. Nussbaum, G.; Cleare, V.W.; Casadewal, A.; Scharff, M.D.; Valadon, P. J. Exper. Med., 1997, 185, 685-694. Phalipon, A.; Folgori, A.; Arondel, J.; Sgaramella, G.; Fortugno, P.; Cortese, R.; Sansonetti, P.J.; Felici, F. Eur. J. Immunol., 1997, 27, 2620-2625. Minenkova, O.O.; Ilychev, A.A.; Kishchenko, G.P.; Petrenko, V.A. Gene, 1993, 128, 85-88. Galfré, G.; Monaci, P.; Nicosia, A.;Luzzago, A.; Felici, F.; Cortese, R. Methods Enzymol., 1996, 267,109-115. Chen, X.; Scala, G.; Quinto, I.; Liu, W.; Chun, T.W.; Justement, J.S.; Cohen, O.J.; vanCott, T.C.; Iwanicki, M.; Lewis, M.G.; Greenhouse, J.; Barry, T.; Venzon, D.; Fauci, A.S. Nature Med ., 2001, 7(11), 1225-1231. Matthews, L-J.; Davies, R.; Smith, G.P. J. Immunol., 2002, 169, 837-846. Vita, C.; Roumestand, C.; Toma, F.; Menez, A. Proc. Natl. Acad. Sci. USA, 1995, 92, 6404-6408. Possani, L.D.; Selisko, B.; Gurrola, G.B. In Darbon, H.and Sabatier, J.M. In Perspectives in Drug Discovery and Design; Animal toxins and potassium channels; Darbon, Sabatier, Eds.; Kluwer Academic Publishers: Holland, 1999, 15/16. pp. 15-22. Kieber-Emmons, T.; Murali, R.; Green, M.I. Curr. Opin. Biotechnol., 1997, 8, 435-441. Zavodszky, P.; Kardos, J.; Svingor, Petsko, G.A. Proc. Natl. Acad. Sci. USA, 1998, 95, 7406-7411. Ringe, D.; Mattos, C. Med. Res. Rev., 1999, 19, 321-331. Mount, J.; Samoylova, T.I.; Morrison, N.E.; Cox, N.R.; Baker, H.J.; Petrenko, V.A. Gene, 2004, 341, 59-65.
Drug Discovery and Design [226] [227] [228] [229] [230] [231]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 67
Tao, J.; Wendler, P.; Conelly, G.; Lim, A.; Zhang, J.; King, M.; Li, T.; Silverman, J.A.; Schimmel, P.R.; Tally, F.P. Proc. Natl. Acad. Sci. USA, 2000, 97, 783-786. Blum, J.H.; Dove, S.L.; Hochschild, A.; Mekalanos, J.J. Proc. Natl. Acad. Sci. USA, 2000, 97, 22412246. Cunningham, B.C.; Wells, J.A. Curr. Opin. Struct. Biol., 1997, 7, 457–462. Zhang, B.; Salituro, G.; Szalkowski, D.; Li, Z.; Zhang, Y.; Royo, I.; Vilella, D.; Diez, M.T.; Pelaez, F.; Ruby, C.; Kendall, R.L.; Mao, X.; Griffin, P.; Calaycay, J.; Zierath, J.R.; Heck, J.V.; Smith, R.G.; Moller, D.E. Science, 1999, 284, 974–977. Wells, J.A. Science, 1996, 273, 449-450. Nezlin, R. Comb. Chem. High Throughput Screen, 2001, 4, 377-383.
Frontiers in Drug Design & Discovery, 2005, 1, 69-86
69
High Throughput Screening: Will The Past Meet The Future? Patrick Englebienne* Biomedical Consultant, Englebienne & Associates, Strijpstraat 21, B-9750 Zingem, Belgium, and Biocybernetics Unit, Laboratory of Experimental Medicine, Free University of Brussels, Place Van Gehuchten 4, B-1020 Brussels, Belgium Abstract: High throughput screening (HTS) was developed over the last decades with an aim at selecting interesting drug candidate leads within the huge libraries of compounds obtained by combinatorial chemistry. From the early days, the technique consisted in detecting molecules capable of interacting with specifically selected target receptors, enzymes or antibodies so as to sort out the ones susceptible of further development. In the early days, radioactive, enzyme or fluorescent detection techniques were commonly applied in plate reader formats. Throughput was progressively enhanced by increasing the number of wells per plate and by automating the procedures. A further development of the throughput capacity was obtained with the switch from inhomogeneous to homogeneous technologies, where the detection signal is generated directly from the biomolecular interaction, without the need to separate bound from free fractions within the reaction mixture. This advance reduced the manipulation steps, reducing the costs and further improving screening throughput which culminates today in ultra-high-throughput methodologies. The increase in data generated by such technological improvements required a parallel development of softwares capable of handling and analysing the information, at the expense sometimes of missing false negatives. The drug discovery process now faces a new bottleneck resulting directly from HTS developments. Although HTS allows to rapidly identify new drug candidates, the technique does not address the possible applicability of the hits to a biological system. The technology does not give any insight into the possible cellular toxicity of the hit, which is a minimal requirement for further development. Thus, the bottleneck is currently shifting from hit identification to toxicity and biodisponibility evaluation. This new challenge will require a further effort in inventiveness from the part of pharmaceutical scientists.
INTRODUCTION During the last half-century, the pharmaceutical industry has faced several new challenges, including the increasing cost of bringing new drugs to market, and the needs for more efficient therapies for high incidence lethal or debilatating chronic diseases *Corresponding author: Tel/Fax: +32-495-773-006; E-mail:
[email protected] Garry W. Caldwell / Atta-ur-Rahman / Barry A. Springer (Eds.) All rights reserved – © 2005 Bentham Science Publishers.
70
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Patrick Englebienne
such as malignancies and autoimmune diseases, or emergent new plagues such as the infections by the human immunodeficiency virus [1, 2]. These new challenges required cost reductions and an increase in new drug candidates output, which were met during the last decades with the advent of combinatorial chemistry technologies [3]. The combinatorial chemistry revolution provided the industry with series of compound libraries the size of which could not be thought of with the synthetic chemistry technologies available before. This situation created a bottleneck at the level of compound evaluation which was solved by a progressive increase in screening speed by the advent of high throughput screening (HTS) technologies [4]. HTS consists in identifying rapidly compounds exhibiting a given biological activity, such as affinity, toward a given pharmaceutical target (enzyme, receptor), from libraries containing from hundred-of-thousands to millions of compounds [5]. The recent advances in genomics and molecular biology have produced an increasing number of drug molecular targets, so that the demand is now shifting from a single to multiple target screens of a given library [6]. The technological advances made in the field during the few last years have permitted to test more and more compounds at faster and faster rates. Unfortunately, this was not as productive as expected because of the new challenges presented by the abundance of data that need to be analyzed, and the difficulty to detect specific from non-specific hits within assay results [7]. Failure to recognize such early problems results in a vast majority of early hits to fail progressing at later stages of the drug development process [8]. Moreover, a biological hit compound does not necessarily fulfill all the requirements for a successful drug in terms of absorption, metabolism, distribution, elimination, and toxicity (ADMET) and consequently, the current trend demands a departure from the unique process of identification toward fast and costeffective and more integrated parallel strategies guaranteeing improved clinical prospects and successes [9]. This review will address the progressive evolution of HTS with a particular emphasis on the technological developments, and the possible application of the latest discoveries in the field to the emerging challenges faced by this early phase of the drug discovery process. HTS, TARGETS AND LEADS HTS is an early step in the drug discovery process. As presented in Fig. (1), HTS is at the cross-road of the pharmaceutical target identification process and the synthesis of compound libraries by combinatorial chemistry. The whole process (Fig. (1)), usually starts with the identification of a suitable and validated biological target, which plays a role in the etiology or pathogenesis of a disease and the activity of which can be modulated by a drug [2]. The target is then confronted during HTS campaigns to libraries (collections) of small molecules obtained by combinatorial synthetic chemistry, or of natural origin, in order to identify hits. The next hit-to-lead development process is intended to identify a lead compound which is active in the cellular context and displays drug-like properties in terms of ADMET characteristics. Finally, the lead development process involves in vivo testing and the further optimization of the lead compound in terms of pharmacokinetics and pharmacodynamic properties to make a series of clinical candidates. Combinatorial chemistry is out of the scope of this chapter and will not be addressed. The reader may refer to recent excellent reviews covering the major advances in the field [3, 10-15].
High Throughput Screening
Frontiers in Drug Design & Discovery, 2005, Vol. 1 71
However, it is important to say a word about the drug targets because not only their nature, but also their cellular function, directs the technology to be used in an HTS campaign.
Fig. (1). Algorithm presenting the central place of HTS in the drug discovery process.
Pharmaceutical targets are usually proteins which are differentially expressed or activated in a diseased versus normal tissue [1]. The information gathered by sequencing the human genome (genomics) and the ability to identify every expressed gene will lead to the identification of many new protein targets relevant to the pathogenesis and/or persistence of diseases [1]. The identification of a prospective protein target (proteomics) is currently effected by three means, namely: by evaluating the differential profile of a given protein expression [16] or activation [17] between diseased and normal tissues (profiling proteomics); by understanding protein interactions and functions (functional proteomics) individually [16, 18, 19] and between each another in cellular pathways [17, 20]; by elucidating the tertiary structure of proteins and domains so as to understand the mechanisms of their interactions within cells (structural proteomics) [21]. The cellular target proteins might be enzymes such as kinases and in most cases, the HTS campaign will attempt at identifying inhibitors of the enzymatic activity [22]. However, cellular kinases such as protein kinase C interact also with other cellular proteins within signal transduction pathways, in which case the HTS campaign will attempt to identify small molecules capable of inhibiting such interactions [23]. The target protein might also be a nuclear or membrane receptor, such as a chemokine receptor [24], in which case the HTS campaign will hunt for small molecules modulating
72
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Patrick Englebienne
the binding of the natural ligand. But a membrane receptor can also display an associated enzymatic activity, such as the G-protein-coupled-receptors (GPCR), and in such a case, the screen will evaluate the drug-receptor interactions in terms of effects on the downstream enzymatic activity [25]. These downstream cellular events lead to the differential expression of specific gene products [26, 27]. Finally, the protein target might be an ion channel and the HTS run will address the capacity of the library compounds to modify ion fluxes [28]. Besides proteins, RNA molecules are also emerging pharmaceutical targets. RNA possesses the double property of being a repository of genetic information and of displaying either catalytic activity like enzymes [29], or protein interaction capacity [29, 30]. Therefore, RNA molecules intervene in many cellular activities, from gene expression [29] to signal transduction [30]. The identified target involved will consequently govern the HTS methodology and hence the technology to be applied, depending on the results and type of information expected from the assay. These can be either mechanistic, such as in enzymatic or binding assays [31], or functional, such as in protein interaction or ion flux assays [32]. Table 1 presents a summary of the information that can be expected from mechanistic and functional assays, depending on the type of target considered. Table 1.
Information Obtained from Mechanistic and Functional Assay Systems Information expected from the compounds tested
Target:
Examples: Mechanistic Assay
Functional Assay
Serum enzyme
Cyclooxygenase, Thrombin, proteases.
Kcat, Km, Vmax, inhibition of natural substrate binding (KI), selectivity.
Production of mediators Capacity to interact with other blood components.
Cellular enzyme
Kinases, phosphatases, caspases, proteases.
Kcat, Km, Vmax, inhibition of natural substrate binding (KI), selectivity.
Capacity to interact with other cell proteins or peptides, capacity to hetero- or homodimerize, apoptotic induction.
Membrane receptor
Growth factor receptors, Chemokine receptors, G-proteincoupled receptors.
Nuclear receptor
Thyroid, steroid, retinoid receptors.
Affinity (KD), inhibition of natural ligand binding(KI), selectivity.
Interaction with coactivators, gene expression regulation, distinction between agonist, partial agonist and antagonist ligands.
Cellular binding proteins
p53, Bcl-2, BclXL
Affinity (KD)
Protein-protein interactions, apoptotic induction.
Ion channels
K+, Ca++ channels, ryanodine receptors.
Affinity (KD)
Ion flux through cellular or cell compartment membrane, downstream effects.
Affinity (KD), inhibition Downstream effects (c-AMP, inositolof natural ligand 3-phosphate, Ca++, GTP-binding, binding (KI), selectivity. oligomerization, localization); distinction between agonist, partial agonist and antagonist ligands.
High Throughput Screening
Frontiers in Drug Design & Discovery, 2005, Vol. 1 73
STATE OF THE ART IN HTS METHODOLOGY AND TECHNOLOGY Assay Type and Format Before deciding on a specific technology, the type of assay must be selected between a cell-free or a cell-based assay system [33]. Cell-free assays are simple and amenable to ultra-HTS (u-HTS), but instead of cell-based assays, they do not present the target in the cellular context and consequently do not allow to measure the downstream signal transduction effects of a compound. In contrast, cell-based assays permit to gather more useful information but usually at the expense of throughput [34]. Both cell-free and cellbased assays are usually adapted to plate formats. The continuous trend toward higher thoughput has seen the number of wells per plates to increase progressively from 96 to 384 and 1536-well plates [35]. Currently, new systems allow for handling up to 3456well plates [36]. This trend was continuously associated with the development of robotics and high performance liquid-handling devices permitting to rapidly add volumes reduced down to several micro- or even nanoliters per well [37 – 41]. The main objective of this trend was, besides increase in assay speed and efficiency, to reduce the cost of reagent usage [35]. However, going to greater plate formats involves also a substantial capital investment in equipment, which include not only liquid-handling, but also specific high-throughput readers [35, 42]. Another difficulty presented by u-HTS results from the fact that due to labor intensiveness, most cell-based assays are hardly applicable to formats beyond the 96/384-well plates [43]. Some of these difficulties have been overcome with the development of protein biochips and microarrays which involve the automated production of small reagent spots on hydrophobic membranes by inkjetting technologies [44]. This new advance in u-HTS was made possible thanks to the parallel development of large capacity scanners and readers [45]. Recently, cell microarrays have also been developed that allow to increase the throughput capacity of cell-based assays [46, 47]. Cell-free assays are in most cases mechanistic. They are particularly suited to the measurement of enzymatic activities and their inhibition by the test compounds. The measurement of kinases [48], phosphatases [49], caspases [49] activities, as well as many other serum or cellular enzymes of which a catalytic substrate-product transformation can be monitored at end-point or in kinetic mode in absence or presence of the test compounds [50 - 52], have been adapted to HTS and u-HTS. Although the assay format usually requires enzyme immobilization, which changes substrate turnover relative to what happens in solution, this is not likely to have a tremendous impact on the determination of relative inhibitory activities [51]. Cell-free assays provide also useful mechanistic informations on ligand-receptor interactions and have been applied to both membrane receptors such as GPCRs [53] and nuclear receptors such as the estrogen receptors [54]. These assays use the recombinant receptor proteins and evaluate the binding characteristics and inhibition of peptide or small molecule ligand-binding [53, 54]. HTS cell-free assays are similarly applied in the evaluation of DNA and RNA hybridization [55, 56] as well as RNA-protein assembly modulation by library compounds [57]. Recent advances in cell-free assay methodology have been able to alllow the evaluation of functional assay end point elements in the presence or absence of drug candidates, such as transcriptional activation [58, 59] and more recently, even translation and protein expression [60] . These recent improvements were made possible thanks to the commercial availability of cell lysate systems allowing to perform
74
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Patrick Englebienne
automated coupled transcription-translation reactions in high yield from a given gene introduced as a polymerase chain reaction (PCR) product [61]. Cell-based assay system are more complex to implement and more labor-intensive than cell-free assays [34]. However, they present the advantage over cell-free assays to permit the investigation of a wide array of functional properties exerted by compounds from libraries, including effects on intracellular signalling pathways, protein-protein interactions and subcellular localization of the targets [62]. Moreover, since the test compound is put in presence of a complete and fully functional cell, it is subjected to cellular entry barriers and catabolic processes [62]. Although cell entry barrier is not critical with small organic molecules, it might be an essential part of the library effectiveness, particularly with larger molecules such as peptides, nucleotides and oligosaccharides. The critical differences in terms of design and type of information provided between cell-free and cell-based assays are schematically presented in Fig. (2). Among the cells used in HTS, the yeast Saccharomyces cerevisiae has become a prominent model organism, particularly with the emergence of the two-hybrid system screen for protein-protein interactions [63]. Recently however, two-hybrid screening systems using mammalian cells have been developed in order to override the limiting drug permeability differences existing between yeast and mammalian cells [64].
Fig. (2). Schematic presentation of the critical differences between cell-free and cell-based assay systems.
High Throughput Screening
Frontiers in Drug Design & Discovery, 2005, Vol. 1 75
The gene-reporter assays are also in wide use in the drug discovery process. They allow to monitor the transcriptional activation of specific genes resulting from ligandbinding to and activation of membrane [65] and nuclear receptors [66] alike. The most recent technological developments of this assay system even allow to discriminate receptor agonists from antagonists [67]. Specific functional cell-based assays are also being developed that, by monitoring the concentration of secondary messengers such as inositol 1,4,5-triphosphate in living cells, allow to follow in real-time the activation by compounds of given signalling pathways [68]. The evaluation of libraries against ion channel targets is most probably a field that benefits most from cell-based HTS assays. The conventional patch clamp technique is slow and labor-intensive [69]. Therefore it is being progressively replaced by cell-based, rapid assays that monitor ion transport within living cells using electrophysiological techniques [69], voltage- or ion-sensitive dyes [70 - 72]. Signal Detection Whether cell-free or cell-based, the low labor, reduced cost and short time requirements of any HTS assay calls for a “mix and read” capacity. Various strategies have been designed over the last decades to gain such advantage in signal detection, from the scintillation proximity to smart materials and biosensors [73]. The different signal detection technologies are based on various physico-chemical principles leading to the generation of a physical signal upon interaction between the molecules of interest. They are best classified depending on whether they use a label or not, a distinction which is cartooned in Fig. (3). Label-based assays use a tracer molecule or atom with specific physical emission characteristics (e.g. radioactivity, fluorescence), which is coupled to one reactant and can be detected (traced) and quantitated by its emission during complex formation with the cognate molecule. Conversely, label-free assays detect directly the changes occuring in a physical characteristic of respectively either one reactant (Fig. (3), bottom part A, e.g. mass spectrum), or of a supporting layer to which it is attached (Fig. (3), bottom part B, e.g. sensor), that allows to monitor and quantitate the biomolecular interaction of this reactant with a cognate molecule. Label-Based Detection Among the labels, radioactive isotopes have been in use from the early days of binding assays and assay homogeneity has been made possible with the development of the scintillation proximity assay design. Currently available homogeneous assay designs and labels are extensively described and reviewed in ref. [73]. Although still applied to HTS [74], the security requirements and the cost of waste disposal associated with radioactive labels lead many laboratories to prefer using optical labels [75]. Enzymatic assays usually rely on the subsequent reaction of the product generated by the catalytic activity with a label present in the mixture or within the cell and capable of generating a fluorescent [76], chemiluminescent [77], or colorimetric [78] signal upon interaction. Nowadays, many cell-free binding assays use fluorescent labels. Protein-protein [79] and nucleotide-nucleotide [80] interactions are usually monitored by fluorescence resonance energy transfer (FRET). FRET also allows to monitor conformational changes within a given protein and has been adapted to microscopic examination in living cells [79]. Even with the FRET technology which relies on the specific energy-dependence of a pair of fluorophores, the key problems associated with the detection of the
76
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Patrick Englebienne
fluorescence signal are of two natures. Quenchers possibly present in the reaction can inhibit signal generation and autofluorescence from contaminants or compounds in the mixture can mask the signal [81]. In FRET, these limitations have been overcome mainly by time-resolved (TR) fluorescence detection, which permits to eliminate the short-lived non-specific fluorescence signals, and by the use of rare earth cryptates which present the prolonged time emissions, spanning over milliseconds, required for proper TR detection [81-83]. In single fluorophore assay design, fluorescence polarization (FP) is probably the most popular technology in use for HTS, particularly because of the ease to assay miniaturization with this signal detection technology [84]. Quenching and autofluorescence reduction can be achieved by fluorescence lifetime discrimination [85]. Besides its use in classical competitive ligand-binding assays [86], FP has also been recently applied in HTS format to the detection of inhibitors of protein interactions in such cell signaling pathways as apoptosis (Smac/DIABLO, Bcl-XL) [87 89], or signal transducers and activators of transcription (STAT) [90].
Fig. (3). Cartoon presenting the conceptual differences between label-based and label-free assay designs.
Fluorophores are also in progressive use for HTS cell-based assays. A fluorometric imaging assay has recently been developed for the detection of calcium mobilization by cells stably expressing various subtypes of GPCR, with an aim at identifying and
High Throughput Screening
Frontiers in Drug Design & Discovery, 2005, Vol. 1 77
characterizing agonist ligands activating distinct intracellular second messenger pathways [91]. Flow cytometry with fluorescent probes is becoming an increasingly attractive platform for homogeneous HTS of cells, that can be further enhanced by multiplexing on color-coded beads or cell-suspension arrays [92]. In the area of ion channels, although attempts to automate electrophysiological measurements and patchclamping are progressing [93], the FRET technology using membrane potential-sensitive dyes allows for higher capacity screening assays [94, 95]. Other technologies are however emerging, such as the automated detection by atomic absorption spectroscopy of rubidium flux across the cell membrane [96]. Label-Free Detection Among the label-free screening technologies, nuclear magnetic resonance (NMR) spectroscopy has been in use for close to a decade for hit to lead development (see Fig. (1)) [97]. NMR is particularly suitable to cell-free assay systems. The main interest in NMR results from the fact that it allows to distinguish binding sites of different affinities on a same protein or nucleic acid target receptor, that are involved in binding different ligands. This advantage is particularly useful in structure activity relationship studies which further allow to optimize the lead structure for enhanced activity [98, 99]. However, recent efforts have been made to develop new instrumentation and methodologies to increase throughput and apply NMR also in cell-based HTS [99, 100]. Besides NMR, mass spectrometry (MS) has also been developed as a method for quantitative studies on biomolecular interactions between low molecular ligands and proteins in cell-free assays. Electrospray ionization (ESI)-MS allows to infer the affinity of ligand-protein complexes in direct as well as competitive experiments. In some cases, however, the interactions occuring in the gas phase may differ from those found in solution. The technique has now matured and several means have been designed to counter this problem, such as nanoflow electrospray formation on microchips, which were recently fully automated for HTS [101, 102]. Matrix-assisted laser-desorption ionization (MALDI) MS generally complemented by time-of-flight (TOF) analysis has also become an outstanding technique for the study of biomolecular interactions and has been proposed for HTS [103]. The capacity of MS in HTS is further empowered by its combination with high-performance liquid chromatography (HPLC). Several methods have been proposed to screen libraries, involving such combinations as ESI-MS with size exclusion and reverse phase (RP)-HPLC [104], or evaporative light scattering-TOFMS with RP-HPLC eventually coupled with NMR for compound identification [105]. Such methods can be applied in 96- or 384-well plates and currently compete for screening throughput with the most advanced label-based assay methodologies [104]. Optical biosensors are currently emerging as a further label-free alternative for the study of biomolecular interactions [106]. Several technologies have been recently proposed for HTS in cell-free systems, such as reflectometric interference spectroscopy [107] and colorimetric resonant grating [108]. Currently however, the most popular optical sensing technology applied to screening for biomolecular interactions is undoubtedly surface plasmon resonance (SPR) [109]. SPR biosensors have been applied to the quantitative real-time analysis of various biomolecular interactions such as peptide-antibodies [110], receptor-ligand [111], including imaging of peptide and nucleic acid arrays [112, 113]. Quite early, we have shown that the real-time detection of the SPR phenomenon occuring during biomolecular interaction could be implemented,
78
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Patrick Englebienne
not only on sensor chips, but also on colloidal gold nanoparticles in solution, and in such a case, the SPR signal is read by a common UV-vis spectrophotometer [114]. We have eventually extended this system to cell-free HTS [115]. Because of their similar optical properties [116], SPR detection applies also to nanoparticles made of other noble metals such as silver, in liquid [117] as well as on solid-phase [118, 119]. The surface of such nanoparticles can be functionalized with various ligands or chemical groups to enable their recognition by cognate biomolecules [120]. Gold and silver nanoparticles are progressively emerging as particularly interesting sensors for microscopy, since the discovery by the Yguerabide’s of their capacitiy to scatter colored light when illuminated with white light, and to behave like highly fluorescent analogs [121, 122]. The frequency of the scattered light results directly from the SPR and is thus a function of the size, shape, and composition of the nanoparticle. These nanoparticles have a tremendous application potential in HTS, as in vivo extra- and intracellular sensors for protein recognition detection by light microscopy [123]. THE WHEREABOUTS OF TODAY HTS The advances made over the past decade in drug discovery technologies have not yet delivered the increase in productivity that was expected from HTS. Overall, less than one in fifty projects succeed through discovery and development to produce a marketable drug and in the discovery phase, the cumulative attrition is around 80% [124]. Such high level of attrition can be attributed to several factors [124], which span over the whole discovery process (Fig. (1)). As far as HTS is concerned in the process, the trend seeking a continuous increase in the throughput of mechanistic assays has generated new bottlenecks that still reduce the speed of the whole process. First, the deluge of screening data generated needs the necessary logistics for their interpretation and analysis [125, 126]. The throughput obtained by using computerized data analysis is however hampered by the need to still use manual curve fitting in some complex cases suspected to result from false negative interpretation. Second, the functional assays which are subsequent in the process have usually remained at a lower throughput [34, 125]. Third, the relatively high rate of false positive hits compromises the efficiency of HTS and constitutes an important reason for attrition at a later stage. These false-positive results (promiscuous agonists or inhibitors) can be generated by mechanisms that are independent from the HTS assay design, such as non-specific compound aggregation [127]. Despite and notwithstanding these limitations, some advocate a further evolution in capacity involving the simultaneous screening of different targets in single assays [128]. On the opposite, others advocate the progressive emergence of a lower throughput pharmacogenomic approach, which involves the elucidation of interindividual differences in drug effects and mechanism-based approaches to the development of new drugs [129]. In-between, are the tenants of a combination of mechanistic and functional assays in cell-based systems, which allow to gather complex information on the physiological effects of hits without compromising exceedingly on throughput [130 132]. This latter trend is likely to gain more and more supporters. For instance, an adaptation to molecular pathology with human tissue has been proposed to have higher potential in distinguishing the effects of drug candidates between normal and pathological cells, enabling informed decisions to be made with respect to success potential [133]. These different trends respond to some, but not all of the current challenges of present day HTS. These current challenges include also a need for a more integrated approach of HTS, which involves not only the evaluation of activity, but also
High Throughput Screening
Frontiers in Drug Design & Discovery, 2005, Vol. 1 79
of preliminary pharmacological (ADMET) properties so as to avoid attrition of unsuitable compounds at a later stage and increase overall productivity [134]. The integration of the various assay concepts and designs, along with the challenges they currently face are schematically summarized in Fig. (4). The general challenge toward higher throughput that has prevailed up to recently in HTS laboratories is therefore currently modulated by more exhaustive and thoughtful strategies. In the past, the technologies available have been successful in meeting the demand for higher throughput assay design and for cost containment, by delivering sensitive reagents, multiple platforms, automation and miniaturization. Today, the question remains as to whether the past will meet the future in appraising the new HTS challenges in the drug discovery process. WILL THE PAST MEET THE FUTURE? Deciphering the human genome has provided a huge amount of genetic information in health and disease [34]. However, genes code for proteins which effect their specific tasks in the signal transduction pathways that make cells the prime movers of life. Genes can be targets for genetic therapies but in classical pharmacotherapy, proteins are the targets and therefore, the genomic information available must be translated into proteomic information for full exploitation. Consequently, the first line challenge that faces HTS nowadays is the identification of targets, the characterization of their physiological effects, and the elucidation of their respective mechanisms of action (see Fig. (4)). Several advances have been made to this respect among which the RNA interference [135] and the high throughput western blot screening technologies [136, 137] constitute important breakthroughs allowing to rapidly identify and study new targets. These important advances are however applicable to isolated cells and do not necessarily reflect the in vivo situation. New means are therefore needed to better apprehend this gap. Mouse knockout technology is an appealing approach in this context but todate, knockouts exist for only about 10% of mouse genes. Hence, the knockout mouse project recently proposing to mount an international effort to produce a genomewide collection of mouse knockouts is particularly welcome [138]. The second line challenges (Fig. (4)) are directly concerned with HTS assay strategy. There is a need to reduce the high false-positive rates of the classical mechanistic cellfree assays which can be limited by an affinity- rather than activity-based approach [115, 139]. New directions tend however toward the integration of bioinformatic and experimental technologies allowing for a confrontation of virtual with experimental screening results [140 - 142]. Such integration has been successfully applied in such remote areas as the characterization of enzyme inhibitors [142, 143] and of GPCR or nuclear receptor agonists and antagonists [142, 144]. The pharmaceutical industry is also strongly avid to gain pharmacological and pharmacodynamic information early in the drug discovery process and possibly during a HTS campaign, so as to counter the attrition rate at later stages. This is another domain where in silico methods are particularly useful, as they allow to integrate the characterization of hits in terms of ADMET properties with in vitro screening [145 - 147]. Such methods are based on correlations between a given physical characteristic of the drug candidate and an ADMET property, as established using training sets of known drugs. These mathematical models are then used in order to deduce the pharmacological and pharmacodynamic characteristics of libraries. Such models have been established to
80
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Patrick Englebienne
infer absorption [148, 149] and oral bioavailability [150, 151], distribution and pharmacokinetic [152 - 154], metabolic stability [155 - 158], aqueous solubility [159], drug-drug interaction potential [160], and binding promiscuity [161]. Currently, except when HPLC/MS is selected as HTS technology [153, 154, 162], ADMET characterization occurs in parallel to HTS, using different techniques. As examples, metabolic profiling can be deduced from the relationship between the inhibition capacity of cytochrome 450 by known drugs [155, 156], oral bioavailability and absorption can be deduced from the drug permeability to cells in culture [148 - 150] and pharmacokinetic data from lipophilicity [152]. Although this improves the quantity and quality of information gathered during a campaign, the use of parallel technologies decreases the throughput, which still remains the focus of any laboratory, as reminded during the 2nd Conference on Advances in HTS, held in London last January [163].
Fig. (4). Schematic presentation of the current challenges faced by integrated HTS assay concepts and designs.
Although new sophisticated in silico quantitative structure-activity relationship models are being developed for ADMET prediction of novel chemical compounds [164], they still present weaknesses and therefore, new imaginative reagents are needed, allowing to gain such information in a multiplexed format. In this respect, sensors or labels based on
High Throughput Screening
Frontiers in Drug Design & Discovery, 2005, Vol. 1 81
conducting polymers are in our opinion an interesting avenue. A few years ago, we and others have adapted such labels to homogeneous biomolecular recognition [165]. Such reagents are however also sensitive to various physico-chemical characteristics of the molecules with which they interact [73, 165], a feature that could be considered in early ADMET analysis, alone or in combination with other sensitive reporter materials such as colloidal metals. The combination of these latter with other reporter reagents such as fluoroprobes has indeed been shown to be highly beneficial [166]. The combined sensitivities of noble metal colloids to refractive index changes [109] and of conducting polymers to oxido-reduction [73], probably deserve further valorization in such a context. Colloidal metal nanoparticles are also becoming reagents of choice in the currently developing trend to drive HTS to living cell-based functional assays [167, 168]. Fluorescent probes have quite early been exploited in this context [169], although their further optimization has required important research investments in associated technologies such as microarraying and imaging [170 - 172]. Nuclear magnetic resonance imaging of conjugated iron oxide nanoparticles has recently emerged as a non-invasive cell screening technique for HTS [173]. However, despite the tremendous screening potential that they offer, multiplexing remains difficult to conceive using these technologies. Colloidal noble metal nanoparticles made of silver or gold are more susceptible to present breakthrough advances in the field. Their exquisitely sensitive reactivity to changes in their dielectric environment alter their light scattering and surface plasmon resonance spectra, a property which has very recently been exploited for imaging biomolecular recognition [174] and membrane transport [175] in living cells by light microscopy. The application of such nanoparticles to imaging membrane transport is remarkable, since the current technologies used in this HTS field, such as dyes or fluorescence, present disadvantages including false positivity, slow response, assay complexity and high cost [176]. Moreover, it is possible to synthesize nanosized cylindrical particles with different Ag-Au striping patterns [177], which will probably allow, in a very near future, for multiplex imaging in a single cell. Finally, these emerging technologies are non-invasive and are adaptable to screening in intact living animal organisms [178], which constitutes probably the next challenge facing HTS technology [179]. The potential of these emerging technologies for living cell-based and whole organism functional screening are compared to FRET in Table 2. Table 2.
Comparative Analysis of Emerging Technologies for Living Cell-Based Functional HTS with FRET
Technology
Imaging Technique
Adaptation to microarrays
Multiplexing capacity
Membrane transport study
Adaptation to living organisms
FRET
Fluorescence microscopy
Yes
No
Yes
No
Colloidal iron oxide
Light microscopy
Yes
No
Yes
Yes
Colloidal gold or silver
Light microscopy
Yes
Yes
Yes
Yes
82
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Patrick Englebienne
The past has seen advances in technological capacity that have successfully allowed to bring drug screening to a level of throughput meeting progressively the demands of the pharmaceutical industry. The present demand is driving toward cell-based and functional assays incorporating a parallel evaluation of ADMET. The future will call for early screening in the drug discovery process using whole living organisms, while maintaining a high level of throughput. The technological advances that have been made during the recent years along with those currently emerging are likely to indicate that the past will successfully meet the future. ABBREVIATIONS ADMET
=
Absorption, Distribution, Metabolism, Excretion, Toxicity
ESI-MS
=
Electrospray Ionization Mass Spectrometry
FRET
=
Fluorescence resonance energy transfer
GPCR
=
G-Protein-Coupled Receptors
HPLC
=
High-Performance Liquid Chromatography
HTS
=
High Throughput Screening
MALDI
=
Matrix-Assisted Laser-Desorption Ionization
MS
=
Mass Spectrometry
NMR
=
Nuclear magnetic resonance
RP
=
Reverse Phase
TOF
=
Time-Of-Flight
u-HTS
=
Ultra-HTS
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]
Neamati, N.; Barchi, J.J. Jr. Curr. Top. Med. Chem., 2002, 2, 211-227. Westwell, A.D.; Stevens, M.F.G. Drug Discov. Today, 2004, 9, 625-627. Kassel, D.B. Chem. Rev., 2001, 101, 255-267. Landro, J.A.; Taylor, I.C.A.; Stirtan, W.G.; Osterman, D.G.; Kristie, J.; Hunnicutt, E.J.; Rae, P.M.M.; Sweetnam, P.M. J. Pharmacol. Toxicol. Methods, 2000, 44, 273-289. Inglese, J. Drug Discov. Today, 2002, 7, S105-S106. Segall, M. Drug Discov. Today, 2003, 8, 160-161. Fay, N.; Ullmann, D. Drug Discov. Today, 2002, 7, S181-S186. Chapman, T. Nature, 2004, 430, 109-119. Bleicher, K.H.; Bohm, H.J.; Muller, K.; Alanine, A.I. Nature Drug Discov., 2003, 2, 369-378. Guillier, F.; Orain, D.; Bradley, M. Chem. Rev., 2000, 100, 2091-2158. An, H.; Cook, P.D. Chem. Rev., 2000, 100, 3311-3340. Horton, D.A.; Bourne, G.T.; Smythe, M.L. Chem. Rev., 2003, 103, 893-930. Gennari, C.; Piarulli, U. Chem. Rev., 2003, 103, 3071-3100. Wipf, P.; Coleman, C.M. Drug Discov. World, 2004, 5(1), 62-73. Wipf, P.; Coleman, C.M. Drug Discov. World, 2004, 5(2), 65-71. Figeys, D. Anal. Chem., 2002, 74, 413A-419A. Englebienne, P. Drug Design Rev.-Online, 2004, 1, 53-74. Walter, G.; Konthur, Z.; Lehrach, H. Comb. Chem. High Throughput Screen., 2001, 4, 193-205. Crameri, R.; Kodzius, R. Comb. Chem. High Throughput Screen., 2001, 4, 145-155. Auld, D.S.; Diller, D.; Ho, K.K. Drug Discov. Today, 2002, 7, 1206-1213.
High Throughput Screening [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 83
Stewart, L.; Clark, R.; Behnke, C. Drug Discov. Today, 2002, 7, 187-196. Zaman, G.J.; Garritsen, A.; de Boer, T.; van Boeckel, C.A. Comb. Chem. High Throughput Screen., 2003, 6, 313-320. Irie, K.; Nakagawa, Y.; Ohigashi, H. Curr. Pharm. Des., 2004, 10, 1371-1385. Gao, Z.; Metz, W.A. Chem. Rev., 2003, 103, 3733-3752. Williams, C. Nature Rev. Drug Discov., 2004, 3, 125-135. Christopoulos, A.; Kenakin, T. Pharmacol. Rev., 2002, 54, 323-374. Clarke, R.; Leonessa, F.; Welsh, J.N.; Skaar, T.C. Pharmacol. Rev., 2001, 53, 25-71. Worley, J.F., 3rd; Main. M.J. Receptors Channels, 2002, 8, 269-282. Hermann, T.; Westhof, E. Comb. Chem. High Throughput Screen., 2000, 3, 219-234. Burgstaller, P.; Jenne, A.; Blind, M. Curr. Opin. Drug Discov. Devel., 2002, 5, 690-700. Copeland, R.A. Anal. Biochem., 2003, 320, 1-12. Hemmila, I.A.; Hurskainen, P. Drug Discov. Today, 2002, 7, S150-S156. Shoemaker, R.H.; Scuderio, D.A.; Melillo, G.; Currens, M.J.; Monks, A.P.; Rabow, A.A.; Covell, D.G.; Sausville, E.A. Curr. Top. Med. Chem., 2002, 2, 229-246. Chanda, S.K.; Caldwell, J.S. Drug Discov. Today, 2003, 8, 168-174. Garyantes, T. K. Drug Discov. Today, 2002, 7, 489-490. Battersby, B. J.; Trau, M. Trends Biotechnol., 2002, 20, 167-173. Mere, L.; Bennett, T.; Coassin, P.; England, P.; Hamman, B.; Rink, T.; Zimmerman, S.; Negulescu, P. Drug Discov. Today, 1999, 4, 363-369. Felton, M.J. Anal. Chem., 2003, 75, 397A-399A. Morand, K.L.; Burt, T.M.; Regg, B.T.; Chester, T.L. Anal. Chem., 2001, 73, 247-252. Jiang, Y.; Wang, P.C.; Locascio, L.E.; Lee, C.S. Anal. Chem., 2001, 73, 2048-2053. Bousse, L.; Cohen, C.; Nikiforov, T; Chow, A.; Kopf-Sill, A.R.; Dubrow, R.; Parce, W.J. Annu. Rev. Biophys. Biomol. Struct., 2000, 29, 155-181. Auer, M. Drug Discov. Today, 2001, 6, 935-936. Hertzberg, R.P.; Pope, A.J. Curr. Opin. Chem. Biol., 2000, 4, 445-451. Huels, C.; Muellner, S.; Meyer, H.E.; Cahill, D.J. Drug Discov. Today, 2002, 7, S119-S124. Hitt, E. The Scientist, 2004, 18, 42-43. Bailey, S.N.; Wu, R.Z.; Sabatini, D.M. Drug Discov. Today, 2002, 7, S113-S118. Cacace, A.; Banks, M.; Spicer, T.; Civoli, F.; Watson, J. Drug Discov. Today, 2003, 8, 785-792. Hardcastle, I.R.; Golding, B.T.; Griffin, R.J. Annu. Rev. Pharmacol. Toxicol., 2002, 42, 325-348. Karvinen, J.; Laitala, V.; Mäkinen, M.L.; Mulari, O.; Tamminen, J.; Hermonen, J.; Hurskainen, P.; Hemillä, I. Anal. Chem., 2004, 76, 1429-1436. Ma, L.; Gong, X.; Yeung, E.S. Anal. Chem., 2000, 72, 3383-3387. Besanger, T.R.; Chen, Y.; Deisingh, A.K.; Hodgson, R.; Jin, W.; Mayer, S.; Brook, M.A.; Brennan, J.D. Anal. Chem., 2003, 75, 2382-2391. Gosalia, D.N.; Diamond, S.L. Proc. Natl. Acad. Sci. USA, 2003, 100, 8721-8726. Do, E.U.; Choi, G.; Shin, J.; Jung, W.; Kim, S. Anal. Biochem., 2004, 330, 156-163. Parker, G.J.; Law, T.L.; Lenoch, F.J.; Bolger, R.E. J. Biomol. Screen., 2000, 5, 77-88. Erickson, D.; Li, D.; Krull, U.J. Anal. Biochem., 2003, 317, 186-200. Singh, K.K.; Hanne, A.; Krupp, G. Methods Mol. Biol., 2004, 252, 33-48. Klostermeier, D.; Sears, P.; Wong, C.H.; Millar, D.P.; Williamson, J.R. Nucleic Acids Res., 2004, 17, 2707-2715. Garcia-Martinez, L.F.; Bilter, G.K.; Wu, J.; O’Neill, J.; Barbosa, M.S.; Kovelman, R. Anal. Biochem., 2002, 301, 103-110. Englen, R.M. Assay Drug Dev. Technol., 2002, 1, 97-104. Angenendt, P.; Nyarsik, L.; Szaflarski, W.; Glökler, J.; Nierhaus, K.H.; Lehrach, H.; Cahill, D.J.; Lueking, A. Anal. Chem., 2004, 76, 1844-1849. Betton, J.M. Curr. Protein Pept. Sci., 2003, 4, 73-80. Lazo, J.S.; Wipf, P. J. Pharmacol. Exp. Ther., 2000, 293, 705-709. Tucker, C.L. Drug Discov. Today, 2002, 7, S125-S130. Zhao, H.F.; Kiyota, T.; Chowdhury, S.; Purisima, E.; Banville, D.; Konishi, Y.; Shen, S.H. Anal. Chem., 2004, 76, 2922-2927. Liu, B.; Wu, D. Methods Mol. Biol., 2004, 237, 145-149. Jung, J.; Ishida, K.; Nishihara, T. Life Sci., 2004, 74, 3065-3074. Awais, M.; Sato, M.; Sasaki, K.; Umezawa, Y. Anal. Chem., 2004, 76, 2181-2186. Tanimura, A.; Nezu, A.; Morita, T.; Turner, R.J.; Tojyo, Y. J. Biol. Chem., 2004, July 22, 10.1074/jbc. C400312200 (Papers in press).
84
Frontiers in Drug Design & Discovery, 2005, Vol. 1
[69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114]
Patrick Englebienne
Willumsen, N.J.; Bech. M.; Olesen, S.P.; Jensen, B.S.; Korsgaard, M.P.; Christophersen, P. Receptors Channels, 2003, 9, 3-12. Numann, R.; Negulescu, P.A. Trends Cardiovasc. Med., 2001, 11, 54-59. Mattheakis, L.C.; Savchenko, A. Curr. Opin. Drug Discov. Devel., 2001, 4, 124-134. Gonzalez, J.E.; Maher, M.P. Receptors Channels, 2002, 8, 283-295. Englebienne, P., Immune and Receptor Assays in Theory and Practice, CRC Press, 2000. Whitfield, J.; Harada, K.; Bardelle, C.; Staddon, J.M. Anal. Biochem., 2003, 322, 170-178. Gauglitz, G. Curr. Opin. Chem. Biol., 2000, 4, 351-355. Putt, K.S.; Hergenrother, P.J. Anal. Biochem., 2004, 326, 78-86. Koresawa, M.; Okabe, T. Assay Drug Dev. Technol., 2004, 2, 153-160. Rowlands, M.G.; Newbatt, Y.M.; Podromou, C.; Pearl, L.H.; Workman, P.; Aherne, W. Anal. Biochem., 2004, 327, 176-183. Boute, N.; Jockers, R.; Issad, T. Trends Pharmacol. Sci., 2002, 23, 351-354. Parniak, M.A.; Min, K.L.; Budihas, S.R.; Le Grice, S.F.J.; Beutler, J.A. Anal. Biochem., 2003, 322, 33-39. Grepin, C.; Pernelle, C. Drug Discov. Today, 2000, 5, 212-214. Zhou, V.; Han, S.; Brinker, A.; Klock, H.; Caldwell, J.; Gu, X. Anal. Biochem., 2004, 331, 349-357. Bazin, H.; Préaudat, M.; Trinquet, E.; Mathis, G. Spectrochim. Acta, 2001, 57A, 2197-2211. Kowski, T.J.; Wu, J.J. Comb. Chem. High Throughput Screen., 2000, 3, 437-444. Fowler, A.; Swift, D.; Longman, E.; Acornley, A.; Hemsley, P.; Murray, D.; Unitt, J; Dale, I.; Sullivan, E.; Coldwell, M. Anal. Biochem., 2002, 308, 223-231. Do, E.U.; Choi, G.; Shin, J.; Jung, W.; Kim, S. Anal. Biochem., 2004, 330, 156-163. Glover, C.J.; Hite, K.; DeLosh, R.; Scudiero, D.A.; Fivash, M.J.; Smith, L.R.; Fisher, R.J.; Wu, J.W.; Shi, Y.; Kipp, R.A.; McLendon, G.L.; Sausville, E.A.; Shoemaker, R.H. Anal. Biochem., 2003, 320, 157-169. Zhang, H.; Nimmer, P.; Rosenberg, S.H.; Ng, S.C.; Joseph, M. Anal. Biochem., 2002, 307, 70-75. Qian, J.; Voorbach, M.J.; Huth, J.R.; Coen, M.L.; Zhang, H.; Ng, S.C.; Comess, K.M.; Petros, A.M.; Rosenberg, S.H.; Warrior, U.; Burns, D.J. Anal. Biochem., 2004, 328, 131-138. Schust, J.; Berg, T. Anal. Biochem., 2004, 330, 114-118. New, D.C.; Wong, Y.H. Assay Drug Dev. Technol., 2004, 2, 269-280. Edwards, B.S.; Oprea, T.; Prossnitz, E.R.; Sklar, L.A. Curr. Opin. Chem. Biol., 2004, 8, 392-398. Wang, X.; Li, M. Assay Drug Dev. Technol., 2003, 1, 695-708. Wolff, C.; Fuks, B.; Chatelain, P. J. Biomol. Screen., 2003, 8, 533-543. Felix, J.P.; Williams, B.S.; Priest, B.T.; Brochu, R.M.; Dick, I.E.; Warren, V.A.; Yan, L.; Slaughter, R.S.; Kaczorowski, G.J.; Smith, M.M.; Garcia, M.L. Assay Drug Dev. Technol., 2004, 2, 260-268. Gill, S.; Gill, R.; Lee, S.S.; Hesketh, J.C.; Fedida, D.; Rezazadeh, S.; Stankovich, L.; Liang, D. Assay Drug Dev. Technol., 2003, 1, 709-717. Moore, J.M. Curr. Opin. Biotechnol., 1999, 10, 54-58. Stockman, B.J.; Dalvit, C. Prog. Nucl. Magn. Reson. Spectrosc., 2002, 41, 187-231. Hajduk, P.J.; Burns, D.J. Comb. Chem. High Throughput Screen.,2002, 5, 613-621. Lepre, C.A.; Moore, J.M.; Peng, J.W. Chem. Rev., 2004, 104, 3641-3676. Keetch, C.A.; Hernandez, H.; Sterling, A.; Baumert, M.; Allen, M.H.; Robinson, C.V. Anal. Chem., 2003, 75, 4937-4941. Tjernberg, A.; Carnö, S.; Oliv, F.; Benkestock, K.; Edlund, P.O.; Griffith, W.J.; Hallen, D. Anal. Chem., 2004, 76, 4325-4331. Villanueva, J.; Yanes, O.; Querol, E.; Serrano, L.; Aviles, F.X. Anal. Chem., 2003, 75, 3385-3395. Muckenschnabel, I.; Falchetto, R.; Mayr, L.M.; Filipuzzi, I. Anal. Biochem., 2004, 324, 241-249. Eldridge, G.R.; Vervoort, H.C.; Lee, C.M.; Cremin, P.A.; Williams, C.T.; Hart, S.M.; Goering, M.G.; O’Neill-Johnson, M.; Zeng, L. Anal. Chem., 2002, 74, 3963-3971. Baird, C.L.; Myszka, D.G. J. Mol. Recognit., 2001, 14, 261-268. Birkert, Tünnemann, R.; Jung, G.; Gauglitz, G. Anal. Chem., 2002, 74, 834-840. Lin, B.; Qiu, J.; Gerstenmeier, J.; Li, P.; Pien, H.; Pepper, J.; Cunningham, B. Biosens. Bioelectron., 2002, 17, 827-834. Englebienne, P.; Van Hoonacker, A.; Verhas, M. Spectroscopy, 2003, 17, 255-273. Van Regenmortel, M.V.H.; Choulier, L. Comb. Chem. High Throughput Screen., 2001, 4, 385-395. Usami, M.; Mitsunaga, K.; Ohno, Y. J. Ster. Biochem. Mol. Biol., 2002, 81, 47-55. Wegner, G.J.; Lee, H.J.; Corn, R.M. Anal. Chem., 2002, 74, 5161-5168. Nelson, B.P.; Grimsrud, T.E.; Liles, M.R.; Goodman, R.M.; Corn, R.M. Anal. Chem., 2001, 73, 1-7. Englebienne, P. Analyst, 1998, 123, 1599-1603.
High Throughput Screening [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138]
[139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 85
Englebienne, P.; Van Hoonacker, A.; Verhas, M. Analyst, 2001, 126, 1645-1651. Khlebtsov, N.G.; Melnikov, A.G.; Dykman, L.A.; Bogatyrev, V.A. In Photopolarimetry in Remote Sensing; Videen, G.; Yatkiv, Y.S.; Mishchenko, M.I., Eds.; Kluwer Academic Publishers: Amsterdam, 2004, pp. 1-44. Englebienne, P.; Van Hoonacker, A.; Verhas, M.; Khlebtsov, N. G. Comb. Chem. High Throughput Screen., 2003, 6, 777-787. Nath, N.; Chilkoti, A. Anal. Chem., 2002, 74, 504-509. Frederix, F.; Friedt, J.M.; Choi, K.H.; Laureyn, W.; Campitelli, A.; Mondelaers, D.; Maes, G.; Borghs, G. Anal. Chem., 2003, 75, 6894-6900. Goodman, C.M.; Rotello, V.M. Mini-Rev. Org. Chem., 2004, 1, 103-114. Yguerabide, J.; Yguerabide, E.E. Anal. Biochem., 1998, 262, 137-156. Yguerabide, J.; Yguerabide, E.E. Anal. Biochem., 1998, 262, 157-176. Schultz, D.A. Curr. Opin. Biotechnol., 2003, 14, 13-22 Brown, D.; Superti-Furga, G. Drug Discov. Today, 2003, 8, 1067-1077. Taylor, D.L.; Woo, E.S.; Giuliano, K.A. Curr. Opin. Biotechnol., 2001, 12, 75-81. Gedeck, P.; Willett, P. Curr. Opin. Chem. Biol., 2001, 5, 389-395. Seidler, J.; McGovern, S.L.; Doman, T.N.; Shoichet, B.K. J. Med. Chem., 2003, 46, 4477-4486. Marron, B.E.; Jayawickreme, C.K. Curr. Opin.Chem. Biol., 2003, 7, 395-401. McLeod, H.L.; Evans, W.E. Annu. Rev. Pharmacol. Toxicol., 2001, 41, 101-121. Chen, G.; Way, J.; Armour, S.; Watson, C.; Queen, K.; Jayawickreme, C.K.; Chen, W.J.; Kenakin, T. Mol. Pharmacol., 2000, 57, 125-134. Russello, S.V.; Assay Drug Dev. Technol., 2004, 2, 225-235. Dalvit, C.; Ardini, E.; Fogliatto, G.P.; Mongelli, N.; Veronesi, M. Drug Discov. Today , 2004, 9, 595602. Beesley, J.; Rousch, C.; Baker, L. Drug Discov. Today, 2004, 9, 182-189. Di, L.; Kerns, E.H. Curr. Opin. Chem. Biol., 2003, 7, 402-408. Jones, S.W.; Souza, P.M.; Lindsay, M.A. Curr. Opin. Pharmacol., 2004, 4, 522-527. Lorenz, P.; Ruschpler, P.; Koczan, D.; Stiehl, P.; Thiesen, H.J. Proteomics, 2003, 3, 991-1002. Malakhov, M.P.; Kim, K.L.; Malakhova, O.A.; Jacobs, B.S.; Borden, E.C.; Zhang, D.E. J. Biol. Chem., 2003, 278, 16608-16613. Austin, C.P.; Battey, J.F.; Bradley, A.; Bucan, M.; Capecchi, M.; Collins, F.S.; Dove, W.F.; Duyk, G.; Dymecki, S.; Eppig, J.T.; Grieder, F.B.; Heintz, N.; Hicks, G.; Insel, T.R.; Joyner, A.; Koller, B.H.; Lloyd, K.C.; Magnuson, T.; Moore, M.W.; Nagy, A.; Pollock, J.D.; Roses, A.D.; Sands, A.T.; Seed, B.; Skarnes, W.C.; Snoddy, J.; Soriano, P.; Stewart, D.J.; Stewart, F.; Stillman, B.; Varmus, H.; Varticovski, L.; Verma, I.M.; Vogt, T.F.; von Melchner, H.; Witkowski, J.; Woychik, R.P.; Wurst, W.; Yancopoulos, G.D.; Young, S.G.; Zambrowicz, B. Nature Genet., 2004, 36, 921-924. Comess, K.M.; Schurdak, M.E. Curr. Opin. Drug Discov. Devel., 2004, 7, 411-416. Entzeroth, M. Curr. Opin. Pharmacol., 2003, 3, 522-529. Stahura, F.L.; Bajorath, J. Comb. Chem. High Throughput Screen., 2004, 7, 259-269. Clark, D.E.; Harris, N.V.; Roach, A.G.; Baxter, A.D. Drug Discov. World, 2004, 5(3), 37-41. Grüneberg, S.; Stubbs, M.T.; Klebe, G. J. Med. Chem., 2002, 45, 3588-3602. Schapira, M.; Abagyan, R.; Totrov, M. J. Med. Chem., 2003, 46, 3045-3059. Kyranos, J.M.; Cai, H.; Wei, D.; Goetzinger, W.K. Curr. Opin. Biotechnol., 2001, 12, 105-111. Gombar, V.K.; Silver, I.S.; Zhao, Z. Curr. Top. Med. Chem., 2003, 3, 1205-1225. Van de Waterbeemd, H.; Gifford, E. Nature Rev. Drug. Discov., 2003, 2, 192-204. Bohets, H.; Annaert, P.; Mannens, G.; van Beijsterveldt, L.; Anciaux, K.; Verboven, P.; Meuldermans, W.; Lavrijsen, K. Curr. Top. Med. Chem., 2001, 1, 367-383. Hidalgo, I.J. Curr. Top. Med. Chem., 2001, 1, 385-401. Mandagere, A.K.; Thompson, T.N.; Hwang, K.K. J. Med. Chem., 2002, 45, 304-311. Wenlock, M.C.; Austin, R.P.; Barton, P.; Davis, A.M.; Leeson, P.D. J. Med. Chem., 2003, 46, 12501256. Avdeef, A. Curr. Top. Med. Chem., 2001, 1, 277-351. Timmerman, P.M.M.B.I.; de Vries, R.; Ingelse, B.A. Curr. Top. Med. Chem., 2001, 1, 443-461. Wilson, D.M.; Wang, X.; Walsh, E.; Rourick, R.A. Comb. Chem. High Throughput Screen., 2001, 4, 511-519. Masimirembwa, C.M.; Thompson, R.; Andersson, T.B. Comb. Chem. High Throughput Screen., 2001, 4, 245-263. Bapiro, T.E.; Egnell, A.C.; Hasler, J.A.; Masimirembwa, C.M. Drug Metab. Dispos., 2001, 29, 30-35. Shin, Y.G.; Bolton, J.L.; van Breemen, R.B. Comb. Chem. High Throughput Screen., 2002, 5, 59-64.
86
Frontiers in Drug Design & Discovery, 2005, Vol. 1
[158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179]
Patrick Englebienne
Lewis, D.F.V.; Jacobs, M.N.; Dickins, M. Drug Discov. Today, 2004, 9, 530-537. Bevan, C.D.; Lloyd, R.S. Anal. Chem., 2000, 72, 1781-1787. Rodrigues, A.D.; Lin, J.H. Curr. Opin. Chem. Biol., 2001, 5, 396-401. Ekins, S. Drug Discov. Today, 2004, 9, 276-285. Ackermann, B.L.; Berna, M.J.; Murphy, A.T. Curr. Top. Med. Chem., 2002, 2, 53-66. Williams, G.P. Drug Discov. Today, 2004, 9, 515-516. Bugrim, A.; Nikolskaya, T.; Nikolsky, Y. Drug Discov. Today, 2004, 9, 127-135. Englebienne, P. J. Mat. Chem., 1999, 9, 1043-1054. Malicka, J.; Gryczynski, I.; Lakowicz, J.R. Biochem. Biophys. Res. Commun., 2003, 306, 213-218. Li, N.; Tourovskaia, A.; Folch, A. Crit. Rev. Biomed. Eng., 2003, 31, 423-488. Horrocks, C.; Halse, R.; Suzuki, R.; Shepherd, P.R. Curr. Opin. Drug Discov. Devel., 2003, 6, 570575. Schwille, P.; Kettling, U. Curr. Opin. Biotechnol., 2001, 12, 382-386. Beske, O.E.; Goldbard, S. Drug Discov. Today, 2002, 7, S131-S135. Hoever, M.; Zbinden, P. Drug Discov. Today, 2004, 9, 358-365. Ramm, P.; Thomas, N. Science Stke, 2003, www.stke.org/cgi/content/full/sigtrans;2003/177/pe14, 14. Högemann, D.; Ntziachristos, V.; Josephson, L.; Weissleder, R. Bioconjug. Chem., 2002, 13, 116-121. Raschke, G.; Kowarik, S.; Franzl, T.; Sönnischen, C.; Klar, T.A.; Feldmann, J.; Nichtl, A.; Kürzinger, K. Nano Letters, 2003, 3, 935-938. Xu, X.H.N.; Brownlow, W.J.; Kyriacou, S.V.; Wan, Q.; Viola, J.J. Biochem., 2004, 43, 10400-10413. Birch, P.J.; Dekker, L.V.; James, I.F.; Southan, A.; Cronk, D. Drug Discov. Today, 2004, 9, 410-418. Walton, I.D.; Norton, S.M.; Balasingham, A.; He, L.; Oviso, D.F., Jr; Gupta, D.; Raju, P.A.; Natan, M.J.; Freeman, R.G. Anal. Chem., 2002, 74, 2240-2247. Beckmann, N.; Laurent, D.; Tigani, B.; Panizzutti, R.; Rudin, M. Drug Discov. Today, 2004, 9, 35-42. Clemons, P.A. Curr. Opin. Chem. Biol., 2004, 8, 334-338.
Frontiers in Drug Design & Discovery, 2005, 1, 87-96
87
Variety of the DNA Hybridization Rate and Its Relationship with High Order Structure of Single Stranded Nucleic Acids Makoto Tsuruoka* School of Bionics & School of Engineering, Tokyo University of Technology, Katakura 1404-1, Hachiouji, Tokyo, 192-0981 Japan Abstract: The hybridization of oligo-DNAs complementary to the sequences of the genes for verotoxins (Shiga toxins) type 1 and 2 of enterohemorrhagic Escherichia coli (EHEC) was monitored using fluorescence polarization under the reaction condition of high salt concentration (0.8 M NaCl), which had been optimized to obtain a high rate of hybridization. The time courses of fluorescence polarization for the fluorescently labeled oligomers (probe DNAs) mixed with the amplified DNA of the genes were recorded. The secondary structures of the amplified single-stranded DNAs were forecasted based on the calculation for minimum free energy at a specified minimum stacking length. Five probe DNA sequences were designed, some of which hybridized extremely rapidly with an amplified product for the gene of Shiga toxin type 1. In the cases using the two different probe DNAs, the hybridization was 90% complete in less than 1 min, considerably faster than that of the 3 min reported previously, while with another probe it was not complete in more than 14 min. The variety of the rate for hybridization could not be explained by melting temperature or G + C content of the probe sequences. It was suggested that the reason for the slow hybridization would be a steric hindrance, comparing the hybridization rates with the shapes around the binding sites for the probes in the secondary structures.
1. INTRODUCTION This paper describes the development of a rapid assay, based on detection of fluorescently labeled probe following hybridization with target sequence using fluorescence polarization. The assay aims to identify specific gene sequences; examples used are those for Shiga toxin type 1 and 2 of enterohemorrhagic Escherichia coli. The system has potential for rapid screening of clinical samples for the presence of infectious agents. Equally, the method may be of benefit to those interested in screening environmental or food samples for the presence of contaminated microorganisms, but may also find use as a rapid process monitoring tool, for example, the production of plasmid DNA for gene therapy under consensus.
*Corresponding address: E-mail:
[email protected] Garry W. Caldwell / Atta-ur-Rahman / Barry A. Springer (Eds.) All rights reserved – © 2005 Bentham Science Publishers.
88
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Makoto Tsuruoka
Many methods have been reported for detecting specific nucleic acid sequence or following nucleic acid hybridization. In fluorescence polarization analysis (FPA), almost real-time monitoring of association between nucleic acids in solution can be performed. Hybridization reaction between the fluorescent-labeled oligo-nucleotide probes, and the complementary single-stranded nucleic acids has been already observed, using fluorescence polarization in some reports [1-3]. They employed quite simple architecture of DNA detection, in which a fluorescent-labeled probe DNA was mixed with a target DNA and fluorescence polarization or anisotropy was monitored. In this type of FPA, optimizing such conditions as NaCl salt concentration and the temperature of reaction mixture was possible in order to maximize the rate of hybridization, because the reaction conditions for hybridization could be organized in wider ranges [4]. Under the optimized conditions of high salt concentration (0.8 M NaCl) at the temperature of 46 °C, it was shown that the hybridization reactions were 90% completed within 5 minutes of mixing the probe DNAs with the synthesized complementary oligo-DNAs or the complementary gene DNAs of microorganisms amplified by the asymmetric PCR [5]. The aims of this report are to describe the several DNA probes inducing extremely high rates of nucleic acid hybridization under the conditions, to make basic consideration on the reasons for the rapidity, and to comment on the possibility of high throughput screening for nucleic acids. Hence, the time courses of fluorescence polarization, the rates of hybridization and the secondary structures of the segment DNAs for the genes are mainly brought into focus here. 2. EXPERIMENTAL 2.1. Instrumentation Fluorescence polarization values were determined using Beacon 2000 (Panvera Corporation, WI, U. S. A.) to observe the time courses for the genes of Shiga toxin type 1 (stx1) & type 2 (stx2). The samples were measured in a disposable test tube. The measured polarization values were multiplied by 1000 times and plotted in the arbitrary unit of mP, although polarization itself has no unit of dimension. DNA amplification by PCR was performed using a PCR thermal cycler (model 9600, Perkin-Elmer Co. Ltd., CA, U. S. A.). 2.2. Reagents and Samples Two original samples including the template gene DNAs of Shiga toxin type 1 (stx1) and type 2 (stx2) were prepared from the two strains of EHEC O26 and O157, respectively, in the Hiroshima City Institute of Public Health, Japan. Negative control samples for FPA were prepared, which were salmon sperm DNA (Funakoshi Co. Ltd., Tokyo, Japan) at the concentration of 800 ng/ml, and the PCR samples using DEPCtreated water instead of the positive templates. In addition to using FPA, the positive template samples subjected to the amplification protocols of the asymmetric PCR were analyzed using electrophoresis, and the expected bands for each sample were observed (data not shown). TE buffer (10 mM Tris-HCl (pH 8.0) and 1 mM EDTA disodium salt) was used to dilute the solutions. The concentrations of the DNAs were estimated from their absorbance at 260 nm.
Variety of the DNA Hybridization Rate
Frontiers in Drug Design & Discovery, 2005, Vol. 1 89
2.3. Sequences of the Primers and the Probes The chemically synthesized primers (a) and (b), and probes (e) - (g), whose sequences were shown below, were obtained from Sawadee technology (Tokyo, Japan). The probe DNAs were purified with HPLC. F in (e) - (g) represents a fluorescein label. The primer pair (a) & (b) was used to amplify universally the genes of stx1 and stx2 [6] by the asymmetric PCR. The probes (c), (d) and (e) were independently designed to be complementary to the amplified gene DNA of stx1, and those (f) and (g) were also designed for stx2 in the laboratory. (a) MK1; 5' TTT.ACG.ATA.GAC.TTC.TCG.AC 3' (b) MK2; 5' CAC.ATA.TAA.ATT.ATT.TCG.CTC 3' (c) PBST10; 5' F- GAT.AGT.GGC.ACA.GGG.GAT.AAT 3' (d) PBST11; 5' F- ATA.GAT.CCA.GAG.GAA.GGG.CGG 3' (e) PBST12; 5' F- ATT.CGC.TGA.ATG.TCA.TTC.GCT 3' (f) PBST20; 5' F- CAG.GCG.CGT.TTT.GAC.CAT.CTT 3' (g) PBST21; 5' F- AAC.CAC.ACC.CCA.CCG.GGC.AGT 3' 2.4. Amplification of the Genes of Shiga Toxins To amplify the segments of stx1 and stx2 genes and produce much single-stranded DNA, primers MK1 and MK2 were used in 10 pmol and 100 pmol, respectively. In the asymmetric PCR [7], the single-stranded DNAs with which the probes (c) - (g) could hybridize were amplified much more than the other ss-DNAs, which would compete with the probe DNAs in the course of hybridization [5]. Each template volume was 10 µl and the total volume of reaction mixture was 50 µl. The mixtures were initially denatured for 2 min at 94 °C, then they were subjected to 35 temperature cycles, which consisted of denaturation for 30 sec at 94 °C, primer annealing for 60 sec at 48 °C and extension for 30 sec at 72 °C. The last cycle was primer annealing for 60 sec at 48 °C after the denaturation. DNA polymerase used was Ex (Takara Shuzo Co., Ltd., Kyoto, Japan). The amplification of the genes of stx1 and stx2 using the asymmetric PCR was confirmed with agarose gel electrophoresis (data not shown). 2.5. Detection of the Amplified Products The probe DNAs were diluted to 0.6 nM with TE buffer containing 0.8 M NaCl. 100 µl of each probe reagent was mixed with 10 µl of each PCR product, and fluorescence polarization was measured every minute for 15 min at 46 °C. 2.6. Secondary Structures of the Amplified Products The forecast for the secondary structures of the amplified ss-DNAs was performed using Genetyx-Mac Ver.10 (Genetyx, Tokyo, Japan). The calculation based on the minimum free energy at a specified minimum stacking length for ss-RNA was substituted for that for ss-DNA, due to the restriction in the software. Minimum stacking length means the minimum chain length of the continuous or straight base pairs, when the complementary binding is formed within a single-stranded nucleic acid.
90
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Makoto Tsuruoka
3. RESULTS AND DISCUSSION 3.1. The Time Courses of Polarization Using the Probes for Shiga Toxin Genes Figure (1) shows the time courses of the polarization using the probe DNAs for stx1 immediately after being mixed with the asymmetric PCR product of stx1 gene of EHEC. In the figure, the time courses for the probes PBST10 and PBST11 with the positive sample were obviously different from those with the negative sample, where the polarization values of the former were much higher than the latter. The extremely rapid hybridization between these probes and the positive sample was observed, thus, the distinction between the positive and the negative would be possible in one minute of mixing the probes with samples. In contrast, the polarization value of PBST12 with the positive sample was not as high as those of PBST10 or PBST11, and its rate of hybridization was rather low.
Fig. (1). Time courses of polarization of the probe DNAs for stx1 gene. Probes: PBST10, PBST11 and PBST12 complementary to the segment of stx1 gene. Samples: stx1-gene positive and negative samples. To distinguish a group of data clearly, they are connected by simple straight lines.
Variety of the DNA Hybridization Rate
Frontiers in Drug Design & Discovery, 2005, Vol. 1 91
Figure (2) shows the time courses of the polarization of the probes for stx2, mixed with the asymmetric PCR product of stx2 gene. The time courses for the probes PBST20 and PBST21 with the positive sample were also obviously different from those with the negative sample. The polarization values of the former were higher than the latter in 1 4 min and reached saturation faster than the latter. Nevertheless, hybridization of these probes was not completed faster than as that of PBST10 or PBST11 in Fig. (1).
Fig. (2). Time courses of polarization of the probe DNAs for stx2 gene. Probes: PBST20 and PBST21 complementary to stx2 gene. Samples: stx2-gene positive and negative samples. To distinguish a group of data clearly, they are connected by simple straight lines.
3.2. Tm, G + C Content and the Rates of Nucleic Acid Hybridization Table 1 shows the melting temperature Tm, the G + C content and the time for hybridization to reach 90% completion, which was depicted as t90, for all the probe DNAs in this report. The values of t90 were simply calculated using the graphs of time course, which effectively show the rate of hybridization, although kh was more precisely defined in the previous report, using the curve fitting of time course data [4].
92
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Table 1.
Makoto Tsuruoka
Tm, G + C Content and t90 for the DNA Probes
Probes
Tm /°C
G+C /%
t90 /sec
N /mers
PBST10 PBST11 PBST12
70.8 74.7 68.9
47.6 57.1 42.9
60 or less 60 or less 840 or more
21 21 21
PBST20 PBST21
72.8 78.7
52.4 66.7
180 480
21 21
Tm = 81.5 + 16.6(log10[Na+]) + 0.41(%G+C) - (600/N). [8] t90: the time for the hybridization to reach 90% of completion. N: chain length.
The table indicates that the t90 was short, typically 60 seconds or less, for the two probes PBST10 and PBST11. They were hybridized extremely rapidly with the amplified product DNA for stx1 of EHEC (Fig. 1). In the time courses of these two probes, t 90 could not be determined well, because the time resolution of the polarization measurement was not high enough, and in the same manner, kh could not be calculated. As far as I could see, such high rate of hybridization under the condition of high salt concentration (0.8 M NaCl; about five times higher than mammalian physiological concentrations) was originally reported in the paper [9], while temperature dependence of fluorescence anisotropy was observed in high time-resolution, using the selfcomplementary DNA decamer duplex [10]. For the interest, even under the optimized conditions of salt concentration and temperature (0.8 M NaCl, 46 °C), much variety was clearly observed in the rate of hybridization, when the probes with different sequences were hybridized with the same segment of stx1 or stx2 (Fig. 1 or 2). Moreover, the variety of t90 was regarded, independent of Tm or G + C content (Table 1). Therefore, t90 was considered strongly dependent on the nucleic acid sequences, rather than the binding energy of double helix, thus, t 90 would be varied according to a site or region of the individual amplified product of the genes. 3.3. Secondary Structures of the DNA Segments of Shiga Toxin Genes and the Rapidity of Probe Hybridization Figures (3) and (4) show the samples which forecast the secondary structures for the single-stranded DNA segments of stx1 (228 base section) and stx2 (228 base section), where the calculation was executed at the specified minimum stacking length (MSL) of 6 and 7, respectively. In the figures, the binding sites of hybridization for the five probes, PBST10, PBST11, PBST12, PBST20 and PBST21, were shown by halftone screen. As for the binding sites of both PBST12 and PBST21, 3' terminus of the probes was overlapped with an end of the stacking parts with continuous 10 base pairs in the secondary structures for both stx1 and stx2 (Fig. 3 and Fig. 4). In contrast, the binding sites for PBST10, PBST11 and PBST20 were free, and there was no stacking part in the distance of 3 bases or less from their termini (Figs. 3 and 4).
Variety of the DNA Hybridization Rate
Frontiers in Drug Design & Discovery, 2005, Vol. 1 93
Fig. (3). A secondary structure model for the segment of stx1. Minimum stacking length: 6.
94
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Fig. (4). A secondary structure model for the segment of stx2. Minimum stacking length: 7.
Makoto Tsuruoka
Variety of the DNA Hybridization Rate
Frontiers in Drug Design & Discovery, 2005, Vol. 1 95
Here, it was thought that the slower hybridization of PBST12 and PBST21 than the other probes was related with the shapes in the secondary structures. The stacking parts especially with the continuous 10 base pairs shown in both structures would not get loose to a single strand smoothly, because they were stable in energy. Besides, the structural distortion in the vicinity of the stacking parts would be strong. Therefore, it was considered that it would take longer time for PBST12 and PBST21 to hybridize and that the hybridization rate for the probes would be slower (Table 1 in Section 3.2). 4. CONCLUSION Under the condition of high salt concentration (0.8 M NaCl), two probe DNAs were obtained that rapidly hybridized (less than 1 min) with the segment of Shiga toxin type 1 gene (stx1) for enterohemorrhagic Escherichia coli (EHEC). Variety of the rate of hybridization was observed to be independent of Tm or G + C content which could represent the binding energy of double helix, thus, the rate would be varied according to a site or region of the segment for stx1 or stx2. It was considered that one of the greatest reasons for the variety would be a kind of steric hindrance with the three-dimensional structure of the single-stranded gene segment, and that it would become easier to design the probe oligo-DNA if the prediction of high-order structure becomes more accurate. However, it should be noted that there is no method of clearly forecasting the three-dimensional structure of a segment of single-stranded nucleic acid in various solution conditions including salt concentration, pH and temperature in the present situation. The fluorescence polarization method described here has the characteristics that two or more different reactions such as nucleic acid hybridization and ligand-receptor reaction can be monitored almost at the same time in one series of measurement operation and, in the hybridization reaction, nucleic acid samples through different amplification methods and those without amplification can be measured at a time [9]. Thus, as applications of the research, high throughput detection or screening of nucleic acids may be proposed, preparing a combination of the high rate probes for hybridization. Recently, the soluble receptors which maintain sufficient activity in solution have been developed [11]. Screening of nucleic-acid or peptide ligands for receptors in vitro/solution may be also proposed, because the ligand-receptor reactions could be evaluated in solution under almost the same conditions of salt concentration and pH, with those of biological liquids using the receptors. ACKNOWLEDGEMENTS This study was performed in support by ASTL of Hiroshima and the Hiroshima City Institute of Public Health, Japan. ABBREVIATIONS EHEC
=
Enterohemorrhagic Escherichia coli
FPA
=
Fluorescence polarization analysis
kh
=
The rate constant for hybridization
MSL
=
Minimum stacking length
96
Frontiers in Drug Design & Discovery, 2005, Vol. 1
PCR
=
Polymerase chain reaction
ss-DNA
=
Single-stranded DNA
ss-RNA
=
Single-stranded RNA
stx1
=
The gene of Shiga (Vero) toxin type 1
stx2
=
The gene of Shiga (Vero) toxin type 2
TE
=
10 mM Tris-HCl/1mM EDTA
t90
=
The time for hybridization to reach 90% of completion
Makoto Tsuruoka
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]
Kobayashi, S.; Tamiya, E.; Karube, I. MRS Int’l. Mtg. Adv. Mats., 1989, 14, 95-101. Herning, T.; Kobayashi, S.; Tamiya, E.; Karube, I. Anal. Chim. Acta, 1991, 244, 207-213. Murakami, A.; Nakaura, M.; Nakatsuji, Y.; Nagahara, S.; Qui, T-C.; Makino, K. Nucleic Acids Res., 1991, 19, 4097-4102. Tsuruoka, M.; Yano, K.; Ikebukuro, K.; Nakayama, H.; Masuda, Y.; Karube, I. J. Biotechnol., 1996, 48, 201-208. Tsuruoka, M.; Fukuhara, K.; Murano, S.; Okada, M. Nucleic Acids Symposium Series, 1998, 39, 115116. Karch, H.; Mayer, T. J. Clin. Microbiol., 1989, 27, 2751-2757. Gyllensten, U. B.; Erlich, H. A. Proc. Natl. Acad. Sci. USA, 1988, 85, 7652-7656. Bolton, E. T.; McCarthy, B. J. Proc. Natl. Acad. Sci. USA, 1962, 48, 1390. Tsuruoka, M.; Murano, S.; Okada, M.; Ohiso, I.; Fujii, T. Biosens. Bioelectron., 2001, 16, 695-699. Nordlund, T. M.; Andersson, S.; Nilsson, L.; Rigler, R. Biochemistry, 1989, 28, 9095-9103. Sato, A.; Yamamoto, S.; Kajimura, N.; Oda, M.; Usukura, J.; Jingami, H. Eur. J. Biochem., 1999, 264, 439-445.
Frontiers in Drug Design & Discovery, 2005, 1, 97-111
97
An Overview of High Throughput Screening at G Protein Coupled Receptors Richard M. Eglen* DiscoveRx Corp., 42501 Albrae St., Fremont, CA 94358, USA Abstract: Technologies used for high throughput screening (HTS) at G protein coupled receptors (GPCRs) comprise two major approaches; those generally conducted measuring signal intensity changes using a microtiter plate format, and those measuring cellular protein redistribution via imaging-based analysis systems. Several homogeneous assays, i.e. those without wash and fluid phase separation steps, measure changes of second messenger signaling molecules including cAMP, Ins P3 and calcium. Imaging based assays determined the translocation of GPCR associated proteins such as β arrestin, or internalization of the receptor labeled with fusion tags. Generally, the former assays are used in a primary screening campaign, whilst the latter are used in secondary screening and lead optimization. However, increasing use of automated confocal imaging systems and prevalence of modified cell lines has expanded use of protein redistribution assays. Finally, radiometric techniques are widely used, frequently to measure GPCR ligand binding, using a scintillation proximity assay format. In this paper, the various assay methods used for HTS at G protein coupled receptors are compared and contrasted.
G PROTEIN SCREENING
COUPLED
RECEPTORS
AND
HIGH
THROUGHPUT
G protein coupled receptors (GPCRs) are a proven class of targets for drug discovery and are frequent targets entering high throughput screening (HTS) laboratories. Modern HTS is a highly automated approach to compound identification using robotic, fluid dispensing systems and sensitive signal detection instruments [1]. Several assay techniques for GPCR screening employ non-radiometric assay platforms. Radiometric techniques dominate in assays in which ligand binding to the GPCR per se is measured, [2]. There is increased adoption of functional assays in which the effects of the activated GPCR on function are determined, with concomitant development and implementation of cell based assay technologies, and automated cell culture and dispensing [3]. Compounds active at GPCRs have therapeutic benefit in many diseases ranging from central nervous system disorders, including pain, schizophrenia and depression, and metabolic disorders, such as cancer, obesity or diabetes [5]. GPCRs are considered a highly ‘druggable’ class of proteins, with over 40% of marketed drugs (such as Zyprexa, Clarinex, Zantac and Zelnorm) acting to modulate their function. Interestingly, approximately 9% of global pharmaceutical sales are realized from drugs targeted *Corresponding author: Tel: 510 979 1415 x 103; Fax: 510 979 1650; E-mail:
[email protected] Garry W. Caldwell / Atta-ur-Rahman / Barry A. Springer (Eds.) All rights reserved – © 2005 Bentham Science Publishers.
98
Frontiers in Drug Design & Discovery, 2005, Vol. 1
Richard M. Eglen
against only 40-50 well-characterized GPCRs. As there are between 800-1000 genes in the human genome belonging to the GPCR superfamily, it is likely that many more GPCRs remain to be validated as drug targets. Furthermore, endogenous ligands have been identified for only 200 GPCRs, even though the human genome contains many more GPCR genes. When the sensory classes of GPCRs are excluded, many of the remaining GPCRs are ‘orphan’ in nature, in that no ligand or function is presently known. There is therefore, intense interest in identifying novel ligands for orphan GPCRs, both as potential therapeutics, or as pharmacological probes to refine physiological function [6]. The primary function of GPCRs is to transduce extracellular stimuli into intracellular signals. GPCRs are a large, diverse and highly conserved class of membrane-bound proteins. On the basis of homology with rhodopsin, they possess a single, serpentine, polypeptide chain with seven transmembrane helices, comprising of three extracellular loops and three intracellular loops. The amino terminal is located extracellularly, and the carboxy terminal intracellularly. GPCRs are divided into three broad classes, based upon the similarity of the transmembrane sequences and the nature of their ligands. Class 1 includes rhodopsin-like receptors, and the ligands that activate them are biogenic amines, chemokines, prostanoids, and neuropeptides. Class 2 includes secretin-like receptors, and are activated by ligands including secretin, parathyroid hormone, glucagon, calcitonin gene related peptide, adrenomedullin, calcitonin, etc. Class 3 includes metabotropic-glutamate-receptor-like and calcium sensing receptors [7]. Agonist binding to the GPCR promotes a conformational change in the protein, specifically an ionic interchange between the 3rd and 4th transmembrane domain. This induces coupling of the GPCR to the G protein, initiating signaling to the cell interior. Although a diverse protein class, the diversity of GPCR pathways activated in the cell is relatively small, and often involves modulation of two membrane bound enzymes, adenylyl cyclase and phospholipase C. GPCR signaling induces coupling of the liganded receptor to a heteromeric G protein. These are composed of α, β and γ subunits, also a diverse protein group, comprising 18 Gα, 5 Gβ and 11 Gγ subunits. The GPCR/G protein interaction accelerates exchange of guanosine triphosphate (GTP) for guanosine diphosphate (GDP) on the α subunit, leading to the dissociation of the complex from the βγ subunits. The free α or βγ subunits then interact with second messengers; the precise nature of which is dependent upon the GPCR subtype, and the G protein subunits mobilized [8]. GPCR coupling to Gαs and Gαi /o proteins activate or inhibit, respectively, adenylate cyclase, the enzyme responsible for converting adenosine triphosphate (ATP) to 3’ 5’ cAMP and inorganic pyrophosphate. cAMP then acts at several downstream targets including ion channels, kinases that modulate gene transcription and cell metabolism. GPCRs coupling to G αq/o proteins, alternatively, activate phosphoinositol phospholipase Cβ, which hydrolyzes phosphatidylinositol 4, 5 bisphosphate (PIP2) forming sn 1, 2 diacylglycerol and inositol 1, 4, 5 trisphosphate (Ins P3). Ins P3 binds and opens endoplasmic Ins P3 gated calcium channel, causing release of bound calcium into the cytosol. Metabolic products of Ins P3 also modulate cellular function including inositol (1, 3, 4, 5) P4 (Ins P4), which acts to facilitate Ins P3, mediating calcium release synergistically [8].
An Overview of High Throughput Screening
Frontiers in Drug Design & Discovery, 2005, Vol. 1 99
In the presence of continued agonist activation of the GPCR, signaling is attenuated by a coordinated process of desensitization, inactivation and internalization occurs, resulting in GPCR phosphorylation by specific GPCR kinases (GRKs), and subsequent association with the adapter molecule, β arrestin. This complex then initiates the internalization and recycling processes of the receptor: ligand complex [9]. The protein, β arrestin, also facilitates formation of multimolecular complexes, and provides the means by which liganded GPCRs influence numerous cell pathways. By this means, GPCRs activate interconnected pathways, including those involving MAP kinases, nonreceptor tyrosine kinases, receptor tyrosine kinases, phosphatidylinositol 3-kinases, and JNKs [10]. In the context of assays for HTS, several aspects of GPCR signaling described above have been utilized in various assay formats. Thus, ligand binding, guanine nucleotide exchange, second messenger mobilization, ancillary protein recruitment and receptor internalization have all been used as the basis for HTS assays. In the present review, various high throughput approaches for GPCR screening are assessed and are summarized in Table (1). Several reviews on this subject have been published, and the reader is referred to these for additional references [1-4]. Cellular Expression of GPCRs The cell background imposes phenotypic selectivity on GPCR ligand pharmacology [11], which implies that the cell type used to express the receptor for screening is critical. Phenotype-specific pharmacology arises from several causes. Firstly, several GPCRs, particularly those from classes 1 and 3, form dimers or higher order oligomers at the plasmamembrane. Although GPCRs dimerize early in the synthetic pathway, the addition of agonists or antagonists influences formation of higher order oligomers [12]. It is now evident that GPCR oligomer formation depends on both the cell type, as well as the ligand. [13] Secondly, ancillary proteins expressed in the cell also modulate receptor function, and therefore, ligand pharmacology. Specifically, GPCRs interact with several transmembrane and soluble proteins, collectively termed GPCR-interacting proteins or GIPs. These GIP complexes couple to the carboxy terminal of the receptor, linking GPCR activation to a diverse network of pathways termed ‘receptorsomes’; the co-localization of which provides defined spatial and temporal motifs that are highly cell dependent [14]. One example of the effect of such ancillary proteins are the RAMP (receptor activity modifying proteins) class of proteins, the nature of which determines the level and pharmacology of GPCR activity. Thirdly, the cell phenotype influences the action of allosteric GPCRs ligands. Such ligands, in contrast to surmountable agonists and antagonists, act at sites additional to the ligand binding site on the GPCR. They thus modulate receptor function, while maintaining important temporal aspects of cell signaling [15]. Allosteric ligands can have high selectivity between GPCR subtypes, and may even exert actions in their own right by modulating constitutive receptor activity. Since neither all allosteric ligands block receptors similarly, nor all agonists are blocked equally, the allosteric function of the GPCR ligand also varies according to the cellular phenotype [15]. Overall, compound efficacy, previously considered to be an exclusive property of GPCR and its ligand, is now also an essential property of the cell or tissue in which the receptor is expressed [16]. These considerations highlight the importance of the choice of cell for use in screening of novel GPCR ligands. Furthermore, the ability of the phenotype to modulate cell function specifically provides a level of complexity to the
100 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Richard M. Eglen
pharmacology of the ligand being screened. This is one of the reasons why cell based functional assays have increased in use in HTS campaigns. Table 1.
Comparison of Various Approaches Used in GPCR Functional Assays (Adapted from Ref. 3)
Assay type
Gq coupled receptors
Gi or Gs coupled receptors
Reporter genes
Protein redistribution
Receptor internalization
Technique
Changes in intracellular calcium
Changes in intracellular cAMP
Changes in NFAT (calcium signaling) or CRE (cAMP signaling) reporter gene activity
β arrestin redistribution
Internalization of CypHer dyeGPCR epitope tags
Highly sensitive Highly sensitive automated automated screening system to screening system to detect agonists and detect agonists and antagonists. antagonists.
Highly sensitive and Automatable screening systems
Many GPCRs induce arrestin translocation upon activation including several orphans
Allows measurement of ligand induced internalization
Advantages
Allows use of promiscuous G proteins to couple to PLC or chimeric GPCRs and G protein constructs
Detect elevations in basal cAMP due to constitutive activity
Disadvantages No changes in basal Only detects Gs and levels in Gi coupled intracellular GPCRs calcium with constitutive activity. Difficult to detect inverse agonists. Need to assess agonist efficacy and potential changes in agonist pharmacology
Generic screening system for agonists and antagonists
Downstream from ligand activation and requires prolonged incubation with GPCR ligands, results in many false positives.
Cells require over expression of both the GPCR and β arrestin Modification of the carboxy termini may influence ligand pharmacology.
Can detect constitutive internalization
Requires the use of epitope tagged GPCRs and internalization of antibody / receptor complex.
Extensive amounts of data generated during screening campaigns.
Most GPCR HTS assays employ non-recombinant or recombinant stable cell lines, as opposed to native tissues, partly due to the need for a large number of clonal cells of constant phenotype in the screen. Several non-recombinant cells have the major advantage that they endogenously express GPCRs, presumably in a background appropriate to expression and function of the receptor. However, the endogenous GPCR expression levels are often too low to be successfully used in ligand binding assays, and sensitive functional assays remain the only viable option for their use in HTS. Recombinant cells, alternatively, can be constructed to express specific GPCRs in mammalian systems, such as HEK (human embryonic kidney) or CHO (chinese hamster ovary) cells. Here, the high expression levels are easily adjusted, particularly when using
An Overview of High Throughput Screening
Frontiers in Drug Design & Discovery, 2005, Vol. 1 101
plasmid expression vectors harboring viral promoters. Typically, these promoters are derived from cytomegalovirus virus (CMV) or Rous sarcoma virus (RSV), and contain antibiotic resistant genes allowing for detection of positively transfected clones. Mammalian bicistronic expression vectors are often used with the advantage that there is two open reading frames from one mRNA. In some cases, particularly where cell membranes may be used in a binding assay, the GPCR is transiently expressed. This increases the speed at which cells are produced for the screen, but allows receptors to be expressed at similar levels, facilitating easier comparison between multiple receptors. In instances where very large numbers of cells are needed as in, for example, a large cell based HTS campaign, stably transfected cells are preferred. The high expression levels of GPCRs are also optimized for several HTS assay formats, including both membrane binding and cellular functional studies. Very high level of expression of GPCRs can be attained using Baculovirus-insect expression systems. Currently, recombinant baculoviruses, where the insect cell specific promoter is replaced with a mammalian cell active expression cassette, provide very efficient gene delivery in many mammalian cell types [17]. A recent report [18] has shown simultaneous expression of multiple GPCR genes in HEK 293 or U-2 OS osteosarcoma cells, using a GPCR and a G protein chimera, coupling the receptor to a calcium response. Similar observations have been shown in Sf 9 cells, using co expression of GPCRs with the promiscuous Gα16 subunit and a luminescent protein aequorin [18]. High levels of receptor expression, such as baculoviral recombinant techniques, also render them very useful for isolation of membranes for use in binding studies, enhancing the specific binding of the ligand. However, when such membranes are used in low assay volumes, ligand depletion may occur that will complicate analysis. Indeed, membrane binding assays may yield different pharmacological data than that derived from binding studies with whole cells, particularly since inverse agonists bind more slowly and with lower affinity to GPCRs. This can cause the membrane-based assay to miss compounds acting in this manner; an issue of high relevance, when the screen is directed at finding inverse agonists at constitutively active orphan GPCRs [11]. HIGH THROUGHPUT LIGAND BINDING ASSAYS Measurement of GPCR Ligand Binding Radioisotopic labeling of GPCR ligands are widely employed in GPCR screening and utilize three general approaches [19] Firstly, saturation experiments in which the affinity of the radioligand is determined, secondly, inhibition studies in which the affinity of a competing unlabeled compound is measured and thirdly, kinetic experiments in which the association or dissociation rates of the radioligand are defined [19]. Radioligand binding assays use [3H] or [125I] isotopes incorporated into ligands which are then used in low, tracer, concentrations, to monitor specific binding. A competition displacement format is often used in HTS, involving a fixed concentration of radioligand, and its displacement from the GPCR by the library compound. Historically, this approach has utilized wash and filtration procedures to separate the bound ligand from the unbound material, with the residual activity measured by liquid scintillation spectrometry. This approach has the advantage that it is very quantitative in terms of ligand pharmacology, and widespread adoption of the technique over the years has resulted in the broad commercial availability of radioligands for several GPCRs. The
102 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Richard M. Eglen
need to minimize filtration and resultant radioactive waste has also driven adoption of a scintillation proximity assay (SPA) format. In an SPA assay, radiolabeled molecules binding to the GPCR membrane preparation activate the scintillating beads, triggering red or blue light emission. Several polystyrene beads, optimized for HTS are available such that non-specific ligand binding is reduced. These include beads with a yttrium core covered by hydrophilic coatings, or those with Wheat germ agglutinin covalently attached to the bead surface, enhancing capture cell membranes and further reducing non specific binding [20]. A limited number of non-isotopic approaches have been developed for GPCR ligand binding assays, the majority of which employ fluorescence polarization (FP) detection. GPCR ligands labeled with fluorescent dyes, such as fluorescein or BODIPY TMR, have been used to determine binding to GPCRs via FP detection. FP is the ratio of fluorescence emissions in the vertical and horizontal planes. When fluorescent molecules are excited with polarized light, the degree to which the emitted light retains polarization, reflects the rotation that the molecule underwent between excitation and emission. Small molecules rotate rapidly, and emitted light is random with respect to the plane of emission. When bound to a large protein (such as a GPCR), the ligand rotates slowly, and the emitted light retains more of its polarization, which is measured as an increase in FP. The extent of polarization increase is linearly dependent on the extent of binding [21]. FP assays, apart from eliminating radioactivity, are homogeneous, allowing for a real-time, continuous readout of binding activity [22]. However, the approach requires the use of amounts of cell membrane or high levels of expression of GPCRs than those generally required for radioligand binding protocols [23]. Moreover, FP based assays, with few exception, are mostly restricted to peptidergic GPCR ligands, in contrast to the wide variety of isotopically labeled ligands. Fluorescence intensity distribution analysis (FIDA) allows single molecule detection that is sensitive to changes in intensity, which occurs when fluorescent ligands bind to membranes. Indeed, the approach appears to require lower GPCR expression levels than FP. When used in a confocal mode, FIDA allows the use of extremely low assay volumes to approach those volumes used in ultraHTS [24, 25]. Measurement of Guanine Nucleotide Binding GPCR binding assays do not easily differentiate between those compounds that bind and activate the receptor (agonists), from those that bind and occlude the receptor, preventing subsequent activation (antagonists). A radiometric binding approach used assesses GPCR activation measures of guanine nucleotide binding to the liganded GPCR [19, 26]. Agonist binding to GPCRs promotes the exchange of GDP for GTP at the a subunit of heterotrimeric GTP binding proteins (G proteins). Poorly hydrolysable GTP analogs, such as GTPγS, or its isotopic analog, [35S] GTPγS, are used, to monitor the agonist-dependent activation of G proteins in membranes from cells expressing GPCRs. This technique has been widely used over the years, and a fully automated SPA assay using this technique was reported using GPCRs coupled to Gαi subunit [27, 28]. Moreover, the use of GPCR-G protein fusion proteins, expressed as bifunctional polypeptides allow [35S] GTPγS binding assays to be developed that explore the extent of GPCR oligomerization. When used with complementary pairs of non-functional mutants that reconstitute function when co-expressed, studies can be designed that discriminate between homo and heterodimerization [29].
An Overview of High Throughput Screening
Frontiers in Drug Design & Discovery, 2005, Vol. 1 103
Three different approaches enable measurement of [35S] GTPγS binding to specific Gα-subunit families: (i) co expression of G proteins and receptors in Sf9 insect cells; (ii) construction of GPCR-Gα fusions in cells; (iii) immuno-capture techniques. Immunocapture methods, notably, do not require cell engineering for co expression of receptors and G proteins or the creation of fusion constructs. De Lapp has recently reviewed a technique combining immunocapture with SPA technology [30]. This method is easily automated and allows analysis of agonist, partial agonist, antagonist and inverse agonist activity at the level of specific interactions between GPCRs and Gα subunits. However, the productivity of antibody-capture SPA requires the use of large amounts of diluted antibody for 96-well plate incubations, making antibody considerably costly [30]. The identification of non radiometric alternatives to [35S] GTPγS is difficult, principally due to problems of incorporating a detectable label that does not impair G protein binding [31]. A time resolved fluorometric GTP-binding assay has been described, using a GTP analog labeled with a Europium chelate, followed by a filtration washing step [32]. Currently, however, there are few reports in the literature, relating to the use of the technique in HTS. GTP binding assays in general, provide only data at discrete intervals and are not always able to resolve the rapid kinetics of heterotrimeric G-protein signaling applications. Real time, fluorescence-based analyses of G protein and RGS-protein signaling offer some advantages in this regard. Currently, the most important fluorescence-based assay uses BODIPY-FL-GTPδS. Upon binding Gδ protein BODIPY-FL-GTPδS exhibits up to a six-fold increase in fluorescence emission at 510 nm, following excitation at 490 nm [33]. HIGH THROUGHPUT FUNCTIONAL ASSAYS In the last decade, the use of cell based assays to analyze GPCR ligands interactions has accelerated [34]. Although isotopically based techniques to measure ligand binding are still widely used (see above), numerous cell based assays are now in mainstream HTS use [35]. Measurement of liganded GPCR effects on cell function, as opposed to binding per se, permits characterization of compounds that modulate receptor activity in the cell environment. Thus, novel agonists are easily discerned from antagonists. Moreover, agents acting on GPCRs by other mechanisms, such as allosteric regulators or G protein activators, can be easily identified. These functional modulators, collectively, are difficult to characterize using binding methods alone [36]. Functional GPCR assays are also a prerequisite for the identification of endogenous or synthetic ligands for orphan GPCRs [1], including those compounds acting as inverse agonists to reduce constitutive G protein coupling [38]. Functional GPCR platforms typically utilize either reporter gene approaches, or assays monitoring the mobilization of cAMP or intracellular calcium [4]. In high throughput screening, all these approaches need to be amenable to automated fluid handling, scalable to different fluid volumes for use in high density microtiter plate formats, and generate a signal, generally free from artifactual interference. Since wash and separation steps are an impediment to automated fluid dispensing, HTS functional assays also require protocols to be homogenous. Reporter Gene Assays to Measure GPCR Function GPCR activation is well known to alter gene transcription [4]. Several genes contain responsive elements to second messengers, mobilized by GPCRs that can modulate gene
104 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Richard M. Eglen
expression, providing the basis for reporter gene assays [4]. In reporter gene assays, GPCR signal transduction is monitored using expression systems engineered with cisacting enhancer elements, DNA sequence motifs targeted by binding partners promoting gene expression and upstream of a reporter gene. These include β lactamase [39], β galactosidase, luciferase [40] and green fluorescent protein (GFP) [41]. GPCR signaling via changes in cAMP are assayed using the CRE (cAMP response element) enhancer of gene expression, while GPCRs coupled to changes in calcium utilize the calciumsensitive AP1 (activator protein 1) or NFAT (nuclear factor of activated T cells) elements [39]. The principal advantages of reporter gene assays are the wide linearity and sensitivity of the technique, making them very suitable for detection of weak GPCR agonists, or allosteric modulators. Reporter gene assays can also be used in the extremely low assay volumes used for 1536 or 3456 microtiter plate assay formats [4, 39]. However, the potential for false positives in the assay, arising from ‘off target’ actions of library compound on gene transcription during prolonged incubation times, necessitates the use of numerous controls or follow up assays. Second Messenger Assays to Measure GPCR Function Measurement of GPCR second messengers usually involves assays to determine accumulation of cAMP [41] or calcium [42]. Intracellular changes in these second messengers are undertaken in either the broken cells following lysis or in intact cells using cell-penetrant dyes, respectively. Although spatial information is lost in such ‘end point’ protocols, in contrast to the imaging based assays (see below), assays of this type provide highly quantitative pharmacological information on ligands that modulate the GPCR function. Data, such as this, is important in defining the action of the novel compound and thus, the structure activity relationships of a chemical series [43]. Accumulation of 3’ 5’ cAMP Cyclic AMP, specifically 3’ 5’ cyclic AMP, plays an important role in many signal transduction pathways. Consequently, the intracellular levels of cAMP are tightly regulated via the activity of the adenylyl cyclase enzyme family, some of which can be activated or inhibited by G proteins subunits, mobilized by GPCR activation. Changes in the intracellular cAMP levels correlate with GPCR activation and measurement of cAMP levels, a simple functional assay for GPCR screening. There are several assays for the measurement of cAMP, including those in which the accumulation of cAMP is assessed in a homogenous assay format. These have been reviewed in detail [41] and for the sake of brevity, will not be further discussed in the present review. However, one that aspect was highlighted, is based upon β galactosidase (β Gal) enzyme fragment complementation (EFC) [44, 45]. β Gal is an enzyme extensively used in cellular biology as a biological reporter. EFC occurs, when fragments of β gal complement in trans by intracistronic combination to form active enzyme [46, 47]. The commercial availability of numerous β gal substrates enables colorimetric, fluorescent or chemilumescent signals to be generated. In a competitive immunoassay approach, β gal EFC is now widely used for high throughput detection of cAMP [41]. Here, cAMP is chemically conjugated to a small (~5Kd) peptide fragment derived from the β gal α peptide. This conjugate complements to an inactive β gal ω peptide forming active enzyme. Steric hindrance of EFC occurs when
An Overview of High Throughput Screening
Frontiers in Drug Design & Discovery, 2005, Vol. 1 105
an antibody is added to the assay that binds conjugated cAMP, such that little or no active β gal is formed. However, free cAMP generated by the cell competitively displaces the conjugated nucleotide from the antibody, allowing it to freely complement and thus generating a signal. Accumulation of Inositol Phospholipids Assessment of Gq coupled GPCRs, is undertaken by measurement of changes in phosphoinositide phospholipase C activity. This is often done by measuring turnover of the inositol phospholipids cycle [48], using tritiated inositol incorporated. Activation of the GPCR, increases the incorporation of the radioactive inositol in the presence of lithium. This ion uncompetitively blocks inositol monophosphate phosphatase, thereby abrogating the cycle and increasing accumulation of the tritiated isotope at the monophosphate form [48]. This approach has been recently used in conjunction with immobilized metal ion affinity and SPA techniques, to provide a homogenous platform suitable for HTS automation [49, 50]. The measurement of the second messenger, Ins P3 specifically, is undertaken differently, using mass assays with GLC (gas liquid chromatography), anion exchange chromatography or HPLC. These techniques, while sensitive, are not adaptable to assays requiring high throughput. The recognition that Ins P3 binds to a specific intracellular receptor provides the basis for a radiometric competition-binding assay. Here, [3H] labeled Ins P3 is displaced from a crude preparation of the Ins P3 receptor, using a competition radioligand binding protocol [51]. This format, again when used with SPA, is high throughput [52], but the economics of isotopic waste disposal emanating from high volume screening, remains a significant issue. A non-isotopic assay for Ins P3 is now available using the ALPHAscreen technology [53]. This technique is an amplified luminescence assay that employs a donor and acceptor bead. When the donor bead is excited with light at 680nm, a photosensitizer coverts O 2 to single oxygen. When two beads are in close proximity, the single oxygen produces a chemiluminescent signal in the acceptor bead, activating bead fluorophores and amplifying the signal. In an Ins P3 assay, the two beads are held in close proximity by a biotinylated Ins P3 molecule, as the donor bead is coated with streptavidin, and the acceptor bead is coated with an Ins P 3 binding protein. In the absence of cell stimulation, a signal is seen. In the presence of free Ins P 3 from the cell, the donor and acceptor beads dissociate and the signal proportionally decreases [53]. Despite the advantage of ALPHAscreen approach as a non-isotopic homogeneous assay technology, the signal can be sensitive to compound quenching and ambient fluctuations in light and temperature. The ALPHAscreen Ins P3 assay is also limited in the number of cells per well, as matrix interferences from cell lysates reduce the signal. Echelon Biosciences have recently utilized the ALPHAscreen assay format, using a binding protein that binds several inositol phosphates, including IP2 and IP4. These cellular metabolites compete with a biotinylated inositol phosphate analog as described above [54]. This assay has an advantage that it detects several phosphoinositols, resulting in slower kinetics and an assay that is more easily automated. However, an extensive evaluation in HTS screens has not been reported to date. A novel assay has been developed at DiscoveRx Corp to measure Ins P3, which uses a fluorescent polarization format. The Ins P3 assay is a competitive binding assay, in
106 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Richard M. Eglen
which cellular Ins P3 displaces a fluorescent derivative of Ins P3 from a specific binding protein. The assay measures changes in fluorescence polarization (FP), a single wavelength ratiometric technique, in which a fluorescent derivative of Ins P3 is used as a tracer. The FP Ins P 3 assay is performed in crude cell lysates, thereby avoiding laborious separation and filtration steps. It is therefore important that the Ins P3 binding protein exhibits high affinity and selectivity for the D-myo-1, 4, 5-inositol-Ins P3 isomer, over other inositol polyphosphates. The buffer used in the Ins P3 assay is optimized to ensure high affinity binding, and competition binding studies with several substituted inositol phosphates demonstrate that the Ins P3 binding protein is specific for the D-myo inositol 1, 4, 5 Ins P3 isomer. Accumulation of Calcium Measuring changes in intracellular calcium, mobilized by GPCR activation, provide a highly sensitive assay technique to measure ligand function. [55] Platforms such as the fluorometric imaging plate reader (FLIPR) or functional drug screening system (FDSS) are used with dyes compatible with HTS applications, generally the fluorescent calcium sensitive dyes, Fluo-3, Fluo-4 or Calcium 3 [56, 57]. When bound to calcium, enhancements of 5-10 fold are seen between the resting cytosolic calcium levels and those induced by GPCR activation. Real time changes in the GPCR induced calcium transient signal are then determined in microtiter plate using the imaging instruments above. Recently, the assay using the FLIPR has now been miniaturized to a 1536 well format, reducing the cost of such assays [58]. The sensitivity of the technique is such that functional studies using calcium transients are often undertaken in cells endogenously expressing the GPCR of interest. Alternatively, the technique can also be adapted to cell lines engineered to over express both the GPCR and promiscuous calcium coupling G-protein Gα15/16 [59, 60]. Other studies have been conducted using Gαq proteins modified with the C-terminal amino acids, replaced with that from Gαi/o, or Gαs. The substitution permits appropriate receptor/G-protein recognition, and measurement of the functional response via calcium mobilization [61]. A promiscuous G αq subunit, lacking the highly conserved six amino acids amino terminal extension, yet possessing four residues of the Gαi sequence has also been reported [61]. This sequence is both palmitoylated and myristolylated, resulting in a G protein that specifically targets the plasmamembrane [61]. When co expressed with the GPCR, the functional responses of GPCRs are enhanced, increasing the sensitivity of the assay. Although measuring calcium transients is an important HTS technique, the protocols can be time consuming, particularly, when the experimental parameters must be varied from cell to cell. Furthermore, calcium is a second messenger downstream from Gαq coupled GPCR induced activation of PLC. Some compounds in the screening library may modulate intracellular calcium levels by means other than binding to the receptor; or may alter dye fluorescence, resulting in false positive or negative hits. Consequently, HTS platforms using the calcium sensitive bioluminescent protein, aequorin, have been developed [62, 63]. The approach utilizes the calcium ion binding property of the protein, causing in oxidation of the substrate, coelenterazine, with concomitant light emission. However, the approach requires a pre-incubation period of between four hours at room temperature or 18 hours at 4°C, prior to the addition of test compounds. The length of the incubation may also depend on the cell line and GPCR type.
An Overview of High Throughput Screening
Frontiers in Drug Design & Discovery, 2005, Vol. 1 107
PROTEIN REDISTRIBUTION ASSAYS Measurement translocation of GPCR and/or associated proteins employ clonal cells expressing the GPCR of interest, labeled with a suitable fluorescent protein. Activation of the receptor causes redistribution of the label, which is monitored by automated confocal systems and analyzed by imaging algorithms [64]. Analogous experiments are undertaken in fixed cells, in which the proteins are visualized by immunostaining techniques. Here, antibodies are used that recognize either the native protein or an epitope tag fused to the protein. In either of these assays approaches, spatial information on the proteins after pathway activation is derived, and in the case of intact cell studies, temporal data may also obtained [64]. Direct visualization of GPCR internalization, a ubiquitous property of GPCRs, occurs as part of the processes of receptor desensitization and inactivation [7]. The technique is facilitated by tagging the receptor with a fluorescent biomarker, generally, green fluorescent protein (GFP) [65]. Upon agonist activation, the GPCR-GFP fusion protein moves from the plasma membrane to internalized recycling compartments, forming intense bright spots [65]. Automated image capture and quantification of these trafficking events permits screening for both natural and surrogate receptor agonists [66]. Although a seemingly straightforward approach, issues are frequently presented including inappropriate targeting of the GPCR-GFP construct to the plasma membrane, and/or high levels of internalized receptor under basal conditions due to constitutive recycling. Furthermore, the pharmacological properties of a GPCR can be altered by the presence of a GFP-tag. Functional cell based GPCR assays also require labels attached to the GPCR or ancillary protein that do not affect function, yet possess sufficient sensitivity such that the GPCR is not required to be over expressed. Due to the relatively low quantal yield of fluorophores, the use of several fluorescent proteins is limited. Immunostaining procedures whilst sensitive, are highly dependent upon the quality of the antibodies used, and the immunostaining protocols can be lengthy and cumbersome for automation. A strategy focused on the internalization pathway, exploits the acidic nature of the endosomal recycling compartments [67]. The pH-sensitive cyanine dye CypHer 5 is non-fluorescent at pH 7.4 and brightly fluorescent in acidic environments. This approach utilizes a small amino-terminal epitope tag on the GPCR, recognized by a CypHer 5 conjugated antibody. Binding of the antibody to cell surface GPCRs fails to produce a fluorescence signal, and upon ligand-dependent internalization to endosomal compartments, intensely bright fluorescence spots of CypHer 5 are observed [67]. Monitoring cellular translocation of proteins participating in the GPCR desensitization/internalization cycle, specifically β arrestins has been developed as an assay format [68]. This approach monitors the change in distribution of a β-arrestin-GFP construct, from a diffuse intracellular localization to aggregated pit- like compartments, as a consequence of ligand-dependent internalization. Theoretically, the technique is applicable across GPCRs, and may overcome limitations associated with direct GPCR tagging [69]. Provided some degree of plasma membrane localization of the GPCR of interest is achieved, β arrestins translocation follows only those receptors committed towards the internalization pathway, offering a high signal-noise ratio [69]. β Arrestin translocation has also been monitored using bioluminescence energy transfer (BRET), in which a luciferase donor protein transfers energy to a green fluorescent protein [70]. In
108 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Richard M. Eglen
this assay, the green fluorescent protein is fused to arrestin and the GPCR to the luciferase, and translocation of the arrestin to the vicinity of the GPCR induces BRET and thus, a signal [70]. Indeed, biophysical techniques involving resonance energy transfer, namely FRET (fluorescence resonance energy transfer) or BRET enable monitoring of the formation of GPCR/protein complexes in living cells, in real-time [71]. Both FRET and BRET can be detected by microscopy, scanning spectroscopy or a suitable plate reader capable of sequential or simultaneous detection of filtered light emitted within two distinct wavelength windows. Microscopy is often used for detecting FRET, being most suitable for studies involving photobleaching. In contrast, microscopy is rarely used to detect BRET, with the majority of studies utilizing plate reading instrumentation. BRET assays have recently been modified using β arrestin mutants [71]. Here, the BRET2 signal is enhanced, using a phosphorylation independent mutant or β arrestin mutants deficient in their ability to interact with clathrin coated vesicles. Since both mutants, dissociate slowly from the GPCR, the BRET signal is enhanced, markedly improving the assay signal and allowing it to be adapted as an HTS tool. Alternatively, weakly interacting β Gal α and ω deletion mutants are fused to the interacting G protein: β arrestins. After enrichment by FACS analysis, clones are identified with low background expression of β Gal activity, but with a ligand dependent increase in enzyme activity, reflecting ligand induced association. Since the signal is generated by enzymatic amplification, over expression of labeled proteins is not necessary, and the cellular environment not grossly perturbed. Collectively, the approach appears suitable as an HTS system for detection of GPCR agonists or antagonists, and configured to perform in 384 well microtiter plates [72]. The assay exhibits agonist potencies similar to those detected in other GPCR assays and is capable of detecting receptor inhibitors with precision characteristics appropriate to HTS. OTHER GPCR SCREENING ASSAYS Melanophore cells derived from the neural crest of Xenopus laevis have been developed as a highly sensitive system for GPCR screening, since GPCR activation results in dispersion of the pigment melanin [73]. This occurs irrespective of whether adenylyl cyclase or phospholipase C is activated, whereas inhibition of activation results in aggregation, causing the cells to lighten in color. The sensitivity of the assay is such that in transiently expressed systems, constitutive activity of the GPCR can be determined, and the activity of inverse agonists readily determined. When use in conjunction with mutations of the GPCR that induce constitutive activity, such as the CART (constitutive active receptor technology) approach [74], it can provide a very simple means to assess novel ligands interacting at orphan GPCRs, as well as compounds that allosterically regulate GPCR activation. There are several other assay technologies now emerging for use in HTS at GPCRs. Although space limitations do not permit full evaluation of each, several are described for the sake of completeness. Cellular dielectric spectroscopy is a label free analysis system, in which changes in radio frequency spectrometry and bioimpedance exhibit a characteristic pattern when GPCRs are activated by an agonist. Potentially, this may provide a generic assay for GPCRs, without the need for chimeric or promiscuous G proteins and without the incorporation of radiometric of fluorometric detection systems. Other analytical techniques to measure ligand binding to membrane receptors such as acoustic, optical surface plasmon resonance biosensing, sedimentation, isothermal
An Overview of High Throughput Screening
Frontiers in Drug Design & Discovery, 2005, Vol. 1 109
titration calorimetry and differential scanning calorimetry are now emerging. These have been recently reviewed by Cooper [76]. SUMMARY Functional analysis of GPCR responses is a common approach for high throughput screening of libraries of compounds. Measurement of second messengers (cAMP, Ins P3 or calcium) is readily achieved using homogeneous assays requiring neither radiometric techniques nor heterogeneous assay formats. Moreover, all these assays can be automated for robotic fluid dispensing. Other assay formats measure GPCR activation, using the redistribution of labeled protein, (β arrestin or receptor). These assays tend to have lower throughput and are applicable to many GPCR classes. Nonetheless, the ease and robustness of radioligand binding assays, as well as the limited non-radiometric alternatives, will ensure that isotopic techniques will also continue to be used. In practice, it is likely that a combination of all these HTS assay formats will be employed in compound identification, optimization and elucidation of mechanism of action. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31]
Szekeres, P.G. Recept. Channels, 2002, 8, 297-298. Wise, A., Jupe, S.C., Rees, S. Ann. Rev. Pharmacol. Toxicol., 2004, 44, 43-60. Dunlop, J., Eglen, R.M. Drug Discov. Today: Technologies, 2004, 4, 61-68. Conway, B.R. and Demarest, K.T. Recept. Channels, 2002,. 8, 331. Drews, J. Science, 2000, 287, 1960-1964. Kenakin, T.P. Pharmacol. Revs. Comm., 2000, 11, 93-111. Pierce, K.L. Premont, R.T., Lefkowitz, R.J. Nat. Rev. Mol. Cell Biol., 2002, 3, 639-650. Pitcher, J.A., Freedman, N.J., Lefkowitz, R.J. Ann. Rev. Biochem., 1998, 67, 653-692. Chiang, A., Laporte, S.A., Caron, M., Lefkowitz, R.J. Prog. Neurobiol., 2002, 66, 61-79. Kroeze, W.K., Sheffler, D.J., Roth, B.L. J. Cell. Sci. 2003, 116, 4867-4869. Kenakin, T.P. Nat. Rev. Drug Discov., 2003, 2, 429-438. George, S.R., ODowd, B.F., Lee, S.P. Nat. Rev. Drug Discov., 2002, 1, 808-810. Gazi, L., López-Giménez, J.F., Strange, P.G. Curr. Op. Drug Discv. Deve., 2002 5, 756-763. Bockaert, J., Roussignol, G., Bécamel, C., Gavarini, S., Joubert, L., Dumuis, A.L., Fagni, L., Marin, P. Biochem. Soc. Trans., 2004. 32, 851–855. Kenakin T.P. Recept. Channels, 2004, 10, 51-60. Kenakin, T.P. Ann. Rev. Pharmacol. Toxicol., 2002, 42, 349-379. Knight, P.J.K., Pfeifer, T.A., Grigliatti, T.A. Anal. Biochem., 2003, 320, 88-103. Ames, R., Nuthulaganti, P., Fornwald, J., Shabon, U., van der Keyl, H., Elshourbagy, N. Recept. Channels, 2004, 10, 117-124. Byland, D.B., Deupree, J.D., Toews, M.L. Meths Mol. Biol., 2004, 259, 1-28. Rodgers, G. Assay Drug Devel. Tech., 2003, 1, 627-636. Sportsman, J.R. Meths Enzymol., 2003, 361, 505-529. Gagne, A., Banks, P., Hurt, S.D. J. Recept. Signal. Trans., 2002, 22, 333-314. Lee, P.H., Bevis, D.J. J. Biomol. Screening, 2000, 5, 415-419. Rudiger, M., Haupts, U., Moore, K., Pope, A.J. J. Biomol. Screening, 2001, 6, 29-37. Scheel, A.A., Funsch, B., Busch, M., Gradl, G., Pschorr, J. and Lohse, M.J. J. Biomol. Screening, 2001, 6, 11-18. Ferrer, M., Kolodin, G.D., Zuck, P., Peltier, R., Berry, k., Mandala, S.M., Rosen, H., Ota, H., Ozaki, S., Inglese, J., Strulovici, B. Assay Drug Devel. Tech., 2003, 1, 261-273. Laitinen, J.T. Curr. Neuropharmacol., 2004, 2, 191-206. Harrison, C., Traynor, J.R. Life Sci., 2003, 74, 489-508. Milligan, G., Pascal, G., Carrillo, J. J. Med. Chem. Res., 2004, 13, 18-24. De Lapp, N.W. Trend in Pharmacol., Sci., 2004, 25, 400-401. Frang, H., Mukkala, V.-M., Syystö, R., Ollikka, P., Hurskainen, P., Scheinin, M., Hemmilä, I. Assay Drug Dev. Tech., 2003, 1, 275-280.
110 Frontiers in Drug Design & Discovery, 2005, Vol. 1 [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72]
Richard M. Eglen
Hemmilia, I.A. and Hurskainen, P. Drug Discov. Today: high throughput technologies supplement 2002, 7, S150. Gille, A., Seifert, R. Naunyn Schmiedebergs Arch. Pharmacol., 2003, 368, 210-215. Milligan, G. Drug Discov. Today, 2003, 8, 579-585. Rees, S. Recept. Channels, 2002, 8, 257-259. Lazareno, S. Meths Mol. Biol., 2004, 259, 29-46. Milligan, G. Int. Congress Series, 2003, 1249, 15-25. Dinger, M.C., Beck-Sickinger, A.G. Meths Principles Med. Chem., 2004, 21, 73-94. Kunapuli, P., Ransom, R., Murphy, K.L., Pettibone, D., Kerby, J., Grimwood, S., Zuck, P., Hodder, P., Lacson, R., Hoffman, I., Inglese, J., Strulovici, B. Anal. Biochem., 2003, 314, 16-29. Kornienko, O., Lacson, P., Kunapuli, P., Schneewels, J., Hoffman, I., Smith, T., Alberts, M., Inglese, J., Strulovici, B. J. Biomol. Screening., 2004, 9, 186-195. Williams, C. Nat. Rev. Drug Discov., 3, 125. Goetz, A.S., Liacos, J., Yingling, J., Ignar, D.M. J. Pharmacol. Toxicol. Meths, 1999, 42, 225-235. Entzeroth, M. Curr. Op. Pharmacol., 2003, 3, 522-529. Golla, R., Seethala, R. J. Biomol. Screening, 2002, 7, 515-525. Weber, M., Ferrer, M., Zheng, W., Inglese, J., Strulovici, B., Kunapuli, P. Assay and Drug Devel. Techs., 2004, 2, 39-49. Eglen, R.M., Singh, R. Combin. Chem. & HTS, 2003, 6, 313-318. Blakely, B.T., Rossi, F.M., Tillotson, B., Palmer, M., Estelles, A., Blau, H.M. Nat. Biotechnol., 2000, 18, 218. Berridge M.J. Nature, 1993, 361, 315. Liu, J.J., Hartman, D.S., Bostwick, J.R. Anal. Biochem., 2003, 318, 91-99. Brandish, P.E., Hill, L.A., Zheng, W., Scolnick, E.M. Anal. Biochem., 2003, 313, 311-318. Challiss, R.A.J., Batty, I.H., Nahorski, S.R. Biochem. Biophys. Res. Commun., 1988, 157, 684. Mullinax, T.R., Henrich, G., Kasila, P., Ahern, D.G., Wenske, E.A., Hou, C., Argentieri, D., Bembenek, M.E. J. Biomol. Screening, 1999, 4, 151. Chelsky, D., Bossé, R., Illy, C. (2001) (Abstract 10058). 7th. SBS. Annual. Conference. and Exhibition, Baltimore, MD, USA, Sept. 2001, 10-13. Neilsen, P.O., Assis, E.F., Branch, A.M., Drees, B.E. (2003). Poster # P08064. 9th. SBS. Annual. Conference, Portland, OR, Sept. 2003, 21-25. Chambers, C., Smith, F., Williams, C., Marcos, S., Liu, Z.H., Hayter, P., Ciaramella, G., Keighley, W., Gribbon, P., Sewing, A. Combin. Chem. & HTS, 2003, 6, 355-362. Kassack, M.U., Hoefgen, B., Lehmann, J., Eckstein, N., Quillan, J.M., Sadee, W.J. Biomol. Screening, 2002, 7, 233-246. Zhang, Y., Kowal, D., Kramer, A., Dunlop, J. J. Biomol. Screening, 8, 571-577. Hodder, P., Mull, R., Cassaday, J., Berry, K., Strulovici, B. J. Biomol. Screening, 2004, 9, 417-426. Coward, P., Chan, S.D.H., Wada, H.G., Humphries, G.M., Conklin, B.R. Anal. Biochem., 1999, 270, 242-248. Liu, A.M.F., Ho, M.K.C., Wong, C.S.S., Chan, J.H.P., Anson, H.M., Wong, Y.H. J. Biomol. Screening 2003, 8, 39-49. Kostensis, E. J. Recept. Signal Trans., 2002, 22, 267-281. Dupriez, V.J., Maes, K., Le Poul, E., Burgeon, E., Detheux, M. Recept. Channels 2002, 8, 319-330. Le Poul, E., Hisada, S., Miziguchi, Y., Dupriez, V.J., Burgeon, E., Detheux, M. J. Biomol. Screening 2002, 7, 57-65. Wylie, P., Bowen, E. Biochemist, 2004, 26, 27-30. Conway, B.R., Minor, L.K., Xu, J.Z., D’Andrea, M.R., Ghosh, R.N., Demarest, K.T. J. Cell. Physiol., 2001, 189, 341-355. Milligan, G. Drug Disc. Today, 2003, 8, 579-585. Adie, E., Francis, M.J., Davies, J., Smith, L., Marenghi, A., Halther, C., Hadingham, K., Michael, N.P., Milligan, G., Game, S. Assay and Drug Development Techs., 2003, 1, 251-259. Barak, L.S., Ferguson, S.S.G., Zhang, J., Caron, M.G. J. Biol. Chem., 1997, 272, 27497-27500. Oakley, R.H., Hudson, C.C., Cruickshank, R.D., Meyers, D.M., Payne, R.E., Rhem, S.M., Loomis, C.R. Assay and Drug Development Techs., 2002, 1, 21-30. Bertrand, L., Parent, S., Caron, M., Legault, M., Mireille, J., Erik, A., Stephane, A., Bouvier, M., Brown, M., Houle, B., Menard, L. J. Receptors and Signal Trans., 2002, 22, 533-541. Heding, A. Ex. Rev. Mol. Diagnistics 2004, 4, 403-411. Yan, Y-X., Boldt-Houle, D.M., Tillotson, B.P., Gee, M.A., D’Eon, B.J., Chang, X-J., Olesen, C.E., Palmer, M.A.J. J. Biomol. Screening, 2002, 7, 451-459.
An Overview of High Throughput Screening [73] [74] [75]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 111
Lerner M.R. Trends. Neurosci 1994, 17, 142-146. Behan, D.P. Chalmers, D.T. Curr. Op. Drig. Disc. Devel., 2001, 4, 548-560. Cooper, M.A. J. Mol. Recog., 2004, 17, 286-315.
Frontiers in Drug Design & Discovery, 2005, 1, 113-166
113
Developments in Hyphenated Spectroscopic Methods in Natural Product Profiling Sylvia Urbana* and Frances Separovicb a
School of Applied Sciences (Applied Chemistry), RMIT University, GPO Box 2476V Melbourne VIC 3001, Australia and bSchool of Chemistry, The University of Melbourne, Parkville Melbourne VIC 3010, Australia Abstract: Overviews of the principal techniques that have found application in the profiling of natural products are given, including advances in capillary and column trapping LC-NMR-MS. Single hyphenated spectroscopic techniques, like LC-NMR and LC-MS, currently offer robust and efficient approaches for the rapid dereplication of natural product extracts. Multiple hyphenation techniques, such as (HP)LC-UV-NMR-MS-FTIR, now give an effective and comprehensive method for the deconvolution of complex mixtures. Ongoing improvements include miniaturization and cryogenic NMR probes and the hyphenation to capillary scale LC separations to analyse smaller quantities of samples. Extensions in hyphenation techniques required for natural product and other drug discovery programs include the need for LC-13C NMR and the combination of bioassays with already well-established hyphenated separationspectroscopic techniques, coupled with automated database searching capabilities (data libraries for LC, UV, NMR, MS and other search criteria).
INTRODUCTION Natural products have served as a major source of drugs for centuries with about half of the pharmaceuticals in use today derived from natural origins [1,2]. Nature is unrivalled in the ability to produce molecules of structural complexity and biological potency and the earth can be claimed to have been conducting its own combinatorial chemistry for thousands of years. Natural products continue to play a dominant role in the discovery of leads for the development of drugs for the treatment of human diseases. Much remains to be explored, particularly the marine and microbial environments, from which a host of novel bioactive chemical entities await discovery [3]. The search for new drugs from natural origins (either terrestrial or marine) involves screening of extracts for the presence of novel compounds and an investigation of their biological activities. Suspected novel or bioactive compounds are usually isolated in order to elucidate the structure and to perform further biological and toxicological testing. The path that leads from the intact terrestrial or marine organism to the pure constituents is long, involving work that might last from weeks to years. The general steps include: *Corresponding author: E-mail:
[email protected] Garry W. Caldwell / Atta-ur-Rahman / Barry A. Springer (Eds.) All rights reserved – © 2005 Bentham Science Publishers.
114 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
•
Collection of the marine or terrestrial organism(s)
•
Identification or taxonomic classification of the species
•
Review of the literature for any previous studies conducted on the species being targeted
•
Extraction using different solvents
•
Biological screening of the extract
•
Purification of these extracts by different preparative and analytical chromatographic techniques, usually with a bioactivity-guided isolation strategy
•
Structure elucidation of the constituents using a combination of various spectroscopic (UV-Vis, IR, 1H and 13C NMR, 2D NMR experiments, X-ray diffraction, and MS) and chemical methods
•
Pharmacological and toxicological testing of compounds
•
Synthesis or semi-synthesis of the bioactive natural product
•
Synthesis of analogues with the aim of establishing Structure-Activity Relationships (SAR) and to produce more potent variants of the natural product
The characterisation of natural products in complex mixtures is aided by the application of sophisticated hyphenated spectroscopic techniques, which provide the necessary sensitivity and selectivity as well as structural information on the constituents of interest. Hyphenated techniques offer an efficient scheme for the chemical screening of extracts required to detect new leads, which potentially can be interesting from a chemical viewpoint. Recognition of natural products at the earliest possible stage of separation is also essential in order to avoid the time-consuming isolation of common constituents. With the introduction of High Throughput Screening (HTS) programmes, there is an even more urgent need for efficient and sensitive methodologies to give sufficient on-line information for metabolite structure determination. This review provides an outline of the range of hyphenated spectroscopic methods currently applied to the study of natural products as well as discussion of future advances in the field. Origin of Natural Products The use of natural substances, particularly plants, to control disease is an ancient practice that has led to the discovery of more than half of all “modern” pharmaceuticals [4]. Documentation of the use of natural substances for medicinal purposes can be found as far back as 78 A.D., when Dioscorides wrote “De Materia Medica”, describing thousands of medicinal plants [5]. Despite this knowledge, it was not until the early to mid 1800s when a number of important pharmacologically active natural products such as the cardiac glycosides and a variety of bioactive alkaloids (eg. morphine, atropine, reserpine and physostigmine) were discovered. Many of these biologically active natural products became important not only for their use directly as therapeutic agents or as prototype lead compounds for the development of new drugs, but also as biochemical probes to unravel the principles
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 115
of human pharmacology [6]. We follow with an overview of some important terrestrial and marine natural products that have been discovered. Higher Plants Originally, plants were the almost exclusive therapy available to humans. With the development of medicinal chemistry in the early nineteenth century, plants were also the first source of substances to be developed as drugs. For a discussion of the potential of higher plants as a source of new drugs, see the reviews by Hostettmann and Wolfender [7-9]. Classical examples of drugs of plant origin include the antimalarial agent quinine from the bark of Cinchona officinalis (Rubiaceae), the analgesics codeine and morphine from Papaver sonniferum (Papaveraceae), atropine from Atropa belladonna and other Solanaceae species, and the cardiac glycoside digoxin from Digitalis sp. (Scrophulariaceae). Natural compounds have been particularly successful in the field of anticancer drug research. The bisindole alkaloids vinblastine and vincristine from the Madagascar periwinkle (Catharanthus roseus, Apocynaceae) were developed as effective anticancer drugs in the 1960s and are of great importance in the treatment of leukaemia. Further, a semi-synthetic derivative called vinorelbine was developed for the treatment of breast cancer. The diterpenoid paclitaxel, previously known as Taxol, was discovered as part of an NCI (National Cancer Institute, USA) sponsored program. Taxol was first isolated from the stem bark of the Pacific yew Taxus brevifolia (Taxaceae) in the late 1960s, but has since been found in other yew species such as the European yew T. baccata. Development of this product was approved by the FDA (Food Drug Administration) in 1992. Taxol was used for the treatment of ovarian cancer resistant to chemotherapy but its therapeutic applications have been applied in relation to other gynaecologic cancers. For many years the NCI has carried out a program whereby thousands of plant extracts are tested annually [10]. Some of the plant metabolites discovered in the NCI program for the treatment of AIDS include michellamine B from the West-African liana Ancistrocladus korupensis (Ancistrocladaceae), the coumarin derivative calanolide A from the African tropical rainforest tree Calophyllum lanigerum (Guttiferae), the phorbol ester prostratin obtained from Homolanthus nutans (Euphorbiaceae), which is a plant used in the traditional medicine of Samoan islands for treating yellow fever, and the naphthoquinone trimer conocurvone from the Australian shrub Conospermum incurvum (Proteaceae). In recent times, plant products have also played a role in the development of new antimalarial agents. An example is the discovery of artemisinin, a sesquiterpene lactone isolated in 1972 by Chinese scientists from qinghao (Artemisia annua, Asteraceae). This plant had been used for over 2000 years in China for the treatment of malaria. Artemisinin represented a completely new chemical class of antimalarial compounds, which showed high activity against resistant Plasmodium strains. Due to the high lipophilic nature of artemisinin, problems were encountered for its administration as a drug and consequently a series of derivatives were synthesised and are now licensed as drugs. The potential of higher plants as sources for new drugs is still largely unexplored. Among the estimated 400,000-500,000 plant species worldwide, only a small percentage
116 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
has been investigated phytochemically and an even smaller fraction submitted to biological or pharmacological screening [11]. The rapid disappearance of tropical forests and other important vegetation areas means that it is essential to have access to methods for the rapid isolation and identification of bioactive natural products. In spite of the exponential development of synthetic pharmaceutical chemistry, including the use of combinatorial chemistry and microbial fermentation, over 25% of prescribed medicines in industrialised countries are derived directly or indirectly from plants [12]. This proportion can reach 50% for the over-the-counter market, which may include Chinese and naturopathy based products for self-administration. Micro-Organisms Antibiotics are among the most important classes of therapeutic agents and have had an enormous impact on both life expectancy and quality of life. With the discovery of the natural penicillins, secondary metabolites of species of the fungus Penicillium, the course of medical history was dramatically changed and the antibiotic era was born. Following Alexander Fleming’s observations in 1928 and the subsequent isolation and characterisation of the active constituent by Howard Florey and Ernst Chain, hundreds of antibiotics have been isolated from numerous micro-organisms [13]. Marine Organisms Marine natural products are commonly released by organisms as defence agents against predators. These agents are rapidly diluted and, therefore, need to be highly potent to have effect. Due to this and the immense biological diversity in the sea, it is increasingly being recognised that a huge number of natural products and unique chemical entities with novel biological activities exist in the oceans. Bioprospecting of marine natural products has yielded a considerable number of drug candidates [14]. Most of these molecules are still in preclinical or early clinical development but some, like cephalosporins, cytarabine (Ara-C) and vidarabine (Ara-A), are already on the market [15]. In a 2003 review, Haefnerin presented an update of the marine natural products and derivatives currently in clinical development [16]. Mechanisms of action for the compounds (eg. microtubule-interfering agents) were included along with their marine source, chemical class, disease area for treatment, clinical phase status and the pharmaceutical company developing the compound. Among the marine natural products and derivatives in advanced stages of preclinical development are the cone snail venoms, which act as selective peptide antagonists or agonists of ligand or voltage-gated ion channels and G-protein coupled receptors [17,18]. The high potency of these toxins, which are potentially lethal to humans, in the future may be turned to our advantage. Pharmacologists have been investigating the potential use of these toxins as adjuncts in anaesthesia, analgesics, or as drugs for the treatment of conditions such as epilepsy, cardiovascular disease and psychiatric disorders [18]. The bryostatins, which are a family of Protein Kinase C (PKC) inhibitors, were isolated from the bryozoan Bugula neritina and are currently undergoing trials for cancer treatment. Bryostatin-1 has been granted Orphan Drug Status by the FDA and has been designated an Orphan Medicinal Product in Europe for treatment of oesophageal cancer in combination with paclitaxel [19,20]. Of the many novel microtubule-interfering agents discovered in marine organisms, dolastatin 10, dolastatin 15, three derivatives and discodermolide have
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 117
reached clinical development. Of the marine DNA-interactive compounds active in cytotoxicity screens, only ecteinascidin 743 (ET743, Yondelis), obtained from the colonial ascidian Ecteinascidia turbinata has gone into clinical development. ET743 was granted Orphan Medicinal Product designation by the European Agency for the Evaluation of Medicinal Products in 2001. Other marine natural product drug candidates in clinical development include APL (dehydrodidemnin B, Aplidin), from a tunicate Aplidium albicans, and kahalalide F, from the sea slug Elysia rufescens (but most probably is derived from Bryopsis sp., its green algal diet), both of which are candidates in clinical development at PharmaMar. Coproverdine, obtained from a New Zealand ascidian, is one of many examples that have just begun the long process of investigation as a bioactive anticancer lead compound, with an aim to either develop it or a related analogue into a potential drug candidate [21,22]. Some of the natural products isolated from marine invertebrates have been shown or are suspected to be of microbial origin. Marine micro-organisms, whose immense genetic and biochemical diversity is only beginning to be appreciated, are likely to become a rich source of novel chemical entities for the discovery of more effective drugs. In addition, the chemistries of insects and other terrestrial invertebrates are still relatively unexplored as drug discovery leads. Another area of interest is in the manipulation of plant genomes in an effort to express a host of natural products that have not been seen before. According to a recent survey by Newmann, Cragg and Snader of the NCI, 61% of the 877 small-molecule new chemical entities introduced as drugs worldwide during 19812002 can be traced to or were inspired by natural products [2]. These include natural products (6%), natural product derivatives (27%), synthetic compounds with natural product derived pharmacophores (5%) and synthetic compounds designed on the basis of knowledge gained from a natural product or a natural product mimic (23%) with the remaining 39% being of total synthetic origin, Fig. (1).
Fig. (1). Small-molecule new chemical entities introduced as worldwide drugs during the period 1981-2002.
118 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
The component is higher in certain therapeutic areas, where 79% of antibacterials and 74% of anticancer compounds are natural products or have been derived from, or inspired by, a natural product, Fig. (1). These numbers are not surprising if we assume that natural products evolved for self-defence. The influence of natural products is significant even in therapeutic areas for which they might not seem directly relevant, such as cholesterol management, diabetes, arthritis and depression. When a new biologically active compound is discovered, it is still far from being accepted as an effective therapeutic agent. Natural product profiling is only the beginning of the drug discovery pipeline, which may last for years and also involves animal, clinical and toxicological testing. Presently, only one of 20,000 new molecules investigated by the pharmaceutical industry is estimated to reach the stage of commercialisation [23]. NATURAL PRODUCT PROFILING Bioactivity-Guided Isolation Strategy Since the 1980s, many of the active marine and terrestrial natural products undergoing investigation as drug candidates have been discovered using standard procedures involving biological screening followed by activity-guided fractionation. All fractions are biologically evaluated and those continuing to exhibit activity are carried through further isolation and purification steps until pure active compounds are obtained. The large numbers of pure compounds and natural product extracts generated require a rapid biological screening evaluation process. Specific biological screening, usually based on interaction with selected enzymes or receptors, is the approach known as High Throughput Screening (HTS), which relies heavily on robotics and automation, enabling routine evaluation of tens of thousands of samples. Subsequently, new assays using a specific target enzyme or receptor are rapidly developed and adapted for HTS which then makes it possible to rapidly screen very large libraries of compounds. These advancements have led to very short “turnovers” of about every 2-6 months of assays in industry, wherein a new assay is developed, the entire library is evaluated, another new assay developed, the library screened again and so on [6]. The growing interest in natural products is exemplified by the number of attempts by the pharmaceutical industry to introduce them into drug discovery programs. With the introduction of HTS strategies, there is a potential to test large numbers of natural product extracts in a wide variety of bioassays. Since organisms are able to synthesise and produce an unpredictable range of chemical skeletons and novel substances, it is of prime importance to evaluate as many natural products as possible in order to find sources of new drugs or lead compounds. One main strategy in the isolation of new leads consists of the so-called bioactivity-guided isolation, in which pharmacological or biological assays are used to target the isolation of bioactive compounds. A major drawback of the bioassay-guided fractionation strategy is the frequent isolation of previously described compounds. Chemical screening of crude extracts, therefore, constitutes an efficient complementary approach allowing localisation and targeted isolation of new types of compounds with potential bioactivity. This screening procedure also enables recognition of known metabolites at the earliest stage of
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 119
separation, thus avoiding time consuming and expensive isolation of common constituents. A combination of chemical and biological screening is the fastest way to arrive at new lead compounds from biological sources. Relatively simple biological or pharmacological tests are needed in order to localise the particular activity of interest in the extract or in the numerous fractions resulting from the different purification steps. The assay also needs to be sensitive as the active substances may be present in the extract at low concentrations of assays. Chemical screening or the dereplication of crude natural product extracts can be achieved using a number of techniques such as bioautography, counter-current chromatography or by the use of various solid phase chromatography supports followed by HPLC and bioassays. Another method of dereplication is by spectroscopic techniques using either classical individual techniques like UV, MS and NMR or hyphenated techniques like LC-UV-DAD, LC-MS, LC-NMR and LC-NMR-MS. Dereplication Strategies for Natural Product Profiling Bioautography Bioautography combines TLC with a bioassay in situ and allows for the localisation of active constituents in an extract. Spore-producing fungi, such as Aspergillus, Penicillium and Cladosporium spp. can all be used as the target organisms in direct bioautographic procedures. In this method, a TLC plate is run and the solvent allowed to dry. The TLC plate is then sprayed with a mixture of the micro-organism and the nutrition medium. The plate is then incubated in a humid atmosphere and zones of inhibition appear where fungal growth is prevented by the active components of the extract. Direct bioautography is not possible with yeasts such as Candida albicans and thus a rapid agar overlay assay is used. Here an agar layer containing the micro-organism is spread over the TLC plate, enabling transfer of active compounds from the stationary phase into the agar by diffusion. After incubation the plate is sprayed with methylthiazoyltetrazolium chloride (MTT), which is converted into a purple MTT formazan dye by the yeast [7,8], for visualisation. An advantage of such TLC bioautographic methods is that bioactivities can be associated with specific spots on the plate. This simplifies the localisation of active compounds and helps with the design of subsequent isolation strategies. Counter-Current Chromatography Another method, which has been applied to the chemical dereplication of natural product extracts, is that of counter-current chromatography. Molecules present in natural product mixtures cover the whole spectrum of polarity and can vary in hydro- and lipophilicity, charge, solubility, stability and size. Counter-current chromatography operates under mild conditions preventing decomposition and denaturation of valuable components. The absence of solid supports rules out catalytic surface effects and irreversible binding of compounds. In addition, both normal and reversed-phase chromatography (dual mode) can be performed in a single run.
120 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
In 2001, Alvi reported an integrated biological-physiochemical system for the identification of active compounds in fermentation broths. Preliminary fractionation of the microbial extract by dual-mode high-speed counter-current chromatography (HSCCC) was coupled with photodiode array (PDA) detection and bioassay [24]. Using Solid-Phase Cartridges and a Bioassay In the early 1990s the NCI published a chemical screening strategy for the dereplication and prioritization of HIV-inhibitory aqueous natural product extracts [25]. The NCI’s chemical screening protocol utilized various solid-phase extraction cartridges (C4, C18 and G25 respectively). By using these support packings, information on the relative polarity and molecular size and weight of the bioactive compound(s) of interest could be ascertained. The chemical screening procedure is simple, rapid and relatively inexpensive. In addition, the method is versatile whereby other phases can be added or substituted to expand or alter the elution matrix or fingerprint. For example, an additional cartridge containing polyamide resins [26] is utilised to screen plant extracts for tannins since polyphenolics have long been known to be irreversibly retained on polyamide resins, while compounds with fewer phenolic residues, such as flavonoids, can be eluted from polyamide gels with MeOH [27]. An ion exchange cartridge could be added to give information on whether the bioactive compound of interest is charged. The related concept of “chemical screening” was used previously and described by Hammann [28] and Zeeck [29] for secondary metabolites by means of functional groupor compound-class-specific TLC detection reagents. These methods are independent of biological activity or do not take biological activity into account. As already mentioned, a chemical screening protocol that is coupled to a bioassay is the most appropriate and efficient overall strategy for selection of priority extracts and targeting of potentially novel active chemical classes. Further developments in the NCI chemical screening protocol include HPLC analysis and microtitre plate collection of one of the bioactive solid phase cartridge fractions. From this master plate, daughter plates are created to submit for biological testing. Once active wells are identified, MS of the corresponding microtitre plate fractions is carried out on the master plate. These microtitre plate well positions represent regions of the HPLC chromatogram, which can be used to identify the region of bioactivity. The information gained from this chemical screening strategy aims to obtain the relative polarity, size and molecular weight of any bioactive compound, followed by UV chromophore information (from PDA detection on HPLC) and, finally, a possible molecular weight for the active natural product(s) present in the extract [Munro, 30. private communication]. Classical versus Hyphenated Dereplication Strategies In the search for new natural products, crude extracts are typically subjected to multistep work-up and isolation procedures, which include various separation methods (including HPLC, column, gel or counter-current chromatography), in order to obtain pure compounds whose structure is then elucidated by using off-line spectroscopic methods such as NMR and MS [31]. The characterisation of a natural product can be summarised by the information obtained from each of the individual spectroscopic techniques as detailed in Fig. (2). With the application of one or more of these individual techniques a dereplication by partial characterisation is possible.
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 121
Fig. (2). Characterisation of a natural product by information obtained from individual spectroscopic techniques.
The combination of UV, MS and NMR spectroscopic data has often permitted unambiguous structure determination of pure constituents [21, 32-36]. Other techniques, such as IR or X-ray crystallography, have been used less often and mainly when the other spectroscopic methods failed to give complete structural assignment. As natural product extracts often contain a large number of closely related, and thus difficult to separate compounds, this classical approach may become very tedious and time-consuming. The direct hyphenation of an efficient separation technique with powerful spectroscopic methods has great potential in order to speed up the dereplication process. HPLC has been by far the most useful tool for the separation of complex mixtures of small molecules. Reversed-phase HPLC on octadecylsilane (ODS or C 18) has come to be recognised as the most broadly applicable of the bonded phases for this purpose. When
122 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
interfaced with Diode Array Detection (DAD), HPLC allows an analyst to identify known compounds by comparison of their HPLC retention time and UV spectra. The advent of Electrospray Ionisation Mass Spectrometry (ESI-MS) has provided a MS interface, which is applicable to the analysis of a wide range of molecules and is compatible with liquid chromatography [37]. In the last ten years LC-MS has become a widely used tool for the dereplication of natural products [38-40]. LC-MS has become the dereplication tool of choice because the nominal molecular mass of a compound can be used as a search query in nearly all databases. Unfortunately, database searching using only the molecular mass frequently produces a large answer set and rarely results in a definitive identification. LC-MS is often combined in series with DAD thereby providing UV data to narrow down the answer set or sometimes a single answer. However, in many cases, more information is required for a confident identification. Advances in NMR methodology have allowed the direct interface of HPLC with NMR. NMR spectral data provide a great deal of structural information about a compound of interest. NMR is capable of discerning structural differences between compounds of the same molecular mass (isobars) or even the same molecular formula (isomers), with the ability to differentiate between constitutional and geometric isomers. The chemical screening of extracts using individual or hyphenated spectroscopic techniques generates a huge amount of information. In order to rationalise and use this approach efficiently with a high sample throughput, the challenge is to find a way to centralise the data for rapid pattern recognition by reference to standard compounds or databases. There are several commercially available natural product databases currently available, including Berdy Bioactive Natural Product Data Base, AntiBase, Dictionary of Natural Products by Chapman & Hall and the marine natural products database MarinLit. These databases can be used to primarily search by molecular weight, UV and taxonomic classification. Some databases, such as MarinLit, include the ability to search functional groups and NMR data. The combination of the LC-hyphenated techniques, LC-UV, LC-MS and LC-NMR, should ideally enable the complete structural characterisation of any natural product directly within an extract, providing that the corresponding LC peak is clearly resolved. In practise, however, this is not actually so and many factors may hinder on-line detection and structure determination of a natural product. Often only partial structure information will be obtained, but this data will already provide valuable information for targeting the isolation of new compounds or for the dereplication of known constituents. INTRODUCTION TO HYPHENATION The role of hyphenated spectroscopic techniques has played an important part in the dereplication of natural products. Hyphenation refers to the online combination of a separation technique, primarily Liquid Chromatography (LC) or Capillary Gas Chromatography (GC) and a spectroscopic detection method, which provides structural information of the analytes concerned [41]. LC-MS and GC-MS are the most popular hyphenated techniques in use today. Further to this, the use of LC-MS-MS, LC-NMR and, more recently, LC-NMR-MS has been rapidly increasing. A multi-hyphenated system, LC-UV-NMR-MS-FTIR, has also been successfully implemented [42,43]. The following sections describe the mechanism of operation of these and other hyphenated
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 123
spectroscopic techniques, including examples of applications to natural product profiling. (HP)LC HYPHENATED WITH LC DETECTORS High Performance Liquid Chromatography (HPLC), as the best-suited technique for an efficient separation of crude extracts, is routinely used in natural product chemistry. Traditionally Reversed-Phase Liquid Chromatography (RPLC) has served as an effective technique for the separation of many of the components in complex natural product mixtures. The application of Normal-Phase Liquid Chromatography (NPLC) has been extremely limited due to problems associated with dissolving hydrophilic materials in nonaqueous mobile phases. HPLC can be coupled with different spectroscopic detection methods in order to obtain structural information on separated compounds. Although different types of LC detectors exist, such as UV, Refractive Index (RI), fluorescence, electrochemical, Evaporative Light Scattering Detector (ELSD), each method having its own specificity, none permits the detection of all the secondary metabolites encountered in a natural product extract within a single analysis. UV Detectors HPLC coupled with UV detection (LC-UV) has now been used for over two decades for screening natural product extracts and is widely used in many laboratories [44]. The UV spectra of natural products give useful information on the type of constituents and also, as in the case for polyphenols, information on the oxidation pattern. UV spectra of reference compounds or previously isolated bioactive compounds can be placed in a spectral library to create an “in-house” database. This allows known or commonly encountered natural products to be identified or dereplicated very early in a study. Diode Array Detectors UV(DAD) The single wavelength detector has long been replaced by the DAD or PDA detector for the analysis of natural product extracts. LC-UV(DAD) plays an important role in many research areas and, specifically, (RP)LC-UV(DAD) is a robust and user-friendly technique, which has become a tool of natural product chemists. LC-UV with Combined Use of UV Shift Reagents In order to determine the position of the hydroxyl groups on the polyphenol skeleton, LC-UV with the aid of post-column addition of UV shift reagents can be performed [45]. These UV reagents have been extensively used for the characterisation of pure constituents. In this method, extracts are first analysed by LC-UV-MS and then the separation is repeated five times by LC-UV(DAD), each time using another type of UV shift reagent. The procedure has been used for the characterisation of numerous xanthones and flavonol glycosides [45,46]. Refractive Index (RI) Detectors The RI detector is considered to be a universal detector and has been extensively applied to the isolation of natural products, but has “gone out of fashion” due to having
124 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
less sensitivity compared to UV or UV(DAD) detectors and because RI detectors are not tolerant of gradient elutions by HPLC. Fluorescence and Electrochemical Detectors Both fluorescence and electrochemical detectors have gained least in their popularity since these approaches are restricted to compounds displaying native fluorescence or the presence of an electroactive group, thereby seriously limiting the applicable range of these techniques. This is particularly true when considering natural product extracts. Evaporative Light Scattering Detector (ELSD) The ELSD is a popular detection method for profiling of natural product extracts as it can be applied to compounds that contain no chromophore(s). ELSD is considered a universal detector, which can tolerate HPLC gradients and give a quantitative measure of constituents, but it is a destructive technique. (HP)LC HYPHENATION TECHNIQUES Over the years there have been three major HP(LC) hyphenated spectroscopic techniques that have found increasing use for natural product profiling: (HP)LC-FTIR, (HP)LC-MS and (HP)LC-NMR. (HP)LC-FTIR LC-FTIR systems can be either of the flow-cell or the solvent-elimination type [47]. LC flow-cell-FTIR is useful for the specific detection or recognition of functional groups in major constituents of mixtures but LC solvent elimination-FTIR is the more powerful technique. Until now, most LC-FTIR interfaces have been used only by their designers and so have not found wide application due to limitations in compatibility for optimised separation and detection conditions and the rather complicated interfacing that is required. Recently, FTIR has been incorporated into a multi-hyphenated system to give LC-UV-NMR-MS-FTIR (mentioned above). (HP)LC-MS Liquid chromatography combined with Mass Spectrometry (LC-MS) has advanced rapidly from early developments in the 1970s and is now a standard technique [48,49] for determining molecular weights and structures. There are many commercial systems available and the detection sensitivity can reach very low levels (e.g. ng-µg range is readily attainable). The coupling between LC and MS has not been straightforward since the normal operating conditions of a mass spectrometer (high vacuum, high temperatures, gas-phase operation and low flow rates) are diametrically opposed to those used in HPLC [50]. Hence, on-line coupling of the two techniques has been difficult and different LC-MS interfaces have been built, each with its own characteristics and range of applications. The four most common interfaces used for LC-MS of natural products include thermospray (TSP), continuous flow Fast Atom Bombardment (CF-FAB), Atmospheric Pressure Chemical Ionisation (APCI) and Electrospray Ionisation (ESI) [38]. TSP and APCI allow satisfactory ionisation of moderately polar compounds such as polyphenols
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 125
or terpenoids in the mass range of 200-800 Da. For larger polar molecules, such as saponins (MW > 800 Da), CF-FAB or ESI are the methods of choice. All these LC-MS interfaces result in soft ionisation and useful fragment information is obtained by tandem mass spectrometry (MS-MS) or by multiple stage MS-MS experiments (MSn) on ion trap systems [51,52]. Hostettmann and co-workers report that high-resolution mass spectrometers, such as Q-TOF instruments, also have been used recently for the LC-MS analysis of crude plant extracts allowing on-line determination of accurate masses. Accurate MS-MS experiments were also recorded and the on-line molecular formula assignments of different plants were determined. LC-MS response is strongly dependant on the nature of the compounds to be analysed, the solvent and buffer used for the separation, the flow rate and the type of interface used. A natural product crude extract represents a complex mixture of metabolites having various physico-chemical properties, thus making it difficult to find LC-MS conditions that are optimum for the ionisation of all constituents. Often it will be necessary to analyse the extract under different ionisation conditions. However, the specific detection of given constituents can be performed at very low detection limits, provided that the correct ionisation method is used. For a comparison of different interfaces and ionisation conditions for the analysis of crude plant extracts, see Wolfender [38, 53-55]. Reversed-phase HPLC in combination with PDA and ESI-MS are the most important combinations for natural product extracts analysis due to the inherent diversity of compound classes present. For the more polar compounds, the use of hydrophilic interaction chromatography has been evaluated [56]. Some applications of (HP)LC-PDA-MS(ESI) include: the screening of annonaceous acetogenins in bioactive plant extracts [57]; the evaluation of Q-TOF and multiple stage ion-trap MS-MS for the dereplication of flavonoids and related compounds in crude plant extracts [58]; HTS of tocopherols [59]; and the characterisation of the asterosaponin fraction of the starfish Asterias rubens using a combination of matrixdispersion SPE and direct on-line LC-NMR-MS-MS [60]. Recent Advances An important development in ESI ionisation and interfacing is the production of low flow rate (<1 µL/min) devices. The main objectives of this are either a reduction in sample consumption or on-line coupling to low flow rate techniques such as capillary electrophoresis (CE) and nanocapillary LC. Although ESI and APCI are used in the majority of applications, two alternative API interfacing ionisation techniques have been proposed: sonic-spray ionisation interface and atmospheric pressure photoionisation (APPI). APPI is found to significantly extend the application to less polar compounds. Natural product extracts require both ESI and APCI analyses to be performed in order to have the widest applicability. A report by Niessen (2003) describes a recently proposed ESCI source [61]. The system enables scan-wise switching between ESI and APCI during LC-MS acquisition [62,63]. Multichannel Electrospray inlets as well as microfabricated microfluidic Electrospray devices have been proposed. Microchip-based separation techniques are expected to play an important role in future HTS strategies as are mass spectrometers capable of more accurate mass determination such as Fourier
126 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
Transform Ion-Cyclotron Resonance Mass Spectrometry (FT-ICR-MS). Future interests propose to combine bioactivity screening in parallel with MS. Implementing Bioactivity Screening in LC-MS The on-line combination of MS characterisation and biological screening based on ligand-receptor or antigen-antibody interactions is an attractive application for natural product drug discovery. A number of on-line approaches have been reported including affinity Capillary Electrophoresis (CE) coupled to MS (CE-MS) [64], the use of on-line immunoaffinity extraction in combination with coupled column LC-MS-MS [65], and on-line monitoring of biospecific interactions in a homogeneous biochemical assay using ESI-MS [66]. A recent patent describes an invention that provides details for a structure dereplication process utilizing technology such as HPLC-PDA-MS coupled with high throughput bioassay data [67]. This method for generating, screening and dereplicating natural product libraries for the discovery of therapeutic agents was proposed by Unigen Pharmaceutical Incorporated (USA). A recent review by Niessen outlines the progress of LC-MS in HTS [61]. Flow NMR Flow NMR spectroscopy is increasingly being utilised in drug discovery and development [68-72]. LC-NMR has become a routine method to resolve and identify components in a mixture with broad applications in natural products, biochemistry and drug metabolism, and toxicological studies. Recently, Direct Injection (DI) NMR (DI-NMR), Flow Injection Analysis (FIA) NMR (FIA-NMR) and LC-NMR have been developed as ways to acquire NMR data without the use of the traditional precision-glass sample tubes. With DI-NMR and FIANMR techniques, robotic autosamplers and liquid handlers are interfaced to the NMR spectrometer. Samples in disposable vials and 96-well microtitre plates can be routinely analysed in this way by NMR. DI-NMR has been applied to the analysis of biofluids, combinatorial chemistry and protein/small molecule mixtures and, to some extent, natural product libraries. Flow NMR techniques can be classified into either of two categories. The first category subjects the sample to a separation technique, such as chromatography or capillary electrophoresis, before the NMR analysis (LC-NMR and CE-NMR). The second category uses the liquid flow simply as a means to transport the sample into the NMR analysis coil (DI-NMR and FIA-NMR). In the first category, since the NMR relies on the separation technique, this makes the NMR spectrometer a sophisticated chromatographic detector and is epitomized by techniques such as LC-NMR and LCNMR-MS. Hence, if the separation is not adequately performed, the NMR spectroscopy is of limited use. In the second category, NMR spectroscopy is the main rationale of the analysis and the liquid flow becomes a way to deliver the sample conveniently. The samples may have undergone chromatography before NMR analysis but not necessarily performed on-line. Techniques in this category include DI-NMR and FIA-NMR. The use of a stationary flow-cell has advantages over NMR tubes in that locking and shimming are more uniform from one sample to the next and the Radio Frequency (RF) coils for NMR can be placed closer to the sample (directly on the glass flow-cell
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 127
surface), resulting in higher sensitivity. The use of NMR tubes can be circumvented, saving both money and time compared to a conventional NMR tube auto-changer. Automated Data Analysis Tools The ability to process and analyse the vast amounts of data generated through the application of flow NMR spectroscopy is a potential bottleneck to realising its full analytical capacity. This can result from the fast throughput already possible as well as the complex and information rich data generated from biochemical applications (e.g. metabonomics). There has been much development in data handling and interpretation methods, such as AMIX-Viewer and AMIX-Tools software packages (Bruker BioSpin) [73]. Acquired spectra can be coupled to spectral prediction programs, such as the software developed by Advanced Chemistry Development (ACD) Laboratories, either interactively or automatically. Such software is not only useful for generating organic structures from 1D (1H and 13C) data, but is useful for analysis of NMR data from combinatorial libraries. FIA-NMR FIA-NMR was first demonstrated in 1997 and described as ‘column-less LC-NMR’ [74-76]. The sample is injected as a plug into a fluid stream and swept into the NMR detector coil by the motion of the liquid. In FIA-NMR, no additional detectors (other than the NMR) are needed. FIA-NMR uses the mobile phase as a hydraulic push solvent, like a conveyer belt, to carry the injected sample from the injector port to the NMR flowcell. After the pump stops, the spectrometer acquires the scout scan, analyses it and acquires the signal-averaged data (using the Water suppression Enhanced through T1 effects (WET) solvent suppression technique). After completion, a start signal is sent to the solvent pump to flush the old sample from the NMR flow-cell and introduce the next sample. In classic FIA-NMR, as in LC-NMR, the sample always flows in one direction and enters and exits the NMR flow-cell through different ports. The flow-cell is always full of solvent. An interesting hybrid technique has been developed, called Bruker Efficient Sample Transfer (BEST)-NMR, which is referred to as an FIA technique even though it uses air bubbles to separate plugs of solvent but the air bubbles are usually removed just prior to the analysis. However, the combination of FIA with NMR is a relatively new unexplored technique. DI-NMR In contrast to FIA-NMR, DI-NMR injects the sample solution directly into the detector coil of the NMR probe and, since no other solvents are used, the sample, if desired, can then be recovered (without dilution). The DI-NMR injection process is typically performed with a robotic liquid handler, which is controlled by the NMR spectrometer. The advantage of using this kind of hardware for DI-NMR is that it allows the samples to be stored in disposable micotitre plates or vials instead of using the precision glass sample tubes of conventional solution-state NMR. An integrated system for running DI-NMR is commercially available and is known as the Versatile Automated Sample Transport (VAST) sample changer (Varian Inc.). These systems are normally capable of analysing samples ranging from 150-350 µL with concentrations ranging from one to 50 mM and which are stored in multiple 96-well
128 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
microtitre plates. A typical analysis time may range from 2-7 min, depending upon the solvent viscosity and the sample concentration and volume. The sample is withdrawn from the flow probe in the same way as it entered into the flow probe and returned to either the original sample container or an alternative, or even flushed to waste. The probe is emptied completely and then rinsed in a similar manner in preparation for the next sample. Comparison of DI-NMR and FIA-NMR FIA-NMR and DI-NMR are essentially NMR spectrometers with sample changers that exploit the speed and robustness of flow NMR. Neither technique uses chromatography so are not designed to analyse mixtures. However, off-line HPLC microtitre plate analyses of natural product extracts can be evaluated using these techniques. FIA-NMR and DI-NMR have different advantages and disadvantages. Advantages of DI-NMR are higher signal-to-noise per sample and the consumption of less solvent. The disadvantages of DI-NMR are that a minimal sample volume is needed (if the sample volume is less than the NMR flow-cell, the line shape rapidly degrades) and the sample is not filtered. The advantages of FIA-NMR are that no minimal sample volume is required (since the flow-cell is always full), samples can be filtered (with an in-line filter) and the NMR probe can be rinsed. The disadvantages of FIA-NMR are a lower signal-to-noise (due to sample dilution) and that more solvent is consumed. Both techniques suffer from some degree of carry over and are subject to blockages from solid particles. FIA-NMR is more useful for repetitive analyses where there is sufficient sample and the sample can be discarded. In contrast, DI-NMR is more appropriate when there is a limited amount of sample and the sample must be recovered. (HP)LC-NMR HPLC coupled with NMR (LC-NMR) has taken time to become accepted due to initial high costs and lack of sensitivity. LC-NMR was developed in 1978 but, for the first 10-15 years, was regarded more as an academic curiosity rather than a robust analytical tool. Improvements in pulse field gradients, solvent suppression, probe technology and the construction of high field magnets have resulted in greater implemenation of LC-NMR instruments. The improved sensitivity has reached a point where LC-NMR is now well established [77,78]. Developments of LC-NMR during the mid 1980s to mid 1990s saw the technique evolve into an effective method for the structural analysis of complex mixtures of compounds [79]. The unique advantage of LC-NMR over other NMR techniques is in the ability to separate components within a sample in situ. If the component for analysis is unstable to light, air, time or the environment, LC-NMR will most probably be the preferred NMR technique and has the potential for on-line structure identification of natural products. NMR spectroscopy is the most powerful spectroscopic technique for obtaining detailed structural information about organic compounds in solution. LC coupling is straightforward compared to LC-MS, with the samples flowing in a non-rotating glass tube (usually 60-180 µL) connected at both ends with HPLC tubing. The main problem of LC-NMR is the difficulty of observing sample resonances in the presence of the much
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 129
larger resonances of the mobile phase. This problem is even greater in the case of typical LC reversed-phase operating conditions, where more than one protonated solvent is used and when the signals change frequencies during a typical gradient HPLC analysis. These problems were overcome with the development of fast, reliable and powerful solvent suppression techniques such as the WET sequence [80]. In reversed-phase LC-NMR conditions, non-deuterated solvents such as methanol (MeOH) or acetonitrile (MeCN) can be used while water is replaced by deuterium oxide (D2O). High quality 1H NMR spectra can be recorded in two ways, namely using either the stop-flow or the on-flow modes, while two-dimensional (2D) NMR spectra can also be obtained in the stop-flow mode. HPLC Component of LC-NMR LC separations are classified as isocratic or gradient with isocratic elution employing a single unchanging mobile phase to separate analytes in a mixture. For more complex samples gradient elution, in which the mobile phase composition is systematically changed with time, is typically used for analysis. This change in solvent phase composition optimises the retention factor of individual analytes thus resulting in better and/or faster separation. As a result, gradient elution has become an integral part of LC separations, particularly for the analysis of natural product extracts. The full utility of gradient LC-NMR has been hindered due to deterioration of NMR spectra by solvent gradients. The changing mobile phase composition in gradient elution creates a solvent gradient across the NMR observe volume. Also, commonly used LC solvents possess different magnetic susceptibilities and, therefore, the solvent gradient causes significant changes in magnetic susceptibility and affects the magnetic field homogeneity. Consequently this can result in NMR line broadening, loss of coupling information, chemical shift changes and decrease in signal-to-noise. One way to reduce these effects is to perform stop-flow NMR so that constant solvent conditions are obtained. A similar method involves trapping of analyte bands inside the NMR coil and acquiring data after a suitable delay time to stabilise the gradient. Specifically designed flow-cells with appropriate length transfer lines can also be used to obtain high resolution NMR spectra with shallow solvent gradients (for example, a 1-3% change in solvent composition per minute) [81]. All these techniques result in longer analysis times. The use of pulse sequences based on solvent suppression has also been explored as an alternative method to remove the negative features of gradients [82]. A specific pulse sequence is required for each solvent system and gradient. Even minor changes in temperature and flow rate may result in inefficient solvent suppression and line broadening In an approach by Jayawickrama and co-workers (2003), two identical LC systems are used to deliver the gradient and reverse gradient, configured in such a way so that the solvent gradients are combined before the detector [81]. One LC delivers the separation gradient while the other delivers the exact reverse gradient to the post column connector. The two fluids are combined using nanoliter volume connectors and delivered to the NMR coil where, ideally, the solvent gradients reaching the NMR detector have a constant composition after mixing, irrespective of the initial solvent composition. The concept was validated using a FIA-NMR without the use of a column. NMR Component of LC-NMR A modern LC-NMR system consists of a high resolution NMR instrument (400-800 MHz) with a dedicated LC-NMR flow probe connected to an HPLC system, equipped
130 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
with a valve or a loop collector for stop-flow experiments. The HPLC control unit is connected to the data acquisition system of the NMR for synchronisation of different operations. A sensitive detector, such as a UV, is usually coupled to the HPLC before the NMR via a stop-flow valve in order to trigger stop-flow measurements. The HPLC system is located at a distance of approximately 3 metres from the magnet, which is connected to the LC-NMR probe via a long capillary of polyetherketone (PEEK) with the smallest available internal diameter. The flow-cell consists of a non-rotating glass tube surrounded by the RF coil and connected at both ends with HPLC tubing. The homogeneity is acceptable in the small cell volumes of the flow probes because modern NMR methods reduce the requirement for spinning to optimise field homogeneity [83]. The use of relatively high volume flow probes may cause LC-peak broadening and loss of resolution. Hence, over the years, there has been a trend to reduce the volumes of LC-NMR detection cells. Volumes in the range of 200-500 µL normally used for 1H NMR have been reduced to 40-200 µL for higher field magnets [84]. NMR Flow Probe Design Conventional NMR flow-cells have an active volume of 60 µL and a total volume of 120 µL. This means that NMR will only “see” 60 µL of the chromatographic peak. If the flow rate in the HPLC is 1 mL/min, when 4.6 mm columns are used, only 3.6 seconds of the chromatographic peak will be “seen” by the NMR. One disadvantage of LC-NMR is that chromatographic peaks generally take more than 4 seconds to elute, so that the entire peak is not simultaneously detected by the NMR [85]. Synchronisation of both HPLC and NMR is needed, especially for performing accurate experiments in the stop-flow mode. Software integrating NMR and HPLC is available (VnmrJ software for Varian and integration of Hystar software and XWINNMR for Bruker) making full automation possible. There are several challenging aspects to acquiring LC-NMR data. Firstly, all mobile phases in LC-NMR are mixtures and the solvents are rarely fully deuterated. Usually several solvent resonances need to be suppressed in addition to the resonances of 13C satellites of organic solvents. Also, when the samples are flowing through the probe, solvent suppression sequences such as presaturation can take too long to run. These problems were solved with the development of the WET solvent suppression experiment [86]. Another challenge in LC-NMR is that by definition the solvent composition changes during the experiment. In reversed-phase LC-NMR one of the cosolvents is almost always water and the chemical shift of water changes as the solvent composition changes. Also, if the mobile phase is fully protonated (nondeuterated) there will be no 2 H lock to keep the frequency constant. Together, these two facts mean that the frequencies for solvent suppression are constantly changing. To compensate for this and to automatically optimise the frequency, the use of a scout scan was developed [86]. LC-NMR Modes of Operation The types of data acquisitions used in LC-NMR include: (i) on-flow or continuous flow (ii) stop-flow (iii) time-sliced acquisition, where a peak is moved incrementally through the probe with stop-flow spectra taken at each step, and (iv) on-line collection of peaks in storage loops. There is a fifth technique known as pre-concentration. With the
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 131
stop-flow, time-sliced acquisition and on-line loop collection alternatives, peak selection requires the use of a monitoring device, which is typically an LC detector. Stop-flow acquisition is a simpler mode but the repeated stop/go disturbances of the LC gives rise to peak diffusion during long-lasting experiments as well as memory effects if a large peak precedes a small peak. With storage in loops these problems virtually disappear but decomposition/isomerisation of labile compounds can still create problems when storage is prolonged. Continuous or on-flow acquisition mode measurements occur under dynamic conditions and standard LC equipment can be used, whereby the entire chromatogram is obtained with incremented NMR spectra. However, the very short period of time available for data acquisition and the reduced stability under flowing conditions limit the operation to acquiring spectra for the major sample constituents. In their review, Wilson and Brinkmann suggest that an improvement is to run an initial experiment overnight at a much reduced flow rate of about 0.05 mL/min [87]. With conventional LC-NMR systems, up to 128 scans can then be recorded per spectrum, which is a useful way to obtain an overview of mixtures of closely related natural products. On-Flow LC-NMR On-flow LC-NMR measurements are mainly restricted to the direct measurement of the main constituents of a crude extract often where the LC injection has been overloaded. Wolfender and co-workers have indicated that typically 1-5 mg of crude plant extracts needs to be injected on-column [88]. However, the detection limits can be improved by performing the analysis at low-flow or by running time slice experiments over a whole chromatogram where the flow is stopped at defined time intervals. Both modes of operation enable a higher number of transients per increments to be recorded and, thereby, a significantly better signal-to-noise ratio is obtained. Another way to improve sensitivity is to perform on-line trace enrichment whereby one-line Solid-Phase Extraction (SPE) prior to LC-NMR analysis is carried out. Stop-Flow LC-NMR Stop-flow mode is more sensitive and allows 2D NMR experiments with longer acquisition times for low concentration components. The stop-flow mode can employ any one of three different kinds of sample handling. First, the samples may be analysed directly as they elute from the column, one chromatographic peak at a time. Second, the LC pump may be programmed to “time-slice” through a chromatographic peak, stopping every few seconds to acquire a new spectrum [70]. Time Slice “Time-sliced” involves a series of stops during the elution of the chromatographic peak of interest. Time-sliced is used when two analytes elute together or have close retention times or when separation is poor [85]. Time slice is useful for resolving multiple components within a peak that are not fully resolved chromatographically or for the verifying of purity of a chromatographic peak. The third method is to collect the chromatographic peaks of interest into loops of tubing (off-line) and then flush the intact fractions into the NMR flow probe one at a time as needed [70].
132 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
Loop-Collection In this case the analytes must be stable inside the loops during the extended period of analysis. Capillary tubing should be used to avoid peak broadening and loss of analyte “seen” by the NMR spectrometer [85]. A variation of this technique is to trap the eluted peaks onto another chromatographic column to allow concentration of the solute and then re-elute with another solvent into the flow probe as a more concentrated sample [70]. The stop-flow mode gives the possibility to acquire a number of transients on a given LC peak. Satisfactory LC-1H NMR spectra of compounds present in the low µg range can be obtained. In this mode, 2D correlation experiments such as Correlation Spectroscopy (COSY), nuclear Overhauser effect Spectroscopy (NOESY), Rotational nuclear Overhauser Effect Spectroscopy (ROESY), Heteronuclear Single Quantum Coherence Spectroscopy (HSQC) and Heteronuclear Multiple Bond Coherence Spectroscopy (HMBC) are also possible provided that the concentration of the metabolite is high enough (generally more than 100 µg of sample is required) [85,88]. Stop-flow requires the calibration of the delay time, which is the time required for the sample to travel from the UV detector to the NMR flow-cell, and which depends on the flow rate and the length of the tubing connecting the HPLC with the NMR. Experiments such as WET-COSY, WET-TOCSY (Total Correlation Spectroscopy) can be run since the sample can remain inside the flow-cell for days. LC-NMR Resolution, Sensitivity and Limits of Detection Since NMR is a low sensitivity technique, analytical columns are often overloaded when a sample is injected and this will affect the chromatographic resolution and separation. Another factor that can affect chromatographic performance is the use of deuterated solvents, where in many cases analytes show chromatographic peak broadening and occasionally different retention times from protonated solvents. More chromatographic development is then required in order to obtain reasonable resolution. For the period 1985 to 1994, Hicks compares the amount of sample that could be measured successfully in 2 hours by 1H NMR (with double solvent suppression) using the same experimental conditions in the stop-flow mode (for a compound with molecular weight of approximately 400, in CH3CN/D2O, with a signal-to-noise of around 10:1 at 500 MHz) [79]. Evidently, the amount of sample has significantly decreased due to technological improvements. In 1985, 2 µg was required for this analysis. In 1987, 1 µg could be measured with the introduction of a modified coil design. By 1990, this was reduced to 500 ng with the introduction of optimised LC pathways and a modified flowcell. In 1992, with the introduction of automatic stop-flow timing, 300 ng could be measured, and in 1994 150 ng, due to the introduction of digital filtering and over sampling, Fig. (3). The limits of detection in 2002 for 500 molecular weight analytes at 1 H observation frequency of 600 MHz was reported to be ~ 100 ng in stop-flow and loop-storage mode, Fig. (3) [89]. Although continuous flow or on-flow modes use NMR as the real-time detector, sensitivity and resolution are limited by the short residence time of the analytes at 0.51.5 mL/min, and typically >10 µg per analyte are needed for quality results at the 1H observation frequency of 500 MHz [90]. Given sufficient material in stop-flow and loop-
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 133
storage mode, 2D NMR experiments, which are used routinely for characterisation and structural elucidation of natural products, are feasible, e.g. COSY, TOCSY and HSQC.
Fig. (3). The amount of sample able to be measured successfully using LC-NMR.
Typical stop-flow detection limits at the 1H observation frequency of 600 MHz using micro coil probes and capillary systems are in the 5 ng range but these require an overnight acquisition, with high concentrations of analyte required in the 1.5 µL NMRactive probe volume, as reported by Albert (2002) [90]. Data Handling The presentation of the resulting LC-NMR data depends upon the type of experiment run. On-flow LC-NMR data are usually displayed as a contour map, like a conventional 2D NMR dataset but the Fourier transformation (FT) is only applied along one axis (the F2 axis) to give a frequency-versus-elution time plot. Stop-flow LC-NMR data are usually presented as a series of individual 1D spectra, plotted one spectrum per page, although stacked-plot presentations are also used. NMR Pulse Sequences Diffusion-ordered spectroscopy (DOSY) has become a popular tool for determining which compounds bind to large-molecule receptors and is referred to as ‘NMR screening’. DOSY can deconvolute the one-dimensional proton spectrum of a mixture into individual 1D sub-spectra on the basis of the diffusion rate of each compound [91]. NMR screening is being performed using flow NMR and it is logical to try to use DOSY in DI-NMR to verify compound purity [71]. The combination of DOSY and LC-NMR, however, has had limited application to date. LC-NMR (Other Detectors) In LC-NMR, other types of in-line detectors (such as UV, fluorescence, radiochemical, RI or ELSD) have been used, either to obtain additional data or to trigger
134 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
the HPLC pump to stop at a particular peak. The advantage of incorporating a MS in this detector chain is that a sensitive and selective signal can be provided to trigger stop-flow NMR analysis. Size-Exclusion Chromatography NMR (SEC-NMR) has also been demonstrated, as has Ion-Exchange Chromatography NMR (IEC-NMR) [92]. Limitations of LC-NMR Although LC-NMR is now practically applicable for the direct observation of metabolite resonances in LC reversed-phase systems, not all the information obtained by conventional measurement in standard deuterated NMR solvents can be obtained. Another problem is the use of solvent suppression where the signals of analytes of interest that reside under the solvent peak will also be suppressed. This can be a major downside when dealing with unknown constituents. In order to detect all analyte signals, an alternative is to carry out solvent suppression using two independent solvent systems such as MeCN-D2O and MeOH-D2O [93]. A further problem is that the chemical shifts recorded in a reversed-phase solvent will slightly differ from those reported in standard deuterated NMR solvents and this can be a drawback if precise comparisons with literature data need to be made as in the case of identification of previously reported natural products (dereplication). Despite these limitations, the technique is impressive for providing on-line information in crude natural product extract analyses. This technique alone will not provide sufficient spectroscopic information for a complete identification of natural products and other techniques, such as LC-UV-DAD, LC-MSMS, LC-IR and LC-CD, are needed for providing complementary information. Ideally the integration of all these hyphenated techniques in a single set-up with centralised acquisition of the spectroscopic data would permit the complete spectroscopic characterisation of the different metabolites with a single analysis. LC-13C NMR LC-NMR mainly implements 1H NMR spectra or 1H-1H correlation experiments. Access to 13C NMR information is possible, but is restricted only to a very limited number of cases where the concentration of the LC peak of interest is high and 13C NMR data can be deduced indirectly from inverse detection experiments [94]. Due to the low abundance of the 13C isotope (1.1%) the sensitivity for direct measurement in the LCNMR mode is not sufficient. Approaches involving transfer of spin polarisation from immobilised nitroxide radicals, known as ‘dynamic nuclear polarisation’, have been reported, allowing LC- 13C NMR detection of a mixture of halogenated hydrocarbons but at present this technique has not been applied to LC-13C NMR of natural products [95]. The inability of LC-NMR to provide reasonable amounts of sample for 13C NMR is still an important limitation of this technique for natural product analysis and full on-line identification of unknowns. Many secondary metabolites, especially those with characteristic 1H signals, which are not well separated in the NMR scale, are difficult to analyse by LC-1H NMR. Compounds, which exhibit very few 1H signals and with mainly quaternary carbons or hydroxyl substituents in their structure also yield little structural information by LC-1H NMR. However, LC-1H NMR information may be sufficient for the on-line identification of previously known constituents or for studying derivatives of a known core skeleton that are likely to be present in a given extract.
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 135
Structure elucidation of unknown organic compounds is usually performed by the combined use of 1H and 13C NMR spectroscopy. The latter spectroscopy provides direct information about the carbon skeleton of molecules, thus revealing valuable structural features, such as carbonyl and carboxyl moieties, which cannot be deduced by 1H NMR spectroscopy. In on-line HPLC-NMR coupling, the commonly recorded nuclei are 1H and 19F due to their high natural abundance, 99.9 and 100%, respectively. Sensitivity factors currently preclude the direct detection of 13C resonances as previously discussed. Indirect access to the information content of 13C NMR spectra is obtained in the stopflow mode, where ‘inverse’ detected 1H-13C correlation spectra can be recorded. Quaternary carbons without any directly attached protons are not detected and, therefore, it is of major interest to record 13C NMR spectra. In principle, high quality 13C NMR spectra of neat liquid samples or highly concentrated solutions of organic molecules can be obtained using a flow probe with a 13 C RF coil (DI-NMR and FIA-NMR) [90]. However, this technique is not feasible for recording continuous or on-flow 13C NMR spectra of chromatographic peaks (LCNMR). A major increase in sensitivity needs to be achieved for this application. Increasing the Sensitivity For metabolites or natural products present in low concentrations that are too low for on-flow studies, pre-concentration is the way forward. Alternative strategies are presented: a) SPE-NMR In Solid-Phase-Extraction coupled to NMR (SPE-NMR), a HPLC system performs an initial separation of the analytes but as they elute, the HPLC peaks are individually diverted to an in-line SPE cartridge. Solvent conditions are adjusted so that the solutes are retained on this small cartridge (or column) and, once the column trapping is complete, the solutes are eluted with an appropriate solvent (which can be deuterated for ease of NMR analysis) as a concentrated plug that is pumped directly into the NMR detector flow-cell. This idea, published in 1998 [96,97], has been applied to the analysis of natural products [98,99] and resembles FIA-NMR but has the added feature of an initial HPLC separation step. b) 2D-LC-NMR Another method of pre-concentration is the technique referred to as 2D-LC and has to date been applied to LC-NMR-MS [100]. Two columns are connected in series whereby compounds eluting from an HPLC column are trapped using D2O on a second column by means of a diverting valve and back flushed into the LC-NMR flow-cell using an appropriate deuterated organic solvent. This approach has the advantage of concentrating the chromatographic peak, which improves the sensitivity of LC-NMR and minimises the spectral overlap between solvent and solute. This process can be timed using automatic switching so that transfer to the second column occurs only when a target peak is eluting from the first column. This results in trapping and, as with SPE, multiple injections increase the amount trapped and, therefore, improved NMR spectra. The second column used is smaller in diameter, which helps to concentrate the sample to an appropriate NMR volume. The procedure
136 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
employed for column trapping requires the chromatogram to be run twice where on the first occasion elution times are noted. c) Column Trapping Strategies to increase the sensitivity of LC-NMR by trapping minor components on the chromatographic column, and subsequent elution, were described in 1998 [96,97]. Each of these pre-concentration techniques has advantages and result in a several fold increase in sensitivity. In LC-SPE the chromatography is separated from the spectrometry by the SPE step so that the HPLC peak shape is not of importance. Nondeuterated solvents can be used for the chromatography so that the MS is not affected by molecular weight changes and deuterated solvents are introduced for NMR spectrometry. In 2D-LC, peak shape retains significance since the eluting peaks are being analysed. However, the overall length of time of an experiment is probably less than for LC-SPE where multiple trapping and cartridge drying are often involved. d) Column Switching Technique An on-line sample preparation system using a column-switching HPLC for the structure elucidation of compounds in mixtures by NMR was described in 2000 [101]. The system consists of three HPLC sections, one separating the compounds of interest, another trapping the separated compounds on a trapping column using H2O, and the third replacing H2O with D2O or a deuterated solvent. The system allows easy separation of a trace amount of compounds from a mixture. The procedure can also achieve concentration of the separated compounds and allows for the use of a deuterated solvent such as CD3OD for NMR measurement. Overview of LC-NMR to Natural Product Profiling LC-NMR hyphenation combines a separation step with the acquisition of spectroscopic data of individual compounds. In the past, the technique has been used for “speeding up” the structural elucidation of compounds in complex mixtures and several applications have been reported in the field of natural product analysis (see review by Wolfender and Hostettmann, 1998) [72]. More recently, a number of applications of LCNMR for the analysis of both plant and marine natural products have been reported [102-104]. The traditional way to separate a complex mixture such as a natural product extract and examine its individual components is to perform a chromatographic separation offline, collect the individual fractions, evaporate them to dryness (to remove mobile phase), redissolve them in a deuterated solvent and examine them by conventional NMR using microcells and microprobes if needed. When compounds are volatile, unstable or air sensitive this technique is no longer appropriate as is the case for many natural products. LC-NMR offers a means for immediate analysis after an on-line separation. Applications of LC-NMR to Natural Product Profiling In 1997 LC-NMR began to find wider use in natural products analysis. These early reports of the application of LC-NMR to natural products involved the characterisation of isomeric mixtures produced by exposing a pure natural product to light or heat. LCNMR was used to determine the structure of the photo-isomerisation product of the bioinsecticide azadirachtin [105] and to characterise the geometric isomers of vitamin A
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 137
acetate produced upon exposure to heat [106]. The azadirachtin study employed stopflow LC-NMR measurements, while the vitamin A acetate study used on-flow measurements to detect the resonances of olefinic protons in order to determine cis or trans configurations of double bonds. LC-NMR studies of natural products have progressed to include the characterisation of individual components in crude or partially purified extracts and have demonstrated the full power of the hyphenated technique in eliminating the need to isolate individual components from a crude extract for subsequent NMR experiments. Spring et al. in 1995 published one of the first applications, the characterisation of the sesquiterpene lactones in Zaluzania grayana by on-flow and stop-flow LC-NMR experiments [107]. The characterisation of a wide range of plant natural products. including prenylated flavones from Monotes engleri [108], monoterpene dimers from Lisianthius seemannii [109] and napthoquinones from Cordia linnaei [110] have also been reported. In addition, napthylisoquinoline [111,112] and pyrrolizidine [93] alkaloids, sesquiterpene lactones [113-116], phenylphenalenones [117], taxanes [118], lignans [119], glycosides [120,121] and other compounds [122-127] have been identified from LC-NMR data. Most of these reports described the identification of known members of these structure classes by LC-NMR, but the technique has also been used to determine the structures of new analogues of known compounds [107-109]. In these examples, stop-flow measurements employing 2D NMR experiments such as COSY and TOCSY, HSQC and NOESY have been applied to determine the structure of novel compounds [8,72,128]. Wolfender and co-workers [55] have investigated various plant species in which they characterised, among other classes of compounds [125], prenylated flavonones [108], secoiridoids [109,124], naphthoquinones [110], pyrrolizidine alkaloids [93], benzophenones [129] and ‘quinone methide’ diterpenes [130]. Other groups have dealt with naphthylisoquinoline alkaloids [131-133], sesquiterpene lactones [113,135,136], triterpene saponins [137], taxanes [118], lignanes [119], phenylphenalenones [117], polyhydroxy steroids [138], fasciculol triterpenes [139], tocopherols and tocotrienols [126], carotenoid isomers [140], flavonoids [141,142], hop bitter acids [122,143] as well as betacyanin pigments [144]. Applications published so far deal with the characterisation of plant-derived mixtures, while applications to secondary metabolites of micro-organisms [145,146] or marine natural products [8, 147, 148] are still relatively uncommon [107]. Most frequently in natural products analysis, the stop-flow mode is selected to acquire 1H spectra of the compounds of interest or, if further structural information is required, to acquire 2D 1H NMR spectra. In many cases, an on-flow NMR chromatogram (usually at flow rates between 0.3 and 1 mL/min) is recorded beforehand, either to screen for the presence of particular groups of compounds or to gain a general overview of the sample composition. Heteronuclear LC-NMR experiments such as HSQC and HMBC of a natural product have been reported in the literature [108] for highly enriched fractions. ‘Time-sliced’ stop-flow and on-flow approaches at low flow rates have been applied to natural product extracts in order to combine the advantages of both on-flow (a ready overview on the entire sample) and stop-flow (sufficient acquisition time for minor compounds) modes [129,132,147,148]. The sensitivity of the NMR experiment depends on concentration and, so for a given amount of sample, molecules with a lower molecular weight give a more intensive
138 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
signal. However, larger molecules usually possess shorter relaxation times, which allow faster spectral accumulation. Publications to date have focused mainly on the characterisation and identification of small molecules up to a molecular weight of about 700. Nevertheless, LC-NMR has been successfully applied to identify saponins of a molecular weight up to 1400 [147,148] and to characterise larger biomolecules such as glycosphingolipids [149]. LC-NMR has been applied to marine natural products and used to identify the marine alkaloid aaptamine in 2000 [37]. Aaptamine was found to be the active component in the crude dichloromethane extract of the sponge Aaptos sp. UV maxima obtained from HPLC profiles and mass data were used to search the marine natural products database, MarinLit, with four possible matching compounds identified. NMR data obtained from the LC-NMR run unequivocally identified the compound. An HPLC-NMR investigation of exchangeable protons in low molecular mass natural products has been reported [150]. Model alkaloids or crude plant extracts were dissolved in 2H2O-1H2O-MeCN (deuterium oxide-water-acetonitrile) or 2H2O-MeCN and after direct injection or chromatographic separation examined in a 60 µL NMR flow probe. Exchangeable amino protons initially detected by HPLC-MS(ESI) were subsequently identified and investigated by stop-flow 1H NMR, 2D TOCSY and NOESY. Another application of LC-NMR are biosynthetic studies employing feeding experiments with stable isotope-labelled compounds. 1H NMR spectra allow determination of the amount of isotopic label incorporated into metabolites by observing signals that arise from J-couplings of protons to 13C-labelled nuclei [151,152]. Extended Hyphenation (HP)LC-NMR Combined with (HP)LC/CD a) Relative Stereochemistry Elucidation of the relative configuration of purified compounds by on-line methods was reported in 1998 [131]. NMR signals and mass fragments do not intrinsically reveal chiral characteristics, so that information concerning the absolute configuration is not directly available. A method that allows the assignment of absolute configurations of compounds in an extract matrix would complete the structural elucidation without the necessity of isolation and purification. Circular Dichroism (CD) spectroscopy is a method that is widely used for the assignment of absolute configuration if experimental data from structurally related compounds is available or if structures fit into empirical CD rules [153]. In the past, detectors based on Optical Rotation Dispersion (ORD) have been used but limitations of this technique include low sensitivity and lack of compatibility with gradient elution. CD is based on the absorption difference between right and left circularly polarized light and shows up only in the presence of a chromophore, which gives CD a higher intrinsic stability and sensitivity compared to ORD. Bringmann et al. report RPLC on-line coupling to CD spectroscopy for the stereochemical analysis or crude plant extracts [133]. The method was applied to the tropical liana Habropetalum dawei (Dioncophyllaceae) in which the extract was initially investigated by HPLC-NMR and HPLC-ESI-MS-MS. From these combined
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 139
investigations, two unknown natural products were detected for which the configuration remained unknown. Subsequent HPLC-CD experiments permitted the assignment of the absolute configuration of the two metabolites by empirical analysis of the CD data. The combination of NMR, MS and CD data permitted the complete structural elucidation of two new metabolites, phylline and 5’-O-methyldioncopeltine A, in the extract without any isolation and sample purification prior to the coupling experiments. A photometric screening method for dimeric naphthylisoquinoline alkaloids has also been reported whereby the complete structure, including relative configuration, was elucidated on-line by LC-MS, LC-NMR and LC-CD on the extract [134]. b) Absolute Stereochemistry Classically, chemical methods such as Mosher’s ester synthesis have been frequently used for the characterisation of various isolated natural products bearing secondary alcohol functions [154]. In this case, the 1H NMR spectra of (R)- and (S)-2-methoxy-2phenyl-(trifluoromethyl) acetic acid (MTPA) ester derivatives of the analytes are compared. The difference in the chemical shifts of the (S)-MTPA and (R)-MTPA diastereoisomers (∆δH = δS –δR) indicates whether the alcohol function is (R) or (S) based on established conformational models [155]. Until recently Mosher’s ester synthesis was only used in combination with standard NMR. A recent study of crude reaction mixtures by LC-UV-MS and LC-NMR has proven to be a rapid and sensitive method for absolute configuration determination. This on-line method can be applied to a restricted amount of sample or to a fraction difficult to obtain in a pure form. The absolute configuration was determined using only a few milligrams of natural products. This technique was successfully applied to the determination of the absolute configuration of two α-pyrones isolated from the root bark of Ravensara crassifolia (Lauraceae) [156]. Another example of the use of the Mosher esterification method in conjunction with LC-NMR analysis for the elucidation of absolute configurations of micro quantities of available pure substances was reported in 2003 [156]. The absolute configuration of a novel tetrahydrophenathrene from Heliotropium ovalifolium was determined by LCNMR of its Mosher esters [157]. LC-UV-NMR-ELSD The ELSD fulfils the requirements for a sensitive, universal, conventional detector and was investigated by Petritis et al. (2002) as a potential on-line monitoring detector for LC-NMR using un-derivatized amino acids as model compounds [158]. These compounds were selected because they contain either strong, weak or no chromophore groups (e.g. phenylalanine/serine/taurine) or even fluorophore groups (tryptophan). This is particularly important for natural products, as so often many bioactive compounds do not possess a UV chromophore. Indeed ELSD’s are frequently used together with UV or UV(DAD) detectors in the isolation of natural products. The ELSD, like a MS, is a destructive method and should be coupled in parallel and not in tandem with NMR. An important parameter that must be set up carefully in online stop-flow LC-NMR is the exact time difference between the arrival of the chromatographic peak at the monitoring detector and the LC-NMR flow-cell. In fact, the signal from the detector is used to trigger the automated NMR acquisition sequence, including the interruption of chromatographic elution. In addition, the quantity directed
140 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
towards the parallel detector (MS or ELSD) should be minimised to increase NMR sensitivity. Petritis et al. pioneered the use of ELSD prior to NMR acquisition for the on-line monitoring of the chromatographic separation of compounds with weak or no UV chromophore. LC-NMR-CLND Another detector worthwhile investigating as an in-line monitoring detector for LCNMR experiments, especially for nitrogen containing compounds such as alkaloid natural products, is the Chemiluminescence Nitrogen Detector (CLND). This detector is well adapted for sensitive amino analysis detection [158]. Applications of LC-NMR Combined with LC-MS to Natural Product Profiling Application of both LC-NMR and LC-MS to a partially purified extract of Veronia fastigiate led to the direct identification of nine antibacterial sesquiterpene lactones without isolation of individual compounds [114]. The rapid structural analysis of both major and minor components of this class of compounds demonstrated the power of structure-guided screening as a complementary method to bioassay-guided screening. In LC-NMR significant segments of proton spectra are lost through solvent suppression (2.0 ± 0.125 ppm, acetonitrile signal, and ~4.5 ppm, water signal). A method was reported to obtain a complete set of proton resonances. In order to get information about the suppressed signals, the LC-NMR spectra were run again in a methanol/water gradient. Comparison of the resulting spectra with the spectra obtained in the acetonitrile/water allowed the assignment of all proton resonances of the corresponding compounds using just two LC-NMR runs. A review by Hostettmann and Wolfender (1999) described the application of LCNMR and LC-MS in the search for new bioactive compounds from plants of the Americas [159]; while Sanvoss (2002) provided an overview of the application of LCNMR and LC-NMR-MS hyphenation to natural products analysis [107]. LC-NMR together with LC-UV-MS was used to achieve on-line identification of unstable cinnamoyal catalpol glycoside esters from the plant Jamesbrittenia fodina (Scrophulariaceae) by the group of Hostettmann and Wolfender (2003) [160]. An on-line identification of the antifungal constituents of Erythrina vogelii was carried out using LC-NMR and LC-MS-MS [SU79]. The chemical screening strategy with integrated antifungal bioassays permitted on-line identification of numerous constituents and provided a useful and efficient peak-guided isolation procedure. On-line and off-line HPLC-NMR and HPLC-MS analyses of the extracts Jasminium subtriplinerve and Terminalia macroptera have been conducted [161]. The isolation of bioactive compounds via bioassay-guided separation techniques, such as online HPLC-bioassays, and the structure determination by MS and NMR of the bioactive principals from these extracts were carried out. Biologically active natural products were isolated by means of on-line LC-bioassay, LC-NMR and LC-MS from Cussonia barteri (Araliaceae), Terminalia macroptera (Combretaceae), Jasminum subtriplinerve (Oleaceae) and Petunia hybrida (Solanaceae) [162].
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 141
The use of (HP)LC-NMR and other hyphenated techniques such as (HP)LC-MS-MS for identification of natural products from plant sources has been reviewed by Wolfender and Hostettmann [8,72]. Plant products, including polyphenols and bitter components from Gentianaceae species, have been characterised [127] including the assignment of stereochemistry at a double bond in a new secoiridoid glycoside, seemannodide [163]. Other studies include the identification of antifungal materials from the African plant Swertia calycina [127,163], compounds from the Leguminosae family [55], prenylated flavanones from dichloromethane extracts of Monotes engleri [108], naphthoquinones from Cordia linnaei [110], pyrrolizidine alkaloids from Senecio species [125] and antioxidant compounds from the leaves of Orophea enneandra [93]. New naphthylisoquinoline alkaloids from a root extract of Ancistrocladus likoko have also been identified using directly coupled (HP)LC-NMR spectroscopy [111]. Other natural product studies that have used (HP)LC-NMR, include the characterisation of vitamin derivatives [77], saponins from Bacopa monniera [137], antibacterial sesquiterpene lactones from an extract from Veronia fastigata [136], components from Hypericum perforatum L. [164] and ecdysteroids from Silene otites [138]. In the latter study, directly coupled (HP)LC-NMR-MS was also used to identify additional compounds. The metabolism of 2,3,10,11-oxygenated protoberberine alkaloids was studied in cell cultures of Corydalis species. The structures were determined by LC-NMR and LC-MS analyses [165]. LC-NMR has become an important technique for the investigation of natural products. The application of LC-NMR for screening plant constituents has been described in numerous publications [166]. LC-NMR is generally considered a very powerful (although not a fast) technique and exploited further by hyphenating it with MS to produce LC-NMR-MS. (HP)LC-NMR-MS It is seldom possible to solve the structure of a novel compound by NMR alone. Common functional groups such as carboxylic acids, phenols and amino groups are NMR-silent in many solvents because of proton-deuterium exchange. Nitro groups and sulfate conjugates do not contain protons and although these functional groups are not directly detectable in proton NMR spectra, they can be readily detected by MS. Conversely, MS data might give molecular formulae that are insufficient to unambiguously assign the molecular structure of an unknown compound. In the more difficult cases closely eluting isobars and isomers are indistinguishable by LC-MS. Therefore, NMR and MS data of the same analyte are crucial for structural information. When different isolates, such as metabolites, are analysed, one cannot always be certain that the NMR and the MS data apply to the same analyte. To avoid this ambiguity, LCMS and LC-NMR are combined. In practice, the use of either LC-MS or LC-NMR on their own can solve many analytical problems but there are occasions when they each fail and additional techniques are required. In these cases using both LC-MS and LC-NMR can provide further information and, usually, structural identity. A natural progression is to combine the two into one integrated LC-NMR-MS system. In addition MS, in scanning or MS-
142 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
MS mode is also used to provide fragment ion data on selected peaks to coordinate with the NMR data. Apart from the gain in time and sample efficiency, the most important advantage of the doubly hyphenated LC-NMR-MS set-up over LC-MS and LC-NMR individual set ups is the unequivocal assignment of the MS data to the NMR peaks. LC-MS and LCNMR chromatograms of the sample are sometimes difficult to correlate because slightly different chromatograms are obtained in the two systems. This is due to the differences arising from the higher amounts of injected sample in LC-NMR compared to LC-MS, the different chromatographic behaviour of deuterium oxide compared to water or different gradient-forming units in the respective chromatographic systems. The double hyphenated technique involving the coupling of MS and NMR was first described in 1995 [167] and a fully integrated commercial LC-NMR-MS system launched in 1999 [100]. LC-UV-MS-NMR UV detection after the HPLC column is an advantage and thus, strictly speaking, the technique becomes LC-UV-NMR-MS. The steps involve acquisition of a HPLC chromatogram with the generation of a UV profile for the extract, followed by NMR detection and finally MS detection. Alternatively, the NMR and MS detections can be synchronised so that the peak of interest is delivered to the NMR and MS detectors simultaneously, Fig. (4). The UV detector can be either a variable wavelength instrument or a DAD capable of providing UV spectra to complement the NMR and MS data. Although not essential, the UV detector provides a very convenient means of monitoring the separation and aids the spectroscopist to accurately determine the delay times between the analyte exiting the chromatographic system and reaching the spectrometers. Solvent selection is a compromise between the ideal requirements of each instrument. For (HP)LC-NMR the use of inorganic buffers, like sodium phosphate, for pH modification is preferred because no additional signals are introduced into the NMR spectrum. However, this type of buffer system is currently incompatible with most (HP)LC-MS systems using an ESI interface. An alternative acidic modifier is trifluoroacetic acid (TFA), which has no protons to complicate the NMR spectrum but with acidic analytes ion suppression is observed even at high sample concentrations in the MS. Formic acid is a suitable compromise between the needs of MS and NMR. The single proton of formic acid has a sharp, readily suppressible NMR singlet at δ 8.5, and enables MS data to be acquired for acidic analytes. As previously discussed, insensitivity is the major disadvantage with NMR, with relatively large amounts of analyte required to give good signals compared with MS. For 1 H NMR, 1-5 µg on-column in flow mode will produce a good signal, whereas only ng amounts or less are needed for MS. Splitting the column flow unevenly, with 2-5% being directed to the MS and the remainder to the NMR spectrometer, overcomes this difference. Fig. (4) [100]. Recent studies have devoted attention to the use of superheated water (or D2O) as the eluant in LC-NMR-MS [87]. While the detection performance is increased, the approach
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 143
is not suitable for all analytes due to the elevated temperatures causing degradation of thermolabile compounds, which is often the case for many natural products.
Fig. (4). Schematic of the hyphenated spectroscopic technique, LC-NMR-MS. In order to obtain sufficient accumulation time for NMR spectra, on-flow LC-NMRMS investigations can be performed as overnight experiments at eluant flow rates lower than usual (e.g. 0.05 mL/min). This allows a number of scans (~128), to be recorded per spectrum. Sanvoss et al. report that for both early and late eluting asterosaponins, such a low flow rate did not result in peak broadening [135]. Another instrumental conflict is the effect of the magnetic field (typically 400-600 MHz, up to 800 MHz) of the NMR instrument on the MS. Wilson indicated that the high stray magnetic field from the NMR magnet can have adverse effects on the MS and, in practice, means that the mass spectrometer is best sited outside the 5 Gauss line [167]. Modern superconducting NMR magnets are actively shielded so this problem should not occur for commercial systems, but for older NMR or lab-built systems positioning of the mass spectrometer away from the NMR magnet is critical. In the case of (HP)LC-NMR-MS experiments there are some additional considerations. The principal MS ionisation method has been ESI in either positive or negative ion mode (using either single quadrupole or ion-trap mass spectrometers), which puts further constraints on the chromatographic solvent systems. Running the instruments in series allows for the completion of all NMR experiments with either an on-flow or stop-flow procedure adopted before the beginning the sample destructive MS analysis. The possibility of peak dispersion before MS analysis is introduced for any peaks trapped between the NMR and MS when the flow is stopped. Series operation causes the NMR flow-cell and connections to be operated at pressures higher than
144 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
designed with the consequence that leaks are more likely. Running in series also fails to take advantage of the ability for the mass spectroscopist to quickly flag up peaks of interest [168]. Operation of the instruments in parallel is advantageous. The delay between the appearance of the peak in the UV detector and detection by the NMR and MS can be manipulated such that the analyte enters both spectrometers simultaneously. Alternatively, the analyte can be detected first in the MS enabling an informed decision on obtaining a stop-flow NMR spectrum for a particular peak. Reviews by Wilson [167] and Elipe [85] suggested that MS data should be obtained using an off-line approach such as LC-MS, since with NMR data collection in the stopflow mode can take hours or days depending on the complexity of the structure and the amount of sample. As in the case of LC-NMR, the void volumes in a coupled system are high. The NMR flow-probe or the storage loop, which is located further downstream, is reached approximately 10-40 seconds after the peak first appears in the LC detector and needs to be calibrated. A problem with the use of deuterated solvents such as D2O with MS is that the deuterium can exchange with protons on ionisable groups. The practical consequence of this is that the wrong molecular mass may be determined unless the mass spectroscopist takes exchange into account. Sample Recovery in LC-NMR or LC-NMR-MS After separation of the sample and acquisition of spectroscopic data, the sample can be collected in a fraction collector. Further investigation, especially by different MS techniques can follow and fractions collected to be put through a bioassay. Automation To date the majority of LC-NMR-MS applications use stop-flow mode. In this mode, the sample, as in conventional NMR, is stationary. Once the analyte reaches the active NMR probe volume, the flow is stopped and the subsequent eluant is simply diverted either to waste or to on-line storage loops. Loop-storage, as discussed earlier, is a valuable advance whereby chromatographic peaks are trapped on-line in a 36-loop cassette (Bruker BioSpin) or in any array of loops (Varian Inc.) determined by a UV or MS detection threshold [169,170]. Automation directs peaks to the loops and controls the transfer of loop contents, one-at-a-time, to the NMR flow probe for stop-flow analysis. Loop-storage avoids peak diffusion on-column, which reduces NMR sensitivity when several analytes require long NMR acquisition. Less sample is required because all the analytes in a single injection are stored and acquired later under automation. With the Bruker-NMR-Mass spectrometry Interface (BNMI) integrated system, MS experiments can be obtained on the loop contents while the NMR acquires data. The loop-storage mode is a highly efficient mode of operation. Manual, semi-automated or fully automated operation is possible in both stop-flow and loop-storage modes. For flexibility, switch between modes is possible during a run. The goal of LC-NMR-MS is to target important analytes for efficient NMR analysis. In natural product drug discovery programs, where novel diverse molecules are present in complex mixtures, UV and MS provide insight for further NMR experiments. Routine acquisition of LC-NMR-MS can be significantly improved by optimisation of the chromatographic separation conditions. Typically, analytical scale reversed-phase
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 145
HPLC uses columns of 2-4.6 mm i.d. × 5-250 mm length and flow rates of 0.2-1.5 mL/min. Mobile phase mixtures employing deuterated water and non-deuterated organic modifiers are routinely employed (acetonitrile/D2O or methanol/D2O) together with efficient solvent suppression routines. Occasionally a fully deuterated organic solvent is used to minimise solvent artefacts in the NMR spectrum. It is seldom possible by 1D 1H NMR to fully characterise co-eluting analytes when NMR signals from two or more compounds of similar concentration are overlapped. Online MS detection can be used to deconvolute co-eluting components, but only if they are not isobars or isomers. If sufficient material is available, 2D 1H-1H NMR is useful to determine correlations between the resonances within the co-eluting analytes but can be time consuming. Therefore, optimising the chromatographic resolution is frequently the best solution. Until recent years, interfacing LC with NMR and MS has been difficult, but many problems such as sitting the LC and MS physically close to the NMR magnet have now been resolved by the use of shielded magnets. A benchtop ion trap MS can now be placed <1 m from the centre of a 500 MHz NMR magnet without adverse performance. The short distance also limits LC peak broadening and results in enhanced NMR sensitivity. Practically, a simple hyphenation of LC to NMR and MS is achieved using a post-column splitter, which directs 90-95% of the flow to the NMR via a 1-2 m capillary and the remainder to the MS. An alternative is the valve-switching interface termed the BNMI, which is a computer-controlled splitter, and a double dilutor that provides an appropriate make-up flow for optimal ionisation in the MS. BNMI also permits protondeuterium exchange to simplify MS spectra otherwise obtained in LC-NMR-MS. The BNMI also plays an important role in LC-NMR-MS loop-storage mode in which a portion of the loop contents, on transfer to the NMR, can be stored in a delay loop. During NMR acquisition, the dilutor then slowly infuses analytes into the MS. These integrated LC-NMR-MS systems are highly versatile, operating as needs dictate. For instance, LC-MS can be used routinely during longer NMR experiments. In the area of natural products the hyphenated technique LC-NMR-MS has been applied as a rapid screening method of searching for unknown marine natural products in chromatographic fractions [171] and for the separation and characterisation of natural products from plant origin [172,173]. LC-NMR-MS and Natural Product Profiling In drug discovery from natural products it is crucial that duplicate isolation of known or undesirable compounds from complex extracts is avoided. An explosive growth in the use of LC-NMR and LC-NMR-MS for dereplication has taken place in the past few years [37,55]. While the majority of publications have identified known structural classes to evaluate the use of LC-NMR and LC-MS for dereplication, only a few have reported interesting novel bioactive analogues. The first application of LC-NMR-MS to natural products analysis was reported in 1999 [174]. The additional MS information allowed the identification of a further ecdysteroid in an extract of Silene otides, which could not be identified by LC-NMR alone [138]. Further applications of this double hyphenation technique dealt with the identification of napthodianthones [164] and flavone glycosides [164,175] in natural product extracts. In recent years a number of studies on the use of LC in combination
146 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
with NMR and MS detection in a single setup have been reported [87,168,176]. In view of the demanding nature of the non-target approach that is generally prevalent in natural product analysis, LC-NMR-MS would be expected to be the method of choice. With regard to LC implementation with NMR and MS, there appear to be two “schools” of thought. Researchers such as Wolfender and Hostettmann prefer to use the two spectroscopic techniques separately. LC-NMR is generally the most demanding technique and may compromise the use of other hyphenated technique(s) [177,178]. Data from LC-UV(DAD) and LC-MS-MS are acquired at the same time while LC-NMR and LC-micro-fractionation are performed in a separate second analysis. Loading of the sample can now be optimised for each technique and D/H exchange problems during MS detection do not occur. In addition, 10% of each micro-fraction (used for a bioassay) is kept for post-run UV-MS analysis in case of doubt regarding the attribution of LC peaks. This latter precautionary measure indicates the limitation and weakness of this approach with respect to the problem of peak correlation between the two runs, especially if minor peaks are considered. At present, two stand alone LC-MS and LC-NMR setups seem preferred by workers over a single hyphenated system. A recent demonstration of the potential of having both NMR and MS available in one LC run was described in a study on the Baltic starfish, Asterias rubens [171]. Previously unreported asterosaponins were identified from sub-fractions after rapid sample preparation by matrix solid-phase dispersion. Despite the high complexity and close analogy of the structures, the targeted isolation and off-line structural elucidation of seven new compounds was achieved and up to 17 individual constituents could be characterised in a single chromatogram of an asterosaponin fraction. An application of LC-NMR-MS also was reported in the separation and characterisation of two secoisolariciresinol diglucoside isomers in flaxseed [172]. LC-NMR-MS hyphenation was applied to the screening of an asterosaponin subfraction from the starfish Asterias rubens for unknown compounds. This method was successful in indicating the presence of unidentified asterosaponins in the sample and subsequently resulted in the isolation of four new compounds, ruberosides A-D. The compounds were elucidated using 1D and 2D NMR techniques. In the case of asterosaponins, several complicating factors were indicated. Conventional screening methods, such as column chromatography-TLC or HPLC-PDA, are not sufficiently specific as asterosaponins are difficult to separate and show little UV absorbance [147]. (HP)LC-MS methods employing soft ionisation techniques such as ESI are well suited in principle but yield little or no information on the various structural isomers. Also, the response under ESI or APCI conditions strongly depends on the chemical nature of the compounds. 1H NMR spectra can be used to check the purities of isolated compounds but when applied to the analysis of mixtures even 2D NMR spectra are difficult to interpret because of their complexity. An LC-NMR-MS method was developed in 2000 to characterise subfractions of toxic glycosidic constituents of starfish (asterosaponins), which were not sufficiently separated by classical preparative column chromatography [147]. Asterosaponins are ∆9(11)-3β, 6α-dioxygenated steroids with a sulfate group attached at C3 and an oligosaccharide chain containing five or six sugar units at C6. Individual representatives of this homogeneous class differ in their steroidal sidechain or their sugar moiety while the steroidal nucleus is common to all asterosaponins. There are about 100 different
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 147
combinations of sugar moiety and sidechain described in the literature. Dereplication by LC-NMR-MS allowed recognition of novel asterosaponins at an early stage of the dereplication process. Within 2 days, starting from the intact animal material, an overview of the composition of the asterosaponin fraction was obtained by Wolfender et al. [177]. In this study, LC-MS of asterosaponins did not permit identification as identical molecular masses and fragmentation patterns were observed. In such situations, however, LC-NMR-MS facilitates the identification of unknown compounds in the presence of known compounds and thereby prevents unnecessary re-isolation. Natural product isolations will also benefit from faster dereplication using automated loop-storage LC-NMR-MS and is illustrated by the identification of nine closely eluting and isomeric aporphine alkaloids in the Taiwanese plant Litsea genus using 50 times less material compared to conventional experiments using 5 mm tubes and the same field strength NMR spectrometer [170]. Automated loop-storage LC-NMR-MS also led to rapid identification of the main alkaloid classes from extracts of four medicinal plants from the Cape Peninsula region [176]. Natural product investigations using nonhyphenated approaches can take several months, in comparison these results were achieved in 36 hours using fully automated loop-storage mode on a 500 MHz spectrometer using only 150-300 mg of each extract. Microbial production of secondary metabolites is another facet of natural product isolation. Identification of novel warfarin metabolites from Streptomyces rimosus [169] and the identification of the antibiotic aristeromycin from Streptomyces citricolour [145] were achieved by LC-NMR studies. Similarly, LC-NMR and LC-MS were used to identify bioactive molecules from marine sources such as the sponge Aaptos sp. [37] and the starfish Asterias rubens [147]. Clearly the discovery of novel molecular diversity from such phytochemical, microbial and marine sources will be greatly facilitated by advances in LC-NMR-MS. The factors that hamper complete on-line identification of unknown natural products are mainly linked to the present limitations of LC-NMR. LC-UV or LC-MS provide comparable types of spectroscopic information to those recorded for pure constituents and sensitivity does not compromise the HPLC separation. LC-UV-MS and LC-NMR are very efficient for the recording of spectra in crude reaction mixtures at the microgram level. LC-NMR-MS cannot fully replace LC-MS, LC-NMR or even NMR techniques for the structural elucidation of compounds. Cases will always exist were purification of the analyte(s) is required, when the structural problem is too complex or the separation of the chromatographic peak is not suitable. All techniques need to be available to the analyst to choose that which is appropriate for each structural problem. LC-NMR-MS and Beyond LC-SPE-NMR-MS Solid-phase extraction (SPE) has been integrated inline in the LC-NMR-MS sequence. Unlike most SPE applications, which are carried out on crude sample extracts in order to obtain a mixture of analytes, the SPE cartridges are positioned after the HPLC column so that eluting components can be trapped on their own cartridges. Thus, SPE acts as a fraction collector where multiple injections can be undertaken in order to accumulate the target compounds. The individual components in the cartridges can then
148 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
be washed before elution to the NMR and MS. The sensitivity advantages are significant where one trapping step has improved the signal-to-noise ratio for NMR by up to a factor of four. An approach of combining Matrix Solid-Phase Dispersion (MSPD) extraction with LC-NMR-MS hyphenation was first applied in 2001 for screening of the asterosaponin fraction of the starfish Asterias rubens [60]. The intention was to demonstrate that MSPD could be successfully used with on-line LC-NMR-MS to provide information about the compounds in the asterosaponin fraction. In particular, the on-flow LC-NMRMS approach was carried out to assess if adequate separation of closely related asterosaponins could be achieved with sufficient quality spectroscopic information to permit identification. Use of an on-line LC-UV-SPE-NMR-MS system with a cryogenic NMR flow probe was reported in 2003 for the automated analysis of a Greek oregano extract [166]. Combining the data provided by the UV, MS and NMR spectra, the flavanoids taxifolin, aromadendrin, eriodictyol, naringenin, and apigenin, the phenolic acid rosmarinic acid, and the monoterpene carvacrol were identified. The automated technique was found to be very useful for natural product analysis and the greater sensitivity enhancement due to the use of the cryogenic NMR flow probe led to significantly reduced NMR acquisition times. To further increase LC-NMR sensitivity, the SPE method for concentrating the compounds of interest was used for peak storage after the LC separation and prior to NMR analysis. The SPE unit allows the use of normal protonated solvents for the LC separation and fully deuterated solvents for flushing trapped compounds into the NMR probe. Therefore, solvent suppression was no longer required and multiple trappings of the same compound from repeated LC injections was utilised to solve the problem of low concentration and to obtain 2D heteronuclear NMR spectra. When compared to the classical uses of UV, MS and NMR spectroscopy applied to pure natural products, ideally the integration of all these techniques in their hyphenated forms (LC-UV, LC-MS and LC-NMR) in a single setup with centralised acquisition of the spectroscopic data should permit the complete spectroscopic characterisation of different metabolites in a mixture in a single analysis. Further, other existing hyphenated techniques, such as LC-IR and LC-CD, may also result in valuable additional information. The combination of all these techniques in a single setup is possible and the creation of a ‘total analysis device’ has been demonstrated in the case of on-line HPLCUV(DAD)-NMR-MS-FTIR analyses [178]. The coupling of all these different techniques (especially LC-NMR-MS) is not an easy task to perform since operation conditions that are compatible with all have to be found. The possibility of acquiring all data during a unique analysis gives an opportunity to efficiently associate the set of online spectroscopic data with a given peak and renders the processing easier to perform. Extended Hyphenation LC-UV(DAD)-NMR-FTIR-MS LC-UV(DAD)-NMR-MS with the parallel sequence is generally preferred to the serial NMR-plus-MS sequence and the next extension was to include FTIR by collecting the effluent from an NMR instrument on-line with associated solvent evaporation and
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 149
subsequent off-line FTIR detection. The triple hyphenation of (HP)LC-NMR-IR-MS for collecting the eluant for subsequent FTIR measurements was described by Wilson et al. [42] in 1999. The suitability of an integrated LC-UV-NMR-FTIR-MS system for the analysis of natural products was assessed using ecdysteroids whereby two compounds, 20-hydroxyecdysone and polypodine B, were unequivocally identified [43]. A Time-Of-Flight (TOF) based LC-UV(DAD)-NMR-FTIR-TOF-MS system was designed with IR spectra obtained on-flow to study a reasonably concentrated mixture of four non-steroidal anti-inflammatory drugs including ibuprofen and naproxen [179]. This was undertaken as a “proof of concept” and was successful in that good quality spectroscopic data and accurate masses were obtained for all analytes. In these studies stop-flow 1H NMR was used. They showed that diagnostic UV, IR, MS and 1H NMR spectra could be obtained on-flow and in a single run for quantities of material on the order of 50-100 µg. Here the layout of the instrumentation was not fully optimised and further work [180] indicated that a five- to ten-fold improved performance could be achieved by paying attention to minimising band broadening. This ability to obtain four sets of correlated data is a very exciting development particularly to the field of natural products chemistry. Other Hyphenated Spectroscopic Techniques GC Hyphenation Among the separation-identification setups, GC-MS was the earliest and the first hyphenated technique to become useful for routine research and development purposes and has been used to profile metabolites for over forty years. Neither GC-FTIR nor GC coupled to an Atomic Emission Detector (GC-AED) has achieved the same popularity as GC-MS. FTIR is suitable for functional-group recognition, but much less so for true analyte identification. However, GC-FTIR-MS systems have shown utility in the analysis of natural products such as flavours, fragrances and essential oils. A number of reports discuss the application of GC hyphenated techniques [181,182]. For complex samples typically subjected to GC-FTIR-MS, a more sophisticated GC analysis was needed, and hence a GC-GC setup with a cryogenic trap between the two GC columns was used [183]. The recent introduction of GC × GC-TOF-MS has led to markedly successful separations. Recently there has also been renewed attention to the application of coupled-column techniques to enhance resolution with typical examples, including (normal phase) LC-GC and, increasingly, GC × GC (2D-GC) and LC × LC (2D-LC), also known as comprehensive techniques (see review by Wilson and Brinkmann) [87]. The use of GC-MS is restricted to compounds that are sufficiently volatile to pass through GC columns at temperatures of up to 400°C, the upper limit of ‘high temperature’ capillary columns. In natural product research, GC-MS is most frequently used to analyse essential oils. Applicability of GC-MS is limited to less polar compounds with molecular masses below 1 kDa. In many cases, natural products have to be derivatised prior to the chromatographic separation whereby the chromatographic separation, appearance of mass ions and fragmentations are all affected by the choice of derivatising agents used for substitution of polar groups.
150 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
SFC-MS A short time after the development of commercial instruments for supercritical fluid capillary (SFC), normally using CO2 fluid equipped with usual gas chromatographic detectors, coupling with MS was achieved using direct introduction systems. The restrictions typical of capillary SFC, such as the low sample capacity which also reduces the efficiency of the mass detection, has limited applications of this technique. SFC-NMR The coupling of NMR with SFC was reported in 1994 [184] and an on-line SFCNMR system proposed in 1995 where SFC processes of roasted coffee and black pepper samples were monitored using 2D 1H NMR spectra [185]. The temperature and pressure conditions need to be maintained to have the mobile phase in supercritical form in the flow-cell but the problem with solvent signals is eliminated compared to LC-NMR. HILIC-MS(ESI) A technique known as Hydrophilic Interaction Chromatography coupled with ESI ionisation mass spectrometry (HILIC-MS(ESI)) was developed in 1999 to meet the need of evaluating highly polar natural products that cannot be retained on traditional reversed-phase stationary phases [56]. Specifically, amide, polyhydroxyethyl aspartamide and cyclodextrin based packings provided superior performance for the analysis of a set of polar natural product compounds where the properties of the mobilephase buffers also greatly impacted on the separations. LC-ICP-MS The hyphenation of (RP)LC and Inductively Coupled Mass Spectrometry (LC-ICPMS) has become a routine methodology in many laboratories. The technique is most popular for the study of metal-containing compounds but is also well suited for the analysis of halogen, P- and S-containing compounds. RECENT ADVANCES AND SPECTROSCOPIC TECHNIQUES
THE
FUTURE
OF
HYPHENATED
NMR Probe Developments Most recent major NMR probe developments include the “cryo-probe”, the use of small-scale micro-coils for data acquisition of very small samples, and the introduction of multiplex probes. The most significant recent advance has been the cryogenic cooling of the NMR radio frequency coils and electronics to give greatly enhanced sensitivity. The first cryogenic probe built in flow configuration and its application in an LCNMR-MS study of acetaminophen metabolites in urine was reported in 2003 [186]. Previously undetected metabolites of acetaminophen were directly observed in a 15minute on-flow experiment using 40% less material than formerly reported. The probe provides superior sensitivity over conventional non-cryogenic flow NMR probes, allowing analysis of lower concentrations of metabolites or natural products. The strategy is applicable to samples containing mass-limited analytes, such as natural product extracts. NMR detection limits have advanced significantly with the recent introduction of the so-called cryo-probes (Varian Inc. and Bruker Biospin) whereby the probe electronic
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 151
components are cryogenetically cooled to approximately 20 K but the sample remains at ambient temperature. Recently, commercial LC-NMR-MS systems involving the use of this cryo-probe technology have become available and when used in an LC-SPE-NMRMS system, the enhancement is increased by about 20-fold and similar enhancements are achieved in 2D-LC systems. Most of these innovations were necessary to increase the sensitivity of NMR detection compared to MS. The importance of MS data in these combined experiments should not be underestimated as in many cases correlation of molecular ion masses with NMR data is sufficient for compound recognition. This can be achieved at low resolution using an ion trap or single quadrupole mass spectrometer or in high-resolution mode for accurate mass determination with a TOF mass detector. In other experiments, the NMR structural data is insufficient, so the full MS or MS-MS spectra are recorded to aid compound identification. The decision to record MS-MS spectra can be made automatically when a certain condition is met, such as the appearance of a particular molecular ion. Such data-dependent scanning is also available in NMR spectrometry in a different form. NMR data systems can be programmed to recognise a certain combination of peaks in an on-flow run and then initiate a stop-flow run in 2D NMR mode for improved sensitivity. IFC-Interchangeable-Flow-Cell Probes One common problem with flow NMR is that after the probe has been used for thousands of samples, the sample cell can become contaminated. One of the first symptoms of a contaminated sample cell is that the quality of solvent suppression becomes degraded and, as has been suggested in Keifer’s review [70], this occurs because the contaminants create a magnetic-susceptibility distortion that degrades the NMR line shape. Efforts to rigorously wash the cell are not always successful, and it is advantageous to allow the user to change the flow-cell. This capability became available with Varian’s Interchangeable Flow-Cell (IFC) probes. The IFC probe design allows users to change the flow-cell as well as to acquire NMR data without the flow-cell in place. In the latter situation, the probe behaves as a micro-probe. The dual-purpose nature of the IFC probes not only allows a user to service their own flow probe, but also allows the probe to be a fully functional high-resolution probe, even if a new clean flowcell is not available. A limitation of DI-NMR is that repetitive injections can eventually clog the NMR flow-cell. With the development of the user-changeable flow-cells this has been addressed and led to the interchangeable flow-cell probe. Larger Probes Probe and Hardware Developments A larger volume (120 µL) flow probe has been optimised for NMR screening applications using 15N-HSQC data and DI-NMR methods [187]. Smaller Probes Micro-coils Micro-coils are NMR receiver coils with diameters ≤ 1mm and wrapped directly onto the capillary that contains the sample. Although micro-coils have been employed for
152 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
over 20 years, their direct application to capillary-scale analytical separations and to high resolution NMR spectroscopy did not occur until the mid-1990s [188-190]. A four-coil flow-through multiplex sample NMR probe was described in 1999, with potential to increase throughput [191]. Further probe development involving small-scale micro-coils has allowed NMR data to be acquired on samples ranging in size from roughly 1 µL down to 1 nL. The primary justification for this development was to enable applications in flow NMR, more specifically to allow NMR data to be acquired with capillary HPLC and capillary electrophoresis applications. Several reports have discussed the applications of smaller-volume ‘micro-coil’ flow probes that have active volumes of about 1-5 µL. One application is an analysis of a selected natural product library to identify known taxane compounds using 5-50 µg of compound dissolved in 3 µL of solvent [192]. A Triple-Resonance (TXI) (1H, 13C, 15N) high resolution capillary probe with 2.5 µL NMR-active sample volume was described in 2002 for applications to mass and volume limited samples and for coupling LC to NMR [193]. Miniaturised NMR applications have been reported with discrete capillary sample tubes using a 1 mm TXI microliter probe designed for a 600 MHz instrument. The limited sensitivity of the NMR detector often requires that on-line LC-NMR is performed in the stop-flow mode and, since NMR and HPLC operate on different time scales, so it could be preferable to collect HPLC peaks first and then analyse them one by one in the NMR probe. Off-line peak storage can be carried out using storage loops, solid-phase extraction cartridges or small LC columns. An alternative way is to use discrete sample tubes for the peak storage by using 1 mm capillary NMR tubes to store HPLC fractions. The NMR capillaries with micro-fractionated LC peaks can easily be collected and efficiently measured with a TXI probe using existing NMR automation and sample-changing routines. This off-line approach offers advantages such as the minimisation of potential dead volume effects caused by long transfer capillaries and by valve switching and fluidic problems. Also, LC peak dilution is kept to an absolute minimum, which is crucial for obtaining a good signal-to-noise ratio with mass-limited samples. LC peaks can be directly fractionated into 1 mm capillaries if fully deuterated eluants are used. Alternatively, if the separation is carried out using protonated solvents, the peaks can be temporarily “stored” in a well plate and the protonated eluant can be exchanged with deuterated solvent before transferring the LC peak into the 1 mm capillary. The fractionation system can be operated under inert conditions with a nitrogen or argon atmosphere to prevent sample decomposition. For some applications, such as extremely air and light sensitive samples, a closed on-line system is advantageous. The probe allows the measurement of 1D 1H NMR or 2D and 3D inverse heteronuclear NMR experiments with high sensitivity, speed and quality using only a few nanomoles (µg) of compound. Advances in Separations Capillary Separations Small scale hyphenated separation techniques, such as Capillary Liquid Chromatography (CLC), Capillary Electrophoresis (CE) and Capillary Electrochromatography (CEC), have revolutionised the ability to separate components in small samples. These techniques can provide faster analysis times, higher separation efficiencies and greater
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 153
pre-concentration abilities starting from less material. Sample amounts as small as picomoles have been analysed using capillary-NMR (CNMR) techniques. Capillary Separations and Micro-coil NMR Probes Miniaturisation of separation techniques is potentially important in hyphenated NMR, particularly for mass-limited samples, such as natural products in drug discovery. Micro-coil probes have a high filling factor as the RF coil is wound directly onto the cell. Coupling capillary LC (C(HP)LC) and CEC or CE to these probes requires less sample and eluant and shorter analysis times [78,194]. A higher sample loading than CENMR is also achievable. Recently Bruker reported a commercial CLC-NMR system using a CNMR probe with a 1.5 µL active NMR volume in automated stop-flow mode to obtain 1H NMR spectra at 600 MHz with detection limits of 5-25 ng per drug metabolite in an extract of urine 4 hours after a 1 g dose of acetaminophen [195]. Undoubtedly, as the newer CNMR techniques become commercialised they will be used at the discovery stage for mass-limited samples. Capillary Isotachochophoresis (CITP) and CE have also been successfully coupled to NMR micro-coils for improved sample concentration and trace impurity analysis. In general, LC performed with columns of less than 0.5 mm diameter is considered CLC. Chromatographic peak volumes for CLC processes range from ~1 µL to ~100 µL [196]. CLC-NMR One of the requirements for CLC-NMR is that the sample needs to be soluble in a volume of approximately 5 µL or less, which is not always possible. Due to the small solvent volumes involved, most CLC-NMR experiments are performed with deuterated mobile phase, which eliminates the need for solvent suppression and allows access to the total chemical shift range. Another advantage of micro-column CLC techniques is an enhanced detection performance since there is a higher concentration of analytes for a given injected amount. To avoid any adverse effects of magnetic field on LC instrumentation, usually a long transfer line is required to couple the capillary column to the NMR probe but this is not a problem with shielded magnets. Miniaturisation of NMR Probes Hyphenated to Capillary Scale Separations The miniaturisation of NMR samples and hyphenation to a variety of capillary-scale analytical separation techniques including LC, CE and CITP has been described [197]. Capillary based LC and micro-coil NMR have compatible flow rates and sample volume requirements. A typical capillary LC on-column flow rate is 5 µL/min or less while the autosampler-injected analyte volume is 0.1 µL or more and accurate flow is achieved through a capillary of 50 µm i.d. The NMR flow-cell has a total volume of 5 µL and the micro-coil observe volume is 1 µL. A typical injected sample amount for CLC-NMR analysis of a small molecular weight compound is a few µg (nmol) or less. In addition CLC-NMR enables steep solvent gradients of water/organic mobile phases as high as 30% per min, to be used. Interfacing CE with nL volume NMR flow-probes has also been described. By electrophoretic concentration of dilute µL-volume samples into higher concentration nLvolume bands, CITP extends the range of feasibility for micro-coil NMR experiments. Apart from the concentration factor of up to two orders of magnitude that CITP offers, another is the ability to separate charged components in mixtures.
154 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
The C(HP)LC-NMR separation of an extract of black beetles was carried out in which one hundred black beetles were extracted with methanol [198]. Continuous or onflow measurements showed the presence of two major compounds which were later assigned by stop-flow measurements to be 2-ethylbenzo-1,4-quinone and 2-ethyl-3methoxy-benzo-1,4-quinone. CE-MS CE is one of the most efficient separation techniques in liquid phase. The on-line coupling of CE-MS allows simultaneous exploitation of the high separation efficiency of CE and the identification capability of mass detection. The main drawback, which has limited the applications of CE-MS, is the use of very low sample volumes (nL range) and, as a consequence, the high detection limits in terms of concentration. Efforts to improve the sensitivity have included the use of different mass detectors such as TOF spectrometers and Matrix Assisted Laser Desorption Ionisation (MALDI) coupled to TOF (MALDI-TOF) and narrower capillaries or on-line preconcentration procedures. CE-MS has been used for the identification of a number of non-steroidal antiinflammatory drugs and their metabolites in SPE of human urine [199]. A variety of MS techniques were used including tandem mass spectrometry (MS-MS) to profile the extracts and negative ion ESI MS for their detection. Low cone voltages produced rapidly detectable [M-H]- ions while higher voltages allowed the production of a range of diagnostic fragments via in-source Collision Induced Dissociation (CID). A solvent compensation method based on FIA was used recently to obtain high quality NMR spectra with a solvent gradient. A binary solvent system containing D2O and CD 3OD with a 10% methanol per min solvent composition gradient was used with some observation of the NMR line broadening and chemical shift changes noted [81]. Creating a second equal but reverse gradient and combining the two solvent gradients before the NMR detector kept the composition of solvent reaching the NMR flow-cell constant. Using this approach, methods can be developed to generate high quality NMR spectra during on-flow gradient LC-NMR experiments. The ultimate applicability of this approach depends on the ability to compensate for the disturbance of the solvent gradient and reverse gradient by a pair of LC columns. Proposed Hyphenation The ultimate in multi-hyphenated spectroscopic techniques for natural product profiling would be LC-UV(DAD)-NMR-ELSD-MS-CD-FTIR coupled on-line to a bioassay, Fig. (5), using a cryogenic probe for increased sensitivity. The hyphenated operation begins after off-line LC optimisation of the separation with a consideration of suitable buffers in the mobile phase, which will not interfere with the individual spectroscopic techniques. The LC is coupled to a DAD to detect chromophores, but as an option a CLND can be used to selectively identify nitrogencontaining compounds. Separated peaks enter the cryogenic probe in the NMR using firstly the on-flow method for an initial profile run. In this initial run an overview of the analyte can be obtained as well as initial MS data. This also allows any necessary finetuning of the mobile phase. Here, the proportions need to be split to send approximately 90% to the NMR, 5% to the MS and the other 5% to an ELSD. The ELSD, being a
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 155
Fig. (5). Schematic of proposed on-line hyphenated spectroscopic techniques coupled to on-line bioassay. Different options are available for the proposed hyphenation, such as: [1] Chemiluminescence Nitrogen Detector (CLND); [2] cryogenic probe; [3] Electrospray Ionisation (ESI) or Atmospheric Pressure Ionisation (APCI) mass spectrometry.
universal and quantitative detector, will be able to identify additional components not observed by the DAD or CLND. The second run would adopt the use of the stop-flow method using a multiple column-trapping method, followed by transfer to storage loops to obtain 1D and 2D NMR data. The separated compounds can be stored, if desired, in the loops until all analyses are completed. Again, a splitter is used in the stop-flow method but with a 95% to 5% split to the NMR and MS only or as previously mentioned, if using the ELSD. Once NMR acquisition is completed for a compound, then the sample is transferred out of the NMR flow-probe. At this point, the eluting compound enters another on-line splitting valve sending approximately 90% to the FTIR for functional group confirmation and 10% to the CD, enabling relative and/or
156 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
stereochemical determinations. Finally, the compound is recovered from the FTIR via a fraction collector and, by robotic means, is transferred to a master microtitre plate. Once all the desired compounds are collected and transferred, the microtiter plate is evaporated to dryness in a vacuum chamber or by using a stream of nitrogen. From these, daughter plates are made and put through HTS or other bioassays for evaluation. While this may be efficient in terms of run time, sample size and data correction, an obvious disadvantage of such an extended hyphenated system is that it may be very inefficient in the utilisation of expensive spectrometers. It is not clear whether the benefits of such complex hyphenated systems outweigh the costs of assembling such an array of instruments. A single spectroscopic technique or any combination of hyphenated techniques will not always provide an unequivocal solution. This applies to both the dereplication and the characterisation of novel natural products. In addition, no single or multiple hyphenated spectroscopic approach will be able to completely study all constituents in a natural product extract, as this process is specific for the selectivity and sensitivity of the detector. In all dereplication strategies (hyphenated or non-hyphenated), a requirement is that specific databases need to be consulted. Other considerations, such as capillary separation methods, 2D-LC, SPE, 13C NMR and database extensions could be applied to hyphenated spectroscopic techniques to the dereplication process in the future. Future Hyphenations The possibilities for adding extra detectors to an LC-NMR-MS system include (HP)LC-UV-NMR-MS-FTIR, which has been successfully attempted [43]. LC-MS has been linked simultaneously to two MS detectors, such as ICP-MS and TOF-MS, to give an hyphenated system LC-UV(DAD)-ICP-MS-TOF-MS for the analysis of drug metabolites in rat urine [200]. Work is in progress to create LC-NMR-MS systems using ICP-MS alone and with the dual detectors ICP-MS-TOF-MS. ICP-MS is useful for the detection of specific atoms such as bromine, arsenic and selenium. Further additions that have been discussed include CD and fluorescence detectors as well as performing different separation technologies, such as SFC, CE or CEC, linked to multiple detectors. Capillary LC-NMR, CE-NMR and CEC-NMR will be increasingly employed since their operation at low flow rates will enable the use of fully deuterated solvents. High magnetic field strengths with cryogenically cooled NMR probes and preamplifiers provide unsurpassed NMR sensitivity. In 2003, the highest field used for LC-NMR-MS was 800 MHz [201] and LC-NMR has yet to be reported at 900 MHz. An alternative for increasing sensitivity on existing LC-NMR systems is the on-line SPE or 2D-LC add on, particularly for routine field strengths of 400-600 MHz. In the integrated LC-SPE-NMR or the 2D-LC-NMR systems the analytes are detected post-column by UV or MS and automatically trapped on the SPE cartridges in 96 well plate format or by trapping on a second miniature LC column in the case of 2D-LC-NMR. Multiple trapping of a given peak on a given SPE cartridge or a given second miniature LC column is possible. For LC-SPE-NMR the cartridge is dried under nitrogen and the contents are finally eluted from the cartridge into the NMR flow-probe. These are more
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 157
economical procedures as non-deuterated solvents and a wider choice of HPLC buffers are now permissible for the chromatography with only the final transfer volume of 200500 µL using deuterated solvent. Exchangeable protons can be observed in the NMR spectrum, which can aid assignment, and, since the probe only receives the same deuterated solvent, shimming is trivial and solvent suppression is mostly not required. There is an automated wash and dry procedure so that cross-contamination is avoided. The signal-to-noise ratio is improved up to 6.8-fold on triple trapping compared with the conventional loop collection mode for a 60 µL probe at 500 MHz and the procedure is fully automated. As stated earlier, the cryogenic cooling of the RF coils and electronics to give greatly enhanced sensitivity is the most significant recent advance in NMR spectroscopy [186,202]. This is of critical importance in drug discovery from natural products with often mass-limited constituents present. A recent application of a cryogenic probe to the identification of acetaminophen metabolites in human urine using LC-NMR-MS, reported three metabolites previously undetected at 500 MHz in a 15-minute on-flow experiment [186]. MS data combined with stop-flow spectra for greater signal-to-noise ratios enabled unambiguous assignment of these compounds. LC-NMR-MS in automated loop-storage mode clearly provides a versatile analytical platform for complex mixture analysis such as natural product extracts. The most demanding structure determination problems require 2D and 13C NMR data and the continuing pace of SPE, 2D-LC and cryogenic probe developments will no doubt drive these applications forward. In the future, automated MS-directed LC-SPE-NMR-MS or 2D-LC-NMR-MS systems equipped with cryogenic probes should provide the state-ofthe-art in NMR sensitivity for natural product drug discovery. (HP)LC-13C NMR Two approaches under development in order to achieve LC- 13C NMR include the use of immobilised free radicals to reduce the apparent T 1 relaxation times and the other is to use the benefit of electron-13C magnetisation transfer by employing dynamic nuclear polarisation [90]. A modification of the design of the flow-cell could lead to a further increase in sensitivity wherein the addition of a proton RF coil for selective excitation of the protons before entering the 13C flow-cell enables the full build-up of the nuclear Overhauser effect (nOe) in a continuous or on-flow system. The combination of this NMR flow-cell design, together with the use of immobilised paramagnetic species, could lead to the realisation of continuous-flow 13C NMR spectra of separated species. Databases The chemical screening of extracts by various hyphenated spectroscopic techniques generates a large amount of information. In order to rationalise and use this approach efficiently with a high sample throughput, the challenge is to find a means to centralise all the data necessary for pattern recognition with reference to a library of standard compounds. In particular, LC-UV and LC-MS-MS or LC-MS n databases have to be built up and LC-NMR analysis should be used as a complement for studying, in more detail, compounds that are not found in these databases. Integration of high resolution on-line LC-bioassays is crucial in this strategy to permit an efficient localisation of the bioactive
158 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
compounds in the crude extracts and allow their subsequent on-line, full or partial, identification. CONCLUSION Natural products will continue to be very important lead compounds. LC-hyphenated techniques play an increasingly important role as a strategic tool to support natural product investigations, while the last two decades have seen the development of sensitive hyphenated spectroscopic techniques for the analysis of natural product mixtures. Hyphenated techniques provide a great deal of preliminary information about the content and nature of the constituents of crude extracts, which is critical when large numbers of samples need to be processed and the isolation of known compounds is to be avoided. LC-MS analysis of crude extracts is not straightforward due to the great variety of constituents. No MS interface allows an optimum ionisation of all the metabolites within a single crude extract, since the response is compound dependent. Often different ionisation modes or different interfaces are necessary to obtain a complete understanding of the extract composition. Artefacts can also be formed within the LC-MS interface and may lead to incorrect molecular weight assignments. The combination of LC-UV and LC-MS information can be very helpful in the first step of dereplication, especially when this information is combined with taxonomic classifications for cross reference against natural product databases. LC-NMR allows for the recording of complementary on-line structure information, which is particularly important in situations when LC-UV-MS data are insufficient for unambiguous peak identification. Further hyphenation of other spectroscopic techniques, such as IR spectroscopy, for analyte identification has been demonstrated. In addition, other types of MS detection techniques will be increasingly used in (HP)LC-NMR-MS, such as TOF and ICR-MS, which allow accurate masses and, hence, empirical molecular formulae to be determined. Another development is the coupling of HPLC to inductively coupled plasma MS (ICP-MS) to provide a specific atom detector. The future trend to hyphenate multiple techniques, such as LC with NMR, MS and even IR, is expected to occur on the capillary scale. The ability to analyse additional nuclei, such as 13C, 31P and 15N, is expected to be implemented in the future to further increase the analytical utility of the hyphenated spectroscopic techniques currently available. The development of probes with multiple coils connected in parallel may result in the simultaneous acquisition of NMR data from several samples. With further improvement in sensitivity, on-line C(HP)LC-NMR in conjunction with C(HP)LC-MS and, further to this, the double hyphenation of C(HP)LC-NMR-MS are likely to be used as the major tools for the analysis of not only natural products but also to problems encountered in genomics, proteomics and drug metabolomics. ACKNOWLEDGEMENTS The authors gratefully acknowledge Dr R.A. Tinker for assistance in the preparation of this manuscript.
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 159
ABBREVIATIONS NCI
=
National Cancer Institute
USA
=
United States of America
FDA
=
Food Drug Administration
AIDS
=
Acquired Immune Deficiency Syndrome
HIV
=
Human Immunodeficiency Virus
DNA
=
Deoxyribonucleic Acid
SAR
=
Structure-Activity Relationships
PKC
=
Protein Kinase C inhibitors
HTS
=
High-Throughput Screening
UV
=
Ultra Violet Spectroscopy
UV-Vis
=
Ultra Violet Spectroscopy and Visible spectroscopy
CD
=
Circular Dichroism
ORD
=
Optical Rotation Dispersion
NMR
=
Nuclear Magnetic Resonance spectroscopy
MS
=
Mass Spectrometry
IR
=
Infra-Red spectrometry
FT
=
Fourier Transformation
FTIR
=
Fourier-Transform Infra-Red
GC
=
Gas Chromatography
SFC
=
Supercritical Fluid Chromatography
SEC
=
Size-Exclusion Chromatography
HILIC
=
Hydrophilic Interaction Chromatography
IEC
=
Ion-Exchange Chromatography
TLC
=
Thin Layer Chromatography
MTT
=
Methyl Thiazoyl Tetrazolium chloride
HP(LC)
=
High Pressure Liquid Chromatography
RPLC
=
Reversed-Phase Liquid Chromatography
NPLC
=
Normal Phase Liquid Chromatography
ODS
=
Octa-Decyl Silane
HSCCC
=
High-Speed Counter Current Chromatography
160 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Urban and Separovic
AED
=
Atomic Emission Detection
SPE
=
Solid-Phase-Extraction
SPME
=
Solid-Phase-Matrix-Extraction
MSPD
=
Matrix Solid-Phase Dispersion
PDA
=
Photo-Diode Array detection
DAD
=
Diode Array Detection
ELSD
=
Evaporative Light Scattering Detector
CLND
=
Chemiluminescence Nitrogen Detector
RI
=
Refractive Index
TSP
=
Thermospray
PB
=
Particle Beam
HN
=
Heated Nebulizer
APPI
=
Atmospheric Pressure Photo-Ionisation
ESI MS
=
Electrospray Ionisation Mass Spectrometry
API
=
Atmospheric-Pressure Ionisation
APCI
=
Atmospheric-Pressure Chemical Ionisation
TOF
=
Time-Of-Flight mass spectrometry
MALDI
=
Matrix-Assisted Laser Desorption Ionisation
CID
=
Collision Induced Dissociation
ESCI
=
Electrospray and Chemical Ionisation spectrometry
ICP
=
Inductively Coupled Plasma
ICR
=
Ion-Cyclotron Resonance
CITP
=
Capillary Isotachophoresis
FTMS
=
Fourier Transform ion cyclotron Mass Spectrometer
DI
=
Direct Injection
FIA
=
Flow Injection Analysis
ACD
=
Advanced Chemistry Development
WET
=
Water Suppression Enhanced through T1 effects
VAST
=
Versatile Automated Sample Transport sample changer
BEST
=
Bruker Efficient Sample Transfer
BNMI
=
Bruker-NMR-Mass spectrometry Interface
Hyphenated Spectroscopic Methods
Frontiers in Drug Design & Discovery, 2005, Vol. 1 161
nOe
=
nuclear Overhauser effect
COSY
=
Correlation Spectroscopy
DOSY
=
Diffusion-Ordered Spectroscopy
TOCSY
=
Total Correlation Spectroscopy
NOESY
=
nuclear Overhauser effect Spectroscopy
ROESY
=
Rotational nuclear Overhauser Effect Spectroscopy
HSQC
=
Heteronuclear Single Quantum Coherence Spectroscopy
HMBC
=
Heteronuclear Multiple Bond Coherence Spectroscopy
PFG
=
Pulsed-Field Gradient
SLP
=
Shifted Laminar Pulse
PEEK
=
Poly Ether Ketone
RF
=
Radio Frequency
MSPD
=
Matrix Solid-Phase Dispersion
TFA
=
Tri-Fluoro Acetic Acid
MTPA
=
Methoxy-(Trifluoromethyl)-2-Phenyl-Acetic acid
C(HP)LC
=
Capillary Liquid Chromatography
CNMR
=
Capillary NMR
CEC
=
Capillary Electro-Chromatography
CE
=
Capillary Electrophoresis
CITP
=
Capillary Isotachophoresis
TXI
=
Triple-resonance high resolution NMR capillary probe
IFC
=
Interchangeable Flow-Cell probes
H2O
=
Water
D2O
=
Deuterium oxide
CH3OH
=
Methanol
CD3OD
=
Deuterated methanol
CH3CN
=
Acetonitrile
H/D
=
Hydrogen/Deuterium exchange
2D
=
Two-dimensional
REFERENCES [1] [2]
Cragg, G. M.; Newmann, D. J.; Snader, K. M. J. Nat. Prod. 1999, 60, 52-60. Newmann, D. J.; Cragg, G. M.; Snader, K. M. J. Nat. Prod. 2003, 66, 1022-1037.
162 Frontiers in Drug Design & Discovery, 2005, Vol. 1 [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48]
Urban and Separovic
Colwell, R. R. Biotech. Adv. 2002, 20, 215-228. Soejarto, D. D.; Farnsworth, N. R. Perspectives Biol. Med. 1989, 32, 244-256. Tyler, V. E.; Brady, L. R.; Robbers, J. E. Pharmacognosy, (Ninth Edition), Lea & Febiger, Philadelphia 1988. Clark, A. M. Pharmaceutical Research 1996, 13, 1133-1144. Hostettmann, K.; Marston, A. Phytochemistry Reviews 2003, 1, 275-285. Hostettmann, K.; Potterat, O.; Wolfender, J. -L. Chimia 1998, 52, 10-17. Hostettmann, K.; Terreaux, C. Chimia 2000, 54, 652-657. Woerdenbag, H. J.; Pra, N.; Wallaart, W. van Uden.; Beekman, A. C.; Lugts, C. B. Pharm. World. Sci. 1994, 16, 169-180. Wolfender, J. -L.; Terreaux, C.; Hostettmann, K. Pharmaceutical Biology 2000, 38, 41-54. Newmann, D. J.; Cragg, G. M.; Snader, K. M. Nat. Prod. Rep. 2000, 17, 215-234. CRC Handbook of Antibiotic Compounds; Berdy, J. Ed.; CRC Press Inc: Boca Raton, Florida 1980. Urban, S.; Hickford, S. J.; Blunt, J. W.; Munro, M. H. G. Current Organic Chemistry. 2000, 4, 765807. Rayl, A. J. S. The Scientist 1999, 13, 1. Haefner, B. Drug Discovery Today 2003, 8, 536-544. Jones, R. M.; Bulaj, G. Curr. Pharm. Des. 2000, 6, 1249-1285. McIntosh, J. M.; Jones, R. M. Toxicon 2001, 39, 1447-1451. Hale, K. J.; Hummersone, M. G.; Manaviazar, S.; Frigerio, M. Nat. Prod. Rep. 2002, 19, 413-453. Clamp, A.; Jayson, G. C. Anticancer Drugs 2002, 13, 673-683. Urban, S.; Blunt, J. W.; Munro, M. H. G. J. Nat. Prod. 2002, 65, 1371-1373. Munro, M. H. G.; Blunt, J. W.; Urban, S.; Gravalos, D.; “Antitumour Carbazoles and Coproverdine Isolation” PCT Int. Appl. (W00248107), 2002. Hostettmann, K. ‘Tout savoir sur le pouvoir des plantes, sources de médicaments’ Favre SA, Lausanne 1997. Alvi, K. A. J. Liquid Chromatogr. & Related Tech. 2001, 24, 1765-1773. Cardellina, J. H.; Munro, M. H. G.; Fuller, R. W.; Manfredi, K. P.; McKee, T. C.; Tischler, M.; Bokesch, H. R.; Gustafson, K. R.; Beutler, J. A.; Boyd, M. R. J. Nat. Prod. 1993, 56, 1123-1129. Tan, G. T.; Pezzuto, J. M.; Kinghorn, A. D.; Hughes, S. H. J. Nat. Prod. 1991, 54, 143-154. Wall, M. E.; Taylor, M.; Ambrosio, L.; Davis, K. J. Pharm. Sci. 1969, 58, 839-841. Brabley, S.; Hammann, P.; Kluge, H.; Wink, J.; Kricke, P.; Zeeck, A. J. Antibiot. 1991, 44, 797-800. Noltemeyer, M.; Sheldrick, G. M.; Hoppe, M-V.; Zeeck, A. J. Antibiot. 1982, 35, 549-555. Munro, M. H. G.; Marine Chemistry Group, University of Canterbury, Christchurch, New Zealand 2002 (private communication). Hostettmann, K.; Wolfender, J -L.; Terreaux, C. Pharmaceutical Biology 2001, 39, 18-32. Urban, S.; Leone, P. D. A.; Carroll, A. R.; Fechner, G. A.; Smith, J.; Hooper, J. N. A.; Quinn, R. J. J. Org. Chem. 1999, 64, 731-735. Harwood, D. T.; Urban, S.; Blunt, J. W.; Munro, M. H. G. Natural Prod. Research 2003,17, 15-19. Urban, S.; Butler, M. S.; Capon, R. J. Aust. J. Chem. 1994, 47, 1919-1924. Urban, S.; Hobbs, L.; Hooper, J. N. A.; Capon, R. J. Aust. J. Chem. 1995, 48, 1491-1494. Urban, S.; Capon, R. J. Aust. J. Chem. 1996, 49, 711-713. Bobzin, S. C.; Yang, S.; Kasten, T. P. J. Chromatogr. B Biomed Sci. Appl. 2000, 748, 259-267. Wolfender, J. -L.; Rodriguez, S.; Hostettmann, K.; Wagner-Redeker, W. J. Mass Spectrom and Rapid Commun. Mass Spectrom. 1995, S35-S46. Shigematsu, N. J. Mass Spectrom. Soc. Jpn, 1997 45, 295-300. Pannell, L. K.; Shigematsu, N. Am. Lab. 1998, 30, 28, 30. Hyphenation: Hype and Fascination; Brinkman, U. A. Th. (Ed); Elsevier, Amsterdam 1999. Ludlow, M.; Louden, D.; Handley, A.; Taylor, S.; Wright, B.; Wilson, I. D. J. Chromatogr. A 1999, 857, 89-96. Louden, D.; Handley, A.; Taylor, S.; Lenz, E.; Miller, S.; Wilson, I. D.; Sage A.; Lafont R. J. Chromatogr. 2001, 910, 237-246. Hostettmann, K.; Domon, B.; Schaufelberger, D.; Hostettmann, M. J. Chromatogr. 1984, 283, 137147. Wolfender, J. -L.; Hostettmann, K. J. Chromatogr. 1993, 647, 191-202. Ducrey, B.; Wolfender, J. -L.; Marston, A.; Hostettmann, K. Phytochemistry 1995, 38, 129-137. Somsen, G. W.; Gooijer, C.; Brinkmann, U. A. Th. J. Chromatogr. A 1999, 856, 213-242. Willoughby, R.; Sheehan, E.; Mitrovich, S. A. Global View of LC/MS, (Second Edition) Global View Publishing, Pittsburgh 2002.
Hyphenated Spectroscopic Methods [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 163
Liquid Chromatography-Mass Spectrometry: An Introduction; Ardrey, R. E.; John Wiley & Sons, Inc., Chichester and New York 2003. Liquid Chromatography-Mass Spectrometry (2nd Edition); Niessen W M A.; Dekker, New York 1999. Perret, C.; Wolfender, J. -L.; Hostettmann, K. Phytochem. Anal. 1999, 10, 272-278. Martinet, A.; Ndjoko, K.; Terreaux, C.; Marston, A.; Hostettmann, K.; Schutz, Y. Phytochem. Anal. 2001, 12, 48-52. Hostettmann, K.; Wolfender, J.-L.; Bioactive Compounds from Natural Sources 2001, 31-68. Wolfender, J. -L.; Maillard, M.; Hostettmann, K. Phytochem. Anal. 1994, 5, 153-182. Wolfender, J.-L.; Rodriguez, S.; Hostettmann, K. J. Chromatogr. A 1998, 794, 299-316. Strege, M. A. Anal. Chem. 1998, 70, 2439-2445. Gu, Z. M.; Zhou, D. W.; Wu, J.; Shi, G.; Zeng, L.; McLaughlin, J. L. J. Nat. Prod. 1997, 60, 242-248. Wolfender, J. -L.; Waridel, K.; Ndjoko, K.; Hobby, K. R.; Major, H. L.; Hostettmann, K. Analysis 2000, 28, 895-906. Perri, E.; Mazzotti, F.; Raffaelli, A.; Sindona, G. J. Mass Spectrom. 2000, 35, 1360-1361. Sandvoss, M.; Weltring, A.; Preiss, A.; Levsen, K.; Wuensch, G. J. Chromatogr. A 2001, 917, 75-86. Niessen, W. M. A. J. Chromatogr. A 2003, 1000, 413-436. Sigel, M. M.; Tabei, K.; Huang, J.; Balogh, M. P.; Jackson, M. R.; Proceedings of the 50th ASMS Conference on Mass Spectrometry and Allied Topics, June 2-62002, Orlando, FL, 2002. Gallagher, R. T.; Balogh, M. P.; Davey, P.; Jackson, M. R.; Southern, L. J. Anal. Chem. 2003, 75, 973977. Chu, Y. -H.; Kirby, D. P.; Karger, B. L. J. Am. Chem. Soc. 1995, 117, 5419-5420. Nedved, M. L.; Habibi-Goudarzi, S.; Ganem, J. D.; Henion, J. D. Anal. Chem. 1996, 68, 4228-4236. Hogenboom, A. C.; de Boer, A. R.; Derks, R. J. E.; Irth H. Anal. Chem. 2001, 73, 3816-3823. Jia, Qi.; Hong, Mei-Feng. (Unigen Pharmaceuticals, Inc., USA). PCT Int. Appl. 2003. Stockman, Brian, J. Current Opinion in Drug Discovery & Development 2000, 3, 269-274. Keifer, Paul, A. Integrated Drug Discovery Technologies 2002, 485-541. Keifer, Paul, A. Progress in Drug Research 2000, 55, 137-211. Keifer Paul, A. Current Opinion in Chemical Biology 2003, 7, 388-394. Wolfender, J -Luc.; Ndjoko, K.; Hostettmann, K. Current Organic Chemistry 1998, 2, 575-596. Bruker Instruments, AMIX software package http://www.bruker.com/. Keifer, P. A. Mag. Reson. Chem. 2003, 41, 509-516. Keifer, P. A. Drugs Future 1998, 23, 301-317. Keifer, P. A. Drug Discov. Today 1997, 2, 468-478. Albert, K. J. Chromatogr. A 1999, 856, 199-211. Gfrorer, P.; Schewitz, J.; Pusecker, K.; Bayer, E. Anal. Chem. 1999, 71, 315A-321A. Hicks, R. P. Current Medicinal Chemistry 2001, 8, 627-650. Smallcombe, S. H.; Patt, S. L.; Keifer, P. A. Journal of Magnetic Resonance, Series A 1995, 117, 295303. Jayawickrama, D. A.; Wolters, A. M.; Sweedler, J. V. Analyst 2003, 128, 421-426. Laude, D. A.; Wilkins, C. L. Anal. Chem. 1987, 59, 546-551. Wolfender, J -Luc.; Ndjoko, K.; Hostettmann, K. Phytochemical Analysis 2001, 12, 2-22. Behnke, B.; Schlotterbeck, G.; Tallarek, U.; Strohschein, S.; Tseng L –H.; Keller, T.; Albert, K.; Bayer, E. Anal. Chem. 1996, 68, 1110-1115. Elipe, S.; Victoria, M. Analytica Chimica Acta 2003, 497, 1-25 Smallcombe, S. H.; Patt, S. L.; Keifer, P. A. J. Magn. Reson. A 1995, 117, 295-303. Wilson, I. D.; Brinkman, U. A. Th. J. Chromatogr. A 2003, 1000, 325-356. Wolfender, J -Luc.; Ndjoko, K.; Hostettmann, K.; Methods in Polyphenol Analysis 2003, 128-156. Two-Dimensional NMR Spectroscopy. Applications for Chemists and Biochemists (Second Edition); Croasmun, W. R.; Carlson R, M. K.; New York, VCH, 1994. On-Line LC-NMR and Related Techniques.; Albert, Klaus (Ed); Publisher: John Wiley & Sons Ltd, Chichester, UK, 2002. Hoodge, P.; Monvisade, P.; Morris, G. A.; Preece, I. Chem. Commun. 2001, 1, 239-240. Dear, G. J.; Plumb, R. S.; Sweatman, B. C.; Parry, P. S.; Roberts, A. D.; Lindon, J. C.; Nicholson, J. K.; Ismail, I. M. J. Chromatogr. B Biomed Sci. Appl. 2000, 748, 295-309. Ndjoko, K.; Wolfender J.-L.; Röder, E.; Hostettmann, K. Planta Med. 1999, 65, 562-566. Garo, E.; Wolfender J.-L.; Hostettmann, K.; Hiller, W.; Antus, S.; Mavi, S. Helv. Chim. Acta 1998, 81, 754-763. Stevenson, S.; Dorn, H. C. Anal. Chem. 1994, 66, 2993-2999. Griffiths, L.; Horton, R. Magn. Reson. Chem. 1998, 36, 104-109.
164 Frontiers in Drug Design & Discovery, 2005, Vol. 1 [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135]
Urban and Separovic
de Koning, J. A.; Hogenboom, A. C.; Lacker, T.; Strohschein, S.; Albert, K.; Brinkmann U, A. T. J. Chromatogr. A 1998, 813, 55-61. Nyberg, N. T.; Baumann, H.; Kenne, L. Magn. Reson. Chem. 2001, 39, 236-240. Nyberg, N. T.; Baumann, H.; Kenne, L. Anal. Chem. 2003, 75, 268-274. Down, S. Spectroscopy Europe 2004, 16, 8, 10, 12, 14. Yokoyama, Y.; Kishi, N.; Tanaka, M.; Asakawa, N. Analytical sciences 2000, 76, 1183-1188. Vogler, B.; Spring, O.; Recent Research Developments in Phytochemistry 2000, 4, 207-222. Hostettmann, K.; Hostettmann, M.; Rodriguez, S.; Wolfender, J.-L. Biology-Chemistry Interface 1999, 65-101. Bobzin, S., C.; Yang, S.; Kasten, T., P. Journal of Industrial Microbiology & Biotechnology 2000, 25, 342-345. Johnson, S.; Morgan, E. D.; Wilson, I. D.; Spraul, M.; Hofmann, M. J. Chem. Soc. Perkin Trans. 1. 1994, 1, 1499-1502. Albert, K.; Schlotterbeck, G.; Braumann, U.; Händel, M.; Spraul, M.; Krack, G. Angew. Chem. Int. Ed. Engl. 1995, 34, 1014-1016. Spring, O.; Buschmann, H.; Vogler, B.; Schilling, E. E.; Spraul, M.; Hofmann, M. Phytochemistry 1995, 39, 609-612. Garo, E.; Wolfender, J -Luc.; Hostettmann, K.; Hiller, W.; Antus, S.; Mavi, S. Helvetica Chimica Acta 1998, 81, 754-763. Rodriguez, S.; Wolfender J.-L.; Hostettmann, K.; Stoeckli-Evans, H.; Gupta, M. P. Helv. Chim. Acta 1998, 81, 1393-1403. Ioset, J. R.; Wolfender J.-L.; Marston, A.; Gupta, M. P.; Hostettmann, K. Phytochem. Anal. 1999, 10, 137-142. Bringmann, G.; Ruckert, M.; Saeb, W.; Mudogo, V. Magn. Reson. Chem. 1999, 37, 8-102. Bringmann, G.; Ochse, M.; Herderich, M.; Guenther, C.; Wolf, K.; Teltschik, F.; Rueckert, M. Pharmaceutical and Pharmacological Letters 1998, 8, 1-4. Spring, O.; Heil, N.; Vogler, B. Phytochemistry 1997, 46, 1369-1373. Vogler, B.; Klaiber, I.; Roos, G.; Walter, C., U.; Hiller, W.; Sandor, P.; Kraus, W. J. Nat. Prod. 1998, 61, 175-178. Spring, O.; Zipper, R.; Reeb, S.; Vogler, B.; Da Costa, F.B. Phytochemistry 2001, 57, 267-272. Spring, O.; Zipper, R.; Klaiber, I.; Reeb, S.; Vogler, B. Phytochemistry 2000, 55, 255-261. Hölscher, D.; Schneider, B. Phytochemistry 1998, 50, 155-161. Schneider, B.; Zhao, Y.; Blitzke, T.; Schmitt, B.; Nookandeh, A.; Sun, X.; Stöckigt, J. Phytochem. Anal. 1998, 9, 237-244. Zhao, Y.; Nookandeh, A.; Schneider, B.; Sun, X.; Schmitt, B.; Stockigt, J. J. Chromatogr. A 1999, 837, 83-91. Hostettmann, K.; Wolfender J.-L. Stud. Plant. Sci. 1999, 6, 233-260. Gavidia, I.; Seitz, H., U.; Perez-Bermudez, P.; Vogler, B. Phytochemical Analysis 2002, 13, 266-271. Höltzel, A.; Schlotterbeck, G.; Albert, K.; Bayer, E. Chromatographia 1996, 42, 499-505. Hostettmann, K.; Wolfender, J., L.; Rodriguez, S. Planta Medica 1997, 63, 2-10. Wolfender, J -Luc.; Rodriguez, S.; Hostettmann, K.; Hiller, W. Phytochemical Analysis 1997, 8, 97104. Cavin, A.; Potterat, O.; Wolfender J.-L.; Hostettmann, K.; Dyatmyko, W. J. Nat. Prod. 1998, 61, 1497-1501. Strohschein, S.; Rentel, C.; Lacker, T.; Bayer, E.; Albert, K. Anal. Chem. 1999, 71, 1780-1785. Hostettmann, K.; Wolfender, J –Luc. Pesticide Science 1997, 51, 471-482. Hostettmann, K.; Potterat, O.; Wolfender, J.-L. Pharmazeutische Industrie 1997, 59, 339-347. Ferrari, J.; Terreaux, C.; Wolfender J.-L.; Hostettmann, K. Chimica 2000, 54, 652-657. Schaller, F.; Wolfender, J -Luc.; Hostettmann, K.; Mavi, S. Helvetica Chimica Acta 2001, 84, 222229. Bringmann, G.; Günther, C.; Schlauer, J.; Rückert, M. Anal. Chem. 1998, 70, 2805-2811. Bringmann, G.; Rückert, M.; Messer, K.; Schupp, O.; Louis, A. M. J. Chromatogr. A 1999, 837, 267272. Bringmann, G.; Messer, K.; Wohlfarth, M.; Kraus, J.; Dumbuya, K.; Rueckert, M. Anal. Chem. 1999, 71, 2678-2686. Bringmann, G.; Wohlfarth, M.; Rischer, H.; Heubes, M.; Saeb, W.; Diem, S.; Herderich, M.; Schlauer, J. Anal. Chemistry 2001, 73, 2571-2577. Sanvoss, M.; In On-Line LC-NMR and Related Techniques. Albert, K (Ed). 2002, 111-128.
Hyphenated Spectroscopic Methods [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159]
[160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 165
Vogler, B.; Klaiber, I.; Roos, G.; Walter, C. U.; Hiller, W.; Sandor, P.; Krauss, W. J. Nat. Prod. 1998, 61, 175-178. Renukappa, T.; Roos, G.; Klaiber, I.; Vogler, B.; Krauss, W. J. Chromatogr. A 1999, 847, 109-116. Wilson, I. D.; Morgan, E. D.; Lafont, R.; Wright, B. J. Chromatogr. A 1998, 799, 333-336. Kleinwachter, P.; Luhmann, U.; Schlegel, B.; Heinze, S.; Hartl, A.; Kiet, T.; Tam, G. Journal of Basic Microbiology 1999, 39, 345-349. Dachtler, M.; Glaser, T.; Kohler, K.; Albert, K. Anal. Chem. 2001, 73, 667-674. Vilegas, W.; Vilegas, J. K.; Dachtler, M.; Glaser, T.; Albert, K.; Phytochem. Anal. 2000, 11, 317-321. Santos, L. C.; Dachtler, M.; Andrade F, D. P.; Albert, K.; Vilegas, W. F.; Fresenius, J. Anal. Chem. 2000, 368, 540-542. Pusecker, K.; Albert, K.; Bayer, E. J. Chromatogr. A 1999, 836, 245-252. Stintzing, F. C.; Conrad, J.; Klaiber, I.; Beifuss, U.; Carle, R. Phytochemistry 2004, 65, 415-422. Abel, C. B.; Lindon, J. C.; Noble, D.; Rudd, B. A.; Sidebottom, P. J.; Nicholson, J. K. Anal. Biochem. 1999, 270, 220-230. Kleinwächter, P.; Martin, K.; Groth, I.; Dornberger, K. J. High Resol. Chromatogr. 2000, 23, 609-612. Sandvoss, M.; Pham, L. H.; Levsen, K.; Preiss, A.; Mugge, C.; Wunsch, G. European Journal of Organic Chemistry 2000, 1253-1262. Sandvoss, M.; Weltring, A.; Preiss, A.; Levsen, K.; Wuensch, G. J. Chromatogr. A 2001, 917, 75-86. Bäcker, A. E.; Thorbet, S.; Rakotonirainy, O.; Hallberg, E. C.; Olling, A.; Gustavsson, M.; Samuelsson, B. E.; Soussi, B.; Glycoconj. J. 1999, 16, 45-58. Bringmann, G.; Wohlfarth, M.; Heubes, M. J. Chromatogr. A 2000, 904, 243-249. Schmitt, B.; Schneider, B. Phytochemistry 1999, 52, 45-53. Schmitt, B.; Schneider, B. Phytochem. Anal. 2001, 12, 43-47. Circular Dichroism.; Nakanishi, K.; Berova, N.; Woody R W, VCH Publishers, New York 1994. Rassmussen, H. B.; Christensen, S. B.; Kvist, L. P.; Kharazmi, A.; Huansi, A. G. J Nat. Prod. 2000, 63, 1295-1296. Trost, B. M.; Belletire, J. L.; Godleski, S.; McDougal, M. C.; Balkovec, J. M. J. Org. Chem. 1986, 2370-2374. Queiroz, E., F.; Wolfender, J.-L.; Raoelison, G.; Hostettmann, K. Phytochem. Anal. 2003, 14, 34-39. Guilet, D.; Guntern, A.; Ioset, J -R.; Queiroz, E. F.; Ndjoko, K.; Foggin, C. M.; Hostettmann, K. J. Natl. Prod. 2003, 66, 17-20. Petritis, K.; Gillaizeau, I.; Elfakir, C.; Dreux, M.; Petit, A.; Bongibault, N.; Luijten, W. Journal of Separation Science 2002, 25, 593-600. Hostettmann, K.; Wolfender, J.L.; Editor(s): Hostettmann, Kurt; Gupta, Mahabir, P.; Marston, Andrew. Chemistry, Biological and Pharmacological Properties of Medicinal Plants from the Americas, Proceedings of the IOCD/CYTED Symposium, Panama City, Feb. 23-26, 1999, Meeting Date 1997, 19-41. Cogne, A.-L.; Queiroz, E. F.; Wolfender, J.-L.; Marston, A.; Mavi, S.; Hostettmann, K. Phytochemical Analysis 2003, 14, 67-73. Kraus, W.; Ngoc, Luu Hoang.; Conrad, J.; Klaiber, I.; Reeb, S.; Vogler, B. Phytochemistry Reviews 2003, 1, 409-411. Kraus, W. Journal of Toxicology, Toxin Reviews 2003, 22, 495-508. Rodriguez, S.; Wolfender J.-L.; Hostettmann, K.; Stoeckli-Evans, H.; Gupta, M. P. Helv. Chim. Acta 1998, 81, 1393-1403. Hansen, S. H.; Jensen, A. G.; Cornett, C.; Bjornsdottir, I.; Taylor, S.; Wright, B.; Wilson, I. D. Anal. Chem. 1999, 71, 5235-5241. Iwasa, K.; Kuribayashi, A.; Sugiura, M.; Moriyasu, M.; Lee, D -U.; Wiegrebe, W. Phytochemistry 2003, 64, 1229-1238. Exarchou, V.; Godejohann, M.; Van B, Teris, A.; Gerothanassis, I. P.; Vervoort, J. Analytical Chemistry 2003, 75, 6288-6294 Wilson, I.D. J. Chromatogr. A 2000, 892, 315-327. Lindon, J. C.; Nicholson, J. K.; Wilson, I. D. J. Chromatogr. B Biomed Sci. Appl. 2000, 748, 233-258. Cannell R, J. P.; Rashid, T.; Ismail, I. M.; Sidebottom, P.; Knaggs, A. R.; Marshall, P. S. Xenobiotica 1997, 27, 147-157. Tseng L –H.; Braumann, U.; Godejohann, M.; Leec S.-S.; Albert, K. J. Chin. Chem. 2000, 47, 12311236. Sandvoss, M.; Weltring, A.; Preiss, A.; Levsen, K.; Wuensch, G. J. Chromatogr. A 2001, 917, 75-86. Fritsche, J.; Angoelal, R.; Dachtler, M. J. Chromatogr. A 2002, 972, 195-203.
166 Frontiers in Drug Design & Discovery, 2005, Vol. 1 [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202]
Urban and Separovic
Bailey N, J. C.; Stanley, P. D.; Hadfield, S. T.; Lindon, J. C.; Nicholson, J. K. Rapid Commun. Mass Spectrom. 2000, 14, 679-684. Wilson, I. D.; Morgan, E. D.; Lafont, R.; Shockor, J. P.; Lindon, J. C.; Nicholson, J. K.; Wright, B. Chromatographia 1999, 49, 374-378. Lommen, A.; Godejohann, M.; Venema, D. P.; Hollmann P, H. C.; Spraul, M. Anal. Chem. 2000, 72, 1793-1797. Corcoran, O.; Spraul, M. Drug discovery Today 2003, 8, 624-631. Queiroz, E., F.; Wolfender, J.-L.; Atindehou, K., K.; Traore, D.; Hostettmann, K. J. Chromatogr. A 2002, 974, 123-134. Wolfender J -Luc.; Ndjoko, K.; Hostettmann, K. J. Chromatogr. A 2003, 1000, 437-455. Louden, D.; Handley, A.; Taylor, S.; Lenz, E.; Miller, S.; Wilson, I. D.; Sage, A. Anal. Chem. 2000, 72, 3922-3926. Louden, D.; Handley, A.; Taylor, S.; Sinclair, I.; Lenz, E.; Wilson, I. D. Analyst 2001, 126, 16251629. Mangia, A. Current Status and Future Trends in Analytical Food Chemistry, Proceedings of the European Conference on Food Chemistry, 8th, Vienna, Sept. 18-20. 1995, 1 62-69. Kite, G. C.; Veitch, N. C.; Grayer, R. J.; Simmonds, M. S. J. Biochemical Systematics and Ecology 2003, 31, 813-843. Krock, K. A.; Wilkins, C. L. J. Chromatogr. A 1994, 678, 265-277. Albert, K.; Braumann, U.; Tseng, L. H.; Nicholson, G.; Bayer, E.; Spraul, M.; Hofmann, M.; Dowle, C.; Chippendale, M. Anal. Chem. 1994, 66, 3042-3046. Braumann, U.; Handel, H.; Albert, K.; Ecker, R.; Spraul, M. Anal. Chem. 1995, 67, 930-935. Spraul, M.; Freund, A. S.; Nast, R. E.; Withers, R. S.; Maas, W. E.; Corcoran, O. Anal. Chem . 2003, 75, 1546-1551. Haner, R. L.; Llanos, W.; Mueller, L. J. Magn. Reson. 2000, 143, 69-78. Lacey, M. E.; Subramanian, R.; Olson, D. L.; Webb, A. G.; Sweedler, J. V. Chemical Reviews 1999, 99, 3133-3152. Peck, T. L.; Magin, R. L.; Lauterbur, P. C. J. Magn. Reson. B 1995, 108, 114-124. Li, Y.; Wolters, A.; Malawey, P.; Sweedler, J.; Webb, A. Anal. Chem. 1999, 71, 4815-4820. MacNamara, E.; Hou, T.; Fisher, G.; Williams, S.; Raftery, D. Anal. Chim. Acta 1999, 397, 9-16. Eldridge, G. R.; Vervoort, H. C . Lee, C. M.; Cremin, P. A.; Willliams, C. T.; Hart, S. M.; Goering, M. G.; O’Neil-Johnson, M.; Zeng, L. Anal. Chem. 2002, 74, 3963-3971. Schlotterbeck, G.; Ross, A.; Hochstrasser, R.; Senn, H.; Kuhn, T.; Marek, D.; Schett, O. Anal. Chem . 2002, 74, 4464-4471. Pusecker, K.; Schewitz, J.; Gfroerer, P.; Tseng L –H.; Albert, K.; Bayer, E. Anal. Chem. 1998, 70, 3280-3285. Corcoran, O.; Wilkinson, P. S.; Godejohann, M.; Braumann, U.; Hofmann, M.; Spraul, M. American Laboratory, Perspectives in Chromatography 2002, 34, 18-21. Jayawickrama, D. A.; Sweedler, J. V. J. Chromatogr. A 2003, 1000, 819-840. Peck, T. L.; Sweedler, J. V.; Micro Total Analysis Systems 2001, Proceedings µTAS 2001 Symposium, 5th, Monterey, CA, United States, Oct. 21-25. 2001, 417-419. Schefer, A. B.; Rapp, E.; Wahrendorf, M. S.; Wink, M.; Ferreira, A. G.; Bayer, E.; Albert K in OnLine LC-NMR and Related Techniques; Albert, Klaus (Ed); Publisher: John Wiley & Sons Ltd, Chichester, UK 2002, 239-242. Ashcroft, A. E.; Major, H. J.; Lowes, S.; Wilson, I. D. Analytical Proceedings 1995, 32, 459-462. Corcoran, O.; Nicholson, J. K.; Lenz, E. M.; Abou-Shakra, F.; Castro-Perez, J.; Sage, A. B.; Wilson, I. D. Rapid Commun. Mass Spectom. 2000, 14, 2377-2384. Sidelmann, U. G.; Braumann, U.; Hofmann, M.; Spraul, M.; Lindon, J. C.; Nicholson, J. K.; Hansen, S. H. Anal. Chem. 1997, 69, 607-612. Styles, P.; Soffe, N. F.; Scott, C. A.; Cragg, D. A.; Row, F.; White, D. J.; White, P C. J. J. Magn. Reson. 1984, 60, 397-404.
Frontiers in Drug Design & Discovery, 2005, 1, 167-195
167
The Role of Kinetics in High Throughput Screening for Drugs Agustina Gómez-Hens* and Mª Paz Aguilar-Caballos Department of Analytical Chemistry, Faculty of Sciences, “Marie Curie” Building Annex, Campus of Rabanales, University of Córdoba, 14071-Córdoba, Spain Abstract: A systematic study of the incidence of kinetics in high throughput screening for drug design and discovery is presented and discussed. This study includes dynamic aspects of the techniques and assays used in this complex research field, kinetic methods applied to the selection of drug candidates, and recent advances in the elucidation of drug-receptor interaction mechanisms and in high throughput pharmacokinetic screening.
INTRODUCTION Combinatorial technologies have given rise to the availability of large compound collections from which physiologically active drugs can be obtained. Also, there are an increasing number of new molecular targets emerged from advances in molecular biology and the development of genome and proteome programs. As a result of this situation, improvements in high throughput screening (HTS) capabilities are required to evaluate more compounds active against more targets and to attain the correct selection of drug candidates in the early step of the complex drug discovery process. Many of these HTS assays are based on the use of techniques or methodologies in which kinetic aspects play a significant role so that they require the control of time-dependent variables to obtain adequate results. Homogeneous formats are typically used in these assays, avoiding separation and washing steps and allowing the rapid analytical throughput for kinetic characterization. This characterization is required because kinetics and structure are interrelated as optimum kinetics are dependent on physicochemical properties. Absorption, distribution, metabolism and excretion (ADME) are essential kinetic processes in drug discovery. Thus, drug efficacy depends on the oral delivery rate and also, on first-pass metabolism rates and liver or renal clearance rates. Drug metabolism can give rise to clinically active or toxic metabolites as well as drug-drug interactions. The kinetics of formation and clearance of each metabolite are important parameters in the global profile of a drug. The understanding of metabolic rates is essential to explain the results obtained from in vivo experiments, and also to develop modeling strategies suitable for the prediction of preliminary lead selection and/or optimization [1]. All these studies are required because about 40% of drug candidates fails in clinical trials due to inadequate pharmacokinetic and metabolism properties [2]. A number of in vitro, in vivo *Corresponding author: E-mail:
[email protected] Garry W. Caldwell / Atta-ur-Rahman / Barry A. Springer (Eds.) All rights reserved – © 2005 Bentham Science Publishers.
168 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Gómez-Hens and Aguilar-Caballos
and in silico approaches has been described to achieve a high throughput pharmacokinetic screening [3]. Many HTS assays for metabolic screening of small drugs are frequently based on the use of end-point measurements of enzymatically produced products because of the easier automation of steady-state assays whether compared to kinetic assays [4]. However, instruments that can perform kinetic assays for metabolite identification are now commercially available, which contribute to improve the selectivity of some traditional enzymatic assays. The aim of this article is to emphasize the kinetic aspects involved in the different techniques and methodologies applied in HTS, to present an overview of the most recent kinetic methods reported in this research area and to highlight the importance that kinetics have on the whole drug discovery. DYNAMIC ASPECTS OF TECHNIQUES AND ASSAYS USED IN HTS Many analytical techniques or approaches are perceived to be of kinetic nature or to involve key kinetic components. Thus, chromatography and electrophoresis are differential migration techniques in which thermodynamic and kinetic aspects are responsible of the final separation. The chromatographic separation occurs as a result of repeated sorption/desorption processes during the movement of the sample components along the column, so that the differences in the distribution constants of the individual components allow the separation. However, kinetics play a double function as it is necessary to take into account the time required to reach each distribution equilibrium and the differential displacement rate of the sample components, which would be the last responsible of the separation process. Several kinetic factors contribute to the band broadening, which determines the chromatographic efficiency. Thus, a major source of band broadening in chromatographic separations is the resistance to mass transfer in both the stationary and mobile phases, which prevents to reach an instantaneous equilibrium. Also, molecular diffusion, which is a time-dependent process of equalizing concentration and is an important physical kinetic process, contributes to band broadening, increasing its effect with the amount of time the solute spends in the column. Liquid chromatography (LC) and capillary electrophoresis (CE) are widely applied in HTS for drug discovery as they allow to obtain information about the biological activity of large series of compounds, including binding to selective targets such as enzymes, receptors or membrane proteins [5]. Both techniques have been also applied to the separation of drug enantiomers [6], which is of great interest in the pharmaceutical industry to have enantiomeric pure compounds available. One of the most useful detection systems in LC is mass spectrometry (MS) as it allows a large number of compounds to be rapidly screened. For instance, the usefulness of LC with tandem MS (LC/MS/MS) has been shown in HTS for the determination of selected estrogen receptor modulators in human plasma, reaching a sample throughput of 2000 samples/h [7]. The shape and quality of the analytical signal in flow systems, such as unsegmented continuous-flow systems, usually named flow injection analysis (FIA), are directly influenced by dispersion, which occurs as a result of mass transfer and mixing during sample-reagent(s) transport by the carrier stream to a detector. Axial and radial
The Role of Kinetics
Frontiers in Drug Design & Discovery, 2005, Vol. 1 169
diffusion, which is the most significant mechanism of mass transport, greatly contributes to dispersion in these flow systems, together with convection, which is another timedependent process of equalizing density and temperature. Both diffusion and convection processes can be altered when chemical reactions occur during the transport. Measurements in flow systems are obtained when neither physical (homogenization) nor chemical equilibrium have been attained by the time that the sample zone reaches the continuous detector, so the corresponding determinations are doubly kinetic in nature. An example of the application of a FIA system in HTS is the flow probe proposed for the direct transfer of the sample from a reservoir into a NMR detector, obtaining a significant increase in the sample throughput, which is of interest for screening of candidate drug compounds [8]. All luminescent phenomena are processes of an intrinsically kinetic nature as they involve the emission of ultraviolet, visible or near-infrared radiation from a molecule or an atom resulting from the transition of an electronically excited state to a lower energy state, usually the ground state. For instance, in photoluminescence processes, which require the use of electromagnetic radiation as the exciting source and the transition from a singlet (fluorescence) or a triplet (phosphorescence) electronically excited state, measurements are carried out under conditions of dynamic change controlled by the rates of deactivation processes that follow the excitation. The excited states responsible for luminescence phenomena have finite lifetimes that can be measured by applying time-resolved methodologies and used as additional analytical parameters. The temporal information given by fluorescence lifetime can be usually employed in combination with other parameters to identify compounds. Also, if two or more luminescent compounds have similar absorption or emission spectra but different luminescence decay times, the compounds can be determined by differential kinetic analysis. The resolution of spectrally similar fluorophores based on differences in lifetime is suitable when other methods for simultaneous determination fail, when it may be difficult to separate the components in the sample or when the chemical treatment of the sample may increase analysis time or introduce contamination. It is very unlikely that two components will have the same lifetimes as well as the same excitation and emission spectra. Luminescence-based assays have gained widespread acceptance and growing incorporation into HTS programs [9-13]. They have emerged as an alternative to radiolabel-based assays as they approach the sensitivity of radioactive detection along with ease operation, which makes them amenable to miniaturization. Fluorescent and chemiluminescent techniques such as fluorescence polarization (FP), fluorescence correlation spectroscopy (FCS), time-resolved fluorescence resonance energy transfer (TR-FRET) and bioluminescence resonance energy transfer (BRET), offer the advantage of a homogeneous format, which is usually the format of choice for HTS and can be developed in microtiter plate readers. The advantages and limitations of several of these techniques for the establishment of miniaturized homogeneous screening assays have been widely discussed [9,11]. FP spectroscopy involves kinetic aspects as measurements are related with the timeaveraged rotational motion of molecules. When a molecule is excited by polarized light, the emitted light will also be polarized provided than the molecule does not rotate during the time elapsed between excitation and emission. The emission of polarized radiation from this molecule will mainly depend on the lifetime of the excited state compared with the time required for rotational motion, as well as on environmental variables such as
170 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Gómez-Hens and Aguilar-Caballos
viscosity and temperature. Small fluorescent molecules rotate rapidly, and normally exhibit no fluorescence polarization, while the rotation motion of macromolecules is slowed down and the fluorescence remains polarized. Thus, if small fluorescent molecules bind to larger molecules, their rotational diffusion is reduced and the retention of polarization is correspondingly increased, Fig. (1). For typical fluorescent molecules such as fluorescein, polarization is not observed if the molecule is rotating rapidly in solution under the conditions typically used in biological assays, such as aqueous buffers and room temperature. If the fluorescence polarization of a fluorescein-labeled ligand is measured, a low polarization will be observed when the ligand is free in solution, and a high polarization will be observed when the ligand is bound to a macromolecule such as a receptor or antibody, so that the extent of binding can be obtained.
Fig. (1). FP principle. (F: fluorescent label).
FP is usually expressed as the degree of polarization: mP = 1000 [(A-B)/(A+B)], in which A and B are the fluorescence intensities measured in the planes parallel and perpendicular to the plane of the incident polarized light. This radiometric measurement minimizes variations caused by fluctuations in lamp intensity or interferences caused by fluorescence quenching. No biological system can show polarization below 0 mP or greater than 500 mP. If a value outside this range occurs, it is a clear sign that there is an error, either in the instrument or, more likely, in the elements present in the assay sample, such as highly fluorescent or scattering compounds. FP technology has been used in basic research and commercial diagnostic assays for many decades, but has begun to be widely used in drug discovery only in the past few years. Originally, FP assays for drug discovery were developed for single-tube analytical instruments, but the technology was rapidly converted to HTS assays when commercial plate readers with equivalent sensitivity became available. Homogeneous FP has been implemented in instruments for HTS, using 96-, 384- or up to 1536-well plates, which enables a new range of FP assays to be carried out, showing that miniaturization is an
The Role of Kinetics
Frontiers in Drug Design & Discovery, 2005, Vol. 1 171
inherent capability of FP. These instruments measure accurately the polarization of fluorescein at concentrations below 100 pM with a precision better than 7 mP [14]. The extent of interference in FP assays will vary depending on the nature of the compounds being screened. Thus, fluorescent compounds can cause interference, but the use of redshifted fluorophores instead of the commonly used fluorescein may be the best way to minimize fluorescent interferences as well as light scattering [15,16]. A number of FP assays for drug discovery research has been described, which include ligand-receptor binding and enzyme assays [17-23]. A multiplexing assay, which allows to monitor different parameters in a single assay volume, has been described using FP for screening steroid hormone receptors [24]. The assay relies on the displacement of spectrally distinct fluorescent ligands (tracers) from recombinant human estrogen receptor alpha and progesterone receptor ligand-binding domain. The binding of a test compound to its corresponding receptor can be established by a decrease in FP of the tracer that is specific to that receptor. Independent confirmation of binding to each of the receptors is possible because of the distinct spectral properties of the tracers. Kinetics are also present in time-resolved methods based on the use of chemical systems with relatively long luminescence lifetimes, in which measurements are obtained in the phosphorescence mode, using preset and gating times. Lanthanide chelates, mainly europium(III) and terbium(III) chelates, have been widely used in these methods because they exhibit a special behavior as a result of the intramolecular energy transfer from the excited triplet state of the ligand to the emitting level of the central ion. This process, named lanthanide-sensitized luminescence, gives rise to a large Stokes shift and narrow emission bands, affording the spectral discrimination of the analytical signal. Long-lived emissions are usually monitored at a fixed time after flash illumination, which enables the temporal discrimination of the analytical signal, avoiding scatter, Raman and any fluorescent background signals. Also, this timeresolved mode increases the analytical signal as it is integrated over a longer period than in the fluorescence mode, which results in improved detection limits [25,26]. Most applications of lanthanide-sensitized luminescence in HTS involves the use of fluorescence resonance energy transfer (FRET). FRET is a versatile approach in which a fluorescent donor transfers its electronic energy non-radiatively to an acceptor, which reemits light at another wavelength. As a result, the fluorescence intensity of the donor decreases, while the fluorescence of the acceptor increases. This process only takes place if the emission spectrum of the donor molecule and the absorption spectrum of the acceptor molecule overlap sufficiently. The efficiency of the process depends on the distance between the donor and the acceptor, which should range between 1 and 10 nm, and on their relative orientation. The direct coupling of the donor and the acceptor is unnecessary if the fluorescence lifetime of the donor is higher than the duration of the energy-transfer process. The energy transfer rate constant, ket, depends on the inverse sixth power of the distance between the fluorophores involved in the coupling: ket = (1/τD) (R 0/R)6, where τD is the donor fluorescence lifetime, R0 the characteristic transfer distance or Föster distance and R the actual distance between donor and acceptor. The FRET efficiency (E) can be deduced from the amplitude-averaged lifetimes of the donor in the presence (τDA) and absence (τD) of the acceptor: E = 1- (τDA/τD) [11]. One of the first homogeneous fluorescence assays applied in HTS was based on the use of FRET between two visible-
172 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Gómez-Hens and Aguilar-Caballos
wavelength dyes, such as fluorescein and tetramethylrhodamine, but the presence of autofluorescence interferences has restricted its use [11]. However, this limitation can be avoided using FRET in association with time-resolved fluorescence (TR-FRET), which has allowed the design of a homogeneous technology suitable for HTS applications. A lanthanide chelate, usually a europium cryptate chelate, characterized by a long lived fluorescence emission is used as donor and a cross-linked allophycocyanin, a 105 kDa phycobiliprotein, is used as acceptor, Fig. (2). The basics of this technology and their applications to the study of molecular interactions, including HTS, have been widely reviewed [27,28]. Several applications of FRET and TR-FRET in HTS have been described involving many of them enzymatic assays [9,10,29-31], which are commented below. Miniaturized homogeneous immunoassays have been described using TR-FRET, which can be suitable for HTS [32,33]. Human interleukin-13 (IL-13), which was secreted from NK3.3 cells stimulated with interleukin-2, was detected using biotinylated anti-IL-13 monoclonal antibody, europium cryptate-labeled different anti-IL-13 monoclonal antibody and crosslinked allophycocyanin-conjugated with streptavidin [32]. The assay was carried out using 384-well assay plate, in which both cell stimulation and FRET detection can be done, successively. The detection limit of IL-13 was estimated to be less than 600 pg/ml. The method can be suitable for screening IL-13 production inhibitors and discovering anti-allergic drugs. Interferon-γ (IFN-γ) has been also determined using a similar method [33] in which the reagents are added to microplate wells where NK3.3 cells are being cultured and the production of IFN-γ was stimulated with interleukin-12. This assay was applied in HTS for IFN-γ production inhibitors. A TR-FRET assay has been recently described for the identification of inhibitors of heatshock protein 90 (Hsp90), which is required for the stability and function of a number of proteins, many of which are involved in cancer development [34]. This assay allowed the identification of several potent and reversible inhibitors. Fluorescence correlation spectroscopy (FCS) is another fluorimetric technique suitable for miniaturized HTS using homogeneous assay formats [35]. FCS is based on the measurement of fluorescence fluctuations, which can be due to the diffusion of the fluorophore in the excitation volume or to a change of the fluorescence quantum yield because of a chemical reaction. The burst on fluorescent emission of a molecule passing through a small volume of space defined by a sharply focused laser beam, typically one femtoliter, is measured. The photons emitted in each burst are recorded in a timeresolved mode by a highly sensitive single-photon detection device. These measurements are carried out using confocal optics to provide the highly focused excitation light and background rejection required for single molecule detection. This technique can be used to measure kinetic properties of single molecules in drops of solutions or in cells and to measure diffusion coefficients and chemical reaction rates, taking advantage of the differences in the translational diffusion of small versus large molecules. Small molecules diffuse rapidly through the volume and thus yield short bursts of light. Binding of these small molecules to larger molecules reduces their translational diffusion and correspondingly increases the duration of the light bursts. Thus, molecules of different size, and hence different diffusion coefficient, remain in the detection volume for different periods of time. Deconvolution of the emission patterns in a sample by appropriate software can yield the relative amount of the bound and unbound states of a fluorescently tagged ligand. This technique is sensitive to
The Role of Kinetics
Frontiers in Drug Design & Discovery, 2005, Vol. 1 173
fluorophore concentrations in the nanomolar to femtomolar range, requiring that the lifetime of the fluorophore is less than its diffusion time through the confocal volume. Some applications of FCS involve the measurement of receptor-ligand interactions, DNA-protein interactions, nucleic acid hybrid formation and various enzymatic reactions [35-37]. A better selectivity can be obtained using dual-color fluorescence cross-correlation spectroscopy [38,39]. Although the instrumentation required is more complex as two coaxial laser beams are required, it allows to detect molecules that bear two different fluorescent labels. These fluorophores should have excitation and emission spectra that do not overlap to avoid energy transfer as well as cross-talk between the two channels. This technique has been employed to analyze kinetics of nucleic acid and peptide modifications catalyzed by nucleases, polymerases and proteases [39]. Chemiluminescence (CL) is probably the most markedly kinetic process of the luminescent phenomena as the factors affecting the emission intensity are a combination of luminescence and chemical reaction rate aspects. Molecules in the electronically excited state are produced by a chemical, biochemical (bioluminescence, BL) or electrochemical (electroluminescence) reaction, so that the emission intensity, IL, is directly proportional to the rate of the reaction involved, dC/dt, in molecules reacting per second, according to the equation: IL = ΦL (dC/dt). The proportionality constant ΦL is the CL quantum yield or efficiency, in photons emitted per molecule reacting. This CL efficiency is the product of the excitation quantum yield, expressed as excited states produced per molecule reacting, and the emission quantum yield, expressed as photons emitted per excited state. Although the lifetimes of the excited singlet states that are responsible for CL and BL emissions have typical durations in the nanoseconds region, the emission of light persists for as long as the CL or BL reaction takes place. Thus, the photophysics of the electronically excited products of the reaction and the kinetics of the reaction are involved in the luminescence emission. Two important factors to be considered in developing a luminescent method are the efficiency of the luminescent reaction, which ultimately affects sensitivity and detection limits, and the reaction kinetics, which dictates precision and throughput. In practice, both parameters can be affected by experimental conditions such as solvents, concentrations, pH and reagent purity. A significant advantage of CL and BL assays is that light is emitted by a specific reaction involving the analyte, which avoids the background signal from the sample matrix usually present in fluorimetric techniques. Taking into account that luminescence measurements are dynamic in nature, because they are based on the detection of a transient light emission, all CL and BL methods might be classified as kinetic methods. The analytical parameters most frequently employed in these methods are the peak intensity and the integrated area under all or a part of the light emission-time curve, requiring in both instances the control of the interval between initiation of the reaction and acquisition data. The usefulness of CL and BL in HTS has been recently reviewed [40]. Several applications have been focused to the determination of inhibitors of various enzymatic systems [41-46], some of which will be commented below. Firefly luciferase is widely used as reporter for monitoring promoter activity in the control of gene expression. Luciferase catalyzes the ATP-dependent oxidation of luciferin generating a flash of
174 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Gómez-Hens and Aguilar-Caballos
green light, with maximal emission at 563 nm. This system has been used in HTS to screen for drug candidates affecting a specific aspect of cellular physiology. A BL assay suitable in HTS has been described using the luciferase/luciferin system for the determination of antimycobacterial drugs in vitro and in infected macrophages [47]. Other BL systems applied in HTS involve the use of aequorin and the green fluorescent protein (GFP) [48]. A special BL methodology is BRET, which is a Föster resonance energy transfer process occurring from a bioluminescent donor protein to a fluorescent acceptor protein. When the donor and the acceptor are brought into close proximity to one another (in the order of 1-10 nm) and are correctly oriented, the former transfers its energy directly to the latter in a nonradiative way. Since the acceptor emits light at a different wavelength than the donor, the energy transfer can be easily detected by measuring the ratio of the acceptor emission intensity to the donor emission intensity. BRET allows to monitor protein-protein interactions and intracellular signaling events in live cells. It is theoretically applicable to the study of the activation state of any receptor that undergoes polymerization or conformational change, as the process strictly depends on the molecular proximity between donor and acceptor. In a BRET system, the first protein partner is fused to Renilla luciferase (Rluc), whereas the second protein partner is fused to a fluorescent protein such as the yellow fluorescent protein (YFP)] [49], which is a yellow mutant of the GFP. If the two partners do not interact, only one signal can be detected after addition of the luciferase substrate coelenterazine which is oxidized, giving rise to a blue luminescence flash with emission at 485 nm. If the two partners interact, a resonance energy transfer occurs and an additional signal emitted by the YFP at 530 nm can be detected, Fig. (2).
Fig. (2). FRET and BRET principles. (EuCh: Eu-cryptate chelate; AL: allophycocyanin; Rluc: Renilla luciferase; Coe: Coelenterazine; Coeox: oxidized coelenterazine.
BRET and FRET are two analogous methodologies, but they show some differences [49]. BRET eliminates the need for an excitation light source and its associated problems such as high background caused by the autofluorescence emitted by endogenous sample
The Role of Kinetics
Frontiers in Drug Design & Discovery, 2005, Vol. 1 175
components and photobleaching of the donor fluorophore, which results in the loss of signal with time [50,51]. Also, the overlapping absorption and emission spectra of the fluorescent donor used in FRET could give rise to the direct excitation of the acceptor fluorophore, which would complicate the interpretation of the results. However, FRET permits, under microscopic observation, visualization in a single living cell of proteinprotein interactions at the subcellular level [52], which is difficult to perform with BRET methodology. BRET has been used to monitor the activity of the insulin receptor [53,54], which is a glycoprotein composed of two α-subunits and two β-subunits linked by disulfide bonds. Binding of insulin to the α-subunits of the receptor induces the autophosphorylation of the β-subunits on tyrosine residues, which stimulates the tyrosine kinase activity of the receptor toward intracellular substrates, thereby allowing the transmission of the signal. In pathological states, such as diabetes and obesity, insulin effect on its target tissues is markedly reduced, which is often associated with a decrease in the tyrosine kinase activity of the receptor. The discovery of new molecules, capable of stimulating the tyrosine kinase activity of the receptor, may be of considerable importance for the treatment of insulin-resistant or insulin-deficient patients. Human insulin receptor of cDNA was fused to either Rluc or YFP coding sequences. The conformational change induced by insulin on its receptor could be detected as an energy transfer process, so that BRET signal parallels insulin-induced autophosphorylation of the fusion receptor [54]. Also, antibodies that activate or inhibit the autophosphorylation of the receptor have similar effects on BRET signal. The method can be applied in HTS tests for discovery of drugs with insulin-like properties as it allows to study the effects of agonists on insulin receptor activity. APPLICATION OF KINETIC METHODOLOGY IN HTS Kinetic methodology plays a significant role in HTS as about the half of all the marketed drugs act as inhibitors of enzymatic systems [55], in which kinetic parameters such as the apparent maximal rate (V max) and Michaelis-Menten constant (Km) values are important indicators of the interactions between a substrate and its corresponding enzyme. The inhibition effect of a drug is usually studied by measuring the decrease in the enzymatic activity, which is obtained by measuring the rate of the enzymatic reaction. The characterization of the type of inhibition (competitive, uncompetitive and noncompetitive), which is studied using kinetic measurements, is a step required in the design and development of new drugs [56]. A competitive inhibitor binds exclusively to the free enzyme, so that the inhibitor and substrate compete with each other for binding to the enzyme. Thus, for instance, enzyme kinetic measurements have recently identified to phenyl-thiazolylurea-sulfonamides as a new class of potent competitive inhibitors of bacterial phenylalanyl-tRNA synthetase [57], which is an essential enzyme that catalyzes the transfer of phenylalanine to the phenylalanyl-specific tRNA, a key step in protein biosynthesis. Also, the kinetic study of fluorogenic substrates for the protease activities of botulinum neurotoxins (BoNTs), which are zinc metalloproteases that cleavage neuronal proteins involved in neurotransmitter release and are among the most toxic natural products known, has allowed to characterize a new competitive inhibitor of BoNT B protease activity [58]. Taking into account that Km defines the substrate concentration at which the half Vmax under steady state conditions is obtained, the
176 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Gómez-Hens and Aguilar-Caballos
inhibition would be more easily detected at substrate concentrations below Km. When the inhibition is uncompetitive, the inhibitor binds exclusively to the enzyme-substrate (ES) complex or to some species that occurs along the reaction pathway subsequent to the ES complex formation. In this case the inhibitor is more easily detected at substrate concentrations above Km. Finally, a noncompetitive inhibitor binds to both the free enzyme and the ES complex, or later species in the reaction pathway. This type of inhibitor is less sensitive to substrate concentrations than competitive and uncompetitive inhibitors are. Generally, for screening purposes where the aim is to identify a wide group of potential inhibitors and, also, it is not clear which type of inhibitor will be most effective in vivo, the assays are usually carried out at a substrate concentration equal to Km. Inhibition is usually quantified for screening purposes in terms of the IC50, which represents the concentration of inhibitor that reduces the reaction rate by 50% under specific assay conditions. As the inhibition is higher in the initial phase of the reaction than in the later stages, the initial rate would be the parameter chosen to study the inhibition mechanism as a better sensitivity can be obtained. However, the fixed-time method is more often used in HTS than the initial-rate method, which is justified to reduce cost. In this instance, the time point must be chosen in the initial phase of the reaction. A model to choose an appropriate substrate conversion for enzymatic assays in HTS has been given [59] in which the relationship between the IC50 value for an inhibitor and the percentage of substrate conversion using a first-order kinetic model under conditions that obey Michaelis-Menten kinetics has been obtained. The 384-well microplate format is the dominant format of kinetic assays for HTS, although the use of 1536-well format continues to grow [60]. A microplate bioassay has been described for students to learn the principles and techniques used by drug discovery HTS researchers including enzyme inhibition, Michaelis-Menten kinetics and Lineweaver-Burk plots [61]. This drug discovery experiment involves the determination of the dissociation constant for inhibitor binding of the carboxypeptidase A inhibitor, 2benzylsuccinic acid, using kinetic methodology. Kinetic studies carried out in a microtiter plate have shown the usefulness of several 5,5’-dithiobis(2-nitrobenzamides) as alternative substrates for trypanothione reductases and thioredoxin reductases, which are enzymes central to cellular thiol metabolism and related to psoriasis, cancer, and autoimmune diseases [62]. The natural substrates of these enzymes are very expensive and difficult to obtain, while the proposed substrates can be easily synthesized. They are converted to their corresponding chromophoric thiolates, using colorimetric measurements to obtain enzyme activity values. The method is amenable to automation and suitable for inhibitor screening. Flow systems are also an alternative option useful for the determination of enzymatic inhibitors in HTS. Thus, a fluorescence flow injection assay has been described for determining the catalytic inhibition of DNA topoisomerase II [63], which is an ATPdependent nuclear and mitochondrial enzyme that alters DNA topology by catalyzing the passing of an intact DNA double helix through a transient double-stranded break made in the second helix. This enzyme is the target for a number of the most important anticancer drugs. The proposed method is faster and more sensitive than the agarose gel electrophoresis and radioactive centrifugation assays [64].
The Role of Kinetics
Frontiers in Drug Design & Discovery, 2005, Vol. 1 177
Protein kinases are an important group of targets for drug design as they play crucial roles in the regulation of many signal transduction pathways and aberrant activities of these enzymes have been implicated in several diseases such as cancer, inflammation and immune disorder. However, the finding of selective and safe inhibitors is difficult as their active sites share a high level of similarity. A recent review summarizes the literature on chemical libraries towards protein kinase inhibitors [65]. The interest in the identification of new inhibitors of these enzymes can be shown through the numerous methods recently described [21,41,42,66-69]. For instance, a series of indenopyrazolebased cyclin-dependent kinase (CDK) inhibitors has been studied [67]. Kinetic data showed that these compounds are competitive with respect to ATP and bind in the kinase ATP pocket. A series of semicarbazide-based inhibitors are highly potent against CDK2 and CDK4 while maintaining selectivity against other relevant serine/threonine kinases. Also, there are several kinetic assays for screening drug candidates as potential inhibitors of cytochrome P450 (CYPs) isoenzymes [70-72]. These isoenzymes are involved in the oxidative metabolism of more than 50% of all clinically administered drugs as they play an important role in metabolizing and detoxifying endogenous and exogenous compounds. Polymorphic variants of these isoenzymes can affect the metabolism of some drugs, leading to toxic effects and adverse drug reactions. CYP2C9 is the principal human cytochrome P450 isoenzyme involved in the oxidant metabolism of many drugs, including nonsteroidal anti-inflamatory compounds. A kinetic method has been described for identifying compounds that are potential substrates for CYPs [73]. The method is based on the detection of the NAD(P)(H) consumption or generation in cells expressing a NAD(P)(H)-dependent target enzyme. Also, diverse fluorogenic substrates have been used as prototypic probes to obtain in vitro CYP2C9 metabolic rates and kinetic parameters, such as apparent Km, Vmax, and Vmax/Km ratios for allelic variants of CYP2C9 [72]. Measuring inhibition of the fluorescent signal in the presence of competitive drugs allows detection of isozyme-specific substrates and inhibitors. Several kinetic assays suitable for HTS have been recently described using fluorimetric techniques such as FP, TR-FRET and FCS [14,21,23,29-31,35]. For instance, FRET has been used for the assay of N-acyl hydrazone inhibitors of human immunodeficiency virus-1 (HIV-1) reverse transcriptase (RT)-associated ribonuclease H (RNase H) activity, which is an underexplored target for antiretroviral development [30]. The assay substrate was an 18-nucleotide-3’-fluorescein-labeled RNA annealed to a complementary 18-nucleotide 5’-Dabcyl-modified DNA. This RNA/DNA hybride substrate showed extremely low background fluorescence, but signal enhancements of approximately 50-fold were obtained upon complete hydrolysis by HIV-1 RT-RNase H. The fluorescein-labeled ribonucleotide fragment readily dissociated from the complementary DNA at room temperature with immediate generation of a fluorescent signal. A TR-FRET method has been described for the identification of diverse inhibitors of viral and bacterial helicases [29], which are an important class of targets for the development of novel anti-infective agents, as they are responsible for the unwinding of double-stranded DNA, facilitated by the binding and hydrolysis of 5’-nucleoside triphosphates. The usefulness of FP and TR-FRET in structure-activity relationship (SAR) studies has been shown using Src kinase as a model system [74]. Both techniques allow the detection of subnanomolar inhibitors of this enzyme.
178 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Gómez-Hens and Aguilar-Caballos
The determination of enzymatic inhibitors has been also proposed using CL and BL assays [40-46]. These methods are based either on the detection of the end products of the enzymatic reaction or the direct, real time evaluation of the rate of the enzymecatalyzed reaction through its coupling with a suitable luminescent system. The latter possibility allows not only enzyme inhibition measurements but also evaluation of the reaction kinetics, which is useful for the study of enzyme inhibition mechanisms. For instance, the fixed-time method has been used for the CL detection of protein kinase activity [41], which could be applied for identifying novel classes of protein kinase inhibitors. The assay involves the use of biotinylated enzyme substrate peptides captured on a streptavidin-coated microtiter plate and monoclonal antibodies to detect their phosphorylation. Another CL kinetic method has been described for the determination of acetylcholinesterase inhibitors using coupled enzymatic reactions involving acetylcholinesterase, choline oxidase and horseradish peroxidase, with luminol as the CL substrate [44]. Also, a gas phase CL analyzer has been used for the study of nitric oxide synthase activity and the estimation of the inhibition kinetics measuring the formation of nitric oxide [45]. The method is a suitable alternative to the use of radiolabeled materials. A BL method has been proposed for the screening of potential inhibitors of proteases using an aequorin fusion protein that incorporates an optimum natural cleavage site [46]. This protein was immobilized onto microtiter plate wells and used as the substrate for the HIV-1 protease, obtaining a decrease of the BL signal as a result of the proteolytic bond cleavage and the release of aequorin from the solid phase. The scintillation proximity assay (SPA) technology, which involves the use of microscopic beads impregnated with a scintillant and with receptor molecules immobilized on its surface, has been also applied in HTS involving enzymatic systems [69,71,75-77]. Thus, an assay has been developed for the HTS of large compound libraries to identify inhibitors of poly(ADP-ribose)polymerase-1 (PARP-1) [76], an important enzyme involved in DNA repair. The assay allows the determination of IC50 and is adaptable to kinetic evaluation of lead molecules. The mechanism of action of the assay requires the binding of PARP-1 to a double-stranded DNA oligonucleotide leading to the active enzyme. Using NAD(+) and (3)H-NAD(+) as substrate, activated PARP-1 synthesizes labelled poly(ADP-ribose) chains. Once the reaction is stopped, ADP-ribose polymers are brought into proximity with pretreated microplate wells, resulting in signal amplification, which is detected by a scintillation plate reader. Another SPA assay has been described for the identification of small-molecule inhibitors of STK15, a centrosome-associated serine/threonine kinase [77]. The signal to noise ratio obtained was three times better than the value obtained using a TR-FRET assay evaluated with the same reagents. Mass spectrometry (MS) is a useful technique to study enzyme kinetics and inhibition mode. Reaction products can be directly and quantitatively detected, eliminating the requirement of an intrinsic or extrinsic chromophore or fluorophore or a radioactive probe, or the use of secondary enzymatic reactions irrelevant to the target enzyme reaction. LC/MS/MS has been applied for routine screening of drug candidates as potential inhibitors of several human CYP isozymes [78,79]. A 96-well format was used to develop the enzymatic reaction, which was stopped after incubation, and the solution was analyzed by LC/MS/MS. The applicability of LC-MS to the study of enzyme kinetics and the evaluation of inhibitors has been also demonstrated using the
The Role of Kinetics
Frontiers in Drug Design & Discovery, 2005, Vol. 1 179
enzyme uridine diphosphate N-acetyl-muramyl-L-alanine ligase (MurC), an essential enzyme in the bacterial peptidoglycan biosynthetic pathway [80]. Microchip CE is a recent alternative approach for the rapid screening of combinatorial compounds in enzyme inhibition formats [81-83] as the separations are faster than and comparable in efficiency to conventional CE. Microchannels with dimensions in the 10-100 micrometer range are fabricated in glass or fused-silica substrates using photolithography and etching techniques borrowed from the semiconductor industry. The pumping mechanism is electroosmotic, which provides accurate control of the small sample and reagent volumes required and eliminates the need for pump and valves. Microchip CE has been used with a model enzyme assay that involves the hydrolysis with β-glucuronidase (an acid hydrolase) of the fluorescein-β-Dglucuronide (FMG), which liberates fluorescein [82]. FMG and fluorescein are separated and fluorimetrically detected. Also, the inhibition of the enzyme by the competitive inhibitor D-saccharic acid-1,4-lactone was also determined. The usefulness of microchip CE for multiplexed screening has been shown by the electrophoretic separation of four cationic inhibitors of acetylcholinesterase [84]. The use of biosensors in HTS of drugs is a suitable option for monitoring real time interactions between biomolecules and drugs. Biosensors based on surface plasmon resonance (SPR) technology have been widely used to obtain kinetic parameters that may serve as useful indicators towards subtle differences in the binding strength [85]. SPR is a phenomenon occurring in a thin metal film when an incident light beam strikes the surface at a particular angle. Depending on the thickness of a molecular layer at the metal surface, the SPR phenomenon results in a graded reduction in intensity of the reflected light. This system uses polarized light and can detect slight changes in optical resonance that occur when molecules bind to or dissociate from an immobilized target molecule. Kinetic information on the binding interaction provides more information than equilibrium assays alone as it has been shown in the study of several enzyme-inhibitor interactions using these biosensors [86-88]. Thus, the performance of a commercial SPR biosensor has been evaluated analyzing the binding of a number of small-molecule inhibitors interacting with the enzyme carbonic anhydrase II [88]. The development of enantioselective enzymes is an interesting new approach that could have usefulness in HTS for the kinetic resolution of racemic mixtures [89-91]. The method involves the use of a wild-type enzyme that catalyzes a given reaction of interest but not enantioselectively. The gene that encodes the wild-type enzyme is first subjected to random mutagenesis using a molecular biological method. Upon inserting the library of mutant genes in an appropriate microorganism, mutant enzymes (variants) are expressed and individually screened for activity and enantioselectivity in the reaction of interest. The mutant gene of the optimal enzyme variant is then subjected once more to mutagenesis/expression/screening. This process identifies sensitive positions in the enzyme which are responsible for improved enantioselectivity. A lipase has been used as the catalyst in the hydrolytic kinetic resolution of a chiral ester. Also, electrochemical sensors based on potentiometry and amperometry have been used as detectors in enantioselective HTS of drugs using flow injection and sequential injection analysis techniques [92]. The usefulness of nuclear magnetic resonance (NMR) in the design of new pharmaceuticals, the characterization of drug-receptor interactions and metabolite
180 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Gómez-Hens and Aguilar-Caballos
identification has been described [93,94]. NMR has been used for characterizing the product or products of an enzymatic reaction and for gaining insight into the kinetics of the reaction [95,96]. Usually, these studies require high substrate concentrations owing to the low sensitivity of the NMR technique and only strong inhibitors could be detected. However, the NMR method called three fluorine atoms for biochemical screening (3FABS) has extended the capabilities of NMR enabling rapid, efficient and reliable highthroughput functional screening for the identification of enzymatic inhibitors [97]. The substrate is tagged with a CF3 moiety and 19F NMR spectroscopy is used for the detection of the substrate and product components. This approach has been applied to screen multiple enzymes at the same time using different CF3-containing substrates, which allows to study the selectivity of an inhibitor for one target enzyme in the presence of other enzymes of the same family. KINETIC ASPECTS OF DRUG-RECEPTOR INTERACTIONS Pharmacological role of receptor theory and its evolution has been reviewed recently [98]. G-protein coupled receptors (GPCRs) are seven transmembrane helical proteins that interact with molecules involved in the intercellular communication. Approximately 50% of clinical drugs and many abused drugs have GPCRs as their targets, which makes them specially interesting for drug discovery programs. Several models have been proposed to quantitatively simulate GPCRs behavior, such as the extended ternary model. This approach describes a receptor that can exist in both active and inactive states, depending on their ability to activate G proteins. Kinetic studies are used in pharmacology to know the dynamics of physical interactions between different parts of the signal transduction pathway. These studies are often carried out using membrane fragments and not whole cells even when there are two substantial differences: 1) membrane assays are more stable over the time; 2) they are not influenced by cellular processes such as receptor down-regulation, modification or desensitization. For these reasons, the results obtained in these studies can be biased [99]. Kinetic behavior of ligands is different for positive agonists, negative agonists or antagonists. Negative agonists bind more slowly than positive agonists because the receptor must release the G-protein and change to the inactive receptor conformation, which are two additional steps to the releasing of the tracer ligand and binding of the test ligand. Thus, the use of kinetic measurements can provide information about drug efficiency in GPCRs systems. There are receptors coupled to G protein prior to the binding of agonists. A kinetic model has been developed to study the effect of kinetics of receptor and G-protein association and dissociation on the predicted unoccupied sites [100]. It was observed that changes in the reaction kinetics can have effect on the dose-response curve, which cannot be observed using an equilibrium model [101]. There are two mechanisms of receptor actuation, as they can be degraded intracellularly or be recycled to the external surface in an active form. A mathematical approach based on the use of initial-rate equations for the recycling process can be derived from steady-state approaches to enzyme-substrate interactions and from a method for multi-step reactions [102]. This model has been validated using the endocytosis of trapped label (14C-sucrose-LDL) by the LDL receptor of cultured hepatic (Hep-G2) cells. The use of kinetic methodologies for investigating neurotransmitter rapid chemical reactions on cell surfaces in the microsecond-millisecond time region using laser has
The Role of Kinetics
Frontiers in Drug Design & Discovery, 2005, Vol. 1 181
been reviewed [103]. Further literature on drug-receptor interactions can be found in some revision articles [104-106], and also some kinetic models encompassing receptormediated endocytosis have been developed [107-111]. There are many real time methodologies used in HT drug-receptor interaction studies. Receptor activity can be directly or indirectly measured after drug-receptor interactions as Table (1) shows. Table 1.
Real-Time Methodologies Used for the Study of Drug-Receptor Interactions Direct
SPR biosensors gene reporter assays
Indirect
First messengers Second messengers
GTP assays Intracellular calcium cAMP assays ion channels
Direct assays are based on biosensing systems using basically SPR measurements and on gene reporter assays using BL or fluorescent systems. Indirect measurements are based on systems involving the determination of first or second messengers, such as guanosine, 5’-triphosphate (GTP) [112,113] or cyclic adenosine monophosphate (cAMP) [112,114,115], intracellular calcium [116-118] or other ion channels [119-122], respectively. SPR biosensors have been widely reported for the kinetic analysis of ligand-receptor interactions [123-125]. The use of kinetics gives the opportunity to optimize lead compounds based on binding rates, which are likely to be important aspects of drug potency. Rate constants (ks) can be calculated using the following equation [126]: (dRU/dt) = Req (1 – exp (-ks)(t – to)), where RU is the arbitrary binding unit (resonance unit), t o is the time at the start of sample injection, and Req is the equilibrium response. Since k s = k ass C + kdiss, the value of the association rate constant, kass was then obtained from the slope of the ks versus concentration plot, where C is the concentration of injected protein, and kdiss is the dissociation rate constant. This technique is also useful for the analysis of structure/function relationships of target/inhibitor interactions, providing more information than equilibrium assays alone. This fact was checked by the analysis of small molecules, such as sulfonamides, which are carbonic anhydrase inhibitors, using a commercial biosensor [88]. Measurements can be obtained in less than 100 s. Another technique which is also used for the biosensor technology is Plasmon Waveguide Resonance (PWR) [127]. It can be used to study kinetics of conformational changes associated to drug-receptor and G protein-receptor interactions for GPCRs. It has been found using this technique that kinetics of drug binding are distinct among various drug classes. These kinetic differences have been attributed to the participation of lipid bilayer in the agonist activation process, including changes in both protein conformation and bilayer structure, as experiments with βadrenoceptors demonstrate. GPCRs are very hydrophobic and they need a lipid environment to maintain their native conformation. A SPR method, referred as capture and reconstitution method, has been developed to assemble GPCRs onto a biosensor
182 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Gómez-Hens and Aguilar-Caballos
chip surface [128]. Human chemokine receptors CXCR4 and CCR5 have been used as model systems and captured in the presence of alkane groups. Structure-sensitive antibodies were used to check out the efficiency of reconstitution process, which is carried out in a lipid environment to maintain the structural and functional activity. The time required for a whole cycle takes around two thousand seconds and these biosensor systems can perform 200 assays/day. The binding of a ligand to GPCRs catalyzes the exchange of GTP to guanosine, 5’diphosphate (GDP) and the dissociation of the G-protein heterotrimer, giving rise to the GTP-bound α-subunit and free βγ-complexes to signal to downstream targets. This receptor activation process has been studied by using two luminescent techniques whose kinetic fundamentals have been discussed above, such as FRET [112] and TR [113]. In the FRET assay two GFP variants were covalently bound to different protein G subunits. When heterotrimer is intact, FRET intensity is maximum and it decreases when a ligand is bound as a consequence of the G protein disassociation. TR-FRET has been also applied to study the activation of nuclear receptors (NRs) [129], which are important targets for drug discovery as they form a group of ligand-dependent transcription factors that mediate the effects of hormones and other endogenous ligands to regulate the expression of specific genes. Peroxisome proliferator-activated receptor has been used as a model system for characterizing the ligand-dependent interaction between nuclear receptors and nuclear receptor coactivators. A TR membrane assay containing a receptor, an agonist, GDP and Eu-GTP showed that the activity of the receptor could be studied and the results obtained were consistent with those obtained using scintillation counting [113]. The use of fluorescence or BL detection techniques has allowed the real-time determination of receptors or the visualization of receptor trafficking inside the cell. A microfluidic system has been built up to use the luciferase reporter gene activity in the reporter cell line HFF11, based on HeLa cells as a model system [130]. This microfluidic instrument is based on the cascade of intracellular reactions, which ends in the expression of Photinus luciferase. This system continuously supplies fresh cell medium, influencing thus the fast kinetics. It seems that only a short ligand stimulation time is necessary to obtain a strong activation of the reporter system and the microfluidic system could be useful for studies to follow intracellular kinetics in real time. GFP has been used for tagging GPCRs or as a reporter gene expression marker [112,131-134]. GFP acts as a reporter gene for in vivo and in vitro assays to measure cell death in setups involving relocalization of GFP fusion proteins. Single-cell analysis performed by flow cytometry allows direct evaluation of apoptosis [132]. Kinetic measurements of the decrease of enhanced Green Fluorescent Protein (EGFP) would allow a distinction between induction of cell death and cell growth arrest to be done. This assay may be more convenient for screening a large panel of drugs or genes on a limited number of cell lines. GFP has also been studied as a label for a fusion protein family (FPR) for exploration of kinetic disassembly mechanism on bead as sensors in flow cytometry [133]. The use of this technique for real-time HTS of G-protein coupled receptors has been extensively revisited [135-137]. Throughput of assays using conventional flow cytometry has been increased by using automated liquid handling systems. The sample throughput has changed from 2 to 1000 samples per minute in a commercial system recently developed [137]. These systems can be used for monitoring kinetics at the subsecond scale, to study drug-receptor kinetics or receptor solubilization. However,
The Role of Kinetics
Frontiers in Drug Design & Discovery, 2005, Vol. 1 183
there is still some improvement to be made such as the minimization of liquid carryover and sample residual in these commercial systems [138], which could be achieved by reducing the size of the components of the system and by treating PVC tubing to modify the humidity degree of its surface. The use of second messenger assays is also useful for the real-time determination of GPCRs activity. The monitoring of cyclic nucleotides such as cAMP [112,114,115] or the determination of inorganic phosphate [139] has been reported for this purpose. A Gprotein FRET assay [112] allowed the examination of kinetics of transient loss of FRET on addition and removal of cAMP using adequate stopped-flow systems and sensitive fluorometers in the subsecond time order. Thus, a dose-response curve of cAMP and its analogues that bind to CAR1 receptor can be obtained. A scintillation proximity assay (SPA) has been proposed to determine the adenosine triphosphatase (ATPase) activity, and also substrate or product concentrations [139]. These assays are designed in such a way that only the substrate or the product bound to a bead can produce a significant scintillation signal so, a physical separation step is unnecessary. The utility of this SPA procedure for mechanistic studies was assessed by studying the initial rate of the system. The inorganic phosphate released by the action of ATPase is bound to molibdate to form phosphomolibdate anion, which is also bound to scintillation beads. The measurement of intracellular calcium ion concentration has been also used as an indicator of receptor activation. Although it has been widely described for end-point assays, some kinetic studies have been also developed in flow systems [116,117] or multi-well formats [115,118]. In all of them, fluorescent indicators of calcium were used, such as aequorin [118], fura-2 [117] or fluo4-acetoxymethylester [115]. Initial rates of calcium response can be used as a rapid analytical methodology for the discrimination of different drugs [117] using a flow-injection renewable surface technique cell-based assay. Also, the delay introduced in the Ca2+-aequorin interaction kinetics caused by a Ca 2+-chelator has been used to develop kinetic methodologies using multi-well format as the timing of sample handling and plate reading are in the same order of magnitude [118]. Other devices based on microfluidic systems have been described for potassium channels [121,122]. HIGH THROUGHPUT PHARMACOKINETIC SCREENING Kinetics are present in the whole process of absorption, distribution, metabolism, excretion and toxicity (ADMET) of a compound. There are three kinetic mechanisms involved in pharmacokinetic screening: first-order, zero-order and Michaelis-Menten kinetics. Most of absorption, diffusion, permeation and excretion processes can be described using first-order or linear kinetics. This means that the transformation rate of the drug is dependent on its concentration, following the equation: (dC/dt) = -kC, where C is drug concentration, k is the first-order rate constant and t is time. The negative sign indicates that concentration decreases with time. Excretion kinetics can be also given by zero-order kinetics, being the excreted drug concentration independent of initial drug concentration. The rate equation can be expressed as: (dC/dt) = -k0, where k 0 is the zero order rate constant. The well known Michaelis-Menten rate equation, under drug excess conditions, is very useful as most of bio-transformation processes are catalyzed by enzymatic systems. Fig. (3) shows the most relevant kinetic parameters involved in different stages of an ADMET process, which are commented below.
184 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Gómez-Hens and Aguilar-Caballos
Fig. (3). The role of kinetics at the different stages of absorption, metabolism, and excretion studies at early drug discovery. (kIAM, k ILC, retention factors for immobilized artificial membrane (IAM) chromatography and immobilized liposome chromatography (ILC)).
The determination of the absorption rate at early stages is important since it can be used to calculate the bioequivalence of a drug [140]. The study of solubility of potential pharmaceutical compounds is an important aspect in HTS as the rate-limiting step for the absorption of drugs from the gastrointestinal tract is frequently the dissolution of a solid dosage form. The rate in which a compound is dissolved is given by the modified Noyes-Whitney equation [141]: (dC/dt) = [(AD)/l](c 0-c1), where dC/dt is the variation of the concentration in time, c0 and c1 are concentrations at a distance 0 from the solid particle (saturated solubility) and at a distance l from the diffusion layer, respectively, A is the surface area available for dissolution, and D is the diffusion coefficient. Kinetics are present in these studies for any dosage form. Controlled release polymer dosage is specially indicated for rapidly metabolized drugs. Variables affecting the modulation rate are polymer composition, compression force and hydrodynamic conditions among others [142]. Automatic measurements of kinetic aqueous solubility can be done using an instrument with robotic sample handling, multiwell format and nephelometric detection [143], being analyzed more than 300 compounds/day. Immobilized Artificial Membrane (IAM) chromatography uses stationary phases constituted by phospholipids, covalently bound to silica supports. Thus, the polar head of phospholipids can reproduce hydrophobic, ion-pairing and hydrogen-bonding interactions that take place between analytes and biological membranes. The interest of IAM chromatography in the early drug discovery process as a powerful in vitro tool for high throughput screening and their applications have been reviewed [144]. This technique enables the study of non specific binding of biomolecules with cell membranes through hydrophobic interactions and for specific-binding establishing the bound between biomolecules and analytes. IAM has been widely applied to calculate the hydrophobicity parameters for a great variety of compounds [144-152]. The results obtained with this technique have been compared to those obtained by reversed-liquid chromatography [146,147], micellar liquid chromatography and micellar electrokinetic capillary chromatography (MEKC). MEKC and IAM chromatography correlated well,
The Role of Kinetics
Frontiers in Drug Design & Discovery, 2005, Vol. 1 185
which would be probably due to similarities between the organized mediums provided by surfactants and phospholipids. Stationary phases with immobilized receptors have been developed [150-152] to show their usefulness for drug-receptor interactions. Immobilized liposome chromatography (ILC) has been also used for membrane partitioning studies and its uses have been recently reviewed [153]. The predictions made using ILC can be more realistic as liposomes constitute a hydrophobic region sandwiched between two interfacial hydrophilic layers, as it happens in biological membranes. Permeability studies are other important issue to be taken into account in drug absorption screening. Drug permeation often involves non-steady-state conditions. Solutes sometimes have different rates of adsorption to membranes and they can form aggregates in solution or within the membranes thus, affecting permeability. Surface changes of membranes can give rise to attraction or repulsion forces. Apparent permeability (Papp) has a kinetic component as it can be calculated according to the equation: P app = (1/AC0)(dQ/dt), where C0 is the initial concentration of drug, A is the surface area of the filter/cell, and dQ/dt is the permeability rate. Cell lines have been traditionally used for permeation assays and results from Caco-2 cell lines correlate well with in vivo procedures but the assays using these cells often feature a low throughput as three weeks are needed for they to be grown. Marby-Darby canine kidney (MDCK) cells constitute an alternative to Caco-2 cells for permeability assays [154-156], taking into account that only three days are needed to achieve an appropriate cell culture. The results correlate well with those provided by Caco-2 cells and also with those by in vivo assays, what demonstrate its practical usefulness at early drug discovery. A high-throughput assay used for permeation studies is the Parallel Artificial Membrane Permeation Assay (PAMPA). It is a membrane-based assay involving the use of two plates separated by a porous filter made from an inert material coated with a lipid solution to prepare the artificial membrane. Wells from both plates are filled with drug and buffer solutions, respectively, and incubated [157]. Afterwards, the drug concentration in both plates is examined using UV [158-161] or analyzed by LC/MS [160] to measure the rate of permeation, which is determined by the drug effective permeability. The use of LC/MS is very useful for compounds that lack absorbing chromophore groups and the influence of some drug solubilizers, which are commonly used as excipients can be checked out without any interference unlike the UV-vis method [160]. In most instances, the results obtained with this technique correlated well with those provided by Caco-2 cell monolayers for absorption processes based on passive drug transport, which is the most usual phenomenon. The features of this technique such as simplicity, low cost and high throughput have proven that it is a useful tool at early drug discovery process for the selection of potential lead candidates, compared to Caco-2 cell assays. However, the use of PAMPA assays cannot exclude the use of cell lines since some phenomena such as drug efflux, first-pass metabolism and protein transport have to be studied using cell lines. Additionally, new technical improvements, such as the individual stirring of the wells [162] have increased the intrinsic high throughput of this technique. This approach is particularly interesting in the case of lipophilic molecules for which the diffusion of the drug across the water layer on both sides of the membrane is the rate-limiting step. Thus, the incubation time for lipophilic compounds is reduced from 15 h to 15 min. PAMPA assays have been
186 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Gómez-Hens and Aguilar-Caballos
efficiently automated in the 96-well format and fully automated equipments are now commercially available. Membrane proteins such as P-glycoprotein can contribute as active transporters of drugs, giving rise to the drug efflux out of the cell. Substrates of this protein are highly hydrophobic and therefore, they have a high diffusion rate. There is balance between passive diffusion across membrane bilayer and active trasport. A continuous fluorescence assay using tetramethylrosamine was developed to quantify drug transport in proteoliposomes of defined phospholipids [163]. The transport rate was observed in real time at time intervals of 150 s. This method was more advantageous than fixed timepoint rapid filtration technique because the later needed multiple samples, one at each time-point. The elucidation of carrier mechanisms might help to develop theoretical models for predicting drug absorption when drug efflux occurs. Mathematical models have been developed to simulate oral drug absorption [140,164-166]. One model described the gastrointestinal transit and absorption processes to analyze and predict plasma concentration profiles after oral administration [164]. For this simulation, gastrointestinal tract was divided into eight segments, considering firstorder absorption kinetics and the absorbability measurements defined by the absorption rate constant. Plasma concentration-time profiles were predicted using a convolution method for four model drugs. The results obtained showed that this model can be applied in any instance, including drugs experimenting first-pass metabolism or absorption. Other model involves a physiologically based method, which considers the intestine as a continuous tube and the function chosen follows a Gausiann distribution [165]. Two input parameters are needed, intestinal permeability coefficient and solubility in intestinal fluids. Concentration-time curves obtained were very similar to the values obtained in the literature. A great number of drugs are eliminated from the body via metabolism and/or excretion. Lipophilic molecules may move from plasma to hepatic cytosol by simple or facilitated diffusion. Kinetics are related to metabolism by means of plasma concentration-time profiles. Areas under the curve (AUC) can be calculated using the trapezoidal rule, using the relationship [167]: [AUC]0∞ = [AUC] 0t + (Ct/k), where [AUC]0∞ is the total area under the curve, [AUC]0t is the area under the curve at the last measurable point, Ct is the plasma concentration at the last measurable point and k is the elimination rate constant for the terminal phase. Plasma clearance (C l) can be determined by dividing intravenous doses by [AUC]0∞. Plasma half-lives (t 1/2) can be determined by logarithmic-linear least squares regression of the terminal phase of plasma concentration-time profiles. Hepatic transport is important in drug screening because extensive hepatic uptake or enhanced biliary excretion may be desirable features for a potential drug candidate [168]. In-vitro models used for the study of hepatic metabolism are cultured hepatocytes and microsomes although sometimes the results can be affected by inter-individual variability. The use of rapid methods for the evaluation of drug metabolic processes has been reviewed [169,170]. One of the indicators used to predict drug metabolism are liver enzymes, mainly cytochrome P450s (CYP450s). However, it has been recently reported that these enzymes do not follow typical Michaelis-Menten kinetics as they are affected by allosteric effects [1] and the prediction of kinetics by using kinetic modeling is often very complicated as they do not correlate with in vivo measurements. Kinetic parameters
The Role of Kinetics
Frontiers in Drug Design & Discovery, 2005, Vol. 1 187
were calculated and these assays were also used to predict adverse drug reactions. The study of enzyme kinetics is important at early stages of the drug discovery as the use of their sensitivity or selectivity towards a target can be used for the assessment of the utility of a compound. A high throughput method for measuring enzymatic kinetics has been fabricated for a commercial liquid handler platform [4]. It consists of a heating block for incubation of 96 samples at 37 ºC, which can measure inhibition rates under physiological conditions unlike to the conventionally used end-point assays at room temperature. The calculation of inhibition and Michaelis constants of known inhibitors and substrates, respectively, showed the validity of this method, as the results obtained were comparable to those reported in the literature. As it has been mentioned above, LC coupled to tandem MS is a valuable tool for planification activities in HTS as it allows the combination of quantitative and qualitative data to accelerate the determination of metabolites in a single analysis [171,172]. The use of tandem mass spectrometry provides useful information about metabolite formation using M, which is the parent molecule mass, and M+14, M + 16, M + 32 and M + 176, corresponding to ketoformation, hydroxylation, dihydroxilation and glucuronidation, respectively [171]. Metabolites can be resolved from parent molecules without any interference, due to multiple reaction monitoring, mainly in microsomes and hepatocytes. This information is useful for the study of metabolism to make the appropriate modifications in order to adequate their structure into another less susceptible of biotransformation. Procedures commonly used to enhance the throughput of metabolism studies are cassette-dosing and sample pooling [173]. Cassette dosing or N-in-one studies appeared some years ago as a mean to enhance the throughput for the in vivo screening of drugs. This approach consists of the simultaneous dosing of several drugs in order to reduce the number of assays. The advantages and shortcomings of this approach have been extensively reviewed [174]. The main limitation of these studies are drug-drug interactions, which are basically kinetic interactions, giving erroneous values for bioavailability, oral absorption or clearance of drugs. These errors can be false positive or negatives, depending on drug-plasma protein interactions or due to some drugs from cassettes which can be transport proteins or liver enzymes inhibitors. These interactions can be minimized or avoided by lowering the initial drug concentration, even to nM levels and by reducing the number of cassette components. LC/MS/MS has been extensively used to determine the compounds and their metabolites involved in these N-in-one approaches [167,171,172,175-184]. Some of them include automated solid phase extraction coupled on-line to the chromatographic system, which has also increased the throughput of these determinations [179,184]. The use of a commercial liquid handling sampler adapted to 96-well format to achieve automated sample preparation, which includes protein precipitation assay, has been reported [179]. This approach has been applied to individual compounds and to cassette dosing for the pharmacokinetic screening of VLA-4 antagonists in the range 1-5 ng/mL. Staggered parallel LC/MS/MS is an approach based on two LC systems coupled to the same mass spectrometric detector and are switched for the compounds to be detected [177,178,182]. This procedure enhances two-fold the throughput of the system. The analysis of 20 compounds can be done in an overnight assay [178]. Urinary excretion-time data together with serum concentration-time data and pharmacodynamics have been settled up by some institutions, such as United States Food and Drug Administration (FDA), as parameters indicating bioavailability [185].
188 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Gómez-Hens and Aguilar-Caballos
However, the lack of information about the relationship between serum concentration and renal clearance has given rise to a limited use of urinary excretion-time profiles, especially when renal clearance is not linear. To study the errors obtained in the assessment of renal clearance, a kinetic open one-compartmental model was used, which considered first-order absorption and linear/nonlinear clearance. This later was related to saturation but it may be also associated with other mechanisms of transport-mediated reabsorption. It is concluded that a prior knowledge of the relationship between serum concentration and renal clearance should be done. The kinetic information obtained through the assays involved in pharmacokinetic screening is a useful tool for the building up of mathematical and computational models. These approaches constitute the in silico screening, which can be very valuable at early stages if models can predict accurately in vivo results. Some models, which take into account reaction mechanisms in the ADMET processes and physicochemical properties of drug candidates have been developed in recent years [186-190]. These models have been named physiologically based pharmacokinetic (PBPK) models. A generic PBPK model based on conventional mass balance differential equations of drug disposition in the different compartments has been proposed [186]. Input parameters are in vitro data of metabolism, plasma-protein binding and lipophilicity, which are available at early drug discovery. In ultimate instances, this model takes into account a scaling approach from the translation of in vitro data to in vivo situations. The comparison of the simulated with experimental data on plasma and tissue concentration-time profiles can provide mechanistic information if key ADME processes are correctly described by the model. This PBPK model was evaluated in ten different tissues for three non-structurally related drugs, two lipophilic bases (diazepam, propanolol) and one neutral (ethoxybenzamide). Some discrepancies between predicted and in vivo data were found in specific tissues, which were ascribed to transport phenomena and to accumulation in tissue processes, respectively. Thus, a deeper study of ADME processes should be needed. An integral model based on the combination of in silico and in vitro prediction tools for the whole ADME process has been reported [187]. It is an iterative closed-loop model, which justify its assumptions in each stage through the comparison of the predicted values with in vivo data. The work is emphasized first on disposition and then, on oral absorption to predict both phenomena in humans. The development of these models has been extended to the prediction of the toxicity of some compounds [188]. It is assumed that the integration of in vitro data and quantitative structure activity relationship (QSAR) models is possible thanks to biokinetic and toxicodynamic modeling. The proposed scheme consists of four stages: a) evaluation based on physicochemical properties and chemical functionality, b) biokinetic modeling and basal toxicity testing, c) election of tests at high tissue concentrations and d) choice from a battery taking into account legislation requirements. The higher predictive ability of conceptual models based on subcellular pharmacokinetics whether compared to model-free QSAR methods has been discussed [189]. The lack of enough experimental data for the validation of a model can be solved using a cross-validation technique, which consists of iterative calculations of the predicted magnitude when a group of data points is not considered. The main limitation of this approach is that it cannot predict values outside of the parameter space. To overcome this shortcoming a leave-extremes-out cross validation technique is used and maximum and minimum values are omitted for calculations. The results obtained from
The Role of Kinetics
Frontiers in Drug Design & Discovery, 2005, Vol. 1 189
this study show how better the predictions with this model are if compared to approaches based on empirical models (neural networks, polynomial regression) for magnitudes out of the space. This is of great interest for the prediction of the behavior of large sets of different compounds, situation in which model-free QSAR empirical approaches usually fail. FINAL REMARKS The extensive literature devoted to drug discovery emphasizes the recent developments and advances in HTS such as miniaturization, automation, homogeneous formats, microarray platforms, in silico screening, etc. However, the significant incidence of kinetics in this research field is frequently omitted or hidden, although the control of the time-dependent variables is essential in most of its different steps to obtain satisfactory results. This article has tried to give an overview of the dynamic aspects involved in many of the techniques, methodologies and assays applied in HTS, in the drug-receptor mechanisms and in the ADMET properties of drug candidates. ACKNOWLEDGEMENTS Authors gratefully acknowledge financial support from the Spanish Ministerio de Ciencia y Tecnología (MCyT) (Grant No. BQU2003-03027). ABBREVIATIONS HTS
=
High Throughput Screening
ADME
=
Absorption, Distribution, Metabolism, Excretion
LC
=
Liquid Chromatography
CE
=
Capillary Electrophoresis
MS
=
Mass Spectrometry
FIA
=
Flow Injection Analysis
FP
=
Fluorescence Polarization
FCS
=
Fluorescence Correlation Spectroscopy
TR-FRET
=
Time-Resolved Fluorescence Resonance Energy Transfer
IL-13
=
Human Interleukin-13
IFN-γ
=
Interferon-γ
BRET
=
Bioluminescence Resonance Energy Transfer
NRs
=
Nuclear Receptors
CL
=
Chemiluminescence
BL
=
Bioluminescence
GFP
=
Green Fluorescent Protein
Rluc
=
Renilla luciferase
190 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Gómez-Hens and Aguilar-Caballos
YFP
=
Yellow Fluorescence Protein
BoNTs
=
Botulinus Neurotoxins
ES
=
Enzyme-Substrate
CDK
=
Cyclin-dependent-kinase
CYP
=
Cytochrome P450
HIV-1-RT-Rnase H
=
Human Immuno Deficiency Virus-1-Reverse Transcriptaseassociated ribonuclease H
SAR
=
Structure-Activity Relationship
SPA
=
Scintillation Proximity Assay
PARP-1
=
poly (ADP-ribose) polymerase-1
STK
=
Serine/Threonine Kinase
MurC
=
N-acetyl-muramyl-L-alanine-ligase
FMG
=
Fluorescein β-D-Glucuronide
SPR
=
Surface Plasmon Resonance
NMR
=
Nuclear Magnetic Resonance
3-FABS
=
Three-Fluorine Atoms for Biochemical Screening
GPCRs
=
G-protein coupled receptors
LDL
=
Low-density lipoprotein
GTP
=
Guanosine, 5’-triphosphate
cAMP
=
Cyclic adenosine monophosphate
PWR
=
Plasmon Waveguide Resonance
GDP
=
Guanosine, 5’-diphosphate
EGFP
=
Enhanced Green Fluorescent Protein
FPR
=
Fusion protein
ADMET
=
Absorption, distribution, metabolism, excretion and toxicity
IAM
=
Immobilized Artificial Membrane
ILC
=
Immobilized Liposome Chromatography
MDCK
=
Marby-Darby Canine Kidney
PAMPA
=
Parallel Artificial Membrane Permeability Assay
AUC
=
Area under the Curve
CL
=
Plasma Clearance
FDA
=
United States Food and Drug Administration
QSAR
=
Quantitative Structure Activity Relationship
The Role of Kinetics
Frontiers in Drug Design & Discovery, 2005, Vol. 1 191
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43]
Atkins, W.M. Drug Discov. Today, 2004, 9, 478-484. Kennedy, T. Drug Discov. Today, 1997, 2, 436-444. Chaturvedi, P.R.; Decker, C.J.; Odinecs, A. Curr. Opin. Chem. Biol., 2001, 5, 452-463. Saraswat, L.D.; Caserta, K.A.; Laws, K.; Wei, D.; Jones, S.S.; Adedoyin, A. J. Biomol. Screen., 2003, 8, 544-554. Bertucci, C.; Bartolini, M.; Gotti, R.; Andrisano, V. J. Chromatogr. B, 2003, 797, 111-129. Millot, M.C. J. Chromatogr. B, 2003, 797, 131-159. Zweigenbaum, J.; Henion, J. Anal. Chem., 2000, 72, 2446-2454. Spraul, M.; Hofmann, M.; Ackermann, M.; Nicholls, A.W.; Damment, S.J.P.; Haselden, J.N.; Shockcor, J.P.; Nicholson, J.K.; Lindon, J.C. Analyst, 1997, 122, 339-341. Burbaum, J.J.; Sigal, N.H. Curr. Opin. Chem. Biol., 1997, 1, 72-78. Silverman, L.; Campbell, R.; Broach, J.R. Curr. Opin. Chem. Biol., 1998, 2, 397-403. Pope, A.J.; Haupts, U.M.; Moore, K.J. Drug Discov.Today, 1999, 4, 350-362. Sundberg, S.A. Curr. Opin. Biotech., 2000, 11, 47-53. Jaeger, S.; Brand, L.; Eggeling, C. Curr. Pharm. Biotech., 2003, 4, 463-476. Sportsman, J.R.; Leytes, L.J. Drug Discov. Today, 2000, 1, 27-32. Turek-Etienne, T.C.; Small, E.C.; Soh, S.C.; Xin, T.A.; Gaitonde, P.V.; Barrabee, E.B.; Hart, R.F.; Bryant, R.W. J. Biomol. Screen., 2003, 8, 176-184. Harris, A.; Cox, S.; Burns, D.; Norey, C. J. Biomol. Screen., 2003, 8, 410-420. Lynch, B.A.; Loiacono, K.A.; Tiong, C.L.; Adams, S.E.; MacNeil, I.A. Anal. Biochem., 1997, 247, 7782. Seethala, R.; Menzel, R. Anal. Biochem., 1998, 255, 257-262. Burke, T.J.; Loniello, K.R.; Beebe, J.A.; Ervin, K.M.; PanVera, L.L.C.; Madison, W.I. Comb. Chem. High T. SCR., 2003, 6, 183-194. Allen, M.; Reeves, J.; Mellor, G. J. Biomol. Screen., 2000, 5, 63-69. Parker, G.J.; Law, T.L.; Lenoch, F.J.; Bolger, R.E. J. Biomol. Screen., 2000, 5, 77-88. Li, Z.; Mehdi, S.; Patel, I.; Kaooya, J.; Judkins, M.; Zhang, W.; Diener, K.; Lzada, A.; Dunnington, D. J. Biomol. Screen., 2000, 5, 31-38. Levine, L.M.; Michener, M.L.; Toth, M.V.; Holwerda, B.C. Anal. Biochem., 1997, 247, 83-88. Blommel, P.; Hanson, G.T.; Vogel, K.W. J. Biomol. Screen., 2004, 9, 294-302. Ci, Y.X.; Liu, Y.Z. Anal. Chem., 1995, 67, 1785-1788. Panadero, S.; Gómez-Hens, A.; Pérez-Bendito, D. Anal. Chim. Acta, 1996, 329, 135-141. Bazin, H.; Trinquet, E.; Mathis, G. J. Biotechnol., 2002, 82, 233-250. Bazin, H.; Preaudat, M.; Trinquet, E.; Mathis, G. Spectrochim. Acta A, 2001, 57, 2197-2211. Earnshaw, D.L.; Moore, K.J.; Greewood, C.J.; Djaballah, H.; Jurewicz, A.J.; Murray, K.J.; Pope, A.J. J. Biomol. Screen., 1999, 4, 239-248. Parniak, M.A.; Min, K.L.; Budihas, S.R.; Le Grice, S.F.J.; Beutler, J.A. Anal. Biochem., 2003, 322, 33-39. Hemmila, I. J. Biomol. Screen., 1999, 4, 303-307. Enomoto, K.; Araki, A.; Nakajima, T.; Ohta, H.; Dohi, K.; Preaudat, M.; Seguin, P.; Mathis, G.; Suzuki, R.; Kominami, G.; Takemoto, H. J. Pharm. Biomed. Anal., 2002, 28, 73-79. Enomoto, K.; Aono, Y.; Mitsugi, T.; Takahashi, K.; Suzuki, R.; Preaudat, M.; Mathis, G.; Kominami, G.; Takemoto, H. J. Biomol. Screen., 2000, 5, 263-268. Zhou, V.; Han, S.; Brinker, A.; Klock, H.; Caldwell, J.; Gu, X. Anal. Biochem., 2004, 331, 349-357. Auer, M.; Moore, K.J.; Meyer-Almes, F.J.; Guenther, R.; Pope, A.J.; Stoeckli, K.A. Drug Discov. Today, 1998, 3, 457-465. Sterrer, S.; Henco, K. J. Recept. Signal Transduct. Res., 1997, 17, 511-520. Pramanik, A. Curr Pharm. Biotech., 2004, 5, 205-212. Kettling, U.; Koltermann, A.; Schwille, P.; Eigen, M. Proc. Natl. Acad. Sci. USA, 1998, 95, 14161420. Rarbach, M.; Kettling, U.; Koltermann, A; Eigen, M. Methods, 2001, 24, 104-116. Roda, A.; Guardigli, M.; Pasini, P.; Mirasoli, M. Anal. Bioanal. Chem., 2003, 377, 826-833. Lehel, C.; Daniel-Issakani, S.; Brasseur, M.; Strulovici, B. Anal. Biochem., 1997, 244, 340-346. Vainshtein, I.; Silveria, S.; Kaul, P.; Rouhani, R.; Eglen, R.M.; Wang, J. J. Biomol. Screen., 2002, 7, 507-514. Ross, H.; Armstrong, C.G.; Cohen, P. Biochem. J., 2002, 366, 977-981.
192 Frontiers in Drug Design & Discovery, 2005, Vol. 1 [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86]
Gómez-Hens and Aguilar-Caballos
Andreani, A.; Cavalli, A.; Granaiola, M.; Guardigli, M.; Leoni, A.; Locatelli, A.; Morigi, R.; Rambaldi, M.; Recanatini, M.; Roda, A. J. Med. Chem., 2001, 44, 4011-4014. Maurer, T.S.; Fung, H. Nitric Oxide, 2000, 4, 372-378. Roda, A.; Russo, C.; Pasini, P.; Piazza, F.; Feroci, G.; Kricka, L.J.; Baraldini, M. J. Biolumin. Chemilumin., 1998, 13, 327-337. Deb, D.K.; Srivastava, K.K.; Srivastava, R.; Srivastava, B.S. Biochem. Bioph. Res. Co., 2000, 279, 457-461. Deo, S.K.; Daunert, S. Fresenius J. Anal. Chem., 2001, 369, 258-266. Boute, N.; Jockers, R.; Issad, T. Trends Pharmacol. Sci., 2002, 23, 351-354. Tsien, R.Y.; Bacskai, B.J.; Adams, S.R. Trends Cell Biol., 1993, 3, 243-245. Cubitt, A.B.; Heim, R.; Adams, S.R. Trends Biochem. Sci., 1995, 20, 448-455. Wouters, F.S.; Verveer, P.J.; Bastiaens, P.I.H. Trends Cell Biol., 2001, 11, 203-211. Boute, N.; Pernet, K.; Issad, T. Mol. Pharmacol., 2001, 60, 640-645. Issad, T.; Boute, N.; Pernet, K. Biochem. Pharmacol., 2002, 64, 813-817. Hopkins, A.L.; Groom, C.R. Nat. Rev. Drug Discov., 2002, 1, 727-730. Copeland, R.A. Anal. Biochem., 2003, 320, 1-12. Beyer, D.; Kroll, H.P.; Endermann, R.; Schiffer, G.; Siegel, S.; Bauser, M.; Pohlmann, J.; Brands, M.; Ziegelbauer, K.; Haebich, D.; Eymann, C. Antimicrob. Agents Chemother., 2004, 48, 525-532. Schmidt, J.J.; Stafford, R.G. Appl. Environ. Microb., 2003, 69, 297-303. Wu, G.; Yuan, Y.; Hodge, N. J. Biomol. Screen., 2003, 8, 694-700. Fox, S.; Farr-Jones, S.; Sopchak, L.; Boggs, A.; Comley, J. J. Biomol. Screen., 2004, 9, 354-358. Wentland, M.P.; Raza, S.; Gao, Y. J. Chem. Educ., 2004, 81, 398-400. Davioud-Charvet, E.; Becker, K.; Landry, V.; Gromer, S.; Logé, C.; Sergheraert, C. Anal. Biochem., 1999, 268, 1-8. Barnabé, N.; Hasinoff, B.B. J. Chromatogr. B, 2001, 760, 263-269. Haldane, A.; Sullivan, D.M. Methods Mol. Biol., 2001, 95, 13-23. Kimmich, R.D.A.; Park, C.W.K. Comb. Chem. High T. SCR., 2003, 6, 661-672. McGovern, S.L.; Shoichet, B.K. J. Med. Chem.; 2003, 46, 1478-1483. Nugiel, D.A.; Vidwans, A.; Etzkorn, A.M.; Rossi, K.A.; Benfield, P.A.; Burton, C.R.; Cox, S.; Doleniak, D.; Seitz, S.P. J. Med. Chem., 2002, 45, 5224-5232. Zaman, G.J.R.; Garritsen, A.; de Boer, Th.; Van Boeckel, C.A.A. Comb. Chem. High T. SCR., 2003, 6, 313-320. Mallari, R.; Swearingen, E.; Arnold Ow, W.L.; Young, S.W.; Huang, S.G. J. Biomol. Screen., 2003, 8, 198-204. Miller, V.P.; Stresser, D.M.; Blanchard, A.P.; Turner, S.; Crespi, C.L. Ann. N. Y. Acad. Sci., 2000, 919, 26-32. Delaporte, E.; Slaughter, D.E.; Egan, M.A.; Gatto, G.J.; Santos, A.; Shelley, J.; Price, E.; Howells, L.; Dean, D.C.; Rodrigues, A.D. J. Biomol. Screen., 2001, 6, 225-231. Marks, B.D.; Thompson, D.V.; Goossens, T.A.; Trubetskoy, O.V. J. Biomol. Screen., 2004, 9, 439449. Tsotsou, G.E.; Cass, A.E.G.; Gilardi, G. Biosens. Bioelectron., 2002, 17, 119-131. Newman, M.; Josiah, S. J. Biomol. Screen., 2004, 9, 525-532. He, X.; Mueller, J.P.; Reynolds, K.A. Anal. Biochem., 2000, 282, 107-114. Dillon, K.J.; Smith, G.C.M.; Martin, N.M.B. J. Biomol. Screen., 2003, 8, 347-352. Sun, C.; Newbatt, Y.; Douglas, L.; Workman, P.; Aherne, W.; Linardopoulos, S. J. Biomol. Screen., 2004, 9, 391-397. Racha, J.K.; Zhao, Z.S.; Olejnik, N.; Warner, N.; Chan, R.; Moore, D.; Satoh, H. Drug. Metab. Pharmacokin., 2003, 18, 128-138. Peng, S.X.; Barbone, A.G.; Ritchie, D.M. Rapid Commun. Mass Spectrom., 2003, 17, 509-518. Deng, G.; Gu, R.F.; Marmor, S.; Fisher, S.L.; Jahic, H.; Sanyal, G. J. Pharm. Biomed. Anal., 2004, 35, 817-828. Cohen, C.B.; Chin-Dixon, E.; Jeong, S.; Nikiforov, T.T. Anal. Biochem., 1999, 273, 89-97. Starkey, D.E.; Han, A.; Bao, J.J.; Ahn, C.H.; Wehmeyer, K.R.; Prenger, M.C.; Halsall, H.B.; Heineman, W.R. J. Chromatogr. B, 2001, 762, 33-41. Guijt, R.M.; Baltussen, E.; Van Dedem, G.W.K. Electrophoresis, 2002, 23, 823-835. Hadd, A.G.; Jacobson, S.J.; Ramsey, J.M. Anal. Chem., 1999, 71, 5206-5212. Karlsson, R. J. Mol. Recognit., 2004, 17, 151-161. Markgren, P.O.; Hämäläinen, M.; Danielson, U.H. Anal. Biochem., 2000, 279, 71-78.
The Role of Kinetics [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 193
Thurmond, R.L.; Wadsworth, S.A.; Schafer, P.H.; Zivin, R.A.; Siekierka, J.J. Eur. J. Biochem., 2001, 268, 5747-5754. Myszka, D.G. Anal. Biochem., 2004, 329, 316-323. Reetz, M.T. Tetrahedron, 2002, 58, 6595-6602. Reetz, M.T. Sci. Prog., 2000, 83, 157-172. Reetz, M.T. Pure Appl. Chem., 2000, 72,1615-1622. Stefan, R.I.; Van Staden, J.F.; Aboul-Enein, H.Y. Comb. Chem. High T. SCR. 2000, 3, 445-454. Pochapsky, S.S.; Pochapsky, T.C. Curr. Top. Med. Chem., 2001, 1, 427-441. Fejzo, J.; Lepre, C.; Xie, X. Curr. Top. Med. Chem. 2003, 3, 81-97. Percival, M.D.; Withers, S.G. Biochemistry, 1992, 31, 505-512. Evans, J.N.S. Biomolecular NMR spectroscopy, Oxford University Press, 1995, pp. 237-340. Dalvit, C.; Ardini, E.; Fogliatto, G.P.; Mongelli, N.; Veronesi, M. Drug Discov. Today , 2004, 9, 595602. Kenakin, T. Trends Pharmacol. Sci., 2004, 25, 186-192. Woolf, P.J.; Kenakin, T.P.; Linderman, J.J. J. Theor. Biol., 2001, 208, 403-418. Shea, L.D.; Neubig, R.R.; Linderman, J. J. Life Sci., 2000, 68, 647-658. Chen, H.C.; Lai, R.W. Pharmaceut. Res., 2003, 47, 163-173. Harwood, Jr. H.J.; Pellarin, L.D. Biochem. J., 1997, 323, 649-659. Hess, G.P. Biophys. Chem., 2003, 100, 493-506. Palmer, G.C.; Widzowski, D. Amino Acids, 2000, 19, 151-155. Perillo, M.A. Recent Res. Dev. Biophys. Chem., 2002, 2, 105-121. Agnati, L.F.; Ferre, S.; Carmen, L.; Franco, R.; Fuxe; K. Pharmacol. Rev., 2003, 55, 509-550. Sugiyama, Y.; Kato, Y. Proceedings of the International Symposium on Controlled Release of Bioactive Materials, 1996, 23rd, 99-100. Kato, Y.; Takeshi, S.; Kuwabara, T.; Sugiyama, Y. J. Control Release, 1996, 39, 191-200. Chauhan, S.S.; Liang, X.J.; Su, A.W.; Pai-Panandiker, A.; Shen, D.W.; Hanover, J.A.; Gottesman, M.M. Brit. J. Cancer, 2003, 88, 1327-1334. Oikawa, K.; Watanabe, T.; Higuchi, S. Xenobiotica, 2000, 30, 693-705. De Diesbach, P.; N’Kuli, F.; Berens, C.; Sonveaux, E.; Monsigny, M.; Roche, A.C.; Courtoy, P.J. Nucleic Acids Res., 2002, 30, 1512-1521. Janetopoulos, C.; Devreotes, P.; Methods, 2002, 27, 366-373. Frang, H.; Mukkala, V.M., Syystö, R.; Ollikka, P.; Hurskainen, P.; Scheinin, M.; Hemmilä, I. Assay Drug Dev. Technol.; 2003, 1, 275-280. Gabriel, D.; Vernier, M.; Pfeifer, M.J.; Dasen, B.; Tenaillon, L.; Bouhelal, R. Assay Drug Dev. Technol.; 2003, 1, 291-303. Reinscheid, R.K.; Kim, J.; Zeng, J.; Civelli, O. Eur. J. Pharmacol., 2003, 478, 27-34. Burchiel, S.W.; Edwards, B.S.; Kuckuck, F.W.; Lauer, F.T.; Prossnitz, E.R.; Ransom, J.T.; Sklar, L.A. Methods, 2000, 21, 221-230. Hodder, P.S.; Ruzicka, J. Anal. Chem., 1999, 71, 1160-1166. Grant, S.K.; Bansal, A.; Mitra, A.; Feigher, S.D.; Dai, G.; Kaczorowski, G.J.; Middleton, R.E. Anal. Biochem., 2001, 294, 27-35. Falconer, M.; Smith, F.; Surah-Narval, S.; Congrave, G.; Liu, Z.; Hayter, P.; Ciaramella, G.; Keighley, W.; Haddock, P.; Waldron, G.; Sewing, A. J. Biomol. Screen., 2002, 7, 460-465. Haruyama, T.; Bongsebandhu-Phubhakdi, S.; Nakamura, I.; Mottershead, D.; Keinaenen, K.; Kobatake, E.; Aizawa, M. Anal. Chem., 2003, 75, 918-921. Sinclair, J.; Pihl, J.; Olofsson, J.; Karlsson, M.; Jardemark, K.; Chiu, D.T.; Orwar, O. Anal. Chem., 2002, 74, 6133-6138. Shieh, C.C.; Trumbull, J.D.; Sarthy, J.F.; McKenna, D.G.; Parihar A.S.; Zhang, X.F.; Faltynek, C.R.; Gopalakrishnan, M. Assay Drug Dev. Technol., 2003, 1, 655-663. Ramakrishnan, A.; Sadana, A. Methods for Affinity-Based Separation of Enzymes and Proteins, 2002, pp 195-216. Myzska, D.G.; Rich, R.L. PSTT, 2000, 3, 310-317. Karlsson, R. J. Mol. Recognit., 2004, 17, 151-161. Natsume, T.; Hirota, J.; Yoshikawa, F.; Furuichi, T.; Mikoshiba, K. Biochem. Bioph. Res. Co., 1999, 260, 527-533. Tollin, G.; Salamon, Z.; Hruby, V.J. Trends Pharmacol. Sci., 2003, 24, 655-659. Stenlund, P.; Babcock, G.J.; Sodroski, J.; Myzska, D.G. Anal. Biochem., 2003, 316, 243-250. Zhou, G.; Cummings, R.; Hermes, J.; Moller, D.E. Methods, 2001, 25, 54-61.
194 Frontiers in Drug Design & Discovery, 2005, Vol. 1 [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175]
Gómez-Hens and Aguilar-Caballos
Davidsson, R.; Boketoft, A.; Bristulf, J.; Kotarsky, K.; Olde, B.; Owman, C.; Bengtsson, M.; Laurell, T.; Emnéus, J. Anal. Chem., 2004, 76, 4715-4720. Kallal, L.; Benovic, J.F.L. Trends Pharmacol. Sci., 2000, 21, 175-180. Steff, A.M.; Fortin, M.; Arguin, C.; Hugo P. Cytometry, 2001, 45, 237-243. Edwards, B.S.; Oprea, T.; Prossnitz, E.R., Sklar, L.A. Curr. Opin. Chem. Biol., 2004, 8, 392-398. Lin, X.; Gately, D.P.; Hom D.; Misako, M.; Los, G.; Howell, S.B. Int. J. Cancer, 2001, 91, 555-562. Waller, A.; Simons, P.; Prossnitz, E.R.; Edwards, B.S.; Sklar, L.A. Comb. Chem. High T. SCR., 2003, 6, 389-397. Nolan, J.P.; Lauer, S.; Prossnitz, E.R.; Sklar, L.A. Drug Discov. Today, 1999, 4, 173-180. Waller, A.; Simons, P.C.; Biggs, S.M.; Edwards, B.S.; Prossnitz, E.R.; Sklar L.A. Trends Pharmacol. Sci., 2004, 25, 663-669. Bartsch, J.W.; Tran H.D.; Waller, A.; Mammoli, A.A.; Buranda, T.; Sklar, L.A.; Edwards, B.S. Anal. Chem., 2004, 73, 3810-3817. Jeffrey, J.A.; Sharom, J.R.; Fazekas, M.; Rudd, P.; Welchner, E.; Thauvette, L.; White, P.W. Anal. Biochem., 2002, 304, 55-62. Endrenyl, L.; Csizmadia, F.; Tothfalusi, L.; Chen, M.L. Pharmaceut. Res., 1998, 5, 1292-1299. Ravin, L.J. Radebaugh, Preformulation. In: Remington’s Pharmaceutical Technology, 18 edn.; Mack Publ., PA, 1990, pp. 212,1437, 1439. Kim, H.; Fassihi, R. J. Pharm. Sci., 1997, 86, 323-328. Dehring, K.A.; Workman, H.L.; Miller, K.D.; Mandagere, A.; Poole, S.K. J. Pharm. Biomed. Anal., 2004, 36, 447-456. Thiravyam, G.; Saranjit, S. PSTT, 2000, 3, 406-416. Braddy, A.C.; Janaky, T.; Prokai, L. J. Chromatogr. A, 2002, 966, 81-87. Detroyer, A.; Vander Heyden, Y.; Cambré, I.; Massart, D.L. J. Chromatogr. A, 2002, 986, 227-238. Pehourcq, F.; Jarry, C.; Bannwarth, B. J. Pharm. Biomed. Anal., 2003, 33, 137-144. Ward, R.S.; Davies, J.; Hodges, G.; Roberts, D.W. J. Chromatogr. A, 2003, 1007, 67-75. Barbato, F.; di Martino, G.; Grumetto, L.; La Rotonda, M.I. Eur. J. Pharm. Sci., 2004, 22, 261-269. Moaddel, R.; Cloix, J.F.; Ertem, G.; Wainer, I.W. Pharmaceut. Res., 2002, 19, 104-107. Farideh, B.; Wainer, I.W. Anal. Chem., 2003, 75, 4480-4485. El-Gendy, A.M., Adejare, A. Int. J. Pharm., 2004, 280, 47-55. Gómez-Hens, A.; Fernández-Romero, J.M. Trends Anal. Chem., in press. Irvine, J.D.; Takahashi, L.; Lockhart, K.; Cheong, J.; Tolan, J.W.; Selick, H.E.; Grove, J.R. J. Pharm. Sci., 1999, 88, 28-33. Goh, L.B.; Spears, K.J.; Yao, D.; Ayrton, A.; Morgan, P.; Roland-Wolf, C.; Friedberg, T. Biochem. Pharmacol., 2002, 64, 1569-1578. Tang, F.; Horie, K.; Borchardt, R.T. Pharmaceut. Res., 2002, 19, 773-779. Ruell, J. Modern Drug Discov., 2003, 6, 28-30. Ruell, J.A.; Tsinman, O.; Avdeef, A. Chem. Pharm. Bull, 2004, 52, 561-565. Kerns, E.H.; Di, L.; Petusky, S.; Farris, M.; Rob, L.; Jupp, P. J. Pharm. Sci., 2004, 93, 1440-1453. Hanlan, L.; Sabus, C.; Carter, G.T.; Du, C.; Avdeef, A.; Tischler, M. Pharmaceut. Res., 2003, 20, 1820-1826. Saitoh, R.; Sugano, K.; Takata, N.; Tachibana, T.; Higashida, A.; Nabuchi, Y.; Aso, Y. Pharmaceut. Res., 2004, 21, 749-755. Avdeef, A.; Nielsen, P.E.; Tsinman, O. Eur. J. Pharm. Sci., 2004, 22, 365-374. Peihua, L.; Liu, R.; Sharom, F.J. Eur. J. Biochem., 2001, 268, 1687-1697. Kimura, T.; Higaki, K. Biol. Pharm. Bull., 2002, 25, 149-164. Willman, S.; Schmitt, W.; Keldenich, J.; Dressman, J.B. Pharmaceut. Res., 2003, 20, 1766-1771. Dokoumetzidis, A.; Macheras, P. Pharmaceut. Res., 1998, 15, 1262-1269. Mclaughling, D.A.; Olah, T.A.; Gilbert, J.D. J. Pharm. Biomed. Anal., 1997, 15, 1893-1901. Priyamvada, C.; Brouwer, K.L.R. Pharmaceut. Res., 2004, 21, 719-735. Bertrand, M.; Jackson, P.; Walther, B. Eur. J. Pharm. Sci., 2000, 11, S61-S72. Ansede, J.H.; Thakker, D.R. J. Pharm. Sci., 2004, 93, 239-255. Tiller, P.R.; Romanyskin, L.A. Rapid Commun. Mass Spectrom., 2002, 16, 1225-1231. Rajanikanth, M.; Madhusudanan, K.P.; Gupta, R.C. Rapid Commun. Mass Spectrom., 2003, 17, 20632070. Hop, C.E.C.A.; Wang, Z.; Chen, Q.; Kwei, G. J. Pharm. Sci., 1998, 87, 901-903. Manitpsikul, P.; White R.E. Drug Discov. Today, 2004, 9, 652-658. Cox, K.A.; Dunn-Meynell, K.A.; Korfmacher, W.A.; Brooke, L.; Nomeir A.A.; Lin C.C.; Cayon M.N. Drug Discov. Today, 1999, 4, 232-237.
The Role of Kinetics [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 195
Xinchun, T.; Ita, I.E.; Wang, J.; Pivnichny, J.V. J. Pharm. Biomed. Anal., 1999, 20, 773-784. Wu, J.T. Rapid Commun. Mass Spectrom., 2001, 15, 73-81. Korfmacher, W.A.; Veals, J.; Dunn-Meynell, K.; Zhan, X.; Tucker, G.; Cox, K.A.; Lin, C.C. Rapid Commun. Mass Spectrom., 1999, 13, 1991-1998. Tong, X.S.; Wang, J.; Zheng, S.; Pivnichny, J.V. J. Pharm. Biomed. Anal., 2004, 35, 867-877. Bryant, M.S.; Korfmacher, W.A.; Shiyong, N.; Nardo C.; Nomeir, A.A.; Lin C.C. J. Chromatogr. A, 1997, 777, 61-66. Colwell, L.F.; Tamrakopoulos, C.S.; Wang, P.R.; Pivnichny, F.V.; Shih, T.L. J. Chromatogr. B, 2002, 772, 89-98. Ohkawa, T.; Ishida, Y.; Kanaoka, E.; Takahashi, K.; Okabe, H.; Matsumoto, T.; Nakamoto, S.; Tamada, J.; Koike, M.; Yoshikawa, T. J. Pharm. Biomed. Anal., 2003, 31, 1089-1099. Raynaud, F.I.; Fischer, P.M.; Nutley, B.P.; Goddard, P.M.; Lane, D.P.; Workman, P. Mol. Cancer Ther., 2004, 3, 353-362. Beaudry, F.; Le Blanc, J.C.Y.; Coutu, M.; Brown, N.K. Rapid Commun. Mass Spectrom., 1998, 12, 1216-1222. Thompson, G.A.; Toothaker, R.D. Pharmaceut. Res., 2004, 21, 781-784. Poulin, P.; Theil, F.P. J. Pharm. Sci., 2002, 91, 1358-1370. Theil, F.P.; Guentert, T.W.; Haddad, S.; Poulin, P. Toxicol. Lett., 2003, 138, 29-49. Blaauboer, B.J. Toxicol. Lett., 2003, 138, 161-171. Balaz, S.; Lukacova V. J. Mol. Graph. Model., 2002, 20, 479-490. Poulin, P.; Theil, F.P. J. Pharm. Sci., 2000, 89, 16-35.
Frontiers in Drug Design & Discovery, 2005, 1, 197-209
197
Assessment of iDEA pkEXPRESSTM for The Prediction of Caco-2 Permeabilities Cheng-Pang (Matt) Hsu, Garry W. Caldwell*, John A. Masucci, Zhengyin Yan and David M. Ritchie Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh & McKean Roads, P.O. Box 776, Spring House, PA 19477, USA Abstract: The assessment of iDEA pkEXPRESSTM 1.1 for the in silico prediction of Caco-2 permeabilities was investigated. We found that when the software was used in a high/low classification scheme with a cutoff Caco-2 permeability value 5 nm/s, the prediction ability of the tool to evaluate large data sets was acceptable. Using a Caco-2 library of 666 compounds, approximately 78% of the predictions in a high/low classification scheme were correct. In addition, the average fold error of these correct predictions was approximately 3. After removing compounds contained in the training set (275 compounds), the prediction ability of the tool was still acceptable where overall predictions in a high/low classification scheme was 76% correct with an average fold error of approximately 3. The software could be useful for analyzing hits after high-throughput screening of large structurally diverse compound libraries. Unfortunately, the software was unable to provide a measure of confidence to the predictions. Without being able to identify poor predictions, it is difficult to evaluate small data sets using the software.
INTRODUCTION The strategy invoked by most pharmaceutical companies to identify high-risk pharmacokinetic drug candidates early in the drug discovery process involves using an array of in-vitro absorption, distribution, metabolism and excretion (ADME) assays [1 – 4]. The basic approach has been to terminate ADME deficient drug candidates in drug discovery before these compounds underwent prolonged and expensive preclinical and clinical drug development. In following this philosophy, it was anticipated that the success rates of drug candidates in preclinical and clinical phases of drug development should improve since time and resources were allocated to drug candidates with real potential. Since many ADME properties of drug candidates are interrelated to each other as well as to the drug’s pharmacodynamic (PD) properties, the usefulness of the strategy has suffered since drug discovery scientists are faced with understanding and assimilating large volumes of multivariate ADME data. While some successes have arisen from this approach, simultaneous optimization of these properties has proven to be a challenging process [5]. Typically, a compromise between ADME and PD is obtained rather than an optimal value for each property.
*Corresponding author: E-mail:
[email protected] Garry W. Caldwell / Atta-ur-Rahman / Barry A. Springer (Eds.) All rights reserved – © 2005 Bentham Science Publishers.
198 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Hsu et al.
It has been suggested that physiologically based (PB) computational models could be used to integrate these multivariate ADME data into more manageable information for assessing the overall pharmacokinetic risk profile of drug candidates and thereby improving the ADME/PD optimization process [6]. Recently, the predictions of human in-vivo oral absorption for 28 known drugs using two commercial physiologically based software packages (iDEATM and GASTROPLUSTM) were investigated [7]. The input data into the PB absorption models were combinations of experimental solubility (thermodynamic) and permeability data (human colon carcinoma cell line; Caco-2) or insilico Caco-2 permeability data generated from two-dimensional (2D) molecular representations. Assessment of the predictions was obtained primarily by comparing the fraction-absorbed prediction to the experimental value according to the following absorption classification scheme: low < 33%, medium 33-66%, and high > 66%. Using experimental solubility data and in-silico Caco-2 permeability predictions as input, the iDEATM software package assigned 20 out of 28 drugs to the correct absorption class. Thus, the overall correct classification rate was approximately 71%. When experimental Caco-2 data was used as input, the iDEATM software package assigned 22 out of 28 drugs to the correct absorption class (79% correct). These results were extremely interesting since they implied in-silico generated Caco-2 data from only twodimensional (2D) molecular structures were comparable to Caco-2 experimental data for the prediction of human absorption. However, as pointed out by the authors, since the iDEATM software package did not disclose its internal drug training set and the 28 drugs used in the evaluation have been well studied in the literature, these 28 drugs might have been part of the internal training set. If the drugs were part of the internal training set, this situation could have biased the conclusions of the study. In the present study we more fully evaluate the in-silico predictions of Caco-2 permeabilities using the iDEATM software package. The experimentally determined Caco-2 results of over six hundred compounds are used to assess the predictiveness of the iDEATM software package. A discussion is presented that examines the usefulness of this in silico method for making decisions in a drug discovery environment. METHODS Permeability, Caco-2 All experimental Caco-2 permeability data were generated at Absorption Systems (Exton, PA, USA). The Caco-2 monolayers were grown to confluence on collagencoated micro-porous polycarbonate membranes in 12-well plates [8]. The inserts (1.13 cm2) were seeded with approximately 60,000 cells per well with the majority of the cells growth occurring between 3 and 8 days. Cells grown for 21 – 30 days (passage number 50 – 60) had a transepithelial electrical resistance (TEER) between 450-650 Ω cm2. The TEER measurements were used to check for monolayer integrity. Any cell monolayer less than 200 Ω cm2 was rejected. The permeability assay buffer was Hank’s Balanced Salt Solution containing 10 mM N-(2-hydroxyethyl)piperazine-N’-(2-ethanesulfonic acid (HEPES) and 15 mM glucose at a pH of 7.0-7.2. Dosing solution concentrations of compounds were 10 µM in assay buffer. Cells were dosed on the apical side (A-to-B) and incubated at 37°C with 5% CO2 and 90% relative humidity. At two time points, 1 and 2 hours, a 200-uL aliquot was taken from the receiver chamber and replaced with fresh assay buffer with each determination being performed in duplicate. All samples
Assessment of iDEA pkEXPRESS TM
Frontiers in Drug Design & Discovery, 2005, Vol. 1 199
were assayed by standard liquid chromatography mass spectrometry (LC/MS) methods using electrospray ionization [9]. The apparent permeability (Papp) expressed in nm/s was calculated under sink conditions according to standard procedures [10]. Several controls were used to monitor the experimental data. Permeability through a cell-free (blank) membrane was studied to determine non-specific binding and free diffusion of the compound through the collagen-coated micro-porous polycarbonate membranes. Compounds with non-specific binding problems were removed from the database. Lucifer yellow flux (< 4 nm/s) was also measured for each monolayer after being subjected to the test compounds to ensure no damage was inflicted to the cell monolayers during the flux period. This test allowed the removal of false-positive data. Positive controls were used daily to monitor the day-to-day variability of the assay. For example, the measured permeability of atenolol was between 2 – 5 nm/s while propranolol was between 150 – 250 nm/s. The in-vivo absorption mechanisms of the compounds used in this study were unknown. That is, the compounds used could have an in-vivo absorption mechanism involving both passive and carrier mediated mechanisms. iDEA pkEXPRESSTM 1.1 Several companies produce software that simulates human physiology. The iDEA software (Lion Bioscience Inc., San Diego, CA) was investigated in this study. Two versions of the software were used (iDEATM 2.0 and iDEA pkEXPRESSTM 1.1) in the present evaluation. The absorption modules for each of these software packages assumed a passive absorption mechanism. Each was evaluated and found to produce the same results even though the training sets contained different numbers of compounds. Therefore, only the results from iDEA pkEXPRESSTM 1.1 will be discussed since it is the latest version on the market. Since much of the details of the algorithms used for iDEA pkEXPRESSTM 1.1 are kept proprietary, only information contained in brochures or reference manuals are available. However, the underlying principles for the software package are contained in the literature [6]. iDEA pkEXPRESSTM 1.1 contains a structure based model for the in silico prediction of Caco-2 permeabilities. It is assumed that this Caco-2 model uses a statistical pattern recognition technique trained by experimental Caco-2 permeability data from approximately 779 compounds. These Caco-2 permeability training data were produced within the Lion organization or from a consortium of pharmaceutical companies. Our company was part of this consortium and contributed 275 compounds to the training set of iDEA pkEXPRESSTM 1.1. The iDEATM Caco-2 assay protocol was similar to the assay used here except Lion’s studies were performed under stirring conditions using an orbital shaker at 50 oscillations per minute and 100 µl aliquots from the receiver side of the monolayer at 30, 50, 70 and 90 minutes post-dosing was used to calculate the apparent permeability. Since the interlaboratory variability of measuring Caco-2 permeabilities can be significant [8], a Caco-2 calibration procedure supplied in the iDEATM software package was used to compare data sets. The iDEATM manual recommended comparing Lion’s Caco-2 data, which had been measured using the iDEATM Caco-2 assay protocol, with Caco-2 values measured in other laboratories. This data set contained ten compounds including acyclovir (6.2 nm/s), amiloride (18.0 nm/s), atenolol (10.7 nm/s), foscamet
200 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Hsu et al.
(5.3 nm/s), hydrochlorothiazide (9.2 nm/s), ketoprofen (186 nm/s), bretylium (3.9 nm/s), metropolol (266 nm/s), naproxen (211 nm/s) and propranolol (287 nm/s). Our Caco-2 permeability data was within 2- to 3-fold of Lion’s data except for atenolol which had approximately a 5-fold difference. There was significantly more scatter between the data sets at the lower permeability range (i.e., below approximately 10 nm/s) as compared to data at the higher permeability range (i.e., greater than approximately 150 nm/s). The total least squares regression analysis of the log-log correlation of the data sets had a slope equal to 1.02 and an intercept of 0.12. Based on these results we felt that there was no real significant difference between the data sets. Therefore, our experimental Caco-2 data was compared unchanged to the iDEA pkEXPRESSTM 1.1 predicted values. Characteristics of Caco-2 Database Our Caco-2 database contained 666 compounds that were synthesized at Johnson & Johnson Pharmaceutical Research & Development, LLC. There were 25 distinct structural families of compounds covering five therapeutic areas. In Figs. (1-3), the molecular weight, in silico Log P and experimental Caco-2 apparent permeabilities of all 666 compounds are shown. The average molecular weight for the database was 480 Daltons with molecular weights ranging from 209 to 871 Daltons. Compounds contained C, H, N, O, S, F and Cl elements. Most of the compounds were lipophilic with an average Log P equal to 4.5. The Log P values ranged from –1.8 to 10. The distribution of the Caco-2 apparent permeabilities were as follows: 146 compounds with Papp < 5 nm/s, 125 compounds with Papp between 5 - 24 nm/s, 165 compounds with Papp between 25 - 100 nm/s and 230 compounds with Papp > 100 nm/s.
Fig. (1). The distribution of molecular weight for compounds used in this investigation.
Assessment of iDEA pkEXPRESS TM
Frontiers in Drug Design & Discovery, 2005, Vol. 1 201
Fig. (2). The distribution of Log P for compounds used in this investigation.
Fig. (3). The distribution of Caco-2 permeabilities (nm/s) for compounds used in this investigation.
202 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Hsu et al.
Method for Calculating Success Criteria Using the structure based absorption model- iDEA pkEXPRESSTM 1.1 and only 2D molecular structure descriptors of the compounds (ISIS/Draw; Elsevier MDL, CA, USA), the in silico Caco-2 permeabilities were calculated and compared to experimental results. The evaluation of the predicted Caco-2 permeability values was accomplished by calculating the average-fold error (equation 1).
Average − fold error = 10
∑
log( predicted / actual ) N
(1)
If iDEA pkEXPRESSTM 1.1 predicted the actual Caco-2 value perfectly, the averagefold error would equal one. If iDEA pkEXPRESSTM 1.1 predicted values to be 100% above or 50% below the actual value, the average-fold error would equal two and so on. It is typically recommended that prediction methods with average-fold errors <2 can be considered successful [11]. To further facilitate the assessment of the predictions, the in silico Caco-2 permeabilities were categorized into a high/low scheme. The refinement of the predictions into more complicated categorization schemes, such as high, medium, and low were deemed not prudent since only two time points were used to measure the experimental apparent Caco-2 permeabilities and the day-to-day variability in the experimental permeabilities was on the average 2 to 3 fold. Due to the sigmoidal nature of the relationship between the in-vivo fraction absorbed and the in-vitro Caco-2 Papp values, the selection of a high/low cutoff value for our data has been typically between 5 – 10 nm/s [8]. In the present study, we are using a cutoff permeability value of 5 nm/s to sort the data into a high/low classification scheme. It should be noted that there was no significant difference in the results when a cutoff value of 10 nm/s was used. RESULTS AND DISCUSSION Experimental Caco-2 Data Set The log-log correlation of the predicted and experimental Caco-2 permeability values are shown in Figs. (4 and 5). As evident in Fig. (4), the correlations between these data sets are poor (r2 = 0.30; N = 666). For visualization purposes, a unity line was drawn in Fig. (4). The adjacent lines to the unity line in Fig. (4) are 0.5 log units away from the unity line. For these types of predictions, an error in the range of + 0.5 log units is acceptable. It is noted that a significant amount of data falls within this 0.5 log unit range. Therefore, the average fold error and its distribution were calculated for all 666 compounds. The average fold error between the predicted and experimental Caco-2 permeabilities was 4.82 with errors ranging from 1 to over 1000. The distribution of the fold error was: 203 compounds (30%) with a fold error < 2, 104 compounds (16%) with a fold error between 2.0-2.9, 61 compounds (9%) with a fold error between 3.0-3.9, 32 compounds (5%) with a fold error between 4.0-5.0, and 266 compounds (40%) with a fold error >5. Based on the criteria that the average fold error should be <2 to be considered successful, iDEA pkEXPRESSTM 1.1 cannot be used to predict absolute Caco-2 permeability values. However, a large fraction of the results (60%) had an average fold error of <5. This result implied that iDEA pkEXPRESSTM 1.1 should be able to sort compounds into a high/low classification scheme.
Assessment of iDEA pkEXPRESS TM
Frontiers in Drug Design & Discovery, 2005, Vol. 1 203
Fig. (4). The log-log correlation of experimental and iDEA pkEXPRESSTM 1.1 predicted Caco-2 permeability values (nm/s). This graph contains a unity line and two adjacent lines that are 0.5 log units away from the unity line.
In Fig. (5), the log-log correlation of the predicted and experimental Caco-2 permeabilities were divided into four quadrants. A permeability cutoff value of 5 nm/s (0.5 X 10 -6 cm/sec) was used as the criteria to divide the data. The upper right quadrant represented a true positive result (TP, or high). In this quadrant the experimental Caco-2 permeability was greater than 5 nm/s and the predicted Caco-2 permeability was also greater than 5 nm/s. The lower left quadrant represented a true negative result (TN, or low). In this quadrant the experimental Caco-2 permeability was less than 5 nm/s and the predicted Caco-2 permeability was also less than 5 nm/s. The lower right quadrant represented a false negative result (FN). In this quadrant the experimental Caco-2 permeability was greater than 5 nm/s, however, the predicted Caco-2 permeability was less than 5 nm/s. Finally, the upper left quadrant represented a false positive result (FP). In this quadrant the experimental Caco-2 permeability was less than 5 nm/s, however, the predicted Caco-2 permeability was greater than 5 nm/s. The distribution and percentage of correct compounds classified based on the 5 nm/s cutoff value are summarized in Table (1). The experimental data set contained 666 compounds with 520 compounds classified > 5 nm/s and 146 compounds classified < 5 nm/s. The iDEA pkEXPRESSTM 1.1 software package predicted 491 compounds (FP
204 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Hsu et al.
Fig. (5). The log-log correlation of experimental and iDEA pkEXPRESSTM 1.1 predicted Caco-2 permeability values (nm/s). This graph is divided into four quadrants. The acronyms are TP (true positives), FN (false negatives), TN (true negatives) and FP (false positives). Table 1.
Predictions of Caco-2 Permeabilities Using iDEA pkEXPRESSTM 1.1
Method
Total Number of Compounds
True Positives (High)
Experimental
666
520
iDEA
666
433
% Correct
83%
Average fold error
3
False Negatives
True Negatives (Low)
False Positives
146 87
88
58
60% 31
3
19
plus TP) with Caco-2 values > 5 nm/s and 175 compounds (TN plus FN) classified as low < 5 nm/s. This percent distribution was similar to the experimental results. The fraction of these 491 predicted compounds that were true positives (TP or high) was 433 compounds. In other words, 83% of the compounds predicted to be > 5 nm/s were correct. The fraction of the 175 predicted compounds that were true negatives (TN or low) was 88 compounds or 60% correct. These results are very good considering that
Assessment of iDEA pkEXPRESS TM
Frontiers in Drug Design & Discovery, 2005, Vol. 1 205
only 2D ISIS/Draw structural representations of the compounds were used as input into the model. In addition a false negative percentage of only 17% (87 out of 520 compounds) was a significant advantage of the model. In this type of classification scheme, a false negative designation would imply that a potentially reasonable compound was eliminated and would not be considered for further analysis. Thus, a low percentage false negative designation was necessary for these types of predictions. On the other hand, a false positive designation of 40% (58 out of 146 compounds) would imply that a compound with potentially poor absorption was retained. False positive designations are less of a problem since they could be eliminated later using additional preclinical assays. The overall percent correct was 78% (i.e., high plus low (521) out of 666 compounds). It was also noted that when the software package predicted the correct classification category (TP or TN) for the compound, the Caco-2 permeability values had an average fold error of approximately 3 with an error range from 1 to 30. When the predicted Caco-2 permeability was in the wrong classification category the average fold error was significantly larger (Table 1). These results were consistent with having a reasonable permeability cutoff value for the classification scheme. Removal of the Training Set from the Experimental Set Since some of our experimental data was used to train the iDEA pkEXPRESSTM 1.1 software, these data were removed to further examine the predictability of the software package. The log-log correlation of the predicted and experimental Caco-2 permeability values for this data set is shown in Fig. (6). As evident in Fig. (6), the correlations between these data sets are again poor (r2 = 0.25; N = 391). The average fold error and its distribution were calculated for all 391 compounds. The average fold error between the predicted and experimental Caco-2 permeabilities was 5.24 and the error ranged from 1 to over 900 fold. The distribution of the fold error was: 113 compounds (29%) with a fold error < 2, 59 compounds (15%) with a fold error between 2.0-2.9, 39 compounds (10%) with a fold error between 3.0-3.9, 20 compounds (5%) with a fold error between 4.0-5.0 and 160 compounds (41%) with a fold error >5. These results were similar to the full data set of 666 compounds. The distribution and percentage of correct compounds classified based on the 5 nm/s cutoff value are summarized in Table (2). A total of 275 compounds were removed from the original data set. Therefore, the experimental data contained a total of 391 compounds with 335 compounds > 5 nm/s (high) and 56 compounds classified < 5 nm/s (low). The fraction of these predicted results that were true positives (high) was 273 compounds (or 81% correct). The fraction of these predicted results that were true negatives (low) was 25 compounds (or 45%). A false negative percentage of 19% (62 out of 335 compounds) and a false positive percentage of 55% (31 out of 56 compounds) were observed. The overall percent correct was 76% (i.e., high plus low (298) out of 391). When the software package predicted the correct classification of the compound the Caco-2 permeability values had an average fold error of approximately 3. These results were again similar to the full data set. The log-log correlation of the predicted and experimental Caco-2 permeabilities values that were contained in the training set is shown in Fig. (7). As evident in Fig. (7),
206 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Hsu et al.
Fig. (6). The log-log correlation of experimental and iDEA pkEXPRESSTM 1.1 predicted Caco-2 permeability values (nm/s). The experimental data does not contain compounds used in the iDEA pkEXPRESSTM 1.1 training set. Table 2.
Predictions of Caco-2 Permeabilities Using iDEA pkEXPRESSTM 1.1 (Compounds not Contained in the Training Set)
Method
Total Number of Compounds
True Positives (High)
Experimental
391
335
iDEA
391
273
Correct
81%
Average fold error
3
False Negatives
True Negatives (Low)
False Positives
56 62
25
31
45% 33
3
20
the correlations between these data sets are again poor (r2 = 0.36; N = 275). The average fold error and its distribution were calculated for all 275 compounds. The average fold
Assessment of iDEA pkEXPRESS TM
Frontiers in Drug Design & Discovery, 2005, Vol. 1 207
error between the predicted and experimental Caco-2 permeabilities was 4.28 and the error ranging from 1 to over 600. The distribution and percentage of correct compounds classified based on the 5 nm/s cutoff value are summarized in Table (3). The experimental data contained a total of 275 compounds with 185 compounds > 5 nm/s (high) and 90 compounds classified < 5 nm/s (low). The fraction of these predicted results that were true positives (high) was 160 compounds (or 86% correct). The fraction of these predicted results that were true negatives (low) was 63 compounds (or 63%). A false negative percentage of 14% (25 out of 185 compounds) and a false positive percentage of 30% (27 out of 90 compounds) were observed. The overall percent correct was 81% (i.e., high plus low (223) out of 275) with an average fold error of approximately 3. These results were somewhat better when compared to the full 666 compounds data set and the 391 compounds data set. These results demonstrate the completeness of the iDEA pkEXPRESSTM 1.1 training set.
Fig. (7). The log-log correlation of experimental and iDEA pkEXPRESSTM 1.1 predicted Caco-2 permeability values (nm/s). The experimental data contains only compounds used in the iDEA pkEXPRESSTM 1.1 training set.
Use of in Silico Caco-2 Models in Drug Discovery The iDEA pkEXPRESSTM 1.1 software package has several significant advantages in a drug discovery environment, however, it cannot be used to predict absolute Caco-2
208 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Hsu et al.
permeability values since it is not capable of flagging unreasonable predictions. The average fold error between the predicted and experimental Caco-2 permeabilities was approximately 5 with the error ranging over two orders of magnitude. The software contains a parameter called the measurement of uncertainty (MOU) that provides a measurement of confidence to the prediction. The MOU denotes good predictions as high or medium and poor predictions as low. The MOU was high for 661 out of 666 compounds with 4 compounds being medium and 1 compound being low. Based on our comparison between experimental and predicted data, we would have expected to see many more low confidence MOU predictions. We estimated that over 200 compounds out of 666 compounds had extremely poor Caco-2 predictions. Since this MOU method failed to identify poor predictions, we conclude that it is unreliable to sort the data. Furthermore, it must be concluded that the use of iDEA pkEXPRESSTM 1.1 for the in silico predictions of Caco-2 permeabilities is extremely risky for small data sets. Table 3.
Predictions of Caco-2 Permeabilities Using iDEA pkEXPRESSTM 1.1 (Compounds Contained in the Training Set)
Method
Total Number of Compounds
True Positives (High)
Experimental
275
185
iDEA
275
160
Correct
86%
Average fold error
3
False Negatives
True Negatives (Low)
False Positives
90 25
63
27
70% 16
3
29
CONCLUSION The results, from this study, have suggested that when iDEA pkEXPRESSTM 1.1 is used in the evaluation of larger test sets (>600 compounds) approximately 75% of the predictions in a high/low classification scheme are correct. In addition, the average fold error is approximately 3 or 4 with the error ranging over only one order of magnitude. The software could be useful in analyzing hits after high-throughput screening of large structurally diverse compound libraries. Therefore used in this regard, the in silico Caco2 model contained in the iDEA pkEXPRESSTM 1.1 software package could provide guidance in the selection of compounds with potentially good absorption. At the right stage in a drug discovery process, the ability to correctly categorize compounds into a high /low scheme can be as valuable as the absolute answer. There is a need within drug discovery for in silico screens to characterize the ADME characteristics of early drug candidates. The dream of many drug discovery scientists is to generate virtual compound libraries by using predicted ADME software to designed drug candidates that are potentially well behaved in humans. While the in silico Caco-2 model contained in the iDEA pkEXPRESSTM 1.1 physiologically based software package has made considerable advances, the accurate predictions (<2 fold error) of data and the flagging of unreasonable predictions is still problematic.
Assessment of iDEA pkEXPRESS TM
Frontiers in Drug Design & Discovery, 2005, Vol. 1 209
ABBREVATIONS ADME
=
Absorption, distribution, metabolism, and excretion
PD
=
Pharmacodynamic
PB
=
Physiologically based
Caco-2
=
Human colon carcinoma cell line
2D
=
Two-dimensional
TEER
=
Transepithelial electrical resistance
LC/MS
=
Liquid chromatography mass spectrometry
Papp
=
Apparent permeability
TP
=
True positive result
TN
=
True negative result
FP
=
False positive result
FN
=
False negative result
MOU
=
Measurement of uncertainity
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]
Humphrey, M.J. Drug Metab. Rev. 1996, 28(3), 473-489 Caldwell, G. W. [Published erratum appears in Cur. Opin. Drug Dis. Dev . 2000, 3(2), 250]. Cur. Opin. Drug Dis. Dev.; 2000, 3(1), 30-41. Caldwell, G. W.; Ritchie, D. M.; Masucci, J. A.; Hageman, W.; Yan, Z. Cur. Top. Med. Chem. 2001, 1(5), 353-366. Caldwell, G. W.; Yan, Z.; Masucci, J. A.; Hageman, W.; Leo, G.; Ritchie, D. M. Pharm. Dev. Regul . 2003, 1(2), 117-132. Smith, D.; Schmid, E.; Jones, B. Clin. Pharm. 2002, 41(13), 1005-1019 Grass, G. M; Sinko, P. J. Adv. Drug Del. Rev. 2002, 54, 433-451. Parrott, N.; Lave’, T. Eur. J. Pharm. Sci. 2002, 17, 51-61. Hidalgo, I. J. Cur. Topic Med. Chem. 2001, 1(5), 385-401. Caldwell, G. W.; Easlick, S. M.; Gunnet, J.; Masucci, J. A.; Demarest, K. J. Mass Spectrum. 1998, 33, 607-614. Stenberg, P.; Norinder, U.; Luthman, K.; Artursson, P. J. Med. Chem. 2001, 44(12), 1927-1937 Obach, R. S.; Baxter, J. G.; Liston, T. E.; Silber, B. M.; Jones, B. C.; Macintyre, F.; Rance, D. J.; Wastall, P. J. Pharmacol. Exp. Ther. 1997, 283(1), 46-58.
Frontiers in Drug Design & Discovery, 2005, 1, 211-229
211
Exploring the Viability of Metabonomic Urinalysis as a Toxicity Screen Within a Pharmaceutical Drug Discovery Division Gregory C. Leo*, Gary W. Caldwell, William Hageman, Becky Hastings, Beata Starosciak, Kristin Snyder, Jaclyn Scowcroft, Aaron Krikava Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh and McKean Roads, P.O. Box 776, Springhouse, PA 19477-0776, USA Abstract: This study investigated the applicability of metabonomic urinalysis as a toxicological screen in a drug discovery setting as a means to reduce the attrition of new chemical entities in pre-clinical and clinical development. The model hepatotoxins, α-naphthylisothiocyanate (ANIT) and thioacetamide (TA) were dosed in rats to validate our methodologies. The toxins generated significant deviations from the normal metabolic profile of the rat urine, and principal component analysis (PCA) readily showed a separation of controls from animals given acute doses. Spectral changes observed upon ANIT dosing reflected previous literature results showing a large decrease in tricarboxylic acid cycle intermediates and the appearance of bile in the urine. The spectral changes observed for TA reflected glycosuria consistent with known toxicological observations. A test set of drug candidates that failed in preclinical testing, because of target organ toxicity, was then submitted to metabonomic urinalysis. Metabonomic urinalysis failed to classify 3 of the 5 compounds as having toxicological problems. This deficiency can be attributed to the acute dosing protocol and possibly the need to fine tune data collection times. Metabonomic urinalysis as a toxicological screening tool in our environment, a pharmaceutical drug discovery department, appears not to be useful due to a high rate of false positive results and the large amount of material required for a single acute dose in rats.
INTRODUCTION Pharmaceutical companies strive to discover and to develop drugs that are safe and efficacious, but the process is scientifically complex and financially risky [1, 2]. Much financial cost is incurred by high attrition of new compound entities in pre-clinical and clinical development. The reasons for failure of these compounds generally fall into one or more of the following categories: poor efficacy, safety deficiencies or economic reasons. Our group has been working on a new pre-clinical paradigm for drug development based on the premise that efficacy and safety issues reveal themselves in pharmacokinetics, toxicokinetics, drug-drug interactions and drug metabolism studies
*Corresponding author: E-mail:
[email protected] Garry W. Caldwell / Atta-ur-Rahman / Barry A. Springer (Eds.) All rights reserved – © 2005 Bentham Science Publishers.
212 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Leo et al.
[3]. Detecting defects in these areas at an early stage of drug discovery would be highly valuable. It is in this regard, that metabonomics was investigated to follow changes in the endogenous pattern of metabolites found in rat urine upon dosing rats with drug discovery compounds. Metabonomics is the marriage of NMR (nuclear magnetic resonance) methods with pattern recognition tools as promulgated by the team of investigators at Imperial College [4]. This report applies metabonomics early in the drug discovery process as a filter for highlighting those compounds that alter normal rat metabolism, thus, implying toxicological problems. There are previous reports that use metabonomics for the toxicity screening [5, 6] but the emphasis was more highly placed on known toxins and not small molecule drug candidates. The methodology for NMR data collection and data reduction by pattern recognition tools has been reported in detail and reviewed in the literature and will not be enumerated here [4, 7, 8]. Metabonomics has demonstrated utility with respect to toxins and this paper explores its limitations in a drug discovery setting. The much studied toxins, ANIT and TA were used to validate methodology and then the procedures were applied to several typical drug candidates. Five different drug candidates that had failed because of unfavorable toxic side effects were chosen plus two marketed compounds. It was felt necessary to do a dose response because these compounds would represent compounds coming through the drug discovery process and would not be as well characterized, as were the textbook examples of ANIT and TA. The dose range would be based on knowledge and experience gained from previous biological assays. This was important given the wide variability in activities. At the onset it became apparent that the quantity of material needed for studies involving single doses required a prohibitive amount of material, approximately one gram. Depending of the protocol chosen, chronic dosing could require 10-50 grams of material and was not considered in this report. This paper describes the application of metabonomics for the study of rat urine collected from rats challenged by compounds found in the literature and compounds generated internally. Two model toxins (ANIT and TA) were chosen as benchmarks to establish the methodology. Two marketed drugs (amiodarone and rosiglitazone) were used as representatives of typical pharmaceutical drugs. Amiodarone is known to cause phosphlipidosis upon chronic dosing [9]. Rosiglitazone is not one of the thiazolidinediones to cause hepatotoxicity and adverse effects were not anticipated with the doses administered [10]. The remaining five compounds were in-house compounds chosen as representative drug candidates but were rejected for development because of adverse toxicity in preclinical trials. Our results obtained from the ANIT and TA studies gave comparable spectra to those from other laboratories. Using a test group, two marketed compounds and five drug candidates, we show some limitations of metabonomics in the drug discovery environment when using an acute dosing protocol. MATERIALS AND METHODS All the procedures involving animals were consistent with the guidelines (Guide for the Care and Use of Laboratory Animals) published by the National Research Council August, 2001 and approved by the animal ethics committee (IACUC, Institutional Animal Care and Use Committee) at Johnson & Johnson Pharmaceutical Research & Development, LLC.
Exploring the Viability of Metabonomic
Frontiers in Drug Design & Discovery, 2005, Vol. 1 213
Animal Handling and Urine Collection Twenty male Sprague-Dawley rats (Charles River, Wilmington, MA) were used for each study. Their tails were numbered with a permanent marker pen so that the urine spectra could be assigned to a given rat. They were group housed (5 rats/cage) under controlled conditions (20 - 22 °C; 50 – 65% relative humidity and 12 h (hour) light cycle) with free access to food, certified diet 5001 (LabDiet, Purina Mills, St. Louis, MO), and water during the times when urine was not being collected. To collect urine the rats were transferred to metabolism cages (Lab Products) for a period of seven hours in which the urine was collected at room temperature after 4 h and/or after 7 h. Graduated, 50 ml (milliliter) plastic Corning tubes were used to collect the urine and the volumes were recorded. The urine collection was done during the rats’ light (resting) cycle and no food was provided during this time period. Eliminating food at this time removed problems with food contamination of the urine. Bacterial growth was inhibited by adding 1.0 ml of a 1% sodium azide solution or 0.5 ml of 2% sodium azide to the collection tubes and by freezing the urine after collection. Metabolism cages were rinsed daily during the study. ANIT Dosing and Urine Collection Time Points Four groups of four rats were given 25, 50, 100, or 150 mg/kg (milligram/kilogram) doses of ANIT dissolved in corn oil by gavage. An additional group of four rats was only given vehicle. Urine was collected from the twenty rats prior to dosing for a six hour period and at the following time points after dosing: 0-7 h, 24-28 h, 28-31 h, 48-52 h, 52-55 h, 72-76 h and 240-244 h. The age of the rats at the initiation of this study was 72-77 days and the weights ranged from 310 – 371 grams. Thioacetamide Dosing and Urine Collection Time Points Three groups of five rats were dosed with 50, 100 or 200 mg/kg of TA dissolved in methylcellulose by gavage. A separate group of five rats was given only methylcellulose. Urine was collected from the twenty rats prior to dosing for a six hour period and at the following time points after dosing: 0-7 h, 24-30.5 h, 48-55 h, 72-79 h and 95-101 h. The age of the rats at the initiation of this study was 66-71 days and the weights ranged from 275 - 375 grams. Amiodarone Dosing and Urine Collection Time Points Three groups of five rats were dosed with 100, 300 or 600 mg/kg of amiodarone dissolved in methylcellulose by gavage. A separate group of five rats was given only methylcellulose. Urine was collected from the twenty rats prior to dosing for a six hour period and at the following time points after dosing: 0-7 h, 24-31 h, 48-55 h, 72-79 h and 96-102 h. The age of the rats at the initiation of this study was 72-77 days and the weights ranged from 306 – 369 grams. Rosiglitazone Dosing and Urine Collection Time Points Three groups of five rats were dosed with 30, 100 or 300 mg/kg of rosiglitazone dissolved in methylcellulose by gavage. A separate group of five rats was given only methylcellulose. Urine was collected from the twenty rats prior to dosing for a six hour period and at the following time points after dosing: 0-7 h, 24-31 h, 48-55 h, 72-79 h and
214 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Leo et al.
96-102 h. The age of the rats at the initiation of this study was 60-65 days and the weights ranged from 253 – 318 grams. Compound 1 Dosing and Urine Collection Time Points Three groups of five rats were dosed with 50, 100 or 150 mg/kg of compound 1 dissolved in methylcellulose by gavage. A separate group of five rats was given only methylcellulose. Urine was collected from the twenty rats prior to dosing for a six hour period and at the following time points after dosing: 0-7 h, 24-31 h, 48-55 h, 72-79 h and 96-102 h. The age of the rats at the initiation of this study was 72-77 days and the weights ranged from 305 – 360 grams. Compound 2 Dosing and Urine Collection Time Points Three groups of five rats were dosed with 10, 30 or 100 mg/kg of compound 2 dissolved in sterile water by gavage. A separate group of five rats was given only sterile water. Urine was collected from the twenty rats prior to dosing for a six hour period and at the following time points after dosing: 0-7 h, 24-31 h, 48-55 h, 72-79 h and 96-102 h. The age of the rats at the initiation of this study was 72-77 days and the weights ranged from 314 – 383 grams. Compound 3 Dosing and Urine Collection Time Points Three groups of five rats were dosed with 30, 150 or 300 mg/kg of compound 3 dissolved in methylcellulose by gavage. A separate group of five rats was given only methylcellulose. Urine was collected from the twenty rats prior to dosing for a six hour period and at the following time points after dosing: 0-7 h, 24-31 h, 48-55 h, 72-79 h and 96-102 h. The age of the rats at the initiation of this study was 72-77 days and the weights ranged from 290 – 355 grams. Compound 4 Dosing and Urine Collection Time Points Three groups of five rats were dosed with 50, 100 or 300 mg/kg of compound 4 dissolved in sterile water (adjusted pH 8-10) by gavage. A separate group of five rats was given only sterile water. Urine was collected from the twenty rats prior to dosing for a six hour period and at the following time points after dosing: 0-7 h, 24-31 h, 48-55 h, 72-79 h and 96-102 h. The age of the rats at the initiation of this study was 72-77 days and the weights ranged from 309 –380 grams. Compound 5 Dosing and Urine Collection Time Points Three groups of five rats were dosed with 3, 30 or 300 mg/kg of compound 5 dissolved in methylcellulose by gavage. A separate group of five rats was given only methylcellulose. Urine was collected from the twenty rats prior to dosing for a six hour period and at the following time points after dosing: 0-7 h, 24-31 h, 48-55 h, 72-79 h and 96-102 h. The age of the rats at the initiation of this study was 72-77 days and the weights ranged from 305 – 369 grams. Proton NMR Analysis The preparation of the urine for NMR analysis was based upon a protocol reported in the literature [11]. To minimize chemical shift variation due to differences in the urine pH, 300 µl of 0.2 M (Molar), pH 7.4 phosphate buffer was added to 600 µl (microliter)
Exploring the Viability of Metabonomic
Frontiers in Drug Design & Discovery, 2005, Vol. 1 215
aliquots of the rat urine. After 10 minutes the samples were centrifuged at 10,000 rpm (revolutions per minute) for 10 minutes and 800 µl of the supernatant was removed and placed in 1.5 ml sample vials. Before capping the vials with septum caps 50 µl of a 1 mg/ml solution of TSP (3-trimethylsilylpropionic-(2,2,3,3-d4)-acid) in D2O (deuterium oxide) was added to each sample. The NMR data were acquired on a Bruker DMX-400 spectrometer (400.03 MHz (megaHertz) 1H frequency) interfaced with Bruker’s BEST flow injection accessory (i.e., a Gilson 215 liquids sample handler). The 5 mm (millimeter) broadband flow probe used had a single cell inverse configuration. The injection volume was 600 ul. Sample information was input into Bruker’s ICONNMR automation software using a spreadsheet. The pulse sequence used was a 1D-NOESY (one dimensional nuclear Overhauser effect spectroscopy) using water pre-saturation (64 Hz field strength) during the re-cycle delay (3 s (seconds)) and the mixing delay (0.08 s) [12]. The acquisition time was 2 seconds using 32K complex points for a 20 ppm sweep width. The number of scans collected for each urine sample was 256. The data were processed using 0.3 Hz exponential line broadening. Residual water signal was eliminated using the method of Marion, Ikura and Bax [13]. Spectra were manually rephased and then baseline corrected before reducing the data using the AMIX software by Bruker. Data reduction involved dividing the spectra into 255 segments, each 0.04 ppm wide for the spectral window ranging from –0.2 to 10.0 ppm. The region from 4.5 to 6.2 ppm was excluded from the data analysis. The sum of the intensities for each segment was used to define each variable and each variable was scaled to the total intensity. Endogenous metabolites resonance assignments, when made, were based upon published literature chemical shift values [14]. Parent drug and/or related metabolites in the urine were identified by comparison with the spectrum of the parent drug; isolation was not carried out for further verification. Chemometric Analysis The reduced data described above were used for principal component analysis (PCA) and soft independent modeling of class analogy (SIMCA) using Pirouette software, version 3.10 by Infometrix, Inc. (Woodinville, WA). These methods use multivariate statistics as a means to visualize changes that occur in large quantities of data. These methods are well documented and have been applied in the areas of analytical science and specifically, metabonomics [15]. The computations were carried out on a desktop personal computer. For the PCA the preprocessing option chosen was mean-center with no rotations or transforms performed. SIMCA was performed using the pre-dose urine spectra as the model group. The SIMCA model group was limited to the rats used for a given study. The number of spectra used to define the SIMCA model ranged from 15 to 20. If n was less than 20 it was usually because of instrument problems (poor water suppression). The vehicle spectra monitored if there was a shift in the urine metabolite pattern due to circumstances not attributable to the drug substance. As seen in the results sections, this was observed to happen in some of the studies where the vehicle data did not fall within the region defined by the model. RESULTS Initial efforts to do these metabonomic studies indicated that the urine collected had to be free of contamination. Following published protocols there were problems with food contamination that were resolved by removing the food during urine collection
216 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Leo et al.
periods as noted above in the methods section. Food contamination resulted in many peaks in the sugar region of the proton spectrum and lactate and acetate. An interesting observation was the variation of the metabolite patterns in the control urine. This variation often involved citrate, taurine, and 2-oxoglutarate in agreement with previous observations [16]. These variations could not be correlated with events in the lab, such as, lab cleaning. If the variation occurred, it usually was a group related event and would be seen in most of the rats used for the study (e.g., 18 of 20). A vehicle control group was always used and in this way if a group of rats should be startled or upset during the course of an experiment, drug effects could be separated from environmental effects. Large variation in the gut microflora, as noted by the loss of hippurate [17, 18], was not usually seen in the control spectra. As a benchmark, rats were dosed with ANIT at one sub-acute and three acute doses. Proton spectra following the time dependence of the urinary metabolite changes for one of the rats given the 200 mg/kg ANIT dose are shown in Fig. (1).
Fig. (1). Proton NMR spectra of rat urine from a rat dosed with 150 mg/kg of ANIT. Changes in the urine metabolite profile are presented as a function of time. Spectrum a is the pre-dose urine spectrum. Spectra b through f are the urine spectra 0-7 hr., 24-28 hr., 48-52 hr., 72-76 hr. and 240244 hr. after dosing, respectively. The spectra are scaled to the reference peak from TSP at 0.0 ppm.
The appearance and disappearance of peaks follows the original metabonomics study of ANIT. Some of the most notable spectral changes are the loss of citrate, 2oxoglutarate and succinate (resonances in the 3.1 to 2.4 ppm region) throughout the major portion of the study and the appearance of bile acids (1.9 to 0.7 ppm) at the point
Exploring the Viability of Metabonomic
Frontiers in Drug Design & Discovery, 2005, Vol. 1 217
of maximum ANIT induced liver damage 24 hours after dosing. There were no resonances observed in the aromatic region of the proton spectra at 400 MHz that could be attributed to ANIT or to α-aminonaphthalene, its metabolite. The PCA trajectories of the dose responses are shown in Fig. (2) (top).
Fig. (2). The PCA plot (top) of the metabolic trajectory calculated from the 1H NMR spectra of urine for rats given different doses of ANIT. The bottom chart displays the same data using SIMCA to create a model based on the predose urine spectra. Both graphs use the same coding: open boxes for pre-dose samples, open circles for the 25 mg/kg samples, plus signs for the 50 mg/kg samples, x’s for the 100 mg/kg samples, filled triangles for the 150 mg/kg samples and filled boxes for the vehicle (corn oil) treated rats.
The spectra from the rats given the sub-acute dose, 25 mg/kg (open circles), showed all the data points remaining very close to the principal component space of the pre-dose
218 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Leo et al.
spectra (open boxes). The pre-dose points are centered about 0 along factor 2 and between 0.01 and 0.03 along factor 1. At 50 mg/kg (plus signs) marked changes are seen for two of the rats after 24 and 28 hours. These are the four plus signs in the upper left branch of the data in the PCA chart. All the rats in this group showed a drop in the tricarboxylic acid cycle intermediates and the two rats most affected also had bile acids in their urine spectra. After two days the 50 mg/kg, rat urine profiles had returned mostly back to the pre-dose region. Trajectories from the groups dosed with 100 (x’s) and 150 mg/kg (filled triangles) ANIT exhibited points in the upper left region of the chart due to the appearance of bile in the urine and the absence of citric acid cycle intermediates. After 48 hours, the two high dose groups exhibited points in another region of the principal component space (the region less than zero along factor 2 and between –0.02 and –0.04 along factor 1). This reflected the partial recovery from ANIT toxicity at higher doses and the corresponding spectra were distinguished by the continued absence of citric acid cycle intermediates, an increase in creatine and the disappearance of the bile acids peaks. The loss of citric cycle acid metabolites was suggestive of damage to the acinar zone 1 of the liver that has been ascribed as predominantly responsible for citric acid cycle metabolism but no further experiments were performed to substantiate this. In the bottom part of Fig. (2) is the same data analyzed using SIMCA. The SIMCA model is generated from the predose urine spectra of 20 rats and represents normal rat metabolism. Those data used to generate the SIMCA model are in the lower left corner (open boxes). The horizontal line cutting through the data is the boundary line for samples that fit the model criteria (below the line) and those that do not fit the model criteria. The line represents the 95% confidence limit. For those samples that do not fit the model, the metabolism is viewed as being perturbed from its normal state since specific toxicity models were not generated. This approach highlights compounds that alter normal metabolism as potential problems. An advantage of this format is that the data are shown as a dose response. A dose response was observed and after ten days most of the rats were completely recovered. It is not clear why the vehicle data in the SIMCA plot are at the edge of the model limits. It may be a stress related phenomenon or with time it may be related to aging [16]. The other well-characterized toxin used was TA, which is known to damage specific regions of the liver and the kidney. Fig. (3) shows the rat urine spectra 55 hours after dosing with 200 mg/kg. This result is very consistent with previous published data illustrating the large metabolic disruption induced by TA, in particular glycosuria . The large increase in the urine sugar concentration was easily recognized by the presence of the α- and β- anomeric protons of glucose (5.3 and 4.7 ppm, respectively) in an uncluttered region proton spectrum. These are the large resonances on each side of 5 ppm. A less pronounced elevation of urinary amino acids was manifested by the increased intensity of amino acid methyl resonances (0.8 –1.5 ppm) as well as other signals. Our urine collection was terminated before full recovery of the rats was observed for those rats given the higher doses. From the SIMCA analysis a dose response was seen (Fig. 4) and the two highest doses elicited a similar response suggesting saturation. The offset in the data points for the dosed animals during the 0-7 hour period was in large part due to the presence of an intense methyl resonance from acetone. The change in the urine metabolite profile induced by the toxin builds to a maximum at the second and third day of the study. The rats given the lowest dose were not as affected and their associated points in the SIMCA plot are equivalent to vehicle by
Exploring the Viability of Metabonomic
Frontiers in Drug Design & Discovery, 2005, Vol. 1 219
Fig. (3). Proton NMR spectra of rat urine from a rat dosed with 200 mg/kg of thioacetamide. The urine was collected for the period 48 – 55 hours after the rat was dosed.
Fig. (4). SIMCA display generated from the proton NMR spectra of rat urine from rats given thioacetamide. The open boxes are pre-dose urine samples and the open circles, plus signs and filled triangles are for the rats dosed with 50, 100 or 200 mg/kg of TA, respectively. The filled boxes are from rats given vehicle (methylcellulose).
220 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Leo et al.
day three. The vehicle dosed animals at the 24-31 hour time period fell outside the model generated using the pre-dose urine. Vehicle data points, which did not fall within the model limits, were also observed in other studies. It possibly reflects stress felt by the vehicle treated animals with respect to the other animals or the group as a whole was affected by an event unrelated to the dosing. The list of test compounds that were tested reflected a typical range of compounds in a drug discovery environment: amiodarone, rosiglitazone, an α-2 adrenergic based analgesic (compound 1), a Ca receptor agonist (compound 2), a p-38 inhibitor (compound 3), an anti-epileptic (compound 4), and a vasopressin inhibitor (compound 5). Rosiglitazone and amiodarone are marketed drugs. Amiodarone, an antiarrythmia drug was found to induce hepatic phospholipidosis upon chronic administration [9, 19]. Rosiglitazone, an antidiabetic drug is a member of the thiazolidinedione class that does not express the hepatotoxicity as for the related troglitazone [10]. The last five compounds were in-house drug discovery candidates that were brought forward to development but were discontinued because of toxicological concerns. Compounds 1, 2 and 4 caused convulsions and compounds 3 and 5 generated liver toxicity (neutrophilic infiltrations – 3 and hepatobilary toxicity – 5). The amiodarone data is presented in Fig. (5) using SIMCA to predict outliers. The first time point, 0-7 hours, showed the data for the dosed as well as the vehicles falling outside the model limit. Inspection of the urine spectra from the vehicle treated rats showed that citrate and 2-oxoglutarate peak intensities dropped. This was not a result of the vehicle used, methylcellulose [20], but it was viewed as a change in the rats’
Fig. (5). SIMCA display generated from the proton NMR spectra of rat urine from rats given amiodarone. The open boxes are pre-dose urine samples and the open circles, plus signs and filled triangles are from rats dosed with 100, 300 or 600 mg/kg of amiodarone, respectively. The filled boxes are from rats given vehicle (methylcellulose).
Exploring the Viability of Metabonomic
Frontiers in Drug Design & Discovery, 2005, Vol. 1 221
metabolism that occurred during that collection period unrelated to dosing. This sensitivity of the metabonomics method reflected the need to have a control group alongside the dosed group. The offset for the dosed animals reflects similar drops in citrate and 2-oxoglutarate but also included an increase in phenylacetylglycine and trimethylamine-N-oxide. The high dose point at 0.0015 (along the vertical axis) in the 0-7 hour time period was low relative to the other high doses because it did not contain an increase in phenylacetylglycine and all its metabolite concentrations appeared to be reduced with citrate and 2-oxoglutarate more than the others. The two and three day time periods for the rats given the highest dose moved away from the model suggesting a delayed onset of a toxicological response. The low dose point, marked by a 1, at the 96-103 hour time period was an outlier having a spectrum with several intensity differences relative to control spectra. Fig. ( 6) shows the SIMCA test set versus the dosed and vehicle treated rats for the rosiglitazone study. The urine samples used for generating the model had, in general, lower levels of citrate and elevated levels of phenylacetylglycine relative to the urine collected after dosing. The urine volume for the predose rats was about 2-3 times the volume that is normally collected (5-7 ml). After the dosing commenced, urine volumes returned to volumes normally observed. The urine from the dosed animals and the vehicle treated animals are not distinct from one another relative to the SIMCA model generated from the predose urine samples. No dose dependence is observed. If a SIMCA
Fig. (6). SIMCA display generated from the proton NMR spectra of rat urine from rats given rosiglitazone. The open boxes are for pre-dose urine samples and the open circles, plus signs and filled triangles are from rats dosed with 30, 100 or 300 mg/kg of rosiglitazone, respectively. The filled boxes are from rats given vehicle (methylcellulose).
222 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Leo et al.
model is generated using the vehicle samples instead of the predose samples then there is some separation between dosed and vehicle data. This difference straddles the model cutoff line and again there is no dose dependent response observed. It was not expected for a single acute dose of rosiglitazone or amiodarone that a toxic situation would develop. Amiodarone requires chronic dosing for toxicological problems to develop although at the very high dose of 600 mg/kg a perturbation away from normal metabolism was observed after two days. These compounds (amiodarone and rosiglitazone) provide a measure of the metabonomic response that can be expected from marketed pharmaceutical drugs. The need for large amounts of sample (approximately one gram for acute dosing) affected the set of in-house compounds that we could chose to test metabonomics as a toxicity screen. Only compounds that had made it to preclinical evaluation could be tested. Compound 1 was a potent alpha-2 adrenoceptor agonist with activity in acute and neuropathic pain models. Fig. (7) shows the results from SIMCA. The vehicle data (filled boxes) fall within the model limit but with time (4 days) start to drift to the edge of the model confidence limits.
Fig. (7). SIMCA display generated from the proton NMR spectra of rat urine from rats given compound 1. The open boxes are for pre-dose urine samples and open circles, plus signs and filled triangles are from rats dosed with 50, 100 or 150 mg/kg of compound 1, respectively. The filled boxes are from rats given vehicle (methylcellulose).
The urine volumes for the 0-7 hour time point were 2-3 times the volume of the predose collection period and relative to the reference, TSP, the metabolite levels were lowered. The 0-7 hour time period also reflected the presence of drug substance (and/or metabolites) in the urine as well as blood in the some of the urine samples. At the doses
Exploring the Viability of Metabonomic
Frontiers in Drug Design & Discovery, 2005, Vol. 1 223
administered to the rats, most of them suffered from chromadacoyrrea (bloody tears) during the first urine collection period. A dose response can be seen for days one, two and three. Day four shows the points drifting away from the model limit for the rats given the highest (filled triangles) and middle doses (plus signs). The main reason for this is the decrease in hippurate and citrate and an increase in taurine for those rats. Since the urine collection was halted after the 96-103 hour collection period it was not clear if the movement away from the model for the animals given the higher doses was due to the development of a toxicological response or simply to animal variation. The metabonomics results for compound 1 indicate a dose dependent disruption of normal metabolism. The level of analysis does not account for the mechanism, only that there is a deviation from normal metabolism upon administration of the drug substance. Warning signs were raised independent of the metabonomics results because of the chromadacoyrrea. The next compound presented was a calcium channel agonist (compound 2) and the SIMCA results are shown in Fig. (8). In general the points fall within the model cutoff. Small peaks attributable to the parent compound and/or metabolites are seen in the spectra at the 0-7 hour time period and at the 24-31 hour time period for the highest dose. In the 0-7 hour collection period the three points farthest outside the model limit have an elevated concentration of 2-oxoglutarate relative to the other samples in the group. These results do not indicate that compound 2 affects normal metabolic pathways.
Fig. (8). SIMCA display generated from the proton NMR spectra of rat urine from rats given compound 2. The open boxes are for pre-dose urine samples and the open circles, plus signs and filled triangles are from rats dosed with 10, 30 or 100 mg/kg of compound 2, respectively. The filled boxes are from rats given vehicle (sterile water).
A p-38 kinase inhibitor (compound 3) designed for possible use against inflammatory diseases was the next example and the SIMCA results are presented in Fig. (9). No clear trend is observed except that the points for the middle time periods are
224 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Leo et al.
Fig. (9). SIMCA display generated from the proton NMR spectra of rat urine from rats given compound 3. The open boxes are for pre-dose urine samples and the open circles, plus signs and filled triangles are from rats dosed with 30, 150 or 300 mg/kg of compound 3, respectively. The filled boxes are from rats given vehicle (methylcellulose).
more removed from the model for both the dosed and vehicle treated animals. The urine volumes were higher for the two lower doses for the 24-31 and 48-55 hour collection periods. The volume increases ranged from one to three-fold. The urine volumes for the rats given the highest dose increased slightly (1 – 1.5 fold) during the first collection period (0-7 hours). The urine volumes alerted the researcher to a change but were not used as variables in the statistical analysis. An anti-convulsant (compound 4) was tested and the analysis of the data using SIMCA is shown in Fig. (10). The 0-7 hour time points were outside the model limits due to the presence of drug metabolites and the decrease of citrate and 2-oxoglutarate. An inverse relation between the dose and the concentrations of citrate and 2-oxoglutarate was seen for the 0-7 hour collection period. This decrease in citrate for compound 4 may be related to its inhibition of carbonic anhydrase. By the 24-31 hour time point the metabolite patterns returned to the edge of the model limit established by the pre-dose samples. The remaining time points for the dosed and vehicle treated rats reflected a slight drift away from the model. One metabolite change that contributed to this drift was the decrease in hippurate during the course of the study. In Fig. (11) the SIMCA analysis shows the data points for compound 5 fit the normal rat metabolism model. Compound 5 is a vasopressin inhibitor. The urine volumes collected for the first time period after dosing were very large – up to 63 ml for one of the rats given a high dose of the drug substance. For comparison, pre-dose urine volumes were in the 5-7 ml range. The spike for the points in the first collection period is not due to the presence of drug substance or drug metabolites but rather to an increase in
Exploring the Viability of Metabonomic
Frontiers in Drug Design & Discovery, 2005, Vol. 1 225
Fig. (10). SIMCA display generated from the proton NMR spectra of rat urine from rats given compound 4. The open boxes are for pre-dose urine samples and the open circles, plus signs and filled triangles are from rats dosed with 50, 100 or 300 mg/kg of compound 4, respectively. The filled boxes are from rats given vehicle (sterile water, pH adjusted between 8 - 10).
Fig. (11). SIMCA display generated from the proton NMR spectra of rat urine from rats given compound 5. The open boxes are for pre-dose urine samples and the open circles, plus signs and filled triangles are from rats dosed with 3, 30 or 300 mg/kg of compound 5, respectively. The filled boxes are from rats given vehicle (methylcellulose).
226 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Leo et al.
hippurate. This was most pronounced for the highest dose administered. Rat 11 was particularly sensitive to this compound and required 4 days before returning within the model limits. The distinguishing features for rat 11 at the 24-31 and 48-55 hour collection periods were the decrease in citric acid cycle metabolites and the increase in phenylacetylglycine. The results indicate that compound 5 causes a temporary perturbation (the initial time period, 0 – 7 hours) in metabolism. DISCUSSION Data have been presented showing the results obtained from metabonomic analysis of rat urine in a pharmaceutical drug discovery environment. We took a simplistic approach and attempted to use metabonomics to classify compounds as altering or leaving unchanged the normal metabolic profile of rat urine to detect compounds having undesired toxicity. The protocol that was chosen was limited by the necessity of having large quantities of drug substance to perform the tests. The in-house compounds that were tested had advanced out of drug discovery to pre-clinical testing and thus were available for testing metabonomics. Following a protocol using a single dose diminishes the chance to detect compounds that require chronic administration (e.g., amiodarone). Other companies have taken a different strategy employing a large metabonomics effort to generate toxicity models to then be able to rank compounds relative to those models. This was one of the goals of the Consortium On Metabonomics Technology (COMET). Well-defined handling of the animals was used to avoid contamination of the urine and undue stress of the animals. The protocol adopted for handling the rats incorporated group housing with free access to food and water until the urine collection periods and then the labeled rats were isolated in separate metabolism cages without access to food thus insuring that the urine was free of food contamination. The group housing prior to the administration of compounds should have allowed for common hydration and fed state of the animals. Even under such conditions, there was sufficient variation observed in the urine metabolite pattern so that the vehicle rats often fell outside the 95% confidence limit for the healthy animal model. Variation of metabolite patterns in rat urine has been documented previously [16]. Variation in the urine metabolite profile may appear unexpectedly as in the rosiglitazone example where the predose urine metabolite patterns displayed significant variation relative to the control group over the course of the study. What caused the alteration in the metabolite profile is not known and that underscores the need for continued research in this field [21]. Other “-omics” technologies are also information rich and a full understanding of all the information is not possible at the present time. Metabonomics can be used to differentiate toxic compounds from controls but most of the test compounds in this study did not cause dramatic changes in metabolism and were not differentiated from the control group. Of the five in-house drug discovery compounds tested only compound 1 produced a marked deviation from the pre-dose control urine and this compound also caused chromadacoyrrea. Three of the samples studied (amiodarone, and compounds 4 and 5) only gave changes at the 0-7 hour time point and these changes were increases or decreases in the concentrations of metabolites that normally show the most variation [16]. Of these three, compound 5 caused a pronounced increase in the urine output. Some compounds (TA and compounds 1,2 and 4) yielded drug metabolites in the urine at the initial, 0-7 hour, time point resulting in a separation in the pattern recognition results. For the amiodarone study there was a
Exploring the Viability of Metabonomic
Frontiers in Drug Design & Discovery, 2005, Vol. 1 227
movement away from the control group that occurred two days after the rats were given the highest dose of amiodarone. The strength behind pattern recognition is its ability to reduce a large amount of data to something that can be readily visualized and we found it convenient to use SIMCA to visualize the dose response and to visualize samples that fell outside the limits of the model group. Other statistical tools, that can and have been effectively used for metabonomics, were not represented in this report [22 - 25]. As mentioned this application of metabonomics requires relatively large amounts of compound but if there was a willingness to use mice the amount of compound required would decrease approximately 10-fold. The reduced volumes of urine available from mice should not be a problem with the current generation of small volume probes available. Our application of metabonomics using an acute dosing protocol missed compounds that we knew to have target organ toxicity and a chronic dosing protocol would be expected to work better. This would add xenobiotic resonances in the urine spectra adding significant complexity to the interpretation of the pattern recognition results and might be circumvented by switching analytical methods to mass spectroscopy. With mass spectroscopy mass filters can be used to select compounds with the desired molecular weights and this type of metabonomic’s application is being commercially performed (e.g., Metabolon Inc., Research Triangle Park, NC). CONCLUSION Metabonomics as applied in this report was not found useful as a toxicological screen within our pharmaceutical drug discovery environment because of the false positives that were observed and the excess amount of compound needed for rat studies. This is not to discredit the technology for there have been commercial applications utilizing metabonomics in clinical areas (e.g., Liposcience in Raleigh, NC) and biomarker identification or in conjunction with genomics and proteomics (e.g., Beyond Genomics in Waltham, MA). The technology would be expected to play an expanded role in the toxicological profiling of new compound entities after compounds have been handed over from drug discovery to development. An application that would seem appropriate for metabonomics in a typical pharmaceutical drug discovery environment may be for the better classification of in vivo disease models. This is not unreasonable given that the technique has been shown to be able to differentiate genetic strain differences in mice [26] and to show the effect of different vehicles on metabolite patterns in rat urine [20]. ACKNOWLEDGEMENTS Carlos Cotto, Jeffrey Hall and Diane Gauthier are acknowledged for their assistance at various stages of this project. ABBREVIATIONS 1D-NOESY
=
One dimensional nuclear Overhauser effect spectroscopy
ANIT
=
α-naphthylisothiocyanate
D2O
=
Deuterium oxide
h
=
Hour
228 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Hz
=
Hertz
IACUC
=
Institutional Animal Care and Use Committee
M
=
Molar
mg/kg
=
Milligram/kilogram
ml
=
Milliliter
mm
=
Millimeter
µl
=
Microliter
MHz
=
MegaHertz
NMR
=
Nuclear Magnetic Resonance
PCA
=
Principal Component Analysis
ppm
=
Parts per million
rpm
=
Revolutions per minute
s
=
Seconds
SIMCA
=
Soft Independent Modeling of Class Analogy
TA
=
Thioacetamide
TSP
=
3-trimethylsilylpropionic-(2,2,3,3-d4)-acid
Leo et al.
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]
Prentis, R. A.; Lis, Y.; Walker, S. R. British Journal of Clinical Pharmacoloogy, 1988, 25, 387-396. Caldwell, G. W. Current Opinion in Drug Discovery & Development, 2000, 3, 30-41. Caldwell, G. W.; Ritchie, D. M.; Masucci, J. A.; Hageman, W.; Yan, Z. Current Topics in Medicinal Chemistry (Hilversum, Netherlands), 2001, 1, 353-366. Nicholson, J. K.; Lindon, J. C.; Holmes, E. Xenobiotica, 1999, 29, 1181-1189. Shockcor, J. P.; Holmes, E. Current Topics in Medicinal Chemistry (Hilversum, Netherlands), 2002, 2, 35-51. Robertson, D. G.; Reily, M. D.; Lindon, J. C.; Holmes, E.; Nicholson, J. K. In Comprehensive Toxicology; Sipes, I.G., McQueen, C.A., Gandolfi, A.J., Eds.; Elsevier Science B.V.: Amsterdam, 2002, pp. 583-610. Holmes, E.; Nicholls, A. W.; Lindon, J. C.; Connor, S. C.; Connelly, J. C.; Haselden, J. N.; Damment, S. J. P.; Spraul, M.; Neidig, P.; Nicholson, J. K. Chem. Res. Toxicol., 2000, 13, 471-478. Lindon, J. C.; Nicholson, J. K.; Holmes, E.; Everett, J. R. Concepts Magn Reson, 2000, 12, 289-320. Goldman, I. S.; Winkler, M. L.; Raper, S. E.; Barker, M. E.; Keung, E.; Goldberg, H. I.; Boyer, T. D. American Journal of Roentgenology, 1985, 144, 541-546. Lebovitz, H. E. Diabetes/Metabolism Research and Reviews, 2002, 18, S23-S29. Waters, N. J.; Holmes, E.; Williams, A.; Waterfield, C. J.; Farrant, R. D.; Nicholson, J. K. Chem Res Toxicol, 2001, 14, 1401-1412. Spraul, M.; Hofmann, M.; Ackermann, M.; Nicholls, A. W.; Damment, S., J. P.; Haselden, J. N.; Shockcor, J. P.; Nicholson, J. K.; Lindon, J. C. Analyst (Cambridge, U K), 1997, 122, 339-341. Marion, D.; Ikura, M.; Bax, A. J. Magn. Reson, 1989, 84, 425-430. Lindon, J. C.; Nicholson, J. K.; Everett, J. R. Annual Reports on NMR Spectroscopy, 1999, 1-88. Holmes, E.; Shockcor, J. P. Current Opinion in Drug Discovery & Development, 2000, 3, 72-78. Bollard, M. E.; Holmes, E.; Lindon, J. C.; Mitchell, S. C.; Branstetter, D.; Zhang, W.; Nicholson, J. K. Analytical Biochemistry, 2001, 295, 194-202. Phipps, A. N.; Stewart, J.; Wright, B.; Wilson, I. D. Xenobiotica, 1998, 28, 527-537.
Exploring the Viability of Metabonomic [18] [19] [20] [21] [22] [23] [24] [25] [26]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 229
Williams, R. E.; Eyton-Jones, H. W.; Farnworth, M. J.; Gallagher, R.; Provan, W. M. Xenobiotica, 2002, 32, 783-794. Poucell, S.; Ireton, J.; Valencia-Mayoral, P.; Downar, E.; Larratt, L.; Patterson, J.; Blendis, L.; Phillips, M. J. Gastroenterology, 1984, 86, 926-936. Beckwith-Hall, B. M.; Holmes, E.; Lindon, J. C.; Gounarides, J.; Vickers, A.; Shapiro, M. J.; Nicholson, J. K. Chem. Res. Toxicol., 2002, 15, 1136-1141. van der Greef, J.; Stroobant, P.; van der Heijden, R. Current Opinion in Chemical Biology, 2004, 8, 559-565. Holmes, E.; Foxall, P. J. D.; Nicholson, J. K.; Neild, G. H.; Brown, S. M.; Beddell, C. R.; Sweatman, B. C.; Rahr, E.; Lindon, J. C. Analytical Biochemistry, 1994, 220, 284-296. Azmi, J.; Griffin, J. L.; Antti, H.; Shore, R. F.; Johansson, E.; Nicholson, J. K.; Holmes, E. Analyst, 2002, 127, 271-276. Anthony, M. L.; Rose, V. S.; Nicholson, J. K.; Lindon, J. C. J. Pharm. Biomed. Anal. , 1995, 13, 205211. Brindle, J. T.; Antti, H.; Holmes, E.; Tranter, G.; Nicholson, J. K.; Bethell, H. W. L.; Clarke, S.; Schofield, P. M.; McKilligin, E.; Mosedale, D. E.; Grainger, D. J. Nature Medicine , 2002, 8, 14391444. Gavaghan, C. L.; Holmes, E.; Lenz, E.; Wilson, I. D.; Nicholson, J. K. FEBS Letters, 2000, 484, 169174.
Frontiers in Drug Design & Discovery, 2005, 1, 231-266
231
Partition of Solvents and Co-Solvents of Nanotubes: Proteins and Cyclopyranoses Francisco Torrens* Institut Universitari de Ciència Molecular, Universitat de València, Dr. Moliner 50, E-46100 Burjassot (València), Spain Abstract: The main contribution to the water-accessible surface area of lysozyme helices is the hydrophobic term, while the hydrophilic part dominates in the sheet, what is related to the 1-octanol–, cyclohexane– and chloroform–water partition coefficients Po–ch–cf of helices, which are greater than those of the sheet are. The analysis of atom–group partial contributions to logPo–ch–cf allows building local maps. The molecular lipophilicity pattern differentiates among helices, sheet and binding site. For a given atom, logP is sensitive to the presence of other atoms. The contributions of C70-a–c atoms to logP are slightly greater than that of d–e are, which correlate with the distances from the nearest pentagon. (10,10) is the favourite single-wall carbon nanotube (SWNT), presenting consistency between a relatively small aqueous solubility and great Po–ch–cf. Efforts to use fullerenes–SWNTs in therapeutic applications are re-evaluated. There is a strong possibility for hydrophobic interactions between proteins and fullerenes–SWNTs in biomilieu, when the latter is used for biomedical applications, unless the molecule is delivered effectively at the intended site of action.
INTRODUCTION Single-wall carbon nanotubes (SWNT) have not yet been fully integrated into biosystems, because of the difficulty in rendering them soluble in aqueous solutions. There were some attempts to disperse fullerenes stably in water without stabilizer [1,2]. To study colloidal fullerene–SWNT bioactivities, it is essential to know the dispersion fundamental properties (colloidal-particle structure and size distribution), and stabilization mechanism. Their colloid bioactivities might be different from those of water-soluble derivatives [3,4], or their aqueous solutions via complexation [5] or with surfactants. Fullerene–SWNT supramolecular structures, assemblies and arrays were applied in materials science and biosystems. While water-soluble polymers and surfactants can bring aqueous solubilities to SWNTs, they may not be biocompatible. Interest in C60 chemistry is due to its unique physical–chemical properties: (1) optical limiting property due to electronic absorption in the entire ultraviolet–visible (UV–VIS) range, (2) efficient singlet-oxygen ( 1O2) generating ability, (3) radical scavenging character and (4) *Corresponding author: E-mail:
[email protected] Garry W. Caldwell / Atta-ur-Rahman / Barry A. Springer (Eds.) All rights reserved – © 2005 Bentham Science Publishers.
232 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Francisco Torrens
superconduictivity on doping with alkali metals. New methods incorporated hydrophobic C 60 with low solubility in organic solvents. Host molecules form supramolecular structures with C60–70, including calixarenes [6–12], cyclotri-veratrylene [13], triptycenes [14], porphyrin [15], cyclodextrin8 (CD8) and CD7. Few are utilizable to solubilize water-insoluble molecules for drug-delivery systems [16]. Studies on deoxyribonucleic acid (DNA)–C60 interactions showed that C60 cleaves DNA, and is base specific [17]. C60–human immunodeficiency virus type 1 (HIV-1) protease (HIVP) interaction inhibited the activity of the latter [18]. C60 aqueous solubilization by nucleophilic addition of CD7-R-monoamines (R = iminoalkyl, iminoaryl) to C60, and studies involving host–guest characteristics, free-radical scavenging and DNA-cleaving properties indicated that C60 has potential for biomedical applications [19]. A watersoluble polyfullerene was synthesized by nucleophilic reaction of a diamine supramolecularly shielded in a CD7 cavity with C60 [20]. SWNT effects were investigated on the polymerase chain reaction [21]. Solubilizing SWNTs in starch aqueous solutions is interesting [22]. CD6–10 are cyclic molecules consisting of 6–10, α-(1→4)-linked D-glucopyranose (D-Glcp) residues, Fig. (1a–c). CDs are able to form
Fig. (1). Structures of (a) D-glucopyranose and (b) cyclodextrin6 (CD 6), (c) molecular dimensions of CD (h = 7.9Å): CD6 (D = 14.6Å, d = 4.9Å), CD7 (D = 15.4Å, d = 6.2Å) and CD8 (D = 17.5Å, d = 7.9Å), and (d) intramolecular H-bonds in CD.
Partition of Solvents and Co-Solvents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 233
inclusion complexes with guest molecules ranging in character from purely hydrophobic to purely hydrophilic, provided that the guest be small enough to fit into CD annular aperture [23,24]; CDs were model compounds in the study of starch complexes, tyrosine-residues burying and enzyme action [25–27]. Jensen et al. reviewed fullerene bioapplications [28]. Beck et al. reconsidered fullerene solubility in various solvents [29]. Connors re-evaluated CD-complex stability [30]. D’Souza and Lipkowitz edited a special issue to review CDs [31]. Lipkowitz [32], and Castro and Barbiric [33] discussed computational chemistry studies of CDs and complexes. Bezmel’nitsyn et al. reviewed fullerenes in solutions [34]. Diederich and Gómez-López reconsidered C60–70 inclusion complexes by macrocycles [35]. Da Ros and Prato re-evaluated medicinal chemistry with fullerenes [36]. Bianco et al. discussed fullerene-based amino acids–peptides [37]. Guldi and Martín reviewed supramolecular fullerene architectures [38]. The Proceedings of the Lipophilicity Symposia covered drug action and toxicology [39,40]. Earlier works reconsidered structural–topological hypotheses [41], and fullerene–SWNT aqueous solubilization via CD–amylose (Amy)·iodine inclusion complexation [42], and reported calculated partitions of lysozyme [43] and C60–70 [44]. Fullerene [45,46] and SWNT [47,48] periodic tables were discussed. AQUAFAC model calculated SWNT aqueous coefficients [49]. A linear–cyclic-D-Glcpn comparative study provided overall conformations, contact surfaces and cavity proportions [50–56]. The aim of this review is to discuss SWNT solvation–partition in the hope of biopreparations. The following sections review racemic separations, partition of proteins, fullerenes and SWNTs via inclusion complexes, biomolecule immobilization, bioactivity, transport and remediation. RACEMIC-MIXTURE SEPARATIONS BY CYCLODEXTRINS One application of CDs, particularly of CD polymers, which seems to be of promise is the resolution of racemates by means of complex formation [57]. As CDs themselves are optically active molecules, they form a diastereomeric pair with each included racemate; the two components of the pair exhibit different physical properties:
D(+) - cyclodextrin⋅ (+)-antipode Diastereomers D(+)-cyclodextrin⋅ (–)-antipode Enantiomeric selective separation is a sharp problem in pharmaceutical industry. Between two synthesized enantiomers, only one presents the required pharmacological properties; the other can be inactive or even toxic [58]. Notice that there are no precise rules allowing, for a given separation, to choose the good stationary phase. To evaluate theoretically the different interactions that can cause the enantiomeric-pair separation, several attempts were performed using molecular dynamics [59]. Considering the great size of the stationary phase–solute (PS–S) complexes, few studies were done using quantum chemistry methods [60]. It is interesting to determine the different structures of PS–S complexes and interactions that can be responsible for the enantioselective separation using theoretical models. CDs and their derivatives received much attention in the field of chromatographic separations. The wide interest in the use of CDs as a separation medium arises from the fact that CDs can offer a selective system for chromatographic separation. CD complexation is highly selective, moreover stereoselective. Inclusion complex formation
234 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Francisco Torrens
is mainly affected by the hydrophobicity and shape of guest molecules. Steric factors are important for the formation and stability of CD inclusion complexes. The partitioning and binding of many hydrophobic and hydrophilic organic molecules to CD cavity can be more selective than the partitioning and binding to a single solvent or to a single traditional stationary phase. For this reason, CDs find their use in typically difficult separations of enantiomers, diastereomers, structural isomers and geometric isomers, in all current types of chromatography [61,62]. The particular properties of CDs can be employed for the chromatographic separation of complex mixtures of substances, even racemates, by molecular recognition [63]. The chirality inherent in cyclodextrins (CD) enables them to differentiate between stereoisomeric potential guest molecules [64], or vice versa, to induce conformational enantiomerism of encapsulated achiral molecules. Albeit some direct hints for these effects can be obtained from solid-state structural analysis [65,66], nuclear magnetic resonance (NMR) data [67,68] and chromatographic separations using CDs as asymmetric stationary phases [69], the chiral resolution of the unmodified CDs is generally poor [70–73]. It is the undirected nature of hydrophobic and van der Waals interactions that causes the significant lack of specificity during molecular recognition by CDs. Even the distinct directionality of hydrogen bond (H-bond) interactions is opposed by the pseudo n-fold symmetry of the various CDs, which provides n alternative, but equally potent H-bond donor and acceptor functions for stabilization of the molecular assembly. Chemically modified CDs [74,75] were considered as basic molecular models for enzymes [76–81] and transition-state binding [82], but the observed regio and stereospecificity, and thus, the differentiation between different substrates remain low [83]. Chemical modification can alter the physico-chemical properties of CDs in a pronounced fashion. An attractive stationary phase is CD7. The two CD 7 faces have different diameters according to experience [84]. The narrowest one (≈8Å) consists of the primary hydroxyl (OH) groups, each group forming an H-bond with the O atom of the neighbouring cycle. The widest opening (≈15Å) consists of two alternating families of secondary OH groups (C2OH and C3OH) bound, respectively, to the C2 and C3 atoms in each glucose unit, Fig. (1d). In view of complex sizes, two glucose units (diglucose, DG) were model compounds in the study of CD7 structure. DG–butanol pair was a model inclusion compound in the study of CD7–solute complex. ORGANIC SOLVENT–WATER PARTITION OF PROTEINS Fleming (1922) discovered lysozyme [85], which was studied by bacteriologists because of its lytic activity against bacteria. Following the successful crystallization of lysozyme from hen egg white by Abraham and Robinson, lysozyme began to attract the attention of protein chemists [86]. There was considerable difficulty attached to studying the structure–function relationship of lysozyme, due to the obscurity of its enzymatic activity. Berger and Weiser observed theβ-N-acetylglucosaminidase activity of lysozyme [87]. Salton and Ghuysen prepared soluble oligosaccharide substrates from a hydrolyzate of bacterial cell walls [88]. Jollès et al. [89] and Canfield [90] determined the aminoacid sequence of hen egg-white lysozyme; Phillips’ group (1965) reported its first X-ray crystallographic analysis [91–93]. Detailed information on the three-dimensional (3D) conformation of lysozyme, and substrate binding mode followed. Lysozyme is the first enzyme in which it is possible to understand the enzymatic activity on the basis of its 3D
Partition of Solvents and Co-Solvents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 235
fine structure, Fig. (2). Progress has been made in physicochemical and enzymatic studies of lysozyme in solution (cf. Table 1) [94]. Table 1.
Some Physicochemical Properties of Lysozyme
Partial specific volume (cm3·g-1)
0.189 (546nm), 0.1955 (436nm), 0.195 (436nm), 0.188 (578nm)
Molecular weight
14307Da
Size
16Å, 15.3Å (radius) 32×30×28Å, 45×30×30Å
Fig. (2). Structures of (a) lysozyme with a number of water molecules from the hydration sphere, (b) ribbon image, (c) binding site and (d) Cα skeleton.
236 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Table 2.
Francisco Torrens
Parameters of Secondary Structure Regions in Lysozyme
Structure
Region
Type
Residue
Length Percentage
Helix
A
α
5–15
11
8.5
B
α
24–34
11
8.5
C
3.010
80–85
6
5
D
α
88–96
9
7
Total α-helix
–
α
5–15, 24–34, 88–96
31
24
Total helix
–
–
5–15, 24–34, 80–85, 88–96
37
29
β-Sheet
E
Antiparallel
41–54
14
11
Total helix+sheet
–
–
5–15, 24–34, 41–54, 80–85, 88–96
51
40
Active site
BS
Substrate binding site
34, 35, 37, 44, 57, 59, 62, 63, 101, 107, 114
11
8.5
Total
–
Protein
1–129
129
100
The regions of helices and sheet are given in Table 2. The Gibbs free energy of solvation ∆Gsolv results for lysozyme (cf. Table 3) show that – ∆Gsolv,water of the helices is Table 3.
a
Free Energy of Solvation, Partition Coefficient and Hydrophobic Moment Results for Lysozyme and Its Secondary Structures
Structure
∆Gsolv,wa
∆Gsolv,ob
∆Gsolv,chc
∆Gsolv,cfd
log Poe
log Pchf
log Pcfg
µHh
A
-174.1
-244.1
-140.7
-226.0
12.3
-5.87
9.11
4.53
B
-165.5
-234.4
-136.0
-218.9
12.1
-5.19
9.38
2.75
C
-90.93
-124.9
-73.57
-119.1
5.96
-3.05
4.96
4.01
D
-137.0
-181.1
-105.8
-171.4
7.73
-5.49
6.03
5.26
Mean of α-helices
-158.9
-219.9
-127.5
-205.4
10.7
-5.52
8.17
4.18
Mean of helices
-141.9
-196.1
-114.0
-183.9
9.52
-4.90
7.37
4.14
Antiparallel β-sheet
-360.8
-299.7
-172.1
-275.5
-10.7
-33.1
-15.0
5.12
Binding site
-401.6
-372.2
-215.7
-343.9
-5.16
-32.7
-10.1
6.50
All molecule
-2189
-2133
-1290
-2124
-9.85
-158
-11.5
25.9
Gibbs free energy of solvation in water (kJ·mol-1). b Gibbs free energy of solvation in 1-octanol (kJ·mol-1). c Gibbs free energy of solvation in cyclohexane (kJ·mol-1). d Gibbs free energy of solvation in chloroform (kJ·mol-1). e Po is the 1-octanol–water partition coefficient. f Pch is the cyclohexane–water partition coefficient. g Pcf is the chloroform–water partition coefficient. h µH is the hydrophobic moment.
Partition of Solvents and Co-Solvents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 237
smaller than that of the sheet is. However, this trend is damped in the organic solvents 1-octanol, cyclohexane and chloroform (o, ch and cf = CHCl3). The logarithm of the partition coefficients logPo–ch–cf of the helices are strongly greater than those of the sheet are. The greater logPo–ch–cf for the helices are related to their large hydrophobic solventaccessible surface area (SASA), and the lower logPo–ch–cf for the sheet, to its large hydrophilic SASA. Phelix – Psheet differences result 20, 28 and 22 log units for Po, Pch and Pcf, respectively. For logP > 3 more than 99.9% of the solute is in the organic phase, and for logP < –3 more than 99.9% of the solute is in the aqueous phase, predicting a negligible quantity of solute in each corresponding alternate phase. Three –logP values are greater than the Avogadro number exponent is, P < 10–23, meaning that no solute molecules would be present in the organic phase to allow experiments for validation. However, all logP figures are kept for the purpose of comparison between helices and sheet. The molecular lipophilicity pattern (MLP) is the normalized logPo map of lysozyme, Fig. ( 3). Lipophilicity is positive for the four helices, and negative for the binding site and, especially, β-sheet. The hydrophobic moment µH (Table 3) cannot differentiate properly between helices and sheet. µH for the whole molecule is greater than those for the fragments are: µH,helix ≈ µH,sheet < µH,binding site << µH,all. The ∆Gsolv, logP and µH for Fe4S4Cysn (0 ≤n ≤4) models of high-potential iron–sulphur proteins [95] (cf. Table 4) show that Po, Pch and Pcf decrease monotonically 4.41, 6.22 and 4.60 log units per Cys ligand, respectively. One value of –logPch is greater than the Avogadro number exponent is. Po results are of the same order of magnitude as CDHI computations, which are based on a method developed by Kantola et al. [96].
Fig. (3). Molecular lipophilicity pattern of lysozyme.
238 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Table 4.
Francisco Torrens
Free Energy of Solvation, Partition Coefficient and Hydrophobic Moment Results for High-Potential Iron–Sulphur Protein Models
Clust.
∆Gs,wa
∆Gsol,ob
∆Gso,chc
∆Gso,cfd
Fe4S4
-2.56
-2.76
-1.39
-2.01
0.04
0.28
-0.21
-0.70
-0.10
0.22
0.25
Fe4S4 Cys
-61.74
-36.64
-24.86
-34.73
-4.41
-3.66
-6.48
-5.42
-4.74
-5.46
4.42
Fe4S4 Cys2
-216.4
-166.8
-144.6
-163.6
-8.72
-7.21
-12.6
-10.0
-9.28
-11.0
3.49
Fe4S4 Cys3
-454.9
-380.5
-348.0
-376.1
-13.1
-11.2
-18.8
-14.6
-13.8
-16.5
0.77
Fe4S4 Cys4
-648.4
-548.1
-505.5
-542.9
-17.6
-16.2
-25.1
-19.5
-18.5
-22.3
2.62
Cys
-50.31
-26.25
-15.55
-25.02
-4.23
-0.87
-6.11
-5.23
-4.44
-5.22
5.97
log Poe log Po Ref.f
log Pchg
log Pch log Pcfi log Pcf Rf.h Rf.h
µHj
a
Gibbs free energy of solvation in water (kJ·mol-1). b Gibbs free energy of solvation in 1-octanol (kJ·mol-1). c Gibbs free energy of solvation in cyclohexane (kJ·mol-1). d Gibbs free energy of solvation in chloroform (kJ·mol-1). e Po is the 1-octanol–water partition coefficient. f CDHI: calculations carried out with a method developed by Kantola et al. g Pch is the cyclohexane–water partition coefficient. h Calculations carried out with a method developed by Leo et al. i Pcf is the chloroform–water partition coefficient. j µH is the hydrophobic moment.
Pch–cf results are of the same order of magnitude as calculations performed with a method by Leo and Hansch [97]. With these reference methods, Po, Pch and Pcf decrease monotonically 4.12, 4.70 and 5.63 log units per Cys ligand, in that order. LogPo, logPch and logPcf mean relative errors (MRE) are 17%, 26% and -16%, correspondingly. These represent globally a MRE of 9%. However, this result should be taken with care because the mean unsigned relative error (URE) is 19%. The error of CHCl3 is rather smaller than that of cyclohexane is, due to the greatest similarity in relative dielectric permittivity between CHCl3 and 1-octanol. In particular for Fe4S4 Po–ch–cl are close to 0 log unit. Fe4S4Cys4 MLP, Fig. (4) shows that lipophilicity is near zero for the Fe4S4 cluster and negative close to Cys ligands. µH quantitatively differentiates Fe4S4Cysn structures. On varying the number of Cys units, the structures show µH indicative of particularly amphipathic structures as Fe4S4Cys. ORGANIC SOLVENT–WATER PARTITION OF FULLERENES The existing methods for production and purification of fullerenes in macroscopic quantities are based on the use of solvents [98]. The most generally employed technology of further separation of fullerenes of different sorts, and their subsequent
Partition of Solvents and Co-Solvents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 239
Fig. (4). Molecular lipophilicity pattern of Fe4S4Cys4 iron–sulphur protein models.
purification is based on the concept of liquid chromatography [99,100]. Another generally employed method for separation and purification of fullerenes is based on the phenomenon of fullerene crystallization in solutions [101,102]. C60–70 and higher can be isolated by extraction in an organic solvent [103-105]. It was not possible to extract C60–70 from a solution in toluene to water. It was not possible to extract C60–70 from an aqueous dispersion (nC60–70) to toluene. Similar results were reported for other nC60–70 aqueous dispersions [106]. C60 dissolved in water via complexation with CD8 was extracted to toluene. C60 incorporated into artificial lipid membranes was not extracted to toluene, but extraction became possible after vesicle destruction adding KCl [107]. Extraction of poly(N-vinylpyrrolidone) (PVP)-solubilized C60 to toluene required KCl addition. Deguchi et al. prepared stable aqueous nC60–70 injecting into water a C60–70 saturated solution in tetrahydrofuran (THF), followed by THF removal purging N2(g) [108]. When they added NaCl to nC60–70, C60–70 were extracted into toulene and the toluene phase exhibited faint magenta–orange, characteristic colours for a solution of C60–70 in toluene, what permits measuring nC60–70 concentration in water by spectrophotometry. As they prepared aqueous nC60–70 with a saturated solution in THF, it is impossible to prepare more concentrated nC60–70. Injecting a saturated solution in THF into nC60–70 prepared beforehand, it is possible to concentrate aqueous nC60–70. When they concentrated aqueous nC60 twice, particle size remained the same by dynamic light scattering (DLS) and transmission electron microscopy (TEM). nC60 does not grow by further C 60 feed but new nC60 is formed. They concentrated nC60 four times and formed polycrystalline aqueous nC60 [109,110]. Aqueous-nC60 stability is achieved by the
240 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Francisco Torrens
negatively charged cluster surface, which is one reason why nC60–70 cannot be extracted from aqueous dispersions to toluene, and nC60 does not grow by further C60 addition. The negative-charge origin on nC60–70 surfaces is not clear. Possible mechanisms are hydroxyl ion adsorption, clathrate formation and charge transfer. In order to test the formation of anionic endohedral metallofullerenes (EMF, M@C2n) during their extraction from EMF-containing soot [111], Kareev et al. studied different EMF extracts applying electrospray-ionization mass spectrometry (ESI–MS) [112]. The aqueous solubility of C60–70–82, van der Waals dimer (C 60)2 and C60H60 was calculated with a program based on AQUAFAC model [113–118]. C60-Ih is especially symmetric, with all atoms occupying equivalent sites in a truncated icosahedron [119]. The molecular structure contains 12 pentagons and 20 hexagons, constituting a roughly spherical molecule. The pentagons sit as far as possible from each other at the vertices of an icosahedron; they may be viewed as defects compared to the un-strained hexagons. All atoms are equivalent and occur at the vertex joining one pentagon and two hexagons [120]. C 70-D5h is similar, with the 10 extra atoms inserted in a hexagons band around the truncated-icosahedron middle, producing a prolate ellipsoid. A substructure [121], where the five non-equivalent atoms are labelled a–e, Fig. (5a), shows that atoms a–d join one pentagon with two hexagons, while atom e joins three hexagons. On going from a to e the distance from the nearest pentagon increases. H atoms introduced by C60 chemical reduction would lay in the cluster outside [122,123]. The symmetric exohedral structure was predicted highly strained. Endo-C60H60 isomers with C–H bonds pointing inside the cavity resulted more stable than their all-out counterparts did. The calculations refer to AM1 optimum number of inside H atoms and geometry, which has 10 endo-H [124]. The solvation descriptors (cf. Table 5) show that –∆Gsolv,water,C60–82 increase from 16–21kJ·mol-1. –∆Gsolv,1-octanol increases from 129–172kJ·mol-1. Po–ch–cf increase by seven orders of magnitude with the number of atoms. (C60)2-C5h results show logPo ≈ 39 indicating that a negligible quantity remains in water. When the calculation was repeated imposing the condition that the aqueous phase be entirely assigned to the monomer, the organic phase is additionally favoured by 2.7 log units. The other organic solvents show a similar effect. C60H60 results show that no important effect on logP is expected related to the all-exo (Ih) or partially endo (C1) H-atom position. Both SCAP logP indicate that all C60H60 would remain in the organic phase. A sharp discrepancy in the orders of magnitude is predicted by CDHI for logPo (≈3–4). CDHI results indicate a preferential solubility in 1-octanol of only 103 times that in water, a prediction that seems unlikely for a system that can be thought as fully saturated of cyclic tertiary C atoms. CDHI spurious results illustrate the danger of using parameter-fitted methods out of the range of molecules that were used in the fitting. LogPch–cf results are of the same order of magnitude as reference calculations performed with a method by Leo and Hansch. SCAP analysis shows that C 70-a–c normalized-logPo–ch–cf contributions are slightly greater than those of d–e are, what can be explained because the distances from the nearest pentagon D vary gradually from a to e. CDHI cannot differentiate atoms a–e. SCAP C70 MLP (10×normalized logPo), Fig. (5b), shows that lipophilicity decreases as D increases, and logPo vs. D, Fig. (6) correlates as: Log Po = 23.7 – 0.153 D – 0.0643D2 r = 0.997 s = 0.068 F = 190.2 (1)
Partition of Solvents and Co-Solvents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 241
Fig. (5). Flat projection of ca. 3/10 of the structure of C70-D5h: (a) the five non-equivalent atom types (C5 axis is through the centre of the top and bottom pentagons), and (b) molecular lipophilicity pattern.
ORGANIC SOLVENT–WATER PARTITION OF NANOTUBES SWNT chromatographic purifications relied on size-exclusion chromatography of surfactant-stabilized dispersions with water as the mobile phase [125,126]. Poly(phenylacethylene)-wrapped multiple-wall carbon nanotubes (MWNT) were analyzed by gel permeation chromatography [127]. Functionalized SWNTs were purified by chromatography [128]. AQUAFAC SWNT aqueous solubility Sw (cf. Table 6) shows that Sw monotonically decreases with n and m. All the values of logSw < –3 meaning that less than 0.1% of SWNT is in solution. Even all –logSw values are greater than the Avogadro exponent is. The results are consistent with the fact that SWNTs are completely insoluble in water. SWNT solubility is hindered because SWNTs aggregate in bundles due to large van der Waals interactions. Although individual van der Waals forces are weak, the total force is large
242 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Francisco Torrens
because of the great number of atoms interacting between the surfaces of aligned SWNTs. Ways of increasing SWNT solubility include the use of anionic–non-ionic surfactants. Table 5.
Solubility, Free Energy of Solvation, Partition and Atom-to-Atom Normalized Partition Coefficient Results for Fullerene Systems
logSwa ∆Gsolv,wb ∆Gsolv,oc
∆Gchd ∆Gsol,cfe logPof
logPo logPch logPcf logPchh logPcfj Ref.g Rf.i Rf.i
-23.9
-15.60
-128.7
-78.48
-103.2
19.9
13.8
11.1
11.6
15.4
21.0
C60 dimer (C5h) -43.8
-30.82
-250.0
-153.1
-202.4
38.5
27.6
21.5
24.2
30.1
42.0
Fullerene
C60 (Ih)
C60 dimer (C5h)k
–
-15.60
-250.0
-153.1
-202.4
41.2
–
24.2
26.0
32.8
45.0
C60H60 (10 H inside, C1)
-25.4
35.43
-122.3
-72.09
-91.44
27.7
2.76
18.9
16.9
22.3
29.8
C60H60 all-out (Ih)
-24.0
38.04
-124.5
-75.04
-97.67
28.6
4.07
19.9
17.4
23.8
30.8
C70 (D5h)
-27.0
-18.09
-148.4
-91.20
-119.9
22.9
15.8
12.8
13.6
17.9
24.4
C82 (C2)
-31.7
-20.86
-172.4
-106.2
-139.2
26.6
17.6
15.0
16.1
20.8
28.6
Fullerene
Atom type
logPof
logPo Ref.g
logPchh
logPch Ref.i
logPcfj
logPcf Ref.i
C60 (Ih)
a
20.0
13.8
11.0
11.7
15.3
21.2
C70 (D5h)
a
23.7
15.8
13.3
14.2
18.2
25.3
b
23.5
15.9
13.2
14.0
18.2
25.1
c
23.2
15.8
12.9
13.8
17.9
24.8
d
22.6
15.9
12.6
13.4
17.8
24.1
e
22.1
15.9
12.5
13.1
17.4
23.5
average
39.6
27.6
22.0
24.9
30.7
43.2
C60 dimer (C5h) a
Sw is the solubility in water (mol·L–1). b Gibbs free energy of solvation in water (kJ·mol–1). c Gibbs free energy of solvation in 1-octanol (kJ·mol–1). d Gibbs free energy of solvation in cyclohexane (kJ·mol–1). e Gibbs free energy of solvation in chloroform (kJ·mol–1). f Po is the 1-octanol–water partition coefficient. g Calculations carried out with a method developed by Kantola et al. h Pch is the cyclohexane–water partition coefficient. i Calculations carried out with a method developed by Leo and Hansch. j Pcf is the chloroform–water partition coefficient. k Here, C60 is calculated as monomer in water and van der Waals dimer in the organic phase.
Partition of Solvents and Co-Solvents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 243
Fig. (6). Normalized logPo vs. distance from the nearest pentagon for C70-D5h.
LogSw(n,0)–(n,n) correlate well with (n2+nm+m2)1/2. LogSw(n,0) variation with (n2+nm+m2)1/2 turns out to be
log Sw( n,0 ) = −4.76 − 3.53(n2 + nm + m 2 ) N = 10
r = −0.999994 2
2 1/2
LogSw(n,n) variation with (n +nm+m )
r = −0.999998
(2)
s = 0.038
F = 707859.2
turns out to be
log Sw( n,n ) = −5.31− 4.09( n2 + nm + m2 ) N =6
12
12
s = 0.032
(3)
F = 836663.0
The absolute slope of (n,0) is smaller than that of (n,n) is. LogSw(n,0) results greater than logSw(n,n) does, especially for thicker SWNTs. LogSw(10,10) is the smallest for all SWNTs. SCAP partition coefficients (cf. Table 6) show that Po increases with n and m. All logPo > 3, meaning that more than 99.9% of the solute is in the organic phase. Even all logPo are greater than the Avogadro exponent is. LogPo results are of the same order of
244 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Table 6.
Solubility, Free Energy of Solvation, Partition and Atom-to-Atom Normalized Partition Coefficient Results for Carbon Nanotubes
Nanot.
log Swa ∆Gs,wb ∆Gsolv,oc ∆Gsol,chd ∆Gsolv,cfe log Pof
log Po log Pch log Pcf log Pchh log Pcfj Ref.g Ref.i Ref.i
90 (9,0)
-36.6
-25.3
-195.1
-105.2
-160.2
29.8
23.7
14.1
18.3
23.7
32.3
100 (10,0)
-40.1
-28.3
-222.0
-118.2
-179.8
34.0
28.0
15.8
21.1
26.6
37.0
110 (11,0)
-43.6
-31.3
-247.3
-131.3
-199.1
37.9
29.1
17.6
23.8
29.5
41.4
120 (12,0)
-47.1
-34.3
-271.7
-144.0
-218.0
41.7
33.0
19.3
26.3
32.3
45.6
130 (13,0)
-50.7
-37.3
-295.7
-156.7
-237.0
45.4
34.4
21.0
28.8
35.1
49.8
140 (14,0)
-54.2
-40.3
-319.5
-169.1
-256.2
49.1
38.1
22.6
31.3
37.9
53.9
150 (15,0)
-57.7
-43.2
-343.2
-181.6
-274.8
52.7
39.7
24.3
33.7
40.7
58.0
160 (16,0)
-61.3
-46.1
-366.8
-194.1
-293.6
56.3
43.3
26.0
36.2
43.5
62.1
170 (17,0)
-64.8
-48.9
-389.3
-207.0
-312.2
59.8
45.0
27.8
38.5
46.2
66.0
180 (18,0)
-68.4
-52.0
-413.8
-218.7
-330.6
63.6
48.5
29.3
41.1
48.9
70.2
100 (5,5)
-40.7
-27.5
-211.3
-114.8
-174.4
32.3
25.8
15.3
20.0
25.8
35.0
120 (6,6)
-47.8
-33.5
-264.0
-140.8
-212.6
40.5
31.0
18.8
25.5
31.5
44.2
140 (7,7)
-54.8
-39.5
-313.1
-165.7
-251.1
48.1
36.2
22.2
30.6
37.2
52.8
160 (8,8)
-61.9
-45.2
-359.8
-190.4
-288.2
55.3
41.6
25.5
35.5
42.7
60.9
180 (9,9)
-69.0
-51.0
-406.9
-215.0
-325.2
62.5
46.7
28.8
40.4
48.2
69.1
200 (10,10) -76.1
-56.8
-453.3
-239.5
-362.1
69.6
51.9
32.1
45.2
53.6
77.1
Nanotube
Atom type
logPof
logPo Ref.g
logPchh
logPch Ref.i
logPcfj
logPcf Ref.i
170 (17,0)
a (trivalent)
62.0
45.0
29.0
40.0
49.7
68.5
b (divalent)
63.1
45.0
29.5
40.8
50.7
69.7
a (trivalent)
72.2
51.9
33.6
46.9
56.3
80.0
b (divalent)
74.4
51.9
34.5
48.4
60.0
82.4
200 (10,10)
a
Francisco Torrens
Sw is solubilty in water (mol·L-1). b Gibbs free energy of solvation in water (kJ·mol-1). c Gibbs free energy of solvation in 1-octanol (kJ·mol-1). d Gibbs free energy of solvation in cyclohexane (kJ·mol-1). e Gibbs free energy of solvation in chloroform (kJ·mol-1). f Po is the 1-octanol–water partition coefficient. g Calculations carried out with a method by Kantola et al. h Pch is the cyclohexane–water partition coefficient. i Calculations carried out with a method by Leo and Hansch. j Pcf is the chloroform–water partition coefficient.
Partition of Solvents and Co-Solvents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 245
magnitude as reference calculations performed with CDHI. LogPo error is 30%. SCAP logPo(n,0)–(n,n) correlate well with (n2+nm+m2)1/2. LogPo(n,0) variation with (n2+nm+m2)1/2 turns out to be
log Po (n,0 ) = −3.17 + 3.72( n2 + nm + m2 ) N = 10
r = 0.9998 2
(4)
s = 0.257 2 1/2
LogPo(n,n) variation with (n +nm+m )
F = 17257.3
turns out to be
log Po (n,n ) = −4.27 + 4.28( n2 + nm + m2 ) N =6
12
r = 0.9996
12
(5)
s = 0.413
F = 5656.5
The slope of the (n,0) curve is smaller than that of the (n,n) curve is. LogPo(n,0) results smaller than logPo(n,n) does, especially for thicker SWNTs. Pch–cf increase with n and m. Most logP values are greater than the Avogadro exponent is. LogPch–cf results are of the same order of magnitude as reference calculations performed with the method by Leo and Hansch. LogPch–cf error is –28%. LogPch(n,0) variation vs. (n2+nm+m2)1/2 turns out to be
log Pch ( n, 0) = −1.08 + 1.69(n2 + nm + m2 ) N = 10
r = 0.99992 2
r = 0.99993
12
LogPcf(n,0) variation with (n +nm+m )
r = 0.99995 2
12
(8)
s = 0.089 2 1/2
LogPcf(n,n) variation with (n +nm+m )
F = 29068.9
turns out to be
log Pcf ( n,0 ) = −1.36 + 2.80( n2 + nm + m2 ) N = 10
(7)
s = 0.082 2 1/2
F = 50691.1
turns out to be
log Pch ( n, n ) = −1.35 + 1.93(n 2 + nm + m2 ) 2
(6)
s = 0.068 2 1/2
LogPch(n,n) variation with (n +nm+m )
N =6
12
F = 80850.0
turns out to be
log Pcf ( n,n ) = −1.87 + 3.21(n 2 + nm + m2 )
1 2
N = 6 r = 0.99994
s = 0.128
(9)
F = 33121.7
LogPo–ch–cf(10,10) are the greatest for all SWNTs. The partition of SCAP logP for (17,0)–(10,10) shows that the contribution of the trivalent atoms a is smaller that that of the divalent atoms b is. However, CDHI does not differentiate atoms a–b. (17,0)–(10,10) MLPs, Fig. (7), show that divalent atoms b present the greatest lipophilicity.
246 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Francisco Torrens
Partition of Solvents and Co-Solvents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 247 (Fig. 7) contd….
Fig. (7). (17,0) Nanotube: (a) projection (fragment) showing non-equivalent atoms, and (b) molecular lipophilicity pattern (MLP); (10,10): (c) projection and (d) MLP.
ORGANIC SOLVENT–WATER PARTITION INCLUSION COMPLEX FORMATION
OF
FULLERENES
VIA
Wennerström’s group studied by molecular modelling [129], photophysics [130] and spectroscopy [131,132] a water-soluble (CD8)2·C60 inclusion complex, in which hydrophobic interactions are responsible, Fig. (8) [133–143]. One problem with highly symmetric C60 is that ring currents in hexagons are offset by those in neighbouring pentagons, giving rise to unpredictable NMR chemical shifts. The recognition of CDs binding to these systems is difficult to study, because of multiple association states where 2–1:1 complexes are in equilibrium with uncomplexed states, and the equilibria are affected by subtle changes in environmental conditions. Molecular modelling gained insight into equilibrium nature and complex structures. Charge-transfer complex C60δ––CD8δ+ was reported, with ether and alcoholic O atoms being electron donors [144]. AMBER force-field calculations performed to understand inclusion kinetics for the 2:1 C60 complex showed that dispersion forces dominated the associations [145]. Kadish’s group solubilized C60 in water and some polar organic solvents by CD6–8 inclusion chemistry [146]. Electrochemical [147] and photochemically reduced CD 8–C60– complex was studied [148]. C70 binding to CD8 was computed [149]. Cycloisomaltoheptaose and octaose (CI 7–8) water-soluble inclusion (2:1) complexes were formed with C60–70 [150]. C60–70 were solubilized by CD8 complexation, and C60 and dimer, with sulfocalix[8]arene [151]. The accessible C60 surface in CD7 complex is increased 8-fold compared to CD8 complex, which is important for bioapplications [152]. CD6 does not form an inclusion complex. C 60 incorporated into artificial lipid membranes. Natural and substituted CDs, and calixarenes were investigated for C60 aqueous solubilization [153], and proved that underivatized CD8 has unique solubilizing power, as it produces stable solutions and solids containing real 1:2 complexes, explained the differences in solution appearance
248 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Francisco Torrens
(magenta–brown colour) by C60 hydration within the supramolecule, and proved complex composition and structure by chemical analyses, X-ray diffraction, AFM and solid-state NMR.
Fig. (8). Schematic description of cyclodextrin8–C60 complexes derived from molecular modelling: (a) 1:1 and (b) 2:1.
Colloidal nC60 results when C60, from either the solid state or organic solution, is placed into contact with neutral water [154,155]. Rather than completely precipitating, some C60 will form suspended and water-stable aggregates up to 100ppm [156–158]. Hwang and Mauzerall [159,160], and Bensasson et al. [161] solubilized C60–70 in lipid bilayers. They solubilized C60 in various biointeresting solvents–systems (octanols, micelles and liposomes), using VIS–UV absorption spectroscopy as a diagnostic tool [162]. They monitored C60 incorporation state in micellar–colloidal liposomal solutions, using a number of spectroscopic criteria of solute–solvent and –solute interactions, based on comparison with spectra obtained in alkane and octanol solvents, and from C60 thin films. They discussed spectral red shifts and intensity modifications of C60 absorption and aggregation, in terms of environment-dependent physical parameters. Their results indicate that C60 can be dispersed in Triton X-100 (TX) and TX R-S micellar solutions, C60 being localized in the micelle inner hydrophobic part. They showed that C60 is incorporated, as nC60, into phosphatidylcholine liposomal colloidal solutions, and concluded that micellar and liposomal solutions can be prepared, which can be used to transfer C 60–nC60 to the cells. Beeby et al. solubilized C60 in aqueous micellar solution, and showed that there are two C60 states in stable dispersions: C60 and nC60 [163]. Klenin’s group studied C 60–PVP water-soluble complexes [164], and PVP–C60 [165] and –C70 [166,167] DLS. Ratnikova et al. obtained water-soluble composites with C60 content up to 5% weight based on PVP [168] and achieved the higher C60 content
Partition of Solvents and Co-Solvents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 249
introducing tetraphenylporphyrine (TPP) and KBr into composites. Their synthesis includes C60–TPP complexation and its further interaction with PVP. They confirmed C60–TPP complexation by 13C NMR, small-angle neutron scattering (SANS) and translational diffusion. Their C60–TPP–PVP-complex hydrodynamic and electrooptical studies indicated higher PVP-coil symmetry in the complex as compared to PVP. They obtained C60–PVP–KBr composites by solid-state interaction under vacuum, KBr promoting nC60 destruction. ORGANIC SOLVENT–WATER PARTITION INCLUSION COMPLEX FORMATION
OF
NANOTUBES
VIA
O’Connell et al. solubilized SWNTs in water associating them with linear polymers (PVP and polystyrene solfonate) [169]. Liu et al. converted SWNTs from nearly endless, highly tangled ropes into short, open-ended pipes that behave as individual macromolecules [170], and purified SWNTs in large batches and cut the ropes into 100–300nm lengths. The resulting pieces formed a stable aqueous suspension with surfactants. Ausman et al. performed a systematic study to find an appropriate medium for SWNT solubilization-dispersion [171]. Five solvents with high electron-pair donicity β and low H-bond parameter α formed stable dispersions. They characterized the dispersions by UV–VIS–near-infrared (IR) spectra, electron spin resonance spectra and atomic-force microscopy (AFM). Yudasaka et al. studied the effect of polymethylmethacrylate in SWNT purification and cutting in chlorobenzene [172]. Chen et al. described a non-wrapping approach to non-covalent engineering of SWNT surfaces by poly(aryleneethynylene), which led to a >20-fold enhancement of thinSWNT solubility [173]. Bavastrello et al. synthesized conducting poly(o-toluidine) with MWNT composite [174]. McCarthy et al. developed a composite material based on SWNT and poly {(m-phenylenevinylene)-co-[(2,5-dioctyloxy-p-phenylene) vinylene]} (PmPV) [175]. Stoddart’s group investigated the chemical interactions between SWNTs and two polymers, PmPV and poly{(2,6-pyridinylenevinylene)-co-[(2,5-dioctyloxy-p-phenylene) vinylene]} (PPyPV) [176]. The difference between these two polymers is that PPyPV is a base and is readily protonated via the addition of HCl. Both polymers promote CHCl3 solubilization of SWNTs. They found that SWNT–PPyPV interaction lowers pKa of PPyPV. Optoelectronic devices, fabricated from single polymer-wrapped SWNTs, revealed a photogating effect on charge transport that can rectify or amplify current flow through SWNTs. For PmPV-wrapped SWNTs, the wavelength dependence of this effect correlated to the absorption spectrum of PmPV. For PPyPV, the wavelength dependence correlated with the absorption spectrum of protonated PPyPV, indicating that SWNTs assist in charge stabilization. Compared with polymers, rigid macromolecules, e.g., stilbene dendrimers, possess more two-dimensional shapes and contain pockets of well-defined dimensions [177]. They showed by molecular modelling of the third-generation stilbenoid dendrimer that only single (10,10) could fit inside the pockets of the dendrimer, thus providing the possibility of more efficient dispersion of SWNT bundles during solubilization. They synthesized a hyperbranched polymer that exhibited an appropriate degree of branching, and was found more efficient at breaking SWNT bundles, provided it was employed at higher polymer-to-SWNT ratios than was the parent PmPV. Introducing a certain degree of branching into PmPV makes it more rigid and less efficient when it comes to
250 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Francisco Torrens
wrapping bundles of SWNTs, and so more polymeric material is required to achieve a sufficient coverage for SWNT dispersion–solubilization. However, the pockets provided by the hyperbranched polymer offer a better fit for SWNTs. More single strands of SWNTs were observed on a mica wafer by AFM than was the case when PmPV was used [178]. Their success in employing a synthetic hyperbranched polymer to disperse and solubilize SWNTs in organic solvents is reminiscent of the use of gum arabic [179] and starch (gray Amy–SWNT complex, Figs. (9,10)) [180] to dissolve SWNTs in aqueous solutions. They concluded that: (1) common starch, activated toward complexation by wrapping itself helically around small molecules, transport SWNTs competitively into aqueous solutions, (2) the process is sufficiently reversible at high temperatures to permit the separation of SWNTs in their supramolecular starch-wrapped form by a series of physical manipulations from amorphous carbon, and (3) the addition of glucosidases to these starched SWNTs results in precipitation of SWNTs from aqueous solution.
Fig. (9). Schematic description of (a) channel-type crystal structures formed by cyclodextrin (CD) inclusion complex, and (b) proposed CD6–polyethylene glycol inclusion compound.
IMMOBILIZATION OF BIOMOLECULES ON FULLERENES Hungerbühler et al. prepared C60 vesicular solutions in dioctadecyldimethylammonium bromide, dihexadecyl hydrogen phosphate and lecithin showing yellow-brownish colours, which indicate C60 incorporation into the membrane [181]. They observed that: (1) the yellow-brownish colour emerges with vesicle formation during ultrasonication; (2) gel exclusion chromatography shows elution of the coloured vesicle fraction within dead-time, while the non-incorporated and insoluble C60 remains at the top of the gel column; (3) attempts to extract membrane-incorporated C60 from the coloured vesicle fraction into toluene failed; and (4) extraction became possible after vesicle destruction adding KCl, with the toluene phase showing C 60 characteristic features in toluene (purple colour–distinct narrow band at 408nm). Some C60 derivatives are neuroprotective agents [182–187], what allowed assuming that fullerenes could appear effective neuroprotecters
Partition of Solvents and Co-Solvents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 251
Fig. (10). Schematic description of (a) a left-handed, single-stranded helix of VH-amylose (Amy), (b) Amy–iodine complex and (c) Amy–carbon natotube complex.
for the treatment of man neurohereditary diseases, and radiating damage prevention [188]. C60–PVP-adduct administration in the hippocampus prevents disturbances of long-term memory consolidation induced by cycloheximide [189–191]. An action mechanism was offered [192]. Aromatic primary amines (aniline [193], p-phenylenediamine [194] and benzidine [195]) are C60-composite donor components. However, tertiary amines (N,N-dialkylsubstituted anylines [196] and N,N,N’,N’-tetraalkyl-substituted p-phenylenediamines) do not form C60-complexes. Prabha et al. interpreted C60–lysozyme association [197]. Although native lysozyme does not bind hydrophobic molecules like 8-anilino-1-naphthalenesulfonic acid, denatured lysozyme interacts with the latter via exposed and hydrated hydrophobic amino-acid residues [198]. In solvents, there would be a strong tendency for C 60 to either react with the terminal amino groups, or occupy sites within the hydrophobic lysozyme core. When they studied C60–lysozyme interactions using a mixed solvent that solubilized both, an adduct was formed, which could be isolated and resulted watersoluble giving an amber-coloured clear solution. Fluorescence energy transfer is a versatile technique to study interactions between two molecules, provided the emission
252 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Francisco Torrens
wavelength of the first overlaps with the excitation wavelength of the second. If the intermolecular distance is <70Å, the emission peak of the first either disappears, or shows a decreased intensity with the appearance of the emission peak of the second. Lysozyme aromatic amino acids excited at 280nm show a peak between 300 and 400nm, corresponding to aromatic-residue emission. C60–lysozyme adduct did not show the typical lysozyme emission peak; they observed an intense peak between 400 and 450nm resulting from the energy transfer between the excited aromatic residues and C60, implying C60–lysozyme interaction. Confirmation was obtained from UV–VIS absorbance spectroscopy. Water-insoluble C60 shows an absorbance maximum at λmax ≈ 330nm in toluene, whereas aqueous lysozyme shows λmax ≈ 280nm. The adduct aqueous spectrum showed the typical lysozyme absorbance peak, and a small peak corresponding to C60. Both peaks are red-shifted by a few nanometers confirming C60–lysozyme interaction. There is an absorbance broadening beyond 400nm typical of C60 in organic solvents. Lysozyme has free N-terminal amino group, and some amino groups present along the side chains of the basic amino acids, which can form covalent bonds with C60. They shaken an adduct aqueous solution with toluene for few minutes, and recorded UV absorbances of the aqueous and organic phases. The aqueous phase showed the typical lysozyme absorbance with the disappearance of the absorbance peak corresponding to that of C60 observed in adduct aqueous solution, suggesting that there is no covalent-bond formation, and the interaction observed by spectrofluorimetry is hydrophobic in nature, what was confirmed by denaturing polyacrylamide gel electrophoresis (GEP), wherein proteins are boiled in a solution containing sodium dodecyl sulfate (SDS). SDS binds the protein backbone in a fixed ratio of 1:2 SDS:amino acid, and envelops the original charges present on the protein. SDS binding causes denaturation of proteins making them into linear rods. When SDS-bound proteins are subjected to GEP, they are resolved into separate bands according to molecular weights. After GEP with C60–lysozyme adduct with lysozyme as a control, they observed that the adduct showed a band parallel to lysozyme control, ruling out any change in its molecular weight after C60 binding, what supports their observation that C60–lysozyme interaction is purely hydrophobic, which was disrupted by SDS. IMMOBILIZATION OF BIOMOLECULES ON SINGLE-WALL CARBON NANOTUBES Sadler’s group immobilized biomolecules in and on MWNTs [199–202], motivated by the prospects of using MWNTs as new types of biosensor materials [203]. They studied by high-resolution TEM the morphologies of the small proteins Zn2Cd5-metallothionein, cytochrome c3 and β-lactamase I immobilized inside MWNTs. They clearly observed single protein molecules and their associated forms inside the central cavity, and a significant amount remained catalytically active indicating that no drastic conformational change had taken place. They immobilized platinated and iodinated oligonucleotides on MWNTs. The electronic properties of MWNTs coupled with the specific recognition properties of the immobilized biosystems would indeed make for an ideal miniaturized sensor. A prerequisite for research in this area is the development of chemical methods to immobilize biomolecules onto MWNTs in a reliable manner. Thus far, only limited work was carried out with MWNTs. Metallothionein proteins were trapped inside and placed onto the outer surfaces of openended MWNTs. Streptavidin and HupR show helical crystallization when adsorbed on
Partition of Solvents and Co-Solvents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 253
MWNTs, Fig. (11), presumably via hydrophobic interactions between MWNTs and hydrophobic domains of the proteins. DNA molecules adsorbed on MWNTs via nonspecific interactions. The enzyme β-lactamase I immobilized in MWNTs remains catalytically active in a significant amount [204]. Chen et al. reported a simple and general approach to non-covalent functionalization of the sidewalls of SWNTs, and subsequent immobilization of various biomolecules onto SWNTs with a high degree of control and specificity [205]. The non-covalent functionalization involves a bifunctional molecule, 1-pyrenebutanoic acid, succinimidyl ester (PBASE), irreversibly adsorbed onto the inherently hydrophobic surfaces of SWNTs in an organic solvent N,N-dimethylformamide (DMF) or methanol. The pyrenyl group, being highly aromatic in nature, is known to interact strongly with the basal plane of graphite via π-stacking [206,207], and they also found to strongly interact with the sidewalls of SWNTs in a similar manner, thus providing a fixation point for PBASE on SWNTs. The anchored molecules of PBASE on SWNTs are highly stable against desorption in aqueous solutions. This leads to the functionalization of SWNTs with succinimidyl ester groups, which are highly reactive to nucleophilic substitution by primary and secondary amines that exist in abundance on the surface of most proteins. The mechanism of protein immobilization on SWNTs, then, involves the nucleophilic substitution of N-hydroxysuccinimide by an amine group on the protein, resulting in the formation of an amide bond. This technique enables the immobilization of a wide range of biomolecules on the sidewalls of SWNTs with high specificity and efficiency, as these authors demonstrated with ferritin, streptavidin and biotinyl-3,6-dioxaoctanediamine (biotin-PEO-amine).
Fig. (11). Schematic description of the helical crystallization of proteins on the outer surface of a carbon nanotube.
254 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Francisco Torrens
ANTIVIRAL ACTIVITIES OF FULLERENE-CONTAINING PREPARATIONS The best way to study fullerenes biological properties is using water-soluble non-covalent complexes with macromolecular carriers, which can change preparation pharmacokinetic properties, but in most cases do not change the basic pharmacodynamic properties of fullerenes. C60–PVP is relatively fragile and can be easily destroyed even when KCl is added to water solution [208] and in organic media [209]. C60–PVP complex on the model of influenza virus (IV) type A (ribonucleic acid, RNA-containing virus) showed the antiviral activity comparable to rimantadine [210,211]. Studies of C60–PVP complex action showed [212]: (1) Complexes are active against IV of both types (A and B). (2) Complexes are active against RNA- and DNA-containing viruses (IV and herpes simplex virus, respectively). (3) The antiviral activity is dose-dependent of active-principle concentration, C60. (4) Complexes are low toxic, and do not change the metabolic and morphological properties of the tissue culture. (5) Complexes action on IVs reproduction does not depend on replication-cycle stage, and has a continuous effect during all stage of viral replication cycle; the synthesis inhibition of individual viral proteins does not correlate with antiviral activity. There are a lot of antiviral activity examples of fullerene-containing preparations: (1) inactivation of Semliki Forest and vesicular stomatitis virus in biofluids by 1O2, generated by C60 under irradiation; (2) HIVP inhibition by 4,4’-bis(N-succinyl-2-aminoethane)diphenylmethanofullerene [213–216]; and (3) Piotrovsky and Kiselev studies of inhibition of membrane-dependent stages of infection-particles formation. Connecting these data with the virus life cycle, it is obvious that fullerenes can be used for viruses intercellular destroy, intracellular inhibition of the synthetic processes of virus replication cycle, and interference in virions assembling stages. In medicinal chemistry terms, it means that with fullerene-containing compositions three main drug-action mechanisms can be obtained: unspecific, specific (ligand–receptor) interaction and membranotropic. Fullerenes as the compounds with broad biological potential are promising not only for antiviral-compounds design, but also for the design of various drug types. Scrivens et al. reported the synthesis of 14C-labelled C 60, its suspension in water and its uptake by human keratinocytes [217]. C60 became rapidly cell-associated, though it did not affect human keratinocytes or fibroblasts proliferation, indicating that the rapid C60 accumulation in human cells does not result in acute toxicity. The interest in using C60 derivatives in biosystems raised the possibility of their assay by immunological procedures, what, in turn, leads to the question of C60 ability to induce specificantibodies production. Erlanger’s group demonstrated that the mouse immune repertoire is diverse enough to recognize and produce antibodies specific for fullerenes [218]. They solved the X-ray crystal structure of an anti-C60 antibody Fab fragment [219]. Bensasson and co-workers studied the chromatographic separation of a highly watersoluble dendritic monoaduct methano[60]fullerene octadecaacid (DF) [220] with octadecylsilica-bonded phases [221]. They found that the reversed-phase highperformance liquid chromatographic (RP–HPLC) behaviour of DF obeys the general rules of stationary- and mobile-phase selection for controlling the separation of usually acidic compounds. An RP–HPLC–ESI–MS analysis confirms DF identity, and allows molecular-weight characterization of the main impurities contained in the sample. RP–HPLC–ESI–MS can control the synthesis and efficiently purify DF, which was shown to be active against mutant infectious clones of HIV-1 [222], which are resistant
Partition of Solvents and Co-Solvents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 255
to 3’-azido-3’-deoxythymidine (AZT) and lamivudine (3TC), drugs that are widely used in acquired-immunodeficiency-syndrome (AIDS) therapy. TRANSLOCATION OF PROTEINS ACROSS CELL MEMBRANES BY SINGLE-WALL CARBON NANOTUBES Fullerenes are a new class of compounds with potential uses in biology and medicine, and many insights were made in the knowledge of their interaction with various biological systems. However, their interaction with organized living systems as well as the site of their potential action remains unclear. Foley et al. demonstrated that a fullerene derivative could cross the external cellular membrane, and it localizes preferentially to the mitochondria [223]. They proposed that their finding supports the potential use of fullerenes as drug-delivery agents as their structure mimics that of clathrin [224] known to mediate endocytosis [225]. This observation strengthens the proposed use of fullerenes to mediate the delivery of other drugs through membranes and to the targeting of tissue [226]. Pantarotto et al. [227] demonstrated that functionalized SWNTs are able to cross the cell membrane. Their study showed that SWNTs are a promising carrier system for future applications in drug delivery and targeting therapy. They explored the use of SWNTs as nanovehicles, and evaluated the biological functions of the covalently linked molecules after cellular uptake. The absence of immunogenicity of SWNTs [228], in comparison to the common protein carriers, will increase the efficacy of the therapeutics delivered in this manner. Kam et al. prepared modified SWNTs and showed that these can be derivatized to enable attachment of small molecules and proteins [229]. The functionalized SWNTs enter non-adherent as well as adherent cell lines (Chinese-hamster ovary Cricetulus griseus, CHO, and adipocyte 3T3) and by themselves are not toxic. While the fluoresceinated protein streptavidin by itself cannot enter cells, it readily enters cells when complexed to a SWNT–biotin transporter, exhibiting dose-dependent cytotoxicity. The uptake pathway is consistent with endocytosis. SWNTs could be exploited as molecular transporters for various cargos. The biocompatibility, unique physical, electrical, optical and mechanical properties of SWNTs provide the basis for new classes of materials for drug-, protein- and gene-delivery applications. ENVIRONMENTAL REMEDIATION, THERAPEUTICS AND TOXICITY OF WATER-SOLUBLE FULLERENES Damaging effects in vivo were found in mouse midbrain cells, since C60 decreases cell differentiation and proliferation [230]. Upon excitation, fullerenes form long-living triplet states 3C60 of lifetimes ranging from 50–100µs, which are considered cytotoxic agents in photodynamic therapy [231,232]. Cheng et al. characterized the adsorption–desorption interactions of naphthalene, a model environmental organic pollutant, with C60 [233]. They used C60 as a model adsorbent for carbonaceous nanoparticles, used typical batch reactors to perform adsorption–desorption experiments, studied naphthalene adsorption–desorption to and from C60 solids in different aggregation forms, where C60 was used as purchased, deposited as a thin film or dispersed in water by magnetic mixing, studied naphthalene adsorption and desorption to activated carbon, a common sorbent, compared with that of C60, found that the enhanced
256 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Francisco Torrens
C60 dispersal could affect naphthalene adsorption by several orders of magnitude, and obtained a solid–water distribution coefficient of 102.4mL·g–1 for naphthalene adsorption to poorly dispersed C60, whereas obtained 104.2–4.3mL·g–1 coefficients for well-dispersed C60 samples. They found that naphthalene desorption from dispersed C60 samples into aqueous solutions exhibits strong hysteresis. For the desorption over a period of 60 days, only ≈11% of total naphthalene was desorbed from C60. They discussed possible mechanisms. Shungites are black Precambrian rocks of Karelia (Russia) where natural occurrence of C60–70 was found [234]. The tests of natural C60–70-based water-soluble complex bioactivity showed that they are harmless and promising for therapeutics [235]. Although nanotechnology has vast potential in uses such as fuel cells, microreactors, drug-delivery devices and personal-care products, it is prudent to determine possible toxicity of nanotechnology-derived products before widespread use. It is likely that nanomaterials can affect wildlife if they are accidentally released into the environment. The fullerenes are one type of manufactured nanoparticle that is being produced by tons each year, and initially uncoated fullerenes can be modified with biocompatible coatings. Fullerenes are lipophilic and localize into lipid-rich regions such as cell membranes in vitro, and they are redox active. Other nano-sized particles and soluble metals were shown to selectively translocate into the brain via the olfactory bulb in mammals and fish. C60 can form aqueous suspended colloids nC60; the question arises of whether a redox-active, lipophilic molecule could cause oxidative damage in an aquatic species. Oberdörster investigated oxyradical-induced lipid and protein damage, as well as impacts on total glutathione (GSH) levels, in largemouth bass exposed to nC60 [236]. She found significant lipid peroxidation in brains of largemouth bass after 48hr of exposure to 0.5ppm uncoated nC60. GSH was also marginally depleted in gills of fish, and nC60 increased water clarity possibly due to bactericidal activity. This was the first study showing that uncoated fullerenes can cause oxidative damage and depletion of GSH in vivo in an aquatic species. Further research needs to be done to evaluate the potential toxicity of manufactured nanomaterials, especially with respect to translocation into the brain. Sayes et al. showed that the cytotoxicity of water-soluble fullerene species is a sensitive function of surface derivatization; in two different human cell lines, the fullerene lethal dose changed over seven orders of magnitude with relatively minor alterations in fullerene structure [237]. In particular, an aggregated form of C60, the least derivatized of the four materials, was substantially more toxic than highly soluble derivatives, e.g., C3, Na+2–3[C60O7–9(OH)12–15](2–3)– and C60(OH)24. They observed oxidative damage to the cell membranes in all cases where fullerene exposure led to cell death, showed that under ambient conditions in water fullerenes could generate superoxide anion O2–, and postulated that O2– is responsible for membrane damage and subsequent cell death. They demonstrated both a strategy for enhancing fullerene toxicity for certain applications, e.g., cancer therapeutics or bactericides, and a remediation for the possible unwarranted biological effects of C60. PERSPECTIVES A great deal of information is still lacking, especially regarding the complexes structure at the molecular–macromolecular levels, what could be attributed to the difficulty of using experimental methods that are sufficiently sensitive to discriminate the structures of the two molecules without affecting the complex whole structure.
Partition of Solvents and Co-Solvents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 257
Powerful structural analysis (IR, circular dichroism, 13C NMR or SANS) will be used extensively in the future. The possibility of using fullerenes–SWNTs and molecules labelled differently could facilitate the complexes structural characterization easier. At the macromolecular level, the use of non-destructive methods (environmental scanning electronic microscopy, ESEM, and confocal laser scanning microscopy, CLSM) should be preferred to other more conventional methods that lead to more or less destructive sample preparations (TEM, SEM). Combined microscopic–spectroscopic methods are also available. On the basis of MLPs CDs can be used as models for starch on a crude and rough level only. Their asymmetry and finite torus-height must lead to considerable end-effects that are not present in polymer structures, and that certainly will influence the inclusioncomplexes structure in a pronounced fashion, what necessarily not only applies for polyiodide complexes but also, e.g., to the inclusion adducts of Amy–CDs with surfactants. Referring to the common trend of considering chemically modified CDs as enzyme models, beside the rate enhancing effect on a particular chemical reaction, almost no similarities between enzymes and CDs can be found. Not only the entirely different chemical structures of both classes of compounds, but also their strikingly different molecular-recognition mechanisms must seriously call this concept into question. A large variety of different chemical groups enables enzymes to specifically recognize their substrates via strong interactions of distinct directionality, which can be highly sensitive to even minor changes of ligand stereochemical features. Chiral CDs are also able to discriminate between enantiomeric guests, or induce conformational enantiomerism upon inclusion, but molecular interactions are generally weak and obviously lack any directionality. In addition to the Cn-symmetry axis of the empty CD cavities, the structural-features absence that create strong guest-orienting forces indicates that CDs can reach by no means enzymatic specificity and effectiveness. In consequence CDs should be seen more likely as simple template-effect-based catalysts, which exert effect on reaction kinetics via bringing substrate and reagent into a close spatial relationship. The terminology of considering CDs as enzyme models only leads to an unjustified devaluation of nature’s sophisticated tools. Relating the unspecific CD8–C60 interactions with C60 specific biological functions seems fairly far-fetched. The main contribution to lysozyme-helix SASA is the hydrophobic term while the hydrophilic part dominates in the sheet, what is related to logPo–ch–cf,helix, which are greater than logPo–ch–cf,sheet. MLP differentiates among helices, sheet and binding site. The difference in logP is 23 log units. MLP shows that, for a given atom, logPo is sensitive to the presence in the molecule of other atoms, e.g., C70 where logPa–c contributions are greater than logPd–e are. LogPo correlates with the distance from the nearest pentagon. CDHI does not differentiate non-equivalent atoms. (10,10), the favourite SWNT, presents consistency between a relatively small Sw and great Po–ch–cf. Other magnitudes related with logPo can be calculated as the molar concentration of organic compound necessary to produce a 1:1 complex with BSA via equilibrium dialysis C, number of hydrophilic groups assuming one lipophilic group n, and hydrophile–lipophile balance (HLB). Atomic analyses of logPo–ch–cf and other properties can be performed. The analyses allow building local logPo–ch–cf, log 1/C, n and HLB maps. Research efforts are underway to use fullerenes–SWNTs in therapeutic applications. There is a strong possibility for hydrophobic protein–fullerene and –SWNT
258 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Francisco Torrens
interactions in the biomilieu when the latter is used for biomedical applications, unless the molecule is delivered effectively at the intended action site. ACKNOWLEDGMENTS I wish to thank Dr. N. Mizoguchi for providing us with calculation results before publication on the optimum number of inside H atoms and geometry of C60H60. The author acknowledges financial support from the Spanish MCT (Plan Nacional I+D+i, Project No. BQU2001-2935-C02-01) and Generalitat Valenciana (DGEUI INF01-051 and INFRA03-047 and OCYT GRUPOS03-173). ABBREVIATIONS 1
= Singlet Oxygen
3D
= Three Dimensional
3T3
= Adipocyte Cell Line
3TC
= Lamivudine
AFM
= Atomic-Force Microscopy
AIDS
= Acquired Immunodeficiency Syndrome
Amy
= Amylose
AZT
= 3’-Azido-3’-deoxythymidine
O2
biotin-PEO-amine = Biotinyl-3,6-dioxaoctanediamine CD
= Cyclodextrin
cf
= Chloroform
ch
= Cyclohexane
CHCl3
= Chloroform
CHO
= Chinese-Hamster Ovary
CI7
= Cycloisomaltoheptaose
CI8
= Cycloisomaltooctaose
CLSM
= Confocal Laser Scanning Microscopy
D
= Distance from the Nearest Pentagon
D-Glcp
= D-Glucopyranose
DF
= Dendritic Monoaduct Methano[60]Fullerene Octadecaacid
DG
= Diglucose
DLS
= Dynamic Light Scattering
DMF
= N,N-Dimethylformamide
Partition of Solvents and Co-Solvents
Frontiers in Drug Design & Discovery, 2005, Vol. 1 259
DNA
= Deoxyribonucleic Acid
EMF
= Endohedral Metallofullerene
ESEM
= Environmental Scanning Electronic Microscopy
ESI–MS
= Electrospray-Ionization Mass Spectrometry
GEP
= Gel Electrophoresis
GSH
= Glutathione
H-bond
= Hydrogen Bond
HIV-1
= Human Immunodeficiency Virus Type 1
HIVP
= Human Immunodeficiency Virus Protease
HLB
= Hydrophile–Lipophile Balance
IR
= Infrared
IV
= Influenza Virus
m
= Second Component of the Graphene Lattice Vector
M@C2n
= Endohedral Metallofullerene
MLP
= Molecular Lipophilicity Pattern
MRE
= Mean Relative Error
MWNT
= Multiple-Wall Carbon Nanotube
n
= First Component of the Graphene Lattice Vector
N2(g)
= Nitrogen Gas
nC60
= Colloidal Dispersion of Buckminsterfullerene
NMR
= Nuclear Magnetic Resonance
o
= 1-Octanol
OH
= Hydroxyl Group
P
= Partition Coefficient
PBASE
= 1-Pyrenebutanoic Acid, Succinimidyl Ester
PmPV
= Poly{(m-Phenylenevinylene)-co-[(2,5-dioctyloxy-pphenylene)Vinylene]}
PPyPV
= Poly{(2,6-Pyridinylenevinylene)-co-[(2,5-dioctyloxy-pPhenylene)Vinylene]}
PS–S
= Stationary Phase–Solute
PVP
= Poly(N-Vinylpyrrolidone)
R
= Iminoalkyl, Iminoaryl
260 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Francisco Torrens
RNA
= Ribonucleic Acid
RP–HPLC
= Reversed-Phase High-Performance Liquid Chromatography
SANS
= Small-Angle Neutron Scattering
SASA
= Solvent-Accessible Surface Area
SDS
= Sodium Dodecyl Sulfate
Sw
= Aqueous Solubility
SWNT
= Single-Wall Carbon Nanotube
TEM
= Transmission Electron Microscopy
THF
= Tetrahydrofuran
TPP
= Tetraphenylporphyrine
TX
= Triton X-100
URE
= Unsigned Relative Error
UV–VIS
= Ultraviolet–Visible
w
= Water
(C60)2
= buckminsterfullerene van der Waals dimer
∆Gsolv
= Gibbs Free Energy of Solvation
µH
= Hydrophobic Moment
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]
Mchedlov-Petrossyan, N. O.; Klochkov, V. K.; Andrievsky, G. V. J. Chem. Soc., Faraday Trans., 1997, 93, 4343-4346. Prilutski, Y.; Durov, S.; Bulavin, L.; Valerij, P.; Astashkin, Y.; Yashchuk, V.; Ogul’Chanski, T.; Buzaneva, E.; Andrievsky, G. Mol. Cryst. Liq. Cryst., 1998, 324, 65-65. Tokuyama, H.; Yamago, S.; Nakamura, E. J. Am. Chem. Soc., 1993, 115, 7918-7919. Tabata, Y.; Murakami, Y.; Ikada, Y. Jpn. J. Cancer Res., 1997, 88, 1108-1108. Lai, D. T.; Neumann, M. A.; Matsumoto, M.; Sunamoto, J. Chem. Lett., 2000, 29, 64-65. Atwood, J. L. Koutsantonis, G. A.; Raston, C. L. Nature (London), 1994, 368, 229-231. Shinkai, S.; Ikeda, A. Gazz. Chim. Ital., 1997, 127, 657-657. Issacs, N. S.; Nichols, P. J.; Raston, C. L.; Sandoval, C. A.; Young, D. J. Chem. Commun., 1997, 1839-1840. Bourdelande, J. L.; Font, J.; González-Moreno, R.; Nonell, S. J. Photochem. Photobiol. A, 1998, 115, 69-71. Ikeda, A.; Yoshimura, M.; Udzu, H.; Fukuhara, C.; Shinkai, S. J. Am. Chem. Soc., 1999, 121, 42964297. Liu, T.; Li, M.; Li, N.; Shi, Z.; Go, Z.; Zhou, X. Electroanalysis (N. Y.), 1999, 11, 1227-1732. Georghiou, P. E. Mizyed S.; Chowdhury, S. Tetrahedron Lett., 1999, 40, 611-614. Matsubara, H.; Hasegawa, A.; Shiwaku, K.; Asano, K.; Uno, M.; Takahashi, S.; Yamamoto, K. Chem. Lett., 1998, 27, 923-924. Veen, E. M.; Postma, P. M.; Jonkman, H. T.; Speck, A. L.; Feringa, B. L. Chem. Commun ., 1999, 1709-1710. Tashiro, K.; Aida, T.; Zheng, J.-Y.; Kinbara, K.; Saigo, K.; Sakamoto, S.; Yamaguchi, K. J. Am. Chem. Soc., 1999, 121, 9477-9478.
Partition of Solvents and Co-Solvents [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 261
Atwood, J. L.; Davies, J. E. D.; MacNicol D. D.; Vogtle, F. In: Comprehensive Supramolecular Chemistry; Szejtli J. and Osa, T. Eds.; Elsevier: New York, 1996; Vol. 3, pp. 1-1. Tokuyama, H.; Yamago, S.; Nakamura, E.; Shiraki, T.; Sugiura, Y. J. Am. Chem. Soc., 1993, 115, 7918-7919. Nakamura, E.; Tokuyam, H.; Yamago, S.; Shiarki T.; Sugiura, Y. Bull. Chem. Soc. Jpn., 1996, 69, 2143-2151. Samal, S.; Geckeler, K. E. Chem. Commun., 2000, 1101-1102. Samal, S.; Choi, B.-J.; Geckeler, K. E. Chem. Commun., 2000, 1373-1374. Cui, D.; Tian, F.; Kong, Y.; Titushikin, I.; Gao, H. Nanotechnology, 2004, 15, 154-157. Thompson, D. B. Carbohydr. Polym., 2000, 43, 223-239. French, D. Adv. Carbohydr. Chem., 1957, 12, 189-260. Wenzt, G. Angew. Chem., Int. Ed. Engl., 1994, 33, 803-822. Thoma, J. A; Stewart, L. In Starch: Chemistry and Technology; R. L. Whistler and E. F. Paschall, Eds.; Academic Press: New York, 1965; Vol. 1, pp. 209-209. Griffiths, D. W.; Bender, M. L. Adv. Catal., 1973, 23, 209-209. Bender, M. L.; Komiyama, M. Cyclodextrin Chemistry; Springer: Berlin, 1978. Jensen, A. W.; Wilson, S. R.; Schuster, D. I. Bioorg. Med. Chem., 1996, 4, 767-779. Beck, M. T.; Mándi, G. Fullerene Sci. Technol., 1997, 5, 291-310. Connors, K. A. Chem. Rev., 1997, 97, 1325-1357. D'Souza, V. T.; Lipkowitz, K. B. Chem. Rev., 1998, 98, 1741-1742. Lipkowitz, K. B. Chem. Rev., 1998, 98, 1829-1873. Castro, E. A.; Barbiric, D. A. J. J. Arg. Chem. Soc., 2002, 90, 1-44. Bezmel’nitsyn, V. N.; Eletskii, A. V.; Okun, M. V. Physics–Uspekhi (Moscow), 1998, 41, 1091-1114. Diederich, F.; Gómez-López, M. Chem. Soc. Rev., 1999, 28, 263-277. Da Ros, T.; Prato, M. J. Chem. Soc., Chem. Commun., 1999, 663-669. Bianco, A.; Da Ros, T.; Prato, M.; Toniolo, C. J. Peptide Sci., 2001, 7, 208-219. Guldi, D. M.; Martín, N. J. Mater. Chem., 2002, 12, 1978-1992. Plißka, V.; Testa, B.; van de Waterbeemd, H. Eds.; Lipophilicity in Drug Action and Toxicology, Proceedings of the 1st Lipophilicity Symposium, Lausanne, 2000, Methods and Principles in Medicinal Chemistry No. 4, VCH: Weinheim, 1996. Testa, B.; van de Waterbeemd, H.; Folkers, G.; Guy, R. Eds.; Pharmacokinetic Optimization in Drug Research: Biological, Physicochemical, and Computational Strategies, Proceedings of the 2nd Lipophilicity Symposium, Lausanne, 2000, Wiley–VCH–Helvetica Chimica Acta: Zurich, 2001. Torrens, F. Comb. Chem. High Throughput Screen., 2003, 6, 801-809. Torrens, F. In: Functional Nanomaterials, Geckeler, K. E. Ed.; American Scientific: Stevenson Ranch (CA), in press. Torrens, F. J. Chromatogr. A, 2001, 908, 215-221. Torrens, F.; Sánchez-Marín, J.; Nebot-Gil, I. J. Chromatogr. A, 1998, 827, 345-358. Torrens, F. J. Chem. Inf. Comput. Sci., 2004, 44, 60-67. Torrens, F. J. Mol. Struct. (Theochem), in press. Torrens, F. Internet Electron. J. Mol. Des., 2004, 3, 514-527. Torrens, F. Internet Electron. J. Mol. Des., in press. Torrens, F. Int. J. Quantum Chem., in press. Torrens, F. Mol. Simul., 2005, 31, 107-114. Torrens, F. Proc. – Electrochem. Soc., submitted for publication. Torrens, F. J. Mol. Struct. (Theochem), in press. Torrens, F. Nanotechnology, in press. Torrens, F. Probl. Nonlin. Anal. Eng. Syst., in press. Torrens, F. In: Synthetic Organic Chemistry, Seijas, J. A., Vázquez Tato, M. P., Lin, S.-K. Eds.; Universidad de Santiago de Compostela: Santiago de Compostela, 2004; Vol. 8, pp. 1-14. Torrens, F. Mater. Sci. Eng., C., submitted for publication. Cramer, F.; Dietsche, W. Chem. Ber., 1959, 92, 378-378. Bishop, R.; Hermansson, I.; Jaderlund, B.; Lindgen, G.; Pernow, C. Am. Lab. (Shelton, Conn.) , 1986, 18, 138-138. Özdemir Can, H. Turk. J. Chem., 2003, 27, 553-564. Lee, O.-S.; Jang, Y. H.; Cho, Y. G.; Hyun, M. H.; Kim, H.-J.; Chung, D. S. Chem. Lett., 2001, 30, 232-233. Hinze, W. L., Armstrong, D. W. Eds.; Ordered Media in Chemical Separations, American Chemical Society: Washington, 1987.
262 Frontiers in Drug Design & Discovery, 2005, Vol. 1 [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104]
Francisco Torrens
Szejtli, J. Cyclodextrin Technology, Kluwer: Boston, 1988. Li, S.; Purdy, W. C. Chem. Rev., 1992, 92, 1457-1470. Arad-Yellin, R.; Green, B. S.; Knossov, M.; Tsoucaris, G. In: Inclusion Compounds (Physical Properties and Applications), Atwood, J. L., Davies, J. E. D., MacNicol, D. D. Eds.; Academic: London, 1984; Vol. 3, pp. 263-263. Uekama, K.; Hirayama, F.; Imai, T.; Otagiri, M.; Harata, K. Chem. Pharm. Bull., 1983, 31, 33633363. Hamilton, J. A.; Chen, L. J. Am. Chem. Soc., 1988, 110, 5833-5841. Kano, K.; Tatsumi, M.; Hashimoto, S. J. Org. Chem., 1991, 56, 6579-6585. Dodziuk, H.; Sitkowski, J.; Stefaniak, L.; Jurczak, J.; Sybilska, D. J. Chem. Soc., Chem. Commun., 1992, 207-208. Armstrong, D. W.; Ward, T. J.; Armstrong, R. D.; Beesley, T. E. Science, 1986, 232, 1132-1135. Harata, K. Bull. Chem. Soc. Jpn., 1982, 55, 1367-1371. Harata, K.; Uekama, K.; Otagiri, M.; Hirayama, F. J. Inclusion Phenom., 1984, 2, 583-583. Harata, K. In: Inclusion Compounds, J. L. Atwood, J. L., Davies, J. E. D., MacNicol, D. D. Eds.; Oxford University: Oxford, 1991; Vol. 5, pp. 311-311. Harata, K. Rep. Natl. Inst. Biosci. Human-Technol., 1993, 1(2), 1-1. Croft, A. P.; Bartsch, R. A. Tetrahedron, 1983, 39, 1417-1474. Koji, K. Bioorg. Chem. Front., 1993, 3, 1-1. Breslow, R. Science, 1982, 218, 532-537. Tabushi, I. Acc. Chem. Res., 1982, 15, 66-72. Tabushi, I. Tetrahedron, 1984, 40, 269-292. Tabushi, I. In: Inclusion Compounds (Physical Properties and Applications), Atwood, J. L.; Davies, J. E. D.; MacNicol, D. D. Eds.; Academic: London, 1984; Vol. 3, pp. 445-445. Breslow, R., in: Inclusion Compounds (Physical Properties and Applications), Atwood, J. L.; Davies, J. E. D.; MacNicol, D. D. Eds.; Academic: London, 1984; Vol. 3, pp. 473-501. D’Souza, V. T.; Bender, M. L. Acc. Chem. Res., 1987, 20, 146-152. Tee, O. S. Carbohydr. Res., 1989, 192, 181-195. Breslow, R. Acc. Chem. Res., 1995, 28, 146-153. Betzel, C.; Saenger, W.; Hingerty, B. E.; Brown, G. M. J. Am. Chem. Soc., 1984, 106, 7545-7557. Fleming, A. Proc. R. Soc. London, B, 1922, 93, 306-317. Abraham, E. P.; Robinson, R. Nature (London), 1937, 140, 24-24. Berger, L. R.; Weiser, R. S. Biochim. Biophys. Acta, 1957, 26, 517-521. Salton , M. R. J.; Ghuysen, J. M. Biochim. Biophys. Acta, 1959, 36, 552-554. Jollès, J.; Jauregui-Adell, J.; Bernier, I.; Jollès, P. Biochim. Biophys. Acta, 1963, 78, 668-689. Canfield, R. E. J. Biol. Chem., 1963, 238, 2698-2707. Blake, C. C. F.; Koenig, D. F.; Mair, G. A.; North, A. C. T.; Phillips, D. C.; Sarma, V. R. Nature (London), 1965, 206, 757-761. Blake, C. C. F.; Mair, G. A.; North, A. C. T.; Phillips, D. C.; Sarma, V. R. Proc. R. Soc. London, B, 1967, 167, 365-377. Blake, C. C. F.; Johnson, L. N.; Mair, G. A.; North, A. C. T.; Phillips, D. C.; Sarma, V. R. Proc. R. Soc. London, B, 1967, 167, 378-388. Hamaguchi, K.; Hayashi, K. In: Proteins Structure and Function, Funatsu, M., Hiromi, K., Imahori, K., Murachi, T., Narita, K. Eds.; Kodansha: Tokyo, 1972; Vol. 1, pp. 85-222. Torrens, F. Polyhedron, 2002, 21, 1357-1361. Kantola, A.; Villar, H. O.; Loew, G. H. J. Comput. Chem., 1991, 12, 681-689. Leo, A.; Hansch, C. J. Org. Chem., 1971, 36, 1539-1544. Krätschmer, W.; Lamb, L. D.; Fostiropoulos, K.; Huffmann, D. R. Nature (London), 1990, 347, 354358. Taylor, R.; Hare, J. P.; Abdul-Sada, A. K.; Kroto, H. W. J. Chem. Soc., Chem. Commun., 1990, 14231425. Diederich, F.; Whetten, R. L.; Thilgen, C.; Ettl, R.; Chao, I.; Alvarez, M. M. Science, 1992, 254, 1768-1770. Fleming, R. M. MRS Symp. Proc., 1991, 206, 691-691. Meng, R. L.; Ramirez, D.; Jiang, X.; Chow, P. C.; Diaz, C.; Matsuishi, K.; Moss, S. C.; Hor, P. H.; Chu, C. W. Appl. Phys. Lett., 1991, 59, 3402-3403. Diederich, F.; Whetten, R. L. Acc. Chem. Res., 1992, 25, 119-126. Parker, D. H.; Chatterjee, K.; Wurz, P.; Lykke, K. R.; Pellin, M. J.; Stock, L.; Hemminger, J. C. Carbon, 1992, 30, 1167-1182.
Partition of Solvents and Co-Solvents [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 263
Smart, C.; Eldridge, B.; Reuter, W.; Zimmerman, J. A.; Creasy, W. R.; Rivera, N.; Ruoff, R. S. Chem. Phys. Lett., 1992, 188, 171-176. Andrievsky, G. V.; Kosevich, M. V.; Vovk, O. M.; Shelkovsky, V. S.; Vashchenko, L. A. J. Chem Soc., Chem. Commun., 1995, 1281-1282. Hungerbühler, H.; Guldi, D. M.; Asmus, K.-D. J. Am. Chem. Soc., 1993, 115, 3386-3387. Deguchi, S.; Alargova, R. G.; Tsujii, K. Langmuir, 2001, 17, 6013-6017. Kasai, H.; Nalwa, H. S.; Oikawa, H.; Okada, S.; Matsuda, H.; Minami, N.; Kakuta, A.; Ono, K.; Mukoh, A.; Nakanishi, H. Jpn. J. Appl. Phys., 1992, 31, L1132-L1134. Kasai, H.; Oikawa, H.; Okada, S.; Nakanishi, H. Bull. Chem. Soc. Jpn., 1998, 71, 2597-2601. Bubnov, V. P.; Laukhina, E. E.; Kareev, I. E.; Koltover, V. K.; Prokhorova, T. G.; Yagubskii, E. B.; Kozmin, Y. P. Chem. Mater., 2002, 14, 1004-1008. Kareev, I. E.; Bubnov, V. P.; Laukhina, E. E.; Dodonov, A. F.; Kozlovski, V. I.; Yagubskii, E. B. Fullerenes Nanotubes Carbon Nanostruct., 2004, 12, 65-69. Myrdal, P.; Ward, G. H.; Dannenfelser, R.-M.; Mishra, D.; Yalkowsky, S. H. Chemosphere, 1992, 24, 1047-1061. Myrdal, P.; Ward, G. H.; Simamora, P.; Yalkowsky, S. H. SAR QSAR Environ. Res., 1993, 1, 53-61. Myrdal, P. B.; Manka, A. M.; Yalkowsky, S. H. Chemosphere, 1995, 30, 1619-1637. Lee, Y.-C.; Myrdal, P. B.; Yalkowsky, S. H. Chemosphere, 1996, 33, 2129-2144. Pinsuwan, S.; Myrdal, P. B.; Lee, Y.-C.; Yalkowsky, S. H. Chemosphere, 1997, 35, 2503-2513. Torrens, F. Toxicol. Environ. Chem., 1999, 73, 177-189. Kroto, H. W.; Heath, J. R.; O’Brien, S. C.; Curl, R. F.; Smalley, R. E. Nature (London), 1985, 318, 162-163. Ternansky, R. J.; Balogh, D. W.; Paquette, L. A. J. Am. Chem. Soc., 1982, 104, 4503-4504. Heath, J. R.; O’Brien, S. C.; Zhang, Q.; Liu, Y.; Curl, R. F.; Kroto, H. W.; Tittel, F. K.; Smalley, R. E. J. Am. Chem. Soc., 1985, 107, 7779-7780. Saunders, M. Science, 1991, 253, 330-331. Dodziuk, H.; Nowinski, K. Chem. Phys. Lett., 1996, 249, 406-412. Mizoguchi, N. VII International Conference on Mathematical Chemistry, Girona, 1997. Duesberg, G. S.; Muster, J.; Krstic, V.; Burghard, M.; Roth, S. Appl. Phys. A, 1998, 67, 117-119. Duesberg, G. S.; Blau, W.; Byrne, H. J.; Muster, J.; Burghard, M.; Roth, S. Synth. Met., 1999, 103, 2484-2485. Tang, B. Z.; Xu, H. Macromolecules, 1999, 32, 2569-2576. Niyogi, S.; Hu, H.; Hamon, M. A.; Bhowmik, P.; Zhao, B.; Rozenzhak, S. M.; Chen, J.; Itkis, M. E.; Meier, M. S.; Haddon, R. C. J. Am. Chem. Soc., 2001, 123, 733-734. Andersson, T.; Nilsson, K.; Sundahl, M.; Westman, G.; Wennerström, O. J. Chem. Soc., Chem. Commun., 1992, 604-606. Sundahl, M.; Andersson, T.; Nilsson, K.; Wennerström, O.; Westman, G. Synth. Met., 1993, 55–57, 3252-3257. Andersson, T.; Westman, G.; Wennerström, O.; Sundahl, M. J. Chem. Soc., Perkin Trans. 2, 1994, 1097-1101. Andersson, T.; Westman, G.; Stenhagen, G.; Sundahl, M.; Wennerström, O. Tetrahedron Lett., 1995, 36, 597-600. Williams, R. M.; Verhoeven, J. W. Recl. Trav. Chim. Pays-Bas Belg., 1992, 111, 531-531. Guldi, D. M.; Hungerbühler, H.; Janata, E.; Asmus, K.-D. J. Chem. Soc., Chem. Commun., 1993, 8486. Braun, T.; Buvári-Barcza, Á.; Barcza, L.; Konkoly-Thege, I.; Fodor, M.; Migali, B. Solid State Ionics, 1994, 74, 47-51. Buvári-Barcza, Á.; Braun, T.; Barcza, L. Supramol. Chem., 1994, 4, 131-133. Priyadarsini, K. I.; Mohan, H.; Tyagi, A. K.; Mittal, J. P. J. Phys. Chem., 1994, 98, 4756-4759. Zhang, D.-D.; Liang, Q.; Chen, J.-W.; Li, M.-K.; Wu, S.-H. Supramol. Chem., 1994, 3, 235-239. Priyadarsini, K. I.; Mohan, H.; Mittal, J. P.; Guldi, D. M.; Asmus, K.-D. J. Phys. Chem., 1994, 98, 9565-9569. Kanazawa, K.; Nakanishi, H.; Ishizuka, Y.; Nakamura, T.; Matsumoto, M. Fullerene Sci. Technol., 1994, 2, 189-194. Guldi, D. M.; Huie, R. E.; Neta, P.; Hungerbühler, H.; Asmus, K.-D. Chem. Phys. Lett., 1994, 223, 511-516. Kuroda, Y.; Nozawa, H.; Ogoshi, H. Chem. Lett., 1995, 24, 47-48. Priyadarsini, K. I.; Mohan, H.; Mittal, J. P. Fullerene Sci. Technol., 1995, 3, 479-493.
264 Frontiers in Drug Design & Discovery, 2005, Vol. 1 [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180]
Francisco Torrens
Yoshida, Z.; Takekuma, H.; Takekuma, S.; Matsubara, Y. Angew. Chem., Int. Ed. Engl., 1994, 33, 1597-1599. Kim, H.-S., Jeon, S.-J. Chem. Commun., 1996, 817-818. Kutner, W.; Boulas, P.; Kadish, K. M. J. Electrochem. Soc., 1992, 139, 243C-243C. Boulas, P.; Kutner, W.; Jones, M. T.; Kadish, K. M. J. Phys. Chem., 1994, 98, 1282-1287. Stasko, A.; Brezová, V.; Rapta, P.; Asmus, K.-D.; Guldi, D. M. Chem. Phys. Lett., 1996, 262, 233240. Andersson, T.; Sundahl, M.; Westman, G.; Wennerström, O. Tetrahedron Lett., 1994, 35, 7103-7106. Jin, C. Y.; Zhang, D. D.; Oguma, T.; Qian, S. X. J. Inclusion Phenom. Mol. Recognit. Chem., 1996, 24, 301-310. Komatsu, K.; Fujiwara, K.; Murata, Y.; Braun, T. J. Chem. Soc., Perkin Trans. 1, 1999, 2963-2966. Murthy, C. N.; Geckeler, K. E. Chem. Commun., 2001, 1194-1195. Buvári-Barcza, A.; Rohonczy, J.; Rozlosnik, N.; Gilányi, T.; Szabó, B.; Lovas, G.; Braun, T.; Samu, J.; Barcza, L. J. Chem. Soc., Perkin Trans. 2, 2001, 191-196. Ying, Q.; Marecek, J.; Chu, B. J. Chem. Phys., 1994, 101, 2665-2672. Simonin, J.-P. J. Phys. Chem. B, 2001, 105, 5262-5270. Andrievsky, G. V.; Klochkov, V. K.; Karyakina, E. L.; Mchedlov-Petrossyan, N. O. Chem. Phys. Lett., 1999, 300, 392-396. Andrievsky, G. V.; Klochkov, V. K.; Bordyuh, A. B.; Dovbeshko, G. I. Chem. Phys. Lett., 2002, 364, 8-17. Schuster, D. I.; Cheng, P.; Jarowski, P. D.; Guldi, D. M.; Luo, C.; Echegoyen, L.; Pyo, S.; Holzwarth, A. R.; Braslavsky, S. E.; Williams, R. M.; Klihm, G. J. Am. Chem. Soc., 2004, 126, 7257-7270. Hwang, K. C.; Mauzerall, D. J. Am. Chem. Soc., 1992, 114, 9705-9706. Hwang, K. C.; Mauzerall, D. Nature (London), 1993, 361, 138-140. Bensasson, R. V.; Garaud, J. L.; Leach, S.; Miquel, G.; Seta, P. Chem. Phys. Lett., 1993, 210, 141148. Bensasson, R. V.; Bienvenue, E.; Dellinger, M.; Leach, S.; Seta, P. J. Phys. Chem., 1994, 98, 34923500. Beeby, A.; Eastoe, J.; Heenan, R. K. J. Chem. Soc., Chem. Commun., 1994, 173-175. Vinogradova, L. V.; Melenevskaya, E. Y.; Khachaturov, A. S.; Kever, E. E.; Litvinova, L. S.; Novokreshchenova, A. V.; Sushko, M. A.; Klenin, S. I.; Zgonnik, V. N. Polym. Sci., Ser. A, 1998, 40, 1152-1159. Sushko, M. L.; Tenhu, H.; Klenin, S. I. Polymer, 2002, 43, 2769-2775. Tarassova, E. V.; Aseyev, V. O.; Tenhu, H. J.; Baranovskaya, I. A.; Klenin, S. I. Fullerenes Nanotubes Carbon Nanostruct., 2004, 12, 349-352. Tarassova, E. V.; Aseyev, V. O.; Tenhu, H. J.; Baranovskaya, I. A.; Klenin, S. I. Fullerenes Nanotubes Cabon Nanostruct., 2004, 12, 365-368. Ratnikova, O. V.; Melenevskaya, E. Y.; Yevlampieva, N. P.; Tarassova, E. V.; Zgonnik, V. N. Fullerens Nanotubes Carbon Nanostruct., 2004, 12, 361-364. O’Connell, M. J.; Boul, P.; Ericson, L. M.; Huffman, C.; Wang, Y.; Haroz, E.; Kuper, C.; Tour, J.; Ausman, K. D.; Smalley, R. E. Chem. Phys. Lett., 2001, 342, 265-271. Liu, J.; Rinzler, A. G.; Dai, H.; Hafner, J. H.; Bradley, R. K.; Boul, P. J.; Lu, A.; Iverson, T.; Shelimov, K.; Huffman, C. B.; Rodriguez-Macias, F.; Shon, Y.-S.; Lee, T. R.; Colbert, D. T.; Smalley, R. E. Science, 1998, 280, 1253-1256. Ausman, K. D.; Piner, R.; Lourie, O.; Ruoff, R. S.; Korobov, M. J. Phys. Chem. B, 2000, 104, 89118915. Yudasaka, M.; Zhang, M.; Jabs, C.; Iijima, S. Appl. Phys. A, 2000, 71, 449-451. Chen, J.; Liu, H.; Weimer, W. A.; Halls, M. D.; Waldeck, D. H.; Walker, G. C. J. Am. Chem. Soc., 2002, 124, 9034-9035. Bavastrello, V.; Carrara, S.; Ram, M. K.; Nicolini, C. Langmuir, 2004, 20, 969-973. McCarthy, B.; Coleman, J. N.; Czerw, R.; Dalton, A. B.; Carroll, D. L.; Blau, W. J. Synth. Met., 2001, 121, 1225-1226. Steuerman, D. W.; Star, A.; Narizzano, R.; Choi, H.; Ries, R. S.; Nicolini, C.; Stoddart , J. F.; Heath, J. R. J. Phys. Chem. B, 2002, 106, 3124-3130. Star, A.; Stoddart, J. F. Macromolecules, 2002, 35, 7516-7520. Star, A.; Stoddart, J. F.; Steuerman, D.; Diehl, M.; Boukai, A.; Wong, E. W.; Yang, X.; Chung, S.-W.; Choi, H.; Heath, J. R. Angew. Chem., Int. Ed., 2001, 40, 1721-1725. Bandyopadhyaya, R.; Nativ-Roth, E.; Regev, O.; Yerushalmi-Rozen, R. Nano Lett., 2002, 2, 25-28. Star, A.; Steuerman, D. W.; Heath, J. R.; Stoddart, J. F. Angew. Chem., Int. Ed., 2002, 41, 2508-2512.
Partition of Solvents and Co-Solvents [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] [211] [212] [213] [214] [215] [216]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 265
Hungerbühler, H.; Guldi, D. M.; Asmus, K.-D. J. Am. Chem. Soc., 1993, 115, 3386-3387. Dugan, L. L.; Gabrielsen, J. K.; Yu, S. P.; Lin, T.-S.; Choi, D. W. Neurobiol. Dis., 1996, 3, 129-135. Dugan, L. L.; Turetsky, D. M.; Du, C.; Lobner, D.; Wheeler, M.; Almli, C. R.; Shen, C. K.-F.; Luh, T.-Y.; Choi, D. W.; Lin, T.-S. Proc. Natl. Acad. Sci. USA, 1997, 94, 9434-9439. Lin, A. M. Y., Chyi, B. Y.; Wang, S. D.; Yu, H.-H.; Kanakamma, P. P.; Luh, T.-Y.; Chow, C. K.; Ho, L. T. J. Neurochem., 1999, 72, 1634-1640. Wang, I. C.; Tai, L. A.; Lee, D. D.; Kanakamma, P. P.; Shen, C. K.-F.; Luh, T.-Y.; Cheng, C. H.; Hwang, K. C. J. Med. Chem., 1999, 42, 4614-4620. Dugan, L. L.; Lovett, E. G.; Quick, K. L.; Lotharius, J.; Lin, T. T.; O’Malley, K. L. Parkinsonism Relat. Disord., 2001, 7, 243-246. Lin, A. M.-Y.; Fang, S.-F.; Lin, S.-Z.; Chou, C.-K.; Luh, T.-Y.; Ho, L.-T. Neurosci. Res., 2002, 43, 317-321. Lin, H.-S.; Lin, T.-S.; Lai, R.-S.; D’Rosario, T.; Luh, T.-Y. Int. J. Radiat. Biol., 2001, 77, 235-239. Shcheglov, I. V.; Kondratieva, E. V.; Podolsky, I. Y. Neurochymia, 2001, 18, 200-200. Podolsky, I. Y.; Kondratieva, E. V.; Shcheglov, I. V.; Dumpis, M. A.; Piotrovsky, L. B. Phys. Solid State, 2002, 44, 578-580. Podolsky, I. Y.; Kondratieva, E. V.; Gurin, S. S.; Dumpis, M. A.; Piotrovsky, L. B. Fullerenes Nanotubes Carbon Nanostruct., 2004, 12, 421-424. Zaporotskova, I. V.; Chernozatonskii, L. A. Fullerenes Nanotubes Carbon Nanostruct., 2004, 12, 381-386. Brusatin, G.; Signorini, R. J. Mater. Chem., 2002, 12, 1964-1977. Bazhenov, A. B.; Maksimuk, M. Y.; Fursova, T. N.; Moravski, A. P.; Nadtochenko, V. A. Russ. Chem. Bull., Int. Ed., 1996, 45, 1459-1459. Brezová, V.; Dvoranová, D.; Rapta, P.; Staßko, A. Spectrochim. Acta, Part A, 2000, 56, 2729-2739. Borovkov, N.; Blokhina, S.; Kutepov, A.; Lebedeva, N.; Olkhovich, M.; Pavlycheva, N.; Sharapova, A. Fullerenes Nanotubes Carbon Nanostruct., 2004, 12, 583-592. Prabha, C. R.; Patel, R.; Murthy, C. N. Fullerenes Nanotubes Carbon Nanostruct., 2004, 12, 405-412. Ratnaprabha, C.; Sasidhar, Y. U. J. Chem. Soc., Faraday Trans., 1998, 94, 3631-3637. Tsang, S. C.; Davis, J. J.; Green, M. L. H.; Hill, H. A. O.; Leung; Y. C.; Sadler, P. J. J. Chem. Soc., Chem. Commun., 1995, 1803-1804. Tsang, S. C.; Davis, J. J.; Green, M. L. H.; Hill, H. A. O.; Leung, Y. C.; Sadler, P. J.; Boutonnet, F.; Zablocka, M.; Igau, A.; Jaud, J.; Majoral, J.-P.; Schamberger, J.; Erker, G.; Werner, S.; Krüger, C. J. Chem. Soc., Chem. Commun., 1995, 2579-2579. Tsang, S. C.; Guo, Z.; Chen, Y. K.; Green, M. L. H.; Hill, H. A. O.; Hambley, T. W.; Sadler, P. J. Angew. Chem., Int. Ed., 1997, 36, 2198-2200. Guo, Z.; Sadler, P. J.; Tsang, S. C. Adv. Mater., 1998, 10, 701-703. Balavoine, F.; Schultz, P.; Richard, C.; Mallouh, V.; Ebbesen, T. W.; Mioskowski, C. Angew. Chem., Int. Ed., 1999, 38, 1912-1915. Davis, J. J.; Green, M. L. H.; Hill, H. A. O.; Leung, Y. C.; Sadler, P. J.; Sloan, J.; Xavier, A. V.; Tsang, S. C. Inorg. Chim. Acta, 1998, 272, 261-266. Chen, R. J.; Zhang, Y.; Wang, D.; Dai, H. J. Am. Chem. Soc., 2001, 123, 3838-3839. Katz, E. J. Electroanal. Chem., 1994, 365, 157-164. Jaegfeldt, H.; Kuwana, T.; Johansson, G. J. Am. Chem. Soc., 1983, 105, 1805-1814. Yamakoshi, Y. N.; Yagami, T.; Fukuhara, K.; Sueyoshi, S.; Miyata, N. J. Chem Soc., Chem. Commun., 1994, 517-518. Piotrovsky, L.; Kiselev, O.; Kozeletskaya, K.; Melenevskaya, E.; Vinogradova, E.; Kever, L.; Klenin, S.; Zgonnik, V.; Dumpis, M. Dokl. Akad. Nauk, 1998, 361, 547-547. Piotrovsky, L. B.; Kiselev, O.; Kozeletskaya, K.; Melenevskaya, E.; Vinogradova, E.; Kever, L.; Klenin, S.; Zgonnik, V.; Dumpis, M. Mol. Mater., 1998, 11, 121-121. Käsermann, F.; Kempf, C. Rev. Med. Virol., 1998, 8, 143-151. Piotrovsky, L. B.; Kiselev, O. I. Fullerenes Nanotubes Carbon Nanostruct., 2004, 12, 397-403. Friedman, S. H.; DeCamp, D. L.; Sijbesma, R. P.; Srdanov, G.; Wudl, F.; Kenyon, G. L. J. Am. Chem. Soc., 1993, 115, 6506-6509. Sijbesma, R.; Srdanov, G.; Wudl, F.; Castoro, J. A.; Wilkins, C.; Friedman, S. H.; DeCamp, D. L.; Kenyon, G. L. J. Am. Chem. Soc., 1993, 115, 6510-6512. Schinazi, R. F.; Sijbesma, R.; Srdanov, G.; Hill, C. L.; Wudl, F. Antimicrob. Agents Chemother., 1993, 37, 1707-1710. Toniolo, C.; Bianco, A.; Maggini, M.; Scorrano, G.; Prato, M.; Marastoni, M.; Tomatis, R.; Spisani, S.; Palú, G.; Blair, E. D. J. Med. Chem., 1994, 37, 4558-4562.
266 Frontiers in Drug Design & Discovery, 2005, Vol. 1 [217] [218] [219] [220] [221] [222] [223] [224] [225] [226] [227] [228] [229] [230] [231] [232] [233] [234] [235] [236] [237]
Francisco Torrens
Scrivens, W. A.; Tour, J. M.; Creek, K. E.; Pirisi, L. J. Am. Chem. Soc., 1994, 116, 4517-4518. Chen, B.-X.; Wilson, S. R.; Das, M.; Coughlin, D. J.; Erlanger, B. F. Proc. Natl. Acad. Sci. USA, 1998, 95, 10809-10813. Braden, B. C.; Goldbaum, F. A.; Chen, B.-X.; Kirschner, A. N.; Wilson, S. R.; Erlanger, B. F. Proc. Natl. Acad. Sci. USA, 2000, 97, 12193-12197. Texier, I.; Berberan-Santos, M. N.; Fedorov, A.; Brettreich, M.; Schönberger, H.; Hirsch, A.; Leach, S.; Bensasson, R. V. J. Phys. Chem. A, 2001, 105, 10278-10285. Gharbi, N.; Burghardt, S.; Brettreich, M.; Herrenknecht, C.; Tamisier-Karolak, S.; Bensasson, R. V.; Szwarc, H.; Hirsch, A.; Wilson, S. R.; Moussa, F. Anal. Chem., 2003, 75, 4217-4222. Schuster, D. I.; Wilson, S. R.; Kirschner, A. N.; Schinazi, R. F.; Schlueter-Wirtz, S.; Tharnish, P.; Barnett, T.; Ermolieff, J.; Tang, J.; Brettreich, M.; Hirsch, A. Proc. – Electrochem. Soc., 2000, (11) 267-270. Foley, S.; Crowley, C.; Smaihi, M.; Bonfils, C.; Erlanger, B. F.; Seta, P.; Larroque, C. Biochem. Biophys. Res. Commun., 2002, 294, 116-119. Shraiman, B. I. Biophys. J., 1997, 72, 953-957. Takei, K.; Haucke, V. Trends Cell Biol., 2001, 11, 385-391. Cagle, D. W.; Kennel, S. J.; Mirzadeh, S.; Alford, J. M.; Wilson, L. J. Proc. Natl. Acad. Sci. USA, 1999, 96, 5182-5187. Pantarotto, D.; Briand, J.-P.; Prato, M.; Bianco, A. Chem. Commun., 2004, 16-17. Pantarotto, D.; Partidos, C. D.; Hoebeke, J.; Brown, F.; Kramer, E.; Briand, J.-P.; Muller, S.; Prato, M.; Bianco, A. Chem. Biol., 2003, 10, 961-966. Kam, N. W. S.; Jessop, T. C.; Wender, P. A.; Dai, H. J. Am. Chem. Soc., 2004, 126, 6850-6851. Tsuchiya, T.; Oguri, I.; Yamakoshi, Y. N.; Miyata, N. FEBS Lett., 1996, 393, 139-145. Prat, F.; Hou, C.-C.; Foote, C. S. J. Am. Chem. Soc., 1997, 119, 5051-5052. Bernstein, R.; Prat, F.; Foote, C. S. J. Am. Chem. Soc., 1999, 121, 464-465. Cheng, X.; Kan, A. T.; Tomson, M. B. J. Chem. Eng. Data, 2004, 49, 675-683. Osipov, E. V.; Reznikov, V. A. Carbon, 2002, 40, 961-965. Osipov, E.; Kondratavicius, H.; Osipov, S. E. The 1st Advanced Nanotechnology Conference, Washington (DC), 2004. Oberdörster, E. Environ. Health Perspect., 2004, 112, 1058-1062. Sayes, C. M.; Fortner, J. D.; Guo, W.; Lyon, D.; Boyd, A. M.; Ausman, K. D.; Tao, Y. J.; Sitharaman, B.; Wilson, L. J.; Hughes, J. B.; West, J. L.; Colvin, V. L. Nano Lett., 2004, 4, 1881-1887.
Frontiers in Drug Design & Discovery, 2005, 1, 267-286
267
Automating Literature-Based Lead Discovery Jonathan D. Wren* The University of Oklahoma, Department of Botany and Microbiology, Advanced Center for Genome Technology, Norman, OK, 73019, USA Abstract: The past decade has seen a rapid growth in biomedical data from many fields: genetics, chemistry, pharmacology and medicine among others. Structured data integration within and among these fields exists to varying degrees, but unstructured data integration has existed since the dawning of science in the form of text-based published reports. The biomedical literature is vast, with Medline having approximately 15 million records and adding new records at a rate greater than one per minute. Medline contains a wealth of information about chemical compounds, interactions, side-effects, phenotypes, genetic interactions and disease studies. Computational methods are being designed to data mine these large bodies of unstructured text to infer what is not yet known based upon. Applied to drug discovery, this approach has become a potential means of shortcutting the traditional drug discovery “pipeline”, which has been estimated to take up to 15 years and cost approximately 1 billion US$ from target selection to FDA approval. These literature-based methods of knowledge discovery provide a means to identify candidate compounds to treat diseases and to identify genes that may play a role in rare, but extremely adverse reactions to promising new drugs that subsequently force their removal from the pipeline. This chapter discusses the use of literature-based sources of knowledge as a means of discovering novel connections between pharmacological entities such as diseases, drugs and genes.
INTRODUCTION It is widely recognized that the amount of publicly available data in most biomedical fields is increasing exponentially. This is the modern paradigm of biomedical research. For many fields, the most difficult task has shifted from gathering data to making sense of large amounts of it within the context of what is already known. The drug development “pipeline” goes from identifying promising molecular leads to testing efficacy in humans. Many compounds, however, have already undergone clinical tests and are known to be safe. Finding new uses or “repurposing” these drugs has a solid market, with perhaps the most striking recent example being Viagra (sildenafil citrate) – a drug originally developed to treat hypertension, then angina and then during tolerance trials was discovered to produce erections [1]. Drugs rarely have isolated, pinpoint effects on one target organ or biological function but rather affect systems, through cell *Corresponding author: E-mail:
[email protected] Garry W. Caldwell / Atta-ur-Rahman / Barry A. Springer (Eds.) All rights reserved – © 2005 Bentham Science Publishers.
268 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Jonathan D. Wren
signaling, membrane permeability, ion transfer or alteration of protein activity. It is this very “sloppiness” that can be exploited to identify other potential uses. Viagra’s original side effect is now the main selling point of the drug. Data, Information and Knowledge The terms, data, information and knowledge are sometimes misused as synonyms, but are distinct. Data consists of empirical measurements, whether quantitative or qualitative. Information is obtained by understanding relationships and context between data elements of interest. Knowledge is a structuring of information such that cause and effect can be modeled or a best course of action can be estimated. As an example, a Geiger counter reading of 100 microrems per hour is data. If this reading came from a ham sandwich, this would be information. Knowledge comes from putting together the various pieces of information both explicit and implied: Geiger counters detect radiation, 100 microrems/hour is a high radiation level, ham sandwiches are ingested for nutrition, ingesting radiation is harmful, and therefore you should not eat the ham sandwich just analyzed. In the preceding example, humans would most likely take most of the informational connections for granted, the most appropriate course of action being intuitive because of the close associations ingrained in our minds (e.g., radiation = bad for humans, food = necessary for survival and sometimes pleasurable to the senses). All these relationships seem implicit to us, yet at one time we did not know what radiation was until someone explicitly specified the nature and effects of it. If we were missing that one critical informational component (radiation=bad) when the radiation measurements were taken, we might have not thought twice about eating the sandwich. If we were missing a different observation (e.g., there’s radiation somewhere in the room, but the source is unknown) it would prompt us to gather more information before deciding a course of action. Observation and informational relationships are closely ingrained in any scientific approach – one must understand the implications of new events, data and observations within the context of other existing ones before knowledge can be gained. Yet, when data and observations are both abundant and scattered, it is not apparent what bits of data elsewhere might be relevant to our specific circumstance. So how do we go about gathering relevant data if we don’t know exactly what it is we’re looking for? Databases Provide Structure for Data Mining Methods At the core of knowledge are empirical data and the informational connections between them. As a means of relating and analyzing gathered data, researchers are increasingly turning to the use of databases (Fig. 1). The well-defined structure of databases enables fairly well established statistical analyses and data-mining techniques to be employed to identify and visualize patterns. But a lot of information, including clinical and phenotypic observations, regarding living organisms is largely reported in the literature rather than databases for several reasons. First, it is a time-honored means of reporting full experimental details, including background, rationale, experimental design, results, observations and speculation. Second, relatively speaking, databases are a more recent phenomenon and exist mostly for specialized fields. Third, databases are rigid in their structure and requirements and the main purpose of reporting science is the communication of results to parties interested either in education or further
Automating Literature-Based Lead Discovery
Frontiers in Drug Design & Discovery, 2005, Vol. 1 269
Fig. (1). Total number of unique database names published in MEDLINE abstracts over time.
experimentation – something easier to achieve via writing, a format that offers great flexibility. This flexibility, however, proves to be problematic when it comes to data mining. The Scientific Literature is the Richest Source of Information, Yet is Unstructured Most biomedical information is encoded the traditional, unstructured way – in writing. Writing offers maximum flexibility in the expression of ideas but is ill suited to computational analysis without some prior processing and manipulation. Largely, it is in writing that scientific advances are reported and the medium by which we understand where the state-of-the-art is in any field. Yet, the amount of scientific text has been growing rapidly* and, consequently, narrowing the relative perspective of individual researchers. The amount of scientific literature is increasing exponentially, along with most other databases in biomedicine, and there are far more papers published than any individual could ever hope to read. Furthermore, within this vast literature are many areas of research interest, more than any individual could ever hope to be aware of, leading to increasing specialization of research focus. This narrowing of relative awareness has not been a barrier to progress, but one could argue that it limits the rate of progress. Empirical science is built upon cycles of observation, hypothesis formation and testing. Modern biomedical research has increased the capacity by which it can gather observations (data) through high-throughput technologies such as sequencing, microarrays, compound screening libraries, and proteomics gels. This increased capacity leads to more observations on a collective basis but not an individual one. That is, researchers are only human and limited by time, education, interest and the speed by *
MEDLINE, for example, contained approximately 15 million records at the beginning of 2004, and is adding approximately 550,000 new records per year.
270 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Jonathan D. Wren
which they can both read and understand new information. They are only aware of a small fraction of what is reported. In an age where data is generated faster than knowledge [2], it becomes increasingly important to be able to compile diverse sets of facts to identify high-impact hypotheses [3, 4]. Computers Play a Role in the Automation of Knowledge Discovery Intuitively, the value of the scientific literature in guiding and offering insight into our own research is obvious. Not only do researchers use it to establish what is known and unknown in our field of study, but to draw upon trends, general principles and observations from other fields and use them to inspire or guide or future directions. A broad perspective can be extremely valuable. Yet, as mentioned, humans are simply limited in the scope of what they know and can learn. Computers, conversely, are perfectly suited to read large amounts of literature, catalog hundreds of thousands of names and synonyms, and simultaneously manipulate, track and analyze hundreds of relevant variables. It seems reasonable to stipulate that, for many areas of research with a significant body of associated literature, only a computer could gain the broadest possible perspective. Computers, however, do not currently understand meaning, implications, significance and relations the way humans do. Currently work is underway to enable computers to play a greater role in the analysis of information and the implications of research discoveries. This structuring of data and information (facts & relations) to obtain knowledge (cause & effect) has been referred to as knowledge discovery (KD). By enabling a computer to identify relationships within the scientific literature, it becomes possible to infer or deduce what is not known based upon what is. Beyond the technical challenges associated with effective information retrieval (IR), the main challenges to the discovery of new knowledge are enabling a computer to identify what is of interest, why it is of interest and how the information will be conveyed to a human user. The intent of KD-based research is not to bypass the human researcher [5], but to provide a powerful supplement in assisting observation, analysis and inference on a large scale. LITERATURE-BASED PRESENT
KNOWLEDGE
DISCOVERY:
HISTORY
AND
In 1986 a researcher named Don Swanson hypothesized that two areas of research could be functionally non-interactive, such that discoveries in one field could be relevant to studies in another, yet nonetheless remain unknown by researchers in either field because the fields have little or no overlap. For example, one might search a body of literature for documents in which two concepts, A and C, appear together. Fig. (2a) illustrates schematically the lack of a documented relationship between A and C. At this point, the search is a dead end – without any documented relationship, the searcher has nothing more to learn by traditional Boolean queries. However, by searching for other terms (B1-6) found associated with both A and C concepts, the searcher can gain a greater understanding of how A and C are related by examining their relationship to these intermediates. This approach can be useful even if A and C do have a documented relationship, as it may identify additional commonalities that do appear specifically within this intersection of topics. To permit this type of analysis, Swanson used a basic
Automating Literature-Based Lead Discovery
Frontiers in Drug Design & Discovery, 2005, Vol. 1 271
(a)
(b)
Fig. (2). Concepts A and C have no direct relationship within the literature (a), but are related to common intermediates (b).
approach involving the pairing of keywords between these different bodies of literature (A and C), and demonstrated that regions of overlap (B) could be identified and novel discoveries made [6-10]. Building Upon Swanson’s ABC Model Swanson’s model permitted a type of analysis not possible by traditional literature search methods, but suffered from a number of drawbacks. Key words, for example, may have less to do with a concept than key phrases. General concepts derived from single word analysis such as “cancer”, “disease” or “gene” are less informative than more specific concepts such as “breast cancer”, “Hodgkin’s disease” and “phosphofructokinase gene”. One way to solve these problems is to use controlled vocabularies. The Unified Medical Language System (UMLS), for example, is a repository of biomedical vocabularies developed by the National Library of Medicine, integrating over 2 million names for around 900,000 concepts [11], including Medical Subheading (MeSH) fields. In MEDLINE, each record is curated sometime after its initial entry with MeSH fields, reflecting the primary concepts found within the abstract. Improvements on Swanson’s method have been made by looking for the co-occurrences of major MeSH descriptors, by mapping text to UMLS concepts [12, 13], or by grouping recognized words and phrases into conceptual “objects” [14]. These methods greatly help the words versus phrases problem by transferring the analysis to concepts. Principally, Swanson’s early approach enabled hypothesis verification more than it did hypothesis identification. That is, one must begin the process by postulating a link between A and C, which presupposes the user already knows or suspects an answer and
272 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Jonathan D. Wren
is looking for verification. But what about finding new, potentially high-impact hypotheses without knowing where to look? The Open-Discovery Model Swanson’s early approach has been termed the “closed discovery” model. The opendiscovery model centers around the analysis of a single concept of interest. Starting with this concept, the literature is searched for related concepts (Fig. 3a), which are then in turn searched for other concepts related to them (Fig. 3b). This enables one to infer relationships that are not known, yet potentially implicit from the relationships shared by two objects (Fig. 3b). These shared relationships provide a means to both research and justify the existence of a potentially novel relationship not explicitly contained within the literature. For drug discovery, this opens up the possibility of searching for potential lead compounds by using known relationships to identify unknown relationships. Traditionally, the focus of analysis has been diseases with a search being conducted for drugs to treat it. This type of approach enables that to be turned on its head by searching for diseases a drug could be used to treat [15].
a)
b)
Fig. (3). Using literature-based relationships to engage in the discovery of new knowledge. a) Beginning with an object of interest (black node), tentative relationships are assigned to other objects (gray nodes) when they are co-mentioned within MEDLINE abstracts. b) Each related object is then analyzed for its relationships with other objects (white nodes). These nodes are not directly related to the primary node, thus they are implicitly related.
The approach outlined in Fig. (3) is what has become known as the “open discovery” model [16, 17]. It is also sometimes referred to as “Swanson’s ABC discovery model”, named because the first input node (black) is referred to as the “A” node, the direct relationships (gray) are referred to as the “B” nodes and the implicit relationships (white) are referred to as the “C” nodes. These implicit relationships have also been referred to as “indirect” and “transitive” relationships. Similarly, the relationships themselves have also been referred to as “associations” and “connections”. Swanson outlined the open-discovery approach conceptually [7], but apparently did not employ it for most of his research because of the problems it posed. Because the number of relationships per object follows a scale-free distribution (Fig. 4a), the number
Automating Literature-Based Lead Discovery
Frontiers in Drug Design & Discovery, 2005, Vol. 1 273
of implicit connections found by an unbounded search increases rapidly for every direct connection [14]. Fig. (4b) shows how the number of implicit connections rapidly approaches the maximum number possible (the upper asymptote) given a relatively small number of direct connections. Thus, everything in the database quickly becomes related to the query object and the problem quickly shifts from finding implicit connections to evaluating the potential relevance of each one.
A)
B)
Fig. (4). Structure of the literature-based network. A) The objects in a literature-based network have a disproportionate number of relationships, following a scale-free distribution. B) In the case of the scientific literature, this leads to “extremely small world” network behavior by which most objects in the network are related by at least one intermediate.
One method of dealing with this explosion is to restrict the analysis to intermediates known to be of interest, which might more accurately be described as “guided open discovery” because it does permit an open-ended query but is restricted in its scope. Selecting immune-related intermediates, for example, is one way of guiding this process and has been used successfully [18], but it does forego other types of observations that may connect two objects. It is useful, however, if the user does have a specific research focus in mind (e.g. an immunologist wanting to discover novel connections that exist specifically through immune-related activity). For guided open-discovery, it is still very much a problem to identify relevant implicit relationships from within of a very large set of implicit relationships. For unbounded open-discovery, it is even more of a problem. Theoretically, a computer might be able to generate a logical description of how A is related to each C. However, not only is this beyond the current state-of-the-art, but even a list of such descriptions would need to be ranked or somehow prioritized by their potential importance. Ranking Implicit Relationships By comparing shared relationship sets identified within the MEDLINE relationship network against what could be expected from a random network model with the same properties, a statistical significance value can be assigned to any given grouping of relationships (Fig. 5), taking into account the connectivity of each object in the set. In Fig. ( 5), a hypothetical network with 1,000 nodes is analyzed. The node with the most
274 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Jonathan D. Wren
shared relationships (four) is itself a highly connected node (connected to 95% of the network), and thus is less noteworthy from a statistical perspective than another node that shares three relationships and is connected to only 5% of the network (marked with an asterisk). A score is thus assigned on the basis of the observed to expected (Obs/Exp) ratio. Because A is given, this only needs to be calculated for each C, and can be done using equation (1).
Fig. (5). The significance of finding an implicit link between A and C is proportional to the probability one could be found by chance. n
Expect (C ↔ Bi ) = ∑ 1 − (1 − i =1
K KC ) ∗ (1 − B i ) Nt Nt
(1)
One means of quantifying performance when ranking implicit relationships is to score known relationships as if they were not known. In Fig. (2b), for example, the A and C nodes are shown as unconnected and sharing a set of intermediate connections (B1-6). However, if a new relationship were discovered between A and C, these intermediate connections would still exist. If only the intermediates are used to judge the relevance of a potential connection between A and C, then they can be evaluated independently of whether or not there is a connection between A and C. Ranking schemes can then be compared to see how well they score known relationships via their evaluation of shared relationships. Perhaps the first attempt at doing this was by comparing the number of observed relationships to the number of expected relationships within a network [14, 19]. This network connectivity ranking has been shown to correlate with the probability a relationship is known as well as with the strength of the relationship (Fig. 6). Other means of ranking implicit relationships have been tested, most recently by extending mutual information measure (MIM) calculations from direct relationships to implicit relationships [20] and also by using fuzzy set theory (FST) to identify conceptual domains shared by two objects [21]. Each approach had its strength and weaknesses in ranking inferences. For example, the FST approach was superior at identifying general concepts (e.g., migraines are associated with pain) whereas the MIM
Automating Literature-Based Lead Discovery
Frontiers in Drug Design & Discovery, 2005, Vol. 1 275
Fig. (6). The object “cardiac hypertrophy” is analyzed to identify all other objects in the database that share literature relationships with it. When a relationship is known (i.e., it has appeared in a MEDLINE title/abstract), a line is plotted on the y-axis, which corresponds to how many times the relationship was mentioned in MEDLINE. When the relationship is not known, there is a gap (not all gaps are visible due to x-axis compression). Note that frequently mentioned relationships tend to receive high scores when comparing the number of observed relationships shared by two objects to the number of relationships expected by chance (Obs/Exp).
approach was superior at identifying more specific, informative relationships (e.g., migraines are associated with sumatriptan, a medication used to treat migraines). APPLICATIONS TO DISCOVER NEW KNOWLEDGE Literature-based knowledge discovery technology could be used in a variety of different research areas, but it arguably holds the most promise for biomedical research in identifying health-related discoveries. Thus far, this has been where most of the applications have been, generating testable biomedical hypotheses. Some of the predicted hypotheses have been verified while others have not. It is evident from their application that literature-based KD methods hold promise in identifying potential lead compounds from within the literature to affect disease etiology, pathogenesis and/or progression. Swanson’s Discoveries Using the method described earlier, which was embedded in a program called Arrowsmith [10, 22] and is now freely available on the web [23], Swanson postulated a
276 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Jonathan D. Wren
relationship between Raynaud’s Disease (A) and fish oil (C) by the associated blood and vascular changes related to both phenomena (B) [6]. Raynaud's Disease is a circulatory disorder that affects certain areas of the body such as fingers, toes, ears, and tip of the nose, causing them to feel numb and cool in response to cold temperatures or stress. Fish oil, conversely, increases a number of circulatory variables decreased in Raynauds, enabling Swanson to hypothesize that fish oil might have a positive effect on Raynaud’s patients. Swanson’s hypothesis was eventually shown to be correct by another group [24], although this discovery may not apply to Raynaud’s phenomenon when it is caused as part of another underlying disease such as mixed cryoglobulinemia [25]. The novelty of the approach in combination with the validation of the discovery led other researchers to use the fish oil-Raynaud’s example as a benchmark for replication [26, 27]. Arrowsmith was later applied to identify other potentially undiscovered relationships, such as between magnesium levels and migraine headaches [8], arginine intake and blood levels of somatomedins [9], and between calcium-independent phospholipase A2 and schizophrenia [28] and to identify intermediates in known relationships without much existing literature such as between Alzheimer’s Disease (AD) and both estrogen [29] and indomethacin [30]. The latter two studies were published in 1996 and the link between AD and estrogen has since become well established [31] (also, for example, 37 papers were published prior to and including 1996 containing both search terms compared to 555 papers between 1997-2004). The link between AD and indomethacin revolves around the inflammatory process, which was established prior to 1996 [32] and also continues to be explored to a greater degree in modern literature [33]. Therapeutic Uses of Thalidomide A concept-based guided open-discovery approach using the UMLS was employed to examine potential therapeutic uses for Thalidomide, a drug formerly banned as unsafe due to the birth defects associated with its use [18]. Intermediate B concepts were restricted to immune-related factors. Connections were then drawn between thalidomide’s immune inhibitory activity and several diseases: Myasthenia gravis, sialadenitis, acute pancreatitis, chronic hepatitis C and Heliobacter pylori induced gastritis. Examining the connecting relationships suggested that thalidomide might be therapeutic for patients with these diseases. Analysis of Shared Relationships – Inferring Ontological Classifications This approach did not involve the classical knowledge discovery methods described thus far that are restricted solely to literature-based analysis, but rather provided a slight twist on them by integrating data types [34]. A gene ontology [35] category (A) was used to compile a list of genes (B) annotated as belonging to a given ontological classification. This list was then analyzed to identify implied literature relationships (C) with the gene ontology category. When the C list was restricted to genes only, the implied relationship was assumed to be that the gene belonged to or was associated with the ontology category. Analysis of the average observed to expected ratio (see section 2.3) for gene groups showed a divergence between randomly assembled gene groups and gene groups assembled from gene ontology categories. This suggested that statistical analysis of literature-based associations could be employed to evaluate how cohesive or a collection of items was within the literature. The approach identified genes that
Automating Literature-Based Lead Discovery
Frontiers in Drug Design & Discovery, 2005, Vol. 1 277
reasonably could be considered part of the ontology category analyzed at a rate of about 50% accuracy. Cardiac Hypertrophy An observed-to-expected object-based open-discovery model was built into a program called IRIDESCENT (Implicit Relationship IDEntification by in-Silico Construction of an Entity-based Network from Text). IRIDESCENT was used to identify compounds implicitly associated with cardiac hypertrophy, a clinically important disease that can develop in response to stress and high blood pressure. Chlorpromazine was singled out for experimental testing and was anticipated to reduce or blunt the development or severity of cardiac hypertrophy. A rodent model was used for testing, where isoproterenol was given to induce cardiac hypertrophy. One group received saline injections and the other chlorpromazine. Subsequent experiments suggested that chlorpromazine did indeed reduce the amount of cardiac hypertrophy induced by isoproterenol [14]. Type 2 Diabetes An observed-to-expected object-based open-discovery model was used to analyze Type 2 diabetes [36], also known as Non-Insulin Dependant Diabetes Mellitus (NIDDM), and revealed a line of literature relationships that suggest the pathogenesis of NIDDM is epigenetic (Fig. 7). The analysis also suggested that dysregulation of immune-related factors may be responsible for the onset NIDDM, explaining clinical observations regarding the apparent up-regulation of immune-related factors in NIDDM and the effects of compounds that interfere with immune system signaling also affecting NIDDM phenotype. Further analysis also suggested that adipocytes (fat cells) may be the tissue from which these immune-related compounds are released due to the effects of short chain fatty acids on DNA methylation and the well documented relationship between obesity and NIDDM.
Fig. (7). Critical relationships shared by loss of DNA methylation and NIDDM (not all relationships shown), suggesting a relationship between the two.
278 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Jonathan D. Wren
Health Benefits of Curcumin A concept-based guided open-discovery approach using MeSH terms was used to analyze a dietary substance, curcumin (turmeric). Three diseases were implicitly linked through genetic and biochemical evidence to curcumin: Retinal diseases, Crohn’s disease and spinal cord related problems. The intermediate links suggested that curcumin may be of benefit to patients suffering from these disorders [37]. CURRENT AND FUTURE RESEARCH ISSUES Conceptually, the groundwork has been laid for literature-based knowledge discovery through a number of different approaches. Some of the problems that remain to be worked out are more philosophical in nature, such as how knowledge is best obtained, represented, filtered, processed and presented to a human user. Other problems are more technical in nature and are outlined below. Setting a Focus for Analysis Any attempt at literature analysis needs some means of identifying structure from within the unstructured text being processed. Names, phrases, and especially concepts (which include synonyms) need to be identified so that relations can be detected and analyzed via some means. Thus far, several methods of anchoring literature-based analysis have been employed: MeSH terms [38-40], UMLS concepts [12, 13, 16], words [6, 22], phrases [41], the mapping of names, synonyms and spelling variants to “objects” [14, 20, 21, 34, 42]. Efforts to identify named entities within text are also promising, but will need to be constrained somehow such that the analysis only entails relationships between relevant concepts. Author names or colors, for example, might easily relate two concepts, but for biomedical purposes, these associations would likely be trivial. Avoiding Trivial Connections As mentioned in previous section, it is possible for two concepts to be “related” by one or more trivial connections. For example, many are familiar with the famous “coincidences” that the assassinations of US Presidents Abraham Lincoln and John Kennedy reportedly have in common (Fig. 8). This example illustrates two of the problems faced in literature-based KD. First, individual relationships must have logical implications. This is why most efforts are restricted to analysis of topical entities such as genes, proteins, diseases, ontological categories, or annotated fields. When a gene and disease co-occur within the same abstract or sentence, for example, the probability that there is a biologically relevant relationship is higher than if a gene and an author name were to co-occur. Genes can cause diseases or play a role in their development and etiology by their expression patterns being altered in time or space, or by alteration of their molecular structure (i.e., mutations and splice variants). However, it’s not apparent how a gene and an author name could have biological implications. Each method previously described is limited by how informative any detected relationship is. Most of them rely upon co-occurrence of terms, the potential relevance of which has been studied in several fields, including biomedicine [14, 43, 44]. Going back to the LincolnKennedy example, it is not really apparent what any of the relationships implies. The
Automating Literature-Based Lead Discovery
Frontiers in Drug Design & Discovery, 2005, Vol. 1 279
fact that Lincoln’s last name has 7 letters is not pertinent to any overriding theme, purpose or rationale. So when coupled with the fact that Kennedy’s last name also has 7 letters, this does not offer any specific implications.
Fig. (8). “Amazing” coincidences connecting the assassinations of Lincoln and Kennedy.
The second problem concerns evaluating sets of observations. An arbitrarily long list of commonalities can be compiled between any two entities if the search space is unrestricted. The long list of coincidences in Fig. (8), of which only a small subset are shown, seem to be statistically exceptional as a group. That is, any one of these coincidences is not likely to impress anyone, but collectively they seem to imply a statistically exceptional event has occurred. However, the list is quite deceptive in this sense because only similarities are shown, but the entirety of the search space is not. If we accept that name length is somehow relevant, we must ask two questions about the discovery that both Lincoln and Kennedy have 7 letters: First, what are the odds this could happen by chance alone given a sampling of last names of presidential candidates? Second, what are the odds one of their 3 names or any combination thereof has the same number of letters (e.g., note the assassin’s names have 15 letters total among all 3 names)? For literature-based KD, similar considerations must be taken into account when using a large list of shared relationships to postulate an implied relationship. This was the goal of the observed to expected ratio [14, 34, 42] mentioned in “Ranking implicit relationships” – if two genes happen to both be related within MEDLINE abstracts to the word “cancer”, this must be presented in its appropriate context before it can be considered any further. To do so requires evaluating the odds that any gene might be mentioned with the word “cancer”. Weighting each relationship by their mutual information achieves a similar affect – vague or general relationships are penalized because they have low mutual information, while statistically exceptional relationships receive high weights. Then, when comparing the mutual information from one set of relationships to another; a rank order emerges in which the statistically exceptional ranks above that which could be more attributed to coincidence.
280 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Jonathan D. Wren
Detecting Relationships Within Text Most of the approaches mentioned so far define a “relation” between two entities when they are observed within a single unit of text, whether sentence, abstract, phrase, or annotation field such as MeSH. Each choice involves a trade-off between precision and recall [43]. The question also arises regarding where most pertinent information can be located within a body of text. Studies suggest that although more information is found within the full-text of an article, abstracts nonetheless have the highest information content per word [45]. Confidence values can be assigned in relationships that are determined by the cocitation of entities within text, either by choosing a cutoff for significance [44] or by weighting the relationship according to a statistical confidence measure [14]. Confidence measures are based upon the probability that a co-citation of two entities within a given unit of text reflects a non-trivial relationship and for biology, approximate estimates put abstract co-citations at about 50% and sentence co-citations at about 80% probability of a non-trivial relationship, although the precise number varies by study [14, 43, 44]. Such estimates, however, are subjective. One attempt at reducing subjectivity examined cocitation patterns within the first half of MEDLINE (records up until approximately Nov. 1991) and compared them with co-citations for the same entities in the latter half of MEDLINE. By first establishing that the two entities existed in both halves of the literature, this approach presumed that co-citation in the early literature only could be due to several possibilities: The co-mention was the result of two unrelated topics being discussed together, the objects were once studied for a relationship but none was found or it was in error, or a relationship was established but was not of sufficient interest to warrant further study or mention. Regardless of the exact reason, these non-persistent “relationships” are probably the closest statistical approximation to trivial or erroneous relationships. Fig. (9) shows the frequency of co-citation for non-persistent relationships is very similar to the abstract error rate estimated by manual evaluation [14], suggesting that these subjective approximations correlate with statistical observations. This use of co-citations has been adopted in a number of experiments where an automated attempt is made at constructing networks of potential interactions or relationships, mostly between genes or proteins [44, 46-50]. The best known is probably the creation of the PubGene genetic network via co-citation of gene names within MEDLINE [44]. Traditionally, two entities are declared potentially related if they are co-mentioned together at least once within a unit of text (e.g. an abstract) and unrelated if they are not mentioned together. This is somewhat of an over-simplification, but a necessary one. Realistically, several co-mentions between terms could be observed without a definitive relationship being present. However, if one uses a cutoff above zero to define when a relationship exists, a problem arises: Some co-mentions below the cutoff will constitute a real relationship – one that is apparent to readers. Using zero co-mentions as a cutoff is a convenience to avoid this problem even if the end result is that some relationships are declared "known" when they really are not. Similarly, if there are no co-mentions between two objects within one body of literature (e.g. MEDLINE abstracts), that does not necessarily mean there are no co-mentions elsewhere (e.g. MEDLINE full-text articles, patent filings, etc.). Thus, as much as possible, the domain of analysis needs to be sufficiently representative of the current state of knowledge in the field(s) being analyzed.
Automating Literature-Based Lead Discovery
Frontiers in Drug Design & Discovery, 2005, Vol. 1 281
Fig. (9). Analysis of the uncertainty function in assigning tentative “relationships” based upon cocitation. Top line represents co-cited objects found within the first half of the 12 million MEDLINE records, but not the second half. Immediately below is the probability the uncertainty function (derived from sample-based error rates) assigns to co-cited relationships based upon the number of co-citations observed. For comparison, the overall distribution in the number of cocitations is shown at bottom.
Homonymy and Synonymy Homonyms are identical words with different meanings, and synonyms are different words with identical meanings. Both homonyms and synonyms present a challenge to text mining [51]. Gene symbols, for example, can be hard to identify within text due to their ambiguity [52-54]. Unfortunately, databases are not always complete in listing synonyms for named entities. And some gene synonyms conflict with the primary names of other genes. For example, TPO is the accepted abbreviation for the Thyroid Peroxidase gene but an alias for the Thrombopoietin gene, whose primary abbreviation is THPO. Some approaches have attempted to identify synonyms de novo. This is usually done by analysis of sentence context in machine-learning approaches [55]. Ambiguous Acronyms Very much related to the problem of homonymy, acronyms provide a unique challenge in text mining. Ambiguous acronyms are common to all fields, but in MEDLINE, an estimated 36% of acronyms have more than one definition. Conversely, approximately 10% of the phrases made into an acronym are made into at least two or more different acronyms [56]. Pairing short and long forms of acronyms is a topic of recent research activity, especially in biomedicine. Several different approaches have been taken among different groups to identify acronym-definition pairs within MEDLINE [57-63]. At least four of these efforts have made their databases publicly available online [64].
282 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Table 1.
Jonathan D. Wren
Acronyms Frequently have Different Meanings Within the Literature. How Frequently an Acronym Refers to a Unique Definition Within a Body of Literature (e.g. MEDLINE) can be Estimated by Various Acronym Resolution Methods. Shown Here is the Percent of References to the Given Acronym Within MEDLINE Whose Definition was the Gene Name Being Sought [42]
Gene acronym
Gene full name
Most popular alternate definitions for acronym
% that stand for gene
GAS
Gastrin
Group A Streptococci Global Assessment Scale
3%
NM
Neutrophil Migration gene
Nuclear Matrix Nodular Melanoma
1%
SD
Segregation Distortion gene
Standard Deviation Sprague-Dawley
<1%
CT
Cytidylyltransferase 1
Computed Tomography Calcitonin
<1%
ACT
Activator of CREM in Testis
Activated Clotting Time Antichymotrypsin
<1%
Orthographical Variation and Term Mapping For approaches that attempt to recognize words within text, variations in spelling, hyphenation and phrasing (orthographical variation) all affect the recognition of named entities within text. Particularly problematic in biology and medicine is the evolving nomenclature for genes, gene name variance between and among species [65], and the non-standard style by which gene names appear in the literature [66]. Table 2, for example, documents the different orthographical variations for the gene TNF-alpha. Notice that some forms are more frequent than others and that some of the variation can Table 2.
Orthographical Variations for the Full form of the Gene TNF-alpha Along with the Frequency by which Each Variant was Observed in MEDLINE (January 2004 Statistics)
Gene name
Spellings
Frequency
Relative Frequency
TNF-alpha
Tumor necrosis factor-alpha
5,885
53.3%
TNF-alpha
Tumor necrosis factor-alpha
2,657
24%
TNF-alpha
Tumor necrosis factor-alpha
1,718
15.5%
TNF-alpha
Tumor necrosis factor-alpha
623
5.6%
TNF-alpha
Tumor-necrosis factor-alpha
21
0.1%
TNF-alpha
Tumor-necrosis-factor-alpha
17
0.1%
TNF-alpha
Tumor-necrosis factor-alpha
14
0.1%
Automating Literature-Based Lead Discovery
Frontiers in Drug Design & Discovery, 2005, Vol. 1 283
be attributed to region-specific spellings (British versus American). This table only shows variants of mapping TNF-alpha to its full name and does not show the different ways that Tumor Necrosis Factor-alpha is abbreviated, which includes TNF-alpha (the most common mapping), TNF, TNF-a and TNF-a. Also problematic is that some variation in biological nomenclature is irrelevant to meaning while other variation is critical. In Table 3, for example, some term variants contain additional descriptive words or symbols. Aligning their letters is one way of measuring similarity. Where the terms fail to align is useful in determining whether or not they should be considered identical. Plural endings or additional descriptors, such as in the IL-2 example above, do not represent the existence of a conceptually different entity. Prefixes such as in DMH are sometimes unimportant as well, but sometimes are quite critical to certain chemical aspects of activity (e.g. L-alanine versus R-alanine). Table 3.
Term Similarity for Variations Between the Entity Name as Given in a Database and as It is Observed Written Within MEDLINE Abstracts. The Top Three are Identical in Meaning Despite the Textual Variation. The Bottom Three, However, have Small Blocks of Dissimilarity Comparable in Size to the Top Three. The Meanings, However, are Completely Different
Acronym
Definitions
Similarity
DMH
Dimethylhydrazine 1,2-dimethylhydrazine ----++++++++++++++
81%
IL-2
Interleukin-2 Interleukin-2 gene ++++++++++-----
76%
12-hydroxy eicosatetraenoic acid 12-hydroxy-5, 8, 10, 14-eicosatetraenoic acid +++++++++------------++++++++++++++++
73%
ABP
Androgen binding protein Auxin binding protein -------++++++++++++++
71%
AD
Alzheimer’s disease gene Aujeszky’s disease gene ---------+++++++++++++
63%
Acetylgalactosamine Acetylgluc osamine +++++++-----+++++
74%
12-HETE
ACG
FUTURE DIRECTIONS The general idea behind most literature-based KD methods is to assist researchers in identifying both reasonable and interesting hypotheses to pursue experimentally. It has
284 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Jonathan D. Wren
long been the domain of humans to be the ones doing the thinking and prioritizing, and it is not likely that computers will take over in the near future. But the benefits of having the broadest possible perspective and utilizing all available information in forming hypotheses and making decisions are undeniable – the only question is how well computers can accomplish these goals in reality. Eventually, as technology advances and more of the current research problems listed above become historical research problems, more and more aspects of observation-based hypothesis generation can be automated either in part or entirely. Each attempt at automation will no doubt meet with varying degrees of success, but if history is a guide to the future then most technical problems will eventually be overcome. As the level of automation increases, we will in essence be creating the in-silico scientist [67], a computer program that can utilize the information we have gathered collectively and employ a set of logical analyses and prioritization to identify highly reasonable and important hypotheses to be empirically tested. ABBREVIATIONS FDA
=
Food and Drug Administration
KD
=
Knowledge discovery
IR
=
Information retrieval
UMLS
=
Unified Medical Language System
MeSH
=
Medical Subheading
Obs/Exp
=
Observed to expected ratio
MIM
=
Mutual information measure
FST
=
Fuzzy set theory
IRIDESCENT
=
Implicit Relationship IDEntification by in-Silico Construction of an Entity-based Network from Text
NIDDM
=
Non-Insulin Dependant Diabetes Mellitus
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]
Kling, J. Modern Drug Discovery, 1998, 1(2), 33-34. Valencia, A. EMBO Rep., 2002, 3(5), 396-400. Blagosklonny, M.V. and Pardee, A.B. Nature, 2002, 416(6879), 373. Bray, D. Nature, 2001, 412(6850), 863. Smalheiser, N.R. EMBO Rep., 2002, 3(8), 702. Swanson, D.R. Perspect. Biol. Med., 1986, 30(1), 7-18. Swanson, D.R. Library Q., 1986, 56, 103-118. Swanson, D.R. Perspect. Biol. Med., 1988, 31(4), 526-557. Swanson, D.R. Perspect. Biol. Med., 1990, 33(2), 157-186. Swanson, D.R. and Smalheiser, N.R. Artifical Intelligence, 1997, 91, 183-203. Bodenreider, O. Nucleic Acids Res, 2004, 32 Database issue (D267-270). Hristovski, D., Stare, J., Peterlin, B. and Dzeroski, S. Medinfo., 2001, 10(Pt(2)), 1344-1348. Weeber, M., Klein, H., Aronson, A.R., Mork, J.G., de Jong-van den Berg, L.T. and Vos, R. (2000). Proc. AMIA Symp. AMIA, Los Angeles, California, pp. 903-907. Wren, J.D., Bekeredjian, R., Stewart, J.A., Shohet, R.V. and Garner, H.R. Bioinformatics, 2004, 20(3), 389-398.
Automating Literature-Based Lead Discovery [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35]
[36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 285
Vos, R. (1991) Drugs looking for diseases: Innovative Drug Research and the Development of the Beta Blockers and the Calcium Antagonists. Kluwer Academic Publishers, Dordrecht, The Netherlands. Pratt, W. and Yetisgen-Yildiz, M. (2003). Proceedings of the International Conference on Knowledge Capture (K-Cap'03), Florida, pp. 105-112. Srinivasan, P. JASIST, 2004, 55(5), 396-413. Weeber, M., Vos, R., Klein, H., De Jong-Van Den Berg, L.T., Aronson, A.R. and Molema, G. J. Am. Med. Inform. Assoc, 2003, 10(3), 252-259. Wren, J.D. and Garner, H.R. Bioinformatics, 2004, 20(2), 191-198. Wren, J.D. BMC Bioinformatics, 2004, 5(1), 145. Wren, J.D. Soft Computing (in press), 2005. Smalheiser, N.R. and Swanson, D.R. Comput. Methods Programs BioMed., 1998, 57(3), 149-153. http://kiwi.uchicago.edu/ DiGiacomo, R.A., Kremer, J.M. and Shah, D.M. Am. J. Med., 1989, 86(2), 158-164. Candela, M., Cherubini, G., Chelli, F., Danieli, G. and Gabrielli, A. Clin. Exp. Rheumatol., 1994, 12(5), 509-513. Gordon, M. and Lindsay, R. JASIS, 1996, 47(2), 116-128. Weeber, M., Vos, R., Klein, H. and de Jong-van den Berg, L. JASIST, 2001, 52(7), 548-557. Smalheiser, N.R. and Swanson, D.R. Arch. Gen. Psychiatry, 1998, 55(8), 752-753. Smalheiser, N.R. and Swanson, D.R. Neurology, 1996, 47(3), 809-810. Smalheiser, N.R. and Swanson, D.R. Neurology, 1996, 46(2), 583. Xu, H., Gouras, G.K., Greenfield, J.P., Vincent, B., Naslund, J., Mazzarelli, L., Fried, G., Jovanovic, J.N., Seeger, M., Relkin, N.R. et al. Nat. Med., 1998, 4(4), 447-451. McGeer, P.L. and McGeer, E.G. Brain Res. Brain Res. Rev., 1995, 21(2), 195-218. Casserly, I. and Topol, E. Lancet, 2004, 363(9415), 1139-1146. Wren, J.D. and Garner, H.R. Bioinformatics, 2004, 20(2), 191-198. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A. P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M. and Sherlock, G. Nat. Genet, 2000, 25(1), 25-29. Wren, J.D. and Garner, H.R. J. Biomed. Biotechnol., 2005, (in press). Srinivasan, P. and Libbus, B. Bioinformatics, 2004, 20 Suppl 1, I290-I296. Srinivasan, P. Proc. AMIA Symp., 2001, 642-646. Srinivasan, P. and Rindflesch, T. Proc. AMIA Symp., 2002, 722-726. Srinivasan, P. and Hristovski, D. MEDINFO 2004, San Francisco, California. Srinivasan, P. Proc. Annu. Symp. Comput. Appl. Med. Care, 1994, 983. Wren, J.D. (2003). Ph.D. Dissertation, University of Texas Southwestern Medical Center, Dallas. Ding, J., Berleant, D., Nettleton, D. and Wurtele, E. Pac. Symp. Biocomput., 2002, 326-337. Jenssen, T.K., Laegreid, A., Komorowski, J. and Hovig, E. Nat. Genet., 2001, 28(1), 21-28. Schuemie, M.J., Weeber, M., Schijvenaars, B.J., Van Mulligen, E.M., Van Der Eijk, C.C., Jelier, R., Mons, B. and Kors, J.A. Bioinformatics, 2004, 20(16), 2597-2604. Rindflesch, T.C., Tanabe, L., Weinstein, J.N. and Hunter, L. Pac. Symp. Biocomput., 2000, 517-528. Stapley, B.J. and Benoit, G. Pac. Symp. Biocomput., 2000, 529-540. Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M. and Eisenberg, D. Nucleic Acids Res., 2002, 30(1), 303-305. Andrade, M.A. and Bork, P. FEBS Lett., 2000, 476(1-2), 12-17. Blaschke, C., Andrade, M.A., Ouzounis, C. and Valencia, A. ISMB, 1999, 60-67. Weeber, M., Mork, J.G. and Aronson, A.R. Proc. AMIA Symp., 2001, 746-750. Chen, L., Liu, H. and Friedman, C. Bioinformatics, 2004, 21(2), 248-256. Weeber, M., Schijvenaars, B.J., Van Mulligen, E.M., Mons, B., Jelier, R., Van Der Eijk, C.C. and Kors, J.A. AMIA Annu. Symp. Proc., 2003, 704-708. Sehgal, A.K., Srinivasan, P. and Bodenreider, O. 2004, SIGIR 2004 Workshop on Search and Discovery for Bioinformatics. Yu, H. and Agichtein, E. Bioinformatics, 2003, 19 Suppl 1, I340-I349. Wren, J.D. and Garner, H.R. Methods of Information in Medicine, 2002, 41(5), 426-434. Adar, E. Bioinformatics, 2004, 20(4), 527-533. Larkey, L., Ogilvie, P., Price, A. and Tamilio, B. 2000, Proceedings of the ACM Digital Libraries Conference, pp. 205-214. Liu, H., Lussier, Y. and Friedman, C. 2001, AMIA Annual Symposium.
286 Frontiers in Drug Design & Discovery, 2005, Vol. 1 [60] [61] [62] [63] [64] [65] [66] [67]
Jonathan D. Wren
Pustejovsky, J., Castano, J., Cochran, B., Kotecki, M. and Morrell, M. Medinfo, 2001, 10(Pt 1), 371375. Schwartz, A.S. and Hearst, M.A. Pac. Symp. Biocomput., 2003, 451-462. Yu, H., Hripcsak, G. and Friedman, C. J. Am. Med. Inform. Assoc., 2002, 9(3), 262-272. Park, Y. and Byrd, R. 2001, Empirical Methods in Natural Language Processing. Wren, J.D., Chang, J.T., Pustejovsky, J., Adar, E., Garner, H.R. and Altman, R.B. Nucleic Acids Res., 2005, 33(Database Issue) D289-293. Pearson, H. Nature, 2001, 411(6838), 631-632. Chiang, J.H. and Yu, H.C. Bioinformatics, 2003, 19(11), 1417-1422. Wren, J.D. IEEE Eng. Med. Biol. Mag., 2004, 23(2), 87-93.
Frontiers in Drug Design & Discovery, 2005, 1, 287-296
287
Structural Biology in Early Phase Drug Discovery Richard Alexander* and John Spurlino Department of Structural Biology, Johnson and Johnson Pharmaceutical Research and Development, Exton, Pennsylvania 19341 Abstract: The role of protein crystallography in drug discovery has changed dramatically during the last 20 years. Based on advances in molecular biology, X-ray techniques, and computation methods, novel structures can be solved in dramatically shorter timescales than were possible previously. These advances have led to the use of structural biology in the earliest stages of drug discovery, lead identification and optimization. Advances have also decreased the time needed to solve protein-ligand structures from weeks to hours. In the past, typically only a handful of co-crystal structures for an individual project were determined during the discovery phase, and frequently, these structures were available too late to greatly impact chemistry efforts. Today, hundreds of compounds can be screened crystallographically to determine if they bind. The reduction in turnaround time for crystal structures has led to a more significant contribution of Structure-Based Drug Design during the earliest phases of drug discovery. This review will examine the ways in which the new technologies, fragment-based design, structure-based combinatorial chemistry, and the structure of macrocyclic inhibitor complexes have impacted early phase drug discovery.
INTRODUCTION Protein crystallography is the current gold standard in the determination of protein 3dimensional structures at atomic resolution. The importance of protein crystallography to the scientific community has been exemplified by the number of Nobel Prizes awarded to crystallographers over the decades. The speed at which this information is available, has been greatly increased since the structure of myoglobin was solved in the late 1950s. There has been a steady decrease in the time it takes to determine a crystal structure. A structure that would take years to determine even in the 1980s, can now be determined in a much shorter time span. Today, it is not unusual for a protein to be expressed, crystallized, and the structure determined within a few months. The timescale for pharmaceutical projects in the earliest phase of discovery (Lead Generation) is typically a 12 to 18 month process and to use Structure-Based Drug Design as an enabling technique during lead generation, a protein crystal structure must be available soon after screening hits have been verified to be relevant in drug discovery [1]. In this timeframe, a protein needs to be identified, assays developed, and proof of principle studies conducted to ensure that the protein is a viable drug target. Screening of chemical libraries is then initiated to find a proprietary lead compound, *Corresponding author: E-mail:
[email protected] Garry W. Caldwell / Atta-ur-Rahman / Barry A. Springer (Eds.) All rights reserved – © 2005 Bentham Science Publishers.
288 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Alexander and Spurlino
which can then be optimized. The absence of a structure during this time will significantly decrease the impact the crystallography will have on the project. One of the most often heard criticisms concerning crystallography is that, crystal structures appear too late in a project life cycle. Ideally, a crystal structure should be available for targetcompound co-crystal structure determination at the same time that lead molecules are identified. In order to have a protein structure available at the appropriate time, gene cloning, protein expression and purification and successful crystallization must all be accomplished. Although gene synthesis and high throughput approaches to expression, purification, and crystallization are becoming routine methods, it can still be a 3-9 month process to achieve a protein structure that is suitable for Structure-Based Drug Design. Once a protein structure is available, it is at this point that fragment-based lead generation and structure based combinatorial synthesis can be used to dramatically increase the speed at which projects can be moved along. This can result in either the production of an inhibitor with more drug-like properties or the decision that a series cannot be improved without major changes, and other chemical series should be given a greater priority. While other techniques are currently being used for fragment based design such as mass spectroscopy, NMR [2], and virtual screening by computational methods, only crystallography reveals the high resolution binding of these fragments, which in turn leads to the knowledge of the correct orientation and linking mechanism required for the design of a more potent lead [3]. An analogous situation also exists in combinatorial chemistry, where the correct length and chemical characteristics must be found prior to designing a library. The proper use of crystallography can aid in these techniques, resulting in the discovery of a lead molecule much sooner in the drug design process. Crystallography has been involved in drug discovery since the early 1980s, and there are drugs that have been approved by the FDA that have been directly impacted by crystallography (Agenerase and Viracept) [4]. Rational drug design has also been utilized in the development of Captopril, Dorzolamide, and Zanamivir [5]. Many currently approved drugs have been solved crystallographically during the discovery process. Crystallography has been instrumental in the design of inhibitors of nuclear receptors, kinases, viruses, (both capsids and enzymes) and all four classes of proteases [5-7]. Crystallography of protein inhibitor complexes has been employed for a wide range of proteins targeted in early drug discovery. Proteins that are involved in many of today’s most important drug targets: cancer [8], aids [9], hepatitis c [10], Alzheimer’s [11], drug resistant bacteria [12], cardiovascular disease [13], and previously unknown targets (SARS) [11], have had their structures determined in order to aid drug discovery. IMPROVEMENT IN CRYSTALLOGRAPHIC TECHNIQUES Improvements in technology have combined to make structural biology more useful in drug discovery. Advances in molecular biology have allowed the straightforward production of many soluble proteins. The use of purification tags and the ability to use different length constructs, leads to a large number of possible proteins for a single target. The parallel use of multiple constructs allows the choice of systems that provide sufficient quantities of protein to employ high throughput crystallization trials. Since different constructs can have very different solubility and polydispersity characteristics, a larger number of constructs should increase the chances of success in crystallizing a target. Protein engineering can also be used to change the properties of proteins that do
Structural Biology
Frontiers in Drug Design & Discovery, 2005, Vol. 1 289
not crystallize in their native form. Amino acids at the outer surface of the proteins can be altered to increase attraction between the protein molecules or prevent non-specific interactions that can poison crystal formation and growth. The use of previously crystallized proteins as the basis for mutations has proven useful for discovering areas for improving the crystallizability of target proteins. The constructs are designed using structure-based sequence alignment and homology modeling of representative enzymes that have already had their structures determined. This may allow previously uncrystallizable proteins to be crystallized for the first time. It may also improve the crystal quality of previously crystallized proteins. In pharmaceutical crystallography, the ability to crystallize the target of interest from the intended species is of critical importance. Another strategy to improve the crystallization of proteins is the use of amide hydrogen/deuterium exchange to map areas of flexibility, to modify for improved crystallization properties [14]. The deletion of large insert domains in proteins has also been utilized successfully to crystallize recalcitrant proteins, most notably, a number of kinase structures have been obtained using this strategy [15]. A major bottleneck in determining a protein structure has been obtaining diffraction quality crystals. The large number of parameters that influence the crystallization of proteins include the purity of the protein, the intrinsic biophysical properties of the protein, the multitude of potential interactions due to surface features (shape, electrostatic properties, flexibility and solvation), and conformational properties of the protein. An additional complexity arises because the nucleation of crystals and the growth of the nucleus to a crystal of diffraction quality, do not necessarily occur in the same environment. The use of robotics in crystallization has greatly increased the speed of screening crystallization conditions [16-18]. Several person-weeks of activity can be accomplished by robotics in one hour, with the ability to screen over 1000 crystallization conditions and give a general sense of the crystallizability of a particular construct. The widespread availability of synchrotron radiation has allowed for smaller crystals to be used for data collection, and this has led to a smaller amount of time needed to refine crystallization conditions. The combination of third generation synchrotron sources and crystal mounting robots has also led to an increase in the speed and quality of data collection. Native data can typically be collected in less than 1 hour. IMCA-CAT & SGX-CAT at Advanced Photon Source at Argonne National Labs are two examples of robotically enabled synchrotron beamlines. The ability to screen crystals quickly, not only allows for a more efficient use of synchrotron beam time, but permits the use of the best available crystal for data collection, instead of the first crystal that diffracted to an acceptable resolution. Many of the advances first pioneered at synchrotron sources are also available for use in a home lab setting. The increase in power and intensity of home source X-ray generators and optics has allowed a similar increase in productivity to be achieved for small crystals and large unit cells. The decrease in cost and increase in sensitivity of area detectors has also resulted in their appearance outside of synchrotrons. Data sets that once required multiple crystals and days to collect are now routinely collected on a single crystal in hours. The next step in streamlining the structure determination process is electron map generation. Improvements in phasing techniques on multiple fronts have resulted in a shortened time frame for structure determination. Previously, the search for multiple heavy atom derivatives could take months to successfully phase a data set. The use of
290 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Alexander and Spurlino
multiwavelength anomalous dispersion methods (MAD) has led to a straightforward method of solving the phase problem [19]. The ability to replace Met with SeMet, has replaced MIR as the standard method for structural solution. There has been recent work using the relatively weak sulfur anomalous signal that is present in most proteins as a source of phase information eliminating the need to produce SeMet labeled protein [20, 21]. The combined increase in resolution of X-ray data and increase in computing power has enabled the use of direct methods for small protein structure phasing [22-24]. The use of computational methods for structural refinement started the revolution in crystallographic methods. The introduction of simulated annealing [25, 26] to increase the radius of convergence of structural fitting was the first step in the automation of protein structure determination. This was quickly followed by the introduction of methods for quality control [27, 28] and additional improvements in the target function for simulated annealing [29, 30], which improved the performance and accuracy of structure determination. Recently, new methods for phasing proteins using Bayesian statistics and maximum entropy refinement methods have been employed to increase the initial quality of maps [31]. Automated fitting (electron density map interpretation) was another advancement in computational programming that has significantly shortened the time to determine protein structures [32-35]. The automatic phasing and structure fitting has been incorporated in RESOLVE, to further the automation of structure determination [36]. The success of these programs still depends on both the resolution and quality of the starting maps, although the required resolution to achieve successful results has decreased in recent years. The CCP4 project was one of the first attempts to assemble relevant crystallographic computer programs in a single cohesive package [37, 38]. One of the most exciting developments in the automation of the crystallography is the program ELVES [39]. This program has been used to index crystals, process data, locate heavy atoms, build models, and refine the structure without human intervention. The results of this automated refinement compare favorably with those requiring human intervention. This program can be used with a variety of different levels of human intervention. When structural refinement tools can be combined with ligand placement and refinement tools, such as Discovery Studio (Accelrys), the crystallographer is free to examine a large number of structures in a much shorter amount of time. UNEXEPECTED BINDING MODES While great advances have been made in modeling, it remains clear that compounds do not always bind as predicted. When only the structure of an enzyme, and not the enzyme inhibitor complex are known, an assumption of binding mode is required prior to the use of any knowledge-based approach to inhibitor design. There are many examples in the literature of unexpected binding modes being discovered in crystal structures. These binding modes can even show unexpected interactions with the catalytic machinery of the enzyme. A non-traditional binding mode of a B-secretase inhibitor was disclosed recently by Merck [40]. The complex structure revealed that the inhibitor bound on the non-prime side of the active site, having a larger than expected group located in the S3 position. Most unexpected was an indirect interaction with the catalytic aspartic acid residues mediated via a solvent molecule. When unusual binding
Structural Biology
Frontiers in Drug Design & Discovery, 2005, Vol. 1 291
modes are observed with inhibitors, the course of synthetic work will need to be redirected to take advantage of experimental evidence. Even when a parent compound has been solved in complex with the target enzyme, a chemically similar compound can bind in a different way. In elastase, a series of trifluoroacetyl-dipeptide-anilides showed different binding modes depending on their substituents [41]. While these compounds only varied an order of magnitude in their affinity for the enzyme, their binding modes changed a great deal. This is without any major changes in the structure of the enzyme. Any further modeling work based on the incorrect binding mode would not be expected to lead to the improvement of future molecules. FRAGMENT BASED DESIGN There has been great interest in fragment-based design since the publication of the SAR by NMR method [2]. The method consists of screening a large number of very small fragments experimentally. These fragments typically have low binding affinities (in the millimolar range) and several binding sites per protein. After a number of compounds have been screened, several different fragments may have binding sites near one another. If the correct linker can be found, these millimolar fragments can be tethered to one another and create a micromolar inhibitor. While other techniques are available to perform this function (NMR, Mass Spec, ThermoFluor, biacore, and in silico experiments), only X-ray crystallography is able to illuminate the orientation and exact distance between these fragments. Also, X-ray crystallography has the ability to look at a wide range of sizes of proteins from a few kD to one of the entire subunits of the ribosome. X-ray crystallography has the disadvantages of needing a system that crystallizes, having a relatively low throughput, and requiring a larger amount of protein, the detailed information that is provided from an X-ray structure is unparalleled compared to any other technique [42]. Several companies, including Astex, Plexxikon, and Structural Genomics, use fragment based design in the development of novel lead compounds. Studies that have come out of Abbott laboratories show some of the promise and limitations of crystallographic fragment based design [43]. Earlier studies described the crystalLEAD system for fragment-based design. This system involves the screening of 10,000 possible ligands to a protein in cassettes of 100 compounds. These compounds are of sufficiently different shapes so as to lead to the ability to infer which of the ligands are binding. They need to collect 100 data sets requiring a protein that crystallizes easily in a high symmetry space group (to reduce data collection time) and diffracts well enough to determine the shapes of the ligands unambiguously. Robotics is necessary to conduct the routine mounting of crystals, as well as data collection. An automated processing of raw data images, crystallographic refinement, and ligand placement greatly increases the utility of this process. A successful application of this method was described using urokinase as a test case. A crystallographic screen identified a 56 micromolar lead, which was improved to produce a 0.37 micromolar optimized lead. The optimization occurred while examining the binding mode of several leads and combining the leads with desirable PK characteristics with known inhibitors. A more recent study from the same laboratory showed some of the limitations of this technique [44]. In a search for Dihydroneopterin Aldolase inhibitors, a crystallographic screen
292 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Alexander and Spurlino
was conducted. The search was hindered because the active site was at a subunit interface of a tetramer. This made it difficult to examine an inhibitor that induced an allosteric transition. This could lead to two misleading events, the crystal could crack from inhibitor binding and not elucidate the binding mode of the inhibitor, or the allosteric binding site could not be reached because of crystal packing limitations, leading to a false negative. The authors commented that several more potent compounds could not be soaked in to the current crystal form, suggesting that the current crystal did not allow the binding of all ligands. These leads were later co-crystallized with the protein, but did not show any significant differences that would suggest why the soak was not successful. Improvements in the program were made by traditional structure directed design. Structure Based Combinatorial Design During the late 1980s, when Structure-Based Drug Design was starting to be implemented in a large number of pharmaceutical companies, combinatorial chemistry was also starting to be applied to problems of medical interest. In contrast to traditional medicinal chemistry, where targets are made in a one compound at a time fashion, combinatorial chemistry makes libraries of compounds in which many compounds can be made at once. For example, using only 20 amino acids, a total of 8,000 tripeptides and 160,000 tetrapeptides can be made quite easily. The difficulty with this technique, as was observed with fragment-based design, is the lack of the correct orientation to start a combinatorial search. As the number of compounds that can currently be synthesized far exceeds the number of compounds that can be tested, a directed combinatorial approach can examine a directed space with a diverse set of compounds. With a directed approach, the size of the molecule should be able to be limited, which can enhance the drug-like properties of the compounds being synthesized. Analysis of a structure should tell not only where to suggest libraries, but also which libraries would interfere with direct interactions with the protein or cause steric clashes. These libraries should not be made regardless of their synthetic ease. The methods used in this technique are exactly the same as traditional combinatorial chemistry, with the addition of a structural parameter. This leads to a much more focused library. An example of this methodology is shown in the development of phenylglycine based factor Xa inhibitors [45]. After benzamidine-3 carboxamide was docked into the crystal structure of factor Xa, a hypothesis was generated that replacement of the central glycine residue with 30 lipophilic D-amino acids could result in increased affinity. A cyclohexylglycine residue showed a 24-fold increase in activity. Another library was made to explore the S4 pocket of the protein with the optimized cyclohexylglycine residue. A total of 200 additional compounds were made, with a substituted piperazine ring compound showing a 1000-fold increase in activity, compared to the original benzamidine-3 carboxamide starting point. As factor Xainhibitor structures can be difficult to generate, a trypsin structure of the final compound was solved and confirmed the binding mode of the inhibitor. A side-by-side comparison of structure-based and diversity-based library design was investigated in a series of dihydrofolate reductase inhibitors [46]. A greater number of hits and more active hits were found using the structure–based approach, suggesting that active compounds could be synthesized more quickly.
Structural Biology
Frontiers in Drug Design & Discovery, 2005, Vol. 1 293
MACROCYCLE DESIGN The formation of a correctly designed macrocycle can increase the potency of an inhibitor. If the connection points and size of the macrocycle can be optimized, a tighter binding inhibitor can be formed. The reduction in the degrees of freedom an inhibitor has can decrease the entropic cost of going from a free solution state to a single bound conformation. The advantage of utilizing crystallography in this process, is knowing the exact 3-dimmensional structure of the bound state, which makes it straightforward to determine the areas of the molecule that are close together in space. The correct size of a linker between the two groups, and that a connection between the two groups would not cause any steric clash between the protein and the inhibitor, can lead to a more potent inhibitor. The structure of human neutrophil collagenase complexed to batimastat was used by Steinman and colleagues, to design a potent analogue by the addition of an ether linkage between adjacent residues [47]. These groups were pointing towards solvent, and the ring size did not have a great effect on potency. Inhibitors can often mimic a peptide substrate by forming a Beta-sheet interaction with a protein. If this is the case, the n and n+2 residues of the inhibitor are situated on the same side of the protein. After analyzing a protein inhibitor complex, it should be possible to determine which pairs of positions could be connected without losing important contacts with the protein, and without causing unfavorable interactions with the protein. In examining inhibitors of Peptide Deformylase [48], Hu and co-workers discovered that the P1’ and P3’ group could be connected with a nonyl linker and increase activity by 10 fold. Cyclization is not a panacea for weak inhibitor activity. Scheidt and co-workers designed a number of cyclic inhibitors of cruzain, a cysteine protease, based on the crystal structure of a peptidebased inhibitor. A number of conformationally restricted pyrrolidinone and vinyl sulfones were prepared, but all showed reduced activity towards the protein, suggesting that these compounds did not correctly mimic the aldehyde in the parent structure [49]. Reducing the flexibility of inhibitor backbone has also been investigated with inhibitors of second mitochondria-derived activator of caspase [50]. A crystal structure revealed that sequential valine and proline residues could be cyclized, leading to a 2-fold increase in activity. As this portion of the inhibitor makes interactions with the protein, the stereochemistry of this cyclization was important to maintain potency. Cyclic inhibitors of Beta Secretase, based on the structures of enzyme-inhibitor complexes, have also been synthesized [51]. A 16-member ring was synthesized, leading to a 2-fold increase in activity. As these rings are close to the protein, rather than exposed to solvent, ring size and saturation were important in determining the activity of inhibitors. The cyclization of inhibitors does typically add weight to an inhibitor, and depending on the group, may change the pharmacokinetic properties of an inhibitor, but an increase in potency can be observed if the correct position and length can be chosen for the linking group. With all of these techniques for deriving more potent molecules, an iterative approach must be taken. The literature is full of examples where seemingly similar inhibitors do not bind in the expected way. These alternative binding modes may be used as a new starting point for connecting fragments or building a new combinatorial library. The ability to solve many more structures than were previously possible, is a tremendous advantage in the ability to confirm binding modes within a series of inhibitors, or to discover a new binding mode with a seemingly small change in inhibitor structure.
294 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Alexander and Spurlino
FUTURE DIRECTIONS While protein crystallography will continue to support drug discovery in the examination of traditional targets, new areas are now being examined by crystallographic methods. Large complexes, such as the proteosome and the ribosome, are now being investigated. A greater understanding of how membrane proteins crystallize, will lead to a much larger number of proteins of pharmaceutical interest that can be studied at the atomic level. Advances in structural genomics also suggest that many new targets of drug discovery may have their structure determined, prior to their selection as a drug target. With the increase in the number of new structures being determined, a greater understanding of selectivity can be achieved more easily. This can be accomplished by determining the structure of a single inhibitor bound to any protein whose affinity has been observed and utilizing protein alignment and homology modeling to extrapolate the binding mode to the target of interest. This will lead to synthetic suggestions as to where selectivity can be achieved by modification of the compound, while maintaining affinity for the intended protein target. Structural data is now being utilized in the more advanced stages of early drug development. Even after a potent and selective compound has been identified, there are still major hurdles a compound must face prior to drug development. Drug metabolism and protein binding are two areas where new structures are being determined that may aid in the understanding of pharmacokinetic properties of inhibitors. Depending on the desired characteristics of a compound, it may be desirable to create a change in the level of protein binding in the serum. Several companies are having success in this field. New Century Pharmaceuticals has the ability to determine the co-crystal structures of compounds complexed to albumin. The ability to change the affinity of a compound towards albumin (without changing the affinity towards the target, of course) would enable the ability to change the level of drug available in the body. In addition, scientists at Astex have determined the structures of several P450 molecules [7]. Knowledge of the structures of these molecules, responsible for the metabolism of many drugs, could also lead to compounds with more desirable half-life properties. Additionally, the knowledge of the exact binding mode of an inhibitor can suggest which portions of an inhibitor can be eliminated to reduce the molecular weight of the inhibitor. A reduction in the weight of an inhibitor can also increase its likelihood to be absorbed into the body. All of these applications will lead to a greater utility of crystallography in a wider range of drug targets, and also provide structural insights into a wider range of the drug development lifespan. These new directions for crystallography could change the role for the discipline from one that works on specific targets, to a technique which can be generally applied to all projects which face challenges in making inhibitors more drug-like. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8]
Marx, V. Chemical and Engineering News, 2004, 82, 22-30. Shuker, S. B.; Hajduk, P. J.; Meadows, R. P.; Fesik, S. W. Science, 1996, 274, 1531-4. Erlanson, D. A.; McDowell, R. S.; O'Brien, T. J. Med. Chem., 2004, 47, 3463-82. Henry. Chemical and Engineering News, 2001, 79, 69-74. Rabine, R. E.; Abdel-Meguid, S. S. Protein Crystallography in Drug Discovery, Wiley-VCH: Weinheim 2004. Hol, W. G. J.; Verlinde, C. L. M. J. In International Tables for Crystallography, E. Arnold, ed.; Kluwer Academic Publishers: Dordrecht, 2001; Vol. F, pp. 10-25. Mountain, V. Chem. Biol., 2003, 10, 95-8. Hubbard, S. R.; Till, J. H. Annu. Rev. Biochem., 2000, 69, 373-98.
Structural Biology [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40]
[41] [42] [43] [44]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 295
Arnold, E.; Das, K.; Ding, J.; Yadav, P. N.; Hsiou, Y.; Boyer, P. L.; Hughes, S. H. Drug Des. Discov., 1996, 13, 29-47. Bartenschlager, R. J. Viral. Hepat., 1999, 6, 165-81. Yang, H.; Yang, M.; Ding, Y.; Liu, Y.; Lou, Z.; Zhou, Z.; Sun, L.; Mo, L.; Ye, S.; Pang, H.; Gao, G. F.; Anand, K.; Bartlam, M.; Hilgenfeld, R.; Rao, Z. Proc. Natl. Acad. Sci. USA, 2003, 100, 13190-5. Bussiere, D. E.; Pratt, S. D.; Katz, L.; Severin, J. M.; Holzman, T.; Park, C. H. Mol. Cell, 1998, 2, 7584. Maignan, S.; Mikol, V. Curr. Top. Med. Chem., 2001, 1, 161-74. Pantazatos, D.; Kim, J. S.; Klock, H. E.; Stevens, R. C.; Wilson, I. A.; Lesley, S. A.; Woods, V. L., Jr. Proc. Natl. Acad. Sci. USA, 2004, 101, 751-6. McTigue, M. A.; Wickersham, J. A.; Pinko, C.; Showalter, R. E.; Parast, C. V.; Tempczyk-Russell, A.; Gehring, M. R.; Mroczkowski, B.; Kan, C. C.; Villafranca, J. E.; Appelt, K. Structure Fold Des. , 1999, 7, 319-30. Jurisica, I.; Rogers, P.; Glasgow, J. I.; Fortier, S.; Luft, J. R.; Wolfey, J. R.; Bianca, M. A.; Weeks, D. R.; DeTitta, G. T. IBM Syst. J., 2001, 40, 394-409. Abola, E.; Kuhn, P.; Earnest, T.; Stevens, R. C. Nat. Struct. Biol., 2000, 7 Suppl. 973-7. Stewart, L.; Clark, R.; Behnke, C. Drug Discov. Today, 2002, 7, 187-96. Hendrickson, W. A.; Horton, J. R.; LeMaster, D. M. EMBO J., 1990, 9, 1665-72. Ren, H.; Wang, L.; Bennett, M.; Liang, Y.; Zheng, X.; Lu, F.; Li, L.; Nan, J.; Luo, M.; Eriksson, S.; Zhang, C.; Su, X. D. Proc. Natl. Acad. Sci. USA, 2005, 102, 303-8. Wang, J. W.; Chen, J. R.; Gu, Y. X.; Zheng, C. D.; Fan, H. F. Acta Crystallogr. D. Biol. Crystallogr., 2004, 60, 1991-6. Weeks, C. M.; Miller, R. Acta Crystallogr. D. Biol. Crystallogr., 1999, 55 (Pt 2), 492-500. Bricogne, G. Acta Crystallogr. D. Biol. Crystallogr., 1993, 49, 37-60. Xu, H.; Hauptman, H. A.; Weeks, C. M. Acta Crystallogr. D. Biol. Crystallogr., 2002, 58, 90-6. Brunger, A. T.; Kuriyan, J.; Karplus, M. Science, 1987, 235, 458-460. Kuriyan, J.; Brunger, A. T.; Karplus, M.; Hendrickson, W. A. Acta Crystallogr. A, 1989, 45 ( Pt 6), 396-409. Brunger, A. T.; Clore, G. M.; Gronenborn, A. M.; Saffrich, R.; Nilges, M. Science, 1993, 261, 328-31. Kleywegt, G. J.; Brunger, A. T. Structure, 1996, 4, 897-904. Adams, P. D.; Pannu, N. S.; Read, R. J.; Brunger, A. T. Proc. Natl. Acad. Sci. USA, 1997, 94, 501823. Blanc, E.; Roversi, P.; Vonrhein, C.; Flensburg, C.; Lea, S. M.; Bricogne, G. Acta Crystallogr. D. Biol. Crystallogr., 2004, 60, 2210-21. Gilmore, C.; Dong, W.; Bricogne, G. Acta Crystallogr. A, 1999, 55, 70-83. Perrakis, A.; Morris, R.; Lamzin, V. S. Nat. Struct. Biol., 1999, 6, 458-63. Ioerger, T. R.; Holton, T.; Christopher, J. A.; Sacchettini, J. C. Proc. Int. Conf. Intell. Syst. Mol. Biol., 1999, 130-7. Oldfield, T. J. Acta Crystallogr. D. Biol. Crystallogr., 2001, 57, 696-705. Cohen, S. X.; Morris, R. J.; Fernandez, F. J.; Ben Jelloul, M.; Kakaris, M.; Parthasarathy, V.; Lamzin, V. S.; Kleywegt, G. J.; Perrakis, A. Acta Crystallogr. D. Biol. Crystallogr., 2004, 60, 2222-9. Terwilliger, T. J. Synchrotron Radiat., 2004, 11, 49-52. Collaborative Computational Project, N. Acta Crystallogr. D. Biol. Crystallogr., 1994, 50, 760-3. Potterton, E.; Briggs, P.; Turkenburg, M.; Dodson, E. Acta Crystallogr. D. Biol. Crystallogr., 2003, 59, 1131-7. Holton, J.; Alber, T. Proc. Natl. Acad. Sci. USA, 2004, 101, 1537-42. Coburn, C. A.; Stachel, S. J.; Li, Y. M.; Rush, D. M.; Steele, T. G.; Chen-Dodson, E.; Holloway, M. K.; Xu, M.; Huang, Q.; Lai, M. T.; DiMuzio, J.; Crouthamel, M. C.; Shi, X. P.; Sardana, V.; Chen, Z.; Munshi, S.; Kuo, L.; Makara, G. M.; Annis, D. A.; Tadikonda, P. K.; Nash, H. M.; Vacca, J. P.; Wang, T. J. Med. Chem., 2004, 47, 6117-9. Mattos, C.; Rasmussen, B.; Ding, X.; Petsko, G. A.; Ringe, D. Nat. Struct. Biol., 1994, 1, 55-8. Beavers, M. P.; Chen, X. J. Mol. Graph Model., 2002, 20, 463-8. Nienaber, V. L.; Richardson, P. L.; Klighofer, V.; Bouska, J. J.; Giranda, V. L.; Greer, J. Nat. Biotechnol., 2000, 18, 1105-8. Sanders, W. J.; Nienaber, V. L.; Lerner, C. G.; McCall, J. O.; Merrick, S. M.; Swanson, S. J.; Harlan, J. E.; Stoll, V. S.; Stamper, G. F.; Betz, S. F.; Condroski, K. R.; Meadows, R. P.; Severin, J. M.; Walter, K. A.; Magdalinos, P.; Jakob, C. G.; Wagner, R.; Beutel, B. A. J. Med. Chem., 2004, 47, 1709-18.
296 Frontiers in Drug Design & Discovery, 2005, Vol. 1 [45] [46] [47] [48] [49] [50] [51]
Alexander and Spurlino
Jones, S. D.; Liebeschuetz, J. W.; Morgan, P. J.; Murray, C. W.; Rimmer, A. D.; Roscoe, J. M.; Waszkowycz, B.; Welsh, P. M.; Wylie, W. A.; Young, S. C.; Martin, H.; Mahler, J.; Brady, L.; Wilkinson, K. Bioorg. Med. Chem. Lett., 2001, 11, 733-6. Wyss, P. C.; Gerber, P.; Hartman, P. G.; Hubschwerlen, C.; Locher, H.; Marty, H. P.; Stahl, M. J. Med. Chem., 2003, 46, 2304-12. Steinman, D. H.; Curtin, M. L.; Garland, R. B.; Davidsen, S. K.; Heyman, H. R.; Holms, J. H.; Albert, D. H.; Magoc, T. J.; Nagy, I. B.; Marcotte, P. A.; Li, J.; Morgan, D. W.; Hutchins, C.; Summers, J. B. Bioorg. Med. Chem. Lett., 1998, 8, 2087-92. Hu, X.; Nguyen, K. T.; Verlinde, C. L.; Hol, W. G.; Pei, D. J. Med. Chem., 2003, 46, 3771-4. Scheidt, K. A.; Roush, W. R.; McKerrow, J. H.; Selzer, P. M.; Hansell, E.; Rosenthal, P. J. Bioorg. Med. Chem., 1998, 6, 2477-94. Sun, H.; Nikolovska-Coleska, Z.; Yang, C. Y.; Xu, L.; Tomita, Y.; Krajewski, K.; Roller, P. P.; Wang, S. J. Med. Chem., 2004, 47, 4147-50. Ghosh, A. K.; Devasamudram, T.; Hong, L.; Dezutter, C.; Xu, X.; Weerasena, V.; Koelsch, G.; Bilcer, G.; Tang, J. Bioorg. Med. Chem. Lett., 2005, 15, 15-20.
Frontiers in Drug Design & Discovery, 2005, 1, 297-341
297
Whole Gene Synthesis: A Gene-O-Matic Future Lance Stewart* and Alex B. Burgin deCODE biostructures, Inc., 7869 N.E. Day Rd. West, Bainbridge Is., WA 98110, USA Abstract: Whole gene synthesis is rapidly becoming a powerful technology that allows researchers the ability to distill a growing body of genetic and structural information into improved nucleic acid sequences that would otherwise be impossible to obtain by traditional cloning and mutagenesis methods. Recent advances in the efficient small-scale manufacture of long and accurate oligodeoxyribonucleotides has resulted in a low cost source of building blocks for the assembly of larger DNA molecules by polymerase chain reaction methods. Proof of concept experiments have yielded synthetic replication competent viral genomes, as well as synthetic multi-gene clusters of greater than 30 kilobase pairs in size encoding multi-enzyme systems that catalyze efficient biosynthesis of small drug molecules. These advances have placed whole gene synthesis on a cost trajectory that will lead to unprecedented advances in synthetic biology, ranging from the engineering of protein crystals to the production of re-engineered translation machineries that can produce totally novel protein-like materials. The possible advances in synthetic biology, enabled by whole gene synthesis, will be limited only by the imagination of the applied life sciences research community.
INTRODUCTION The Genetic Code The 1953 elucidation of the structure of double helix B-form DNA by Watson and Crick, with the aid of X-ray diffraction photographs produced by Wilkins and Franklin [1, 2], led to an intense and accelerated desire by researchers to master nucleic acid biochemistry. Both the organic synthesis and enzyme-mediated synthesis of nucleic acids proved essential to unlocking the mysteries of the genetic code. In their now famous 1961 experiment [3], Nirenberg and Matthaei added polyuridilic acid into a crude in vitro translation system prepared from E. coli extracts, and systematically looked for the incorporation of radioactive C14 labeled amino acids into protein. The addition of radioactive phenylalanine, together with the 19 other unlabeled amino acids, allowed the cell-free translation system to produce radioactive protein at significantly higher levels than control experiments. This result, together with insights from Gamow, Crick, Brenner and colleagues on the triplet-base (codon) nature of the genetic code [4], led to the conclusion that the UUU triplet encodes phenylalanine. In this regard, it should be noted that the Poly-U that was used by Nirenberg and Matthaei was supplied by Singer and Heppel, who generated the RNA material from ribonucleotide
*Corresponding author: E-mail:
[email protected] Garry W. Caldwell / Atta-ur-Rahman / Barry A. Springer (Eds.) All rights reserved – © 2005 Bentham Science Publishers.
298 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
diphosphates using the polynucleotide phosphorylase, an enzyme that was discovered in 1955 by Ochoa [5-7]. These insights triggered an international race to decode the other 63 triplets. In this effort, Khorana’s synthesis of poly-deoxyribonucleotides (oligos) from protected nucleosides and mononucleotides proved to be a powerful force of discovery [8]. Using brute force solution-phase organic chemistry, Khorana’s lab was able to generate a variety of two and three-base oligos with defined sequences. These could be used as templates by crude preparations of RNA polymerase to synthesize long RNA messages with alternating triplet repeats. Translation of these mRNAs in vitro with radiolabeled amino acids, allowed the deciphering of codons with non-repeating base sequence (i.e. UCUCUC… encoding serine-leucine-serine-leucine). The amino acid assignments for most of the codons were cross-validated by both Nirenberg and Khorana, through the use of synthetic three-base poly-ribonucleotides that were found to promote the trapping of radiolabeled amino-acylated tRNAs into ribosome complexes [9, 10]. Together with the unraveling of the base sequence for yeast alanine tRNA by Holley [11], these synthetic and enzymatic methods led to the complete unambiguous deciphering of all codons of the genetic code as shown in Table 1 [12]. By 1967, the degeneracy of the genetic code had been fully defined. Each of the 20 universal amino acids is encoded by between one (Methionine and Tryptophan) to six (Arginine, Leucine, and Serine) codons. The four DNA bases in duplex DNA (guanosine, adenosine, thymidine, and cytosine), serve as the template for the transcription of complementary messenger RNA (mRNA) molecules containing cytosine, uracil, adenosine, and guanine that are translated into proteins by ribosomal machinery using amino-acylated tRNAs to read 61 of the 64 possible codons for the 20 amino acids, while the 3 other triplets (UAG, UAA, and UGA) encode signals for termination of translation (reviewed in [13]). In 1968, Nirenberg [3, 14, 15], Khorana [16, 17], and Holly [11] shared the Nobel Prize in physiology or medicine for their combined work in deciphering the genetic code [2, 18], which is arguably one of the most important human discoveries ever made. The First Synthetic Gene In 1968, Khorana and co-workers took the first step to produce a synthetic gene [19]. They chose to synthesize the yeast alanine tRNA gene, since its coding sequence had been fully characterized by Holley [11]. Moreover, the tRNA product of in vitro transcription from the synthetic gene could be readily assayed for its activity using methods developed during the efforts to decipher the genetic code. By 1968, Khorana’s lab had managed to achieve the routine solution-phase synthesis of short 6 to 8 base oligos. This achievement opened the door to assembling a specific gene sequence by exploiting the inherent complementarity of DNA strands to form duplexes by base pairing. This was also made possible by the discoveries of T4 polyncleotide kinase in 1965 [20] and T4 DNA ligase in 1967 (originally called the T4-joining enzyme) [21-23]. In less than a year, after the discovery the T4-joining enzyme, Khorana’s group was able to ligate several oligo building blocks encoding parts of the yeast alanine tRNA that had been assembled into ordered duplexes through base-pairing [19]. By 1970, Khorana’s team had completed the synthesis of the yeast alanine tRNA gene [24] and had demonstrated its functionality by the ability of RNA polymerase to produce a functional alanine tRNA from the synthetic gene [25].
Whole Gene Synthesis
Table 1.
Frontiers in Drug Design & Discovery, 2005, Vol. 1 299
The Universal Genetic Code
2nd è 1st
U
C
A
G
3rd ê
U
Phe Phe Leu Leu
Ser Ser Ser Ser
Tyr Tyr Stop Stop
Cys Cys Stop Trp
U C A G
C
Leu Leu Leu Leu
Pro Pro Pro Pro
His His Gln Gln
Arg Arg Arg Arg
U C A G
A
Ile Ile Ile Met
Thr Thr Thr Thr
Asn Asn Lys Lys
Ser Ser Arg Arg
U C A G
G
Val Val Val Val
Ala Ala Ala Ala
Asp Asp Glu Glu
Gly Gly Gly Gly
U C A G
The four standard bases of RNA, uracil, cytosine, adenine, and guanine are shown as U, C, A, and G. The initial base of any triplet codon is indicated on the left, the second base along the top, and the third down the right within the box defined by the first two.
The Expanding Non-Universal Genetic Code Through the mid 1980s, the genetic code was thought to be “universal” for all life forms, but in 1986, bacterial glutathione peroxidase and formate dehydrogenase were both found to each contain an in-frame UGA stop codon (reviewed in [26]). Subsequent research demonstrated that this codon directs the incorporation of selenocysteine, the 21st amino acid. In Archaea, a 22nd amino acid L-pyrrolysine has been found to be encoded by a UAG codon. In addition, L-pyrrolysine can be incorporated into proteins in E. coli strains that co-express the archaeal pylT gene product, a tRNA(CUA), and the archeal PylS encoded class II aminoacyl-tRNA synthetase [27-29]. This allows the genetic code of E. coli to be expanded to include UAG-directed pyrrolysine incorporation into proteins. It is also important to note that the original codon table described above is not “frozen”, since there are several examples of non-canonical genetic codes. For example, in 1979, it was found that the code in vertebrate mitochondria differed from the universal code by using AUA for Met and UGA for Trp [30]. Mycoplasma capricolum also uses the UGA stop codon to code for Trp [31]. In one of the most well-characterized alternative codes, the universal CUG leucine codon is translated as serine in Candida albicans [32, 33]. In Tetrahymena spp. two stop codons, UAA and UAG, code for Gln in addition to CAA and CAG [34]. Because of these and several other examples, it has been proposed that the code is still evolving in both mitochondrial and nuclear genomes
300 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
[35]. Although not usually appreciated, this feature clearly demonstrates that it should be possible to alter or create a unique genetic code. The Role of Codon Bias in Gene Expression The redundancy of the genetic code allows any given protein to be encoded by a very large number of distinct nucleic acid sequences. On average, each amino acid can be encoded by approximately three different codons. For a typical 100 amino acid protein, there would be 3100 (~5 x 1047) different possible synonymous coding sequences. Since there are numerous known DNA sequence elements and folded RNA structural elements that can control gene expression, the information content allowed by the genetic code can go far beyond protein sequence. The degeneracy of the genetic code therefore allows the pressures of natural selection to simultaneously optimize both DNA and RNA sequence features in addition to protein coding function. In principle, the design parameters for synthetic genes may encompass all of our knowledge in molecular biology and genetics. Codon bias in genomes can be defined as the unequal usage of synonymous codons in known or predicted open reading frames (ORFs). It is well-known that codon utilization is highly biased and varies considerably among various organisms. Codonusage patterns are related to the relative abundance of tRNA isoacceptors, and genes encoding highly expressed proteins show differences in their codon usage frequencies [13]. It is generally thought that codon usage alters peptide elongation rates, however, codon-usage patterns can also improve the fidelity and kinetic efficiency of translation [36]. Since the level of isoaccepting tRNAs in E. coli are correlated with codon bias, it is generally believed that the presence of rare codons has a negative impact on expression of heterologous genes in E. coli. Over expression of rare tRNAs in E. coli [37-39] can improve the production of recombinant proteins containing the cognate rare codons. However, the overproduction of tRNAs is not necessarily an optimal strategy for improving heterologous protein expression, since production of fully functional tRNAs may be difficult to achieve if other cellular factors are limiting. For example, tRNA molecules are often modified at one or more bases, including anticodon loop modifications which have been shown to improve reading frame maintenance [40-42]. Although the overexpression of rare tRNAs in E. coli can stimulate the production of the associated tRNA synthetases [37], it is not known if the other tRNA modifying enzymes are similarly regulated. Extensive research by Karlin and Sharp have demonstrated that highly expressed genes in E. coli and other bacteria have a major bias towards subsets of codons [43-45]. Indeed, bacteria and unicelluar eukaryotic organisms such as yeast [44, 46, 47], seem to have codon biases that are highly correlated with measured isoaccepting tRNA levels. In general, where it has been measured, higher eukaryotes also appear to have codon biases that complement the abundance of isoaccepting tRNAs. This appears to be generally true for Drosophila melanogaster [48] and Caenorhabditis elegans [49]. There is a growing body of evidence that codon bias may be an important and highly evolved source of gene regulation. For example, it has recently been shown that the codon bias within human ORFs can be tissue specific [50]. Furthermore, Carlini has engineered strains of Drosophila melanogaster that contain an alcohol dehydrogenase (ADH) gene with between 1 to10 non-preferred leucine codons [51], and found that
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 301
phenotypic sensitivity to ethanol was well-correlated with reduced levels of ADH production. More importantly, when the flies with 10 rare leucine codons in the ADH gene were subjected to mutagenesis and selected for ethanol tolerance, the survivors were found to have between 1 and 10 synonymous reversions to preferred leucine codons resulting in higher levels of ADH production [52]. For the Autographa californica Nuecleopolyhedrovirus (AcNPV), it has been shown that proteins expressed during times shortly after cellular infection can have significantly different codon biases in comparison with the viral proteins that are expressed late in the viral replication cycle [53, 54]. Of course, it is well-known that viruses also have numerous biochemical strategies for modifying the cellular translation machinery to promote viral replication. It should therefore be of no surprise that viruses would also have different codon biases for early versus late gene products. In bacteria, it has been suggested that codon usage may differ, based on the sub-cellular localization of proteins such as those found in the outer membrane compartment [55]. Apart from the non-random use of codons, it is also apparent that codon/anticodon recognition is influenced by sequences outside the codon itself, a phenomenon termed codon context. For example, there is an occurrence bias between specific adjacent codon pairs and these biases are different for highly expressed vs. low expressed proteins in E. coli [56-58]. Clearly, there are still a number of mysteries in the subject of codon bias and codon context. Taken together, these results also emphasize how difficult it is to create “codon optimized” genes for expressing proteins; the rules are complex and not completely understood. Gene Synthesis for a Post-Genomics Era The past 50 years of research in genetics and molecular biology have shown that the information encoded in DNA goes far beyond that of an ORF encoding the primary amino acid sequence of proteins. This is illustrated in Fig. (1), where a codon triplet in a simplified eukaryotic gene is shown to embody information that is relevant at all steps of gene expression, from DNA to RNA to protein. For example, a guanine residue not only defines which amino acid will be present in the protein (Glu or Lys), the residue can also affect the quality of the mRNA and therefore, the total amount of protein expressed. Guanine is favored at the 3’ end of exons and a guanine residue at the wobble position for Lys, therefore helps to define the 5’ splice site. In addition, all residues can also play a role in stabilizing or destabilizing mRNA secondary structures that can affect translation efficiency. The concept of a gene has grown to include an amazing wealth of information on the transactions between protein molecules and nucleic acid sequence elements, both DNA and RNA. Such sequence elements include promoters, enhancers, suppressors, chromatin attachment sites, splicing signals, recombination sites, silencers, terminators, restriction endonuclease cutting sites, methylation sites, centromeres, telomeres, etc. The current known list of such sequence elements is poorly understood, and the international research community has recently launched a Project Consortium aptly named the ENCyclopedia of DNA Elements (ENCODE) [59], which aims to thoroughly annotate functional sequence elements of the human genome. The increasing availability of complete genome sequence information, together with a variety of probe array technologies, is having a major impact on our understanding of the depth of information contained within gene sequences. For example, it is estimated
302 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
Fig. (1). Information Content of a Single Nucleotide in DNA. A single nucleotide carries information from gene to RNA to protein. The position of different DNA elements (promoter, exons, intron, polyadenylation signal) are indicated, and the position of two codons encoding for Lysine (Lys) and Glutamic acid (Glu) are highlighted. A single guanine residue (G) is indicated in bold in both codons. The transcribed mRNA is shown (RNA) and the positions of the Lys and Glu codons are indicated. The Lys codon lies at the first intron/exon border, and the G at the wobble position helps to define the 5’ splice site. The Glu codon lies within a potential mRNA hairpin, and the G at the first position destabilizes this structure. The translated protein is shown at the bottom. The position of the Lys and Glu amino acids are indicated; these residues form a hydrogen bond and stabilize the folded protein.
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 303
that alternate pre-mRNA splicing may occur in 40-50% of all multi-exon genes in multicellular organisms [60]. Furthermore, pre-mRNA editing by selective adenosine deamination is known to occur in neural cells of humans and other multicellular organisms [61]. It has also become clear that gene expression can be regulated by RNA interference; mRNAs are silenced by complementary short interfering RNA (siRNA) molecules bound within a silencing complex [62]. Finally, careful studies on the transcriptional activity of the human genome have revealed that much more of the human genome is transcribed than was previously recognized, suggesting that the coding potential of the genome may be 50% more than previously estimated [63, 64]. All of this information must be considered when designing synthetic genes. As information from large scale genomic sequencing and functionalization projects becomes available, there will be an increasing demand for sophisticated software packages that can identify, quantify, transcribe, translate, and otherwise manipulate (introduce or remove) sequences that are held in central processing units (CPUs) of computers. Whole gene synthesis represents an enabling tool where researchers can distill a growing body of genetic information into highly optimized gene sequences that are designed to contain desired features, and/or lack undesired features. For this reason, we anticipate that the fields of genetics and molecular biology will soon find it difficult to imagine conducting research without the ready supply of synthetic genes. With this in mind, it is worthwhile to consider the known and possible future applications of synthetic gene sequences. THE UTILITY OF GENE SYNTHESIS In the 10 years that followed Genentech’s 1977 report of the first protein encoded by a synthetic gene [65], the number of reported synthetic genes multiplied approximately 10 fold [66]. In 1986, Khorana and co-workers synthesized bovine rhodopsin [67]. In 1988, Sung-Hou Kim and colleagues synthesized the gene for human Ras p21, and solved the X-ray crystal structure, the recombinant p21. This represents the first crystal structure of a protein encoded by a synthetic gene [68, 69]. The Kim lab also used gene synthesis to produce a fused dimeric form of the sweet protein Monellin [70, 71]. In 1986 and 1987, the Sligar lab used synthetic genes to produce recombinant forms of rat hepatic cytochrome b5 and sperm whale myoglobin in E. coli [72, 73]. Researchers in Sligar’s lab went on solve the X-ray crystal structure of their recombinant sperm whale myoglobin, demonstrating that the structure of the myoglobin produced by the synthetic gene is essentially identical to the natural sperm whale myoglobin [74]. Clearly gene synthesis does not represent a new technology; however, the decreasing cost of gene synthesis has led to many new utilities and a new appreciation for the power of gene synthesis. Expression Optimization of Engineered Proteins One of the most important applications of total gene synthesis is to create expressionoptimized genes with improved codon bias for production of recombinant proteins in heterologous systems. In 1977, when the biotechnology industry was still young, researchers at Genentech reported the first synthetic gene designed for the production of a functional protein. Itakura and co-workers at Genentech, synthesized a tandem repeat sequence encoding multiple copies of the 14 amino acid hormone somatostatin which
304 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
was produced in E. coli as a polyprotein fusion to β-galactosidase [65]. Functional somatostatin peptides were released from purified fusion poly-protein by chemical cleavage with cyanogen bromide at engineered methionine residues. At the time, very little E. coli genome sequence information was available, and so these researchers engineered somatostatin sequence using preferred codons from phage MS2 which had recently been sequenced [75, 76], as a proxy for highly expressed E. coli genes. This represents the first example where total gene synthesis was used to manipulate codon bias for the benefit of improved expression of recombinant proteins in heterologous systems. Since rare codons have been well-documented to have a negative impact on protein expression in E. coli and other organisms [77], one of the first applications of complete gene synthesis is directed at engineering genes with optimal codon usage for the desired expression system. This approach is particularly useful for expressing genes from unusual genomes like the A+T rich genomes of parasitic protozoans [78]. Rare codons can cause in-frame translational hopping and dramatically reduce protein yield as has been observed for bovine placental lactogen expressed in E. coli [79]. Translational frameshifting and misincorporation of amino acids has also been shown for p27 protease domain from Herpes Simplex Virus 2 [80]. Perhaps, more importantly though, rare codon clusters can lead to ribosome stalling [39], which in turn can then trigger premature termination of translation. In this process, the 10Sa transfer messenger RNA (tmRNA) takes over as the message and is translated into a short C-terminal peptide before terminating translation [81, 82]. Many of the recombinant proteins whose cDNAs are heterologously expressed in E. coli, show significant amounts of non-full-length polypeptide, and when carefully examined, many of these shorter polypeptides turn out to be rare codon-induced tmRNA-mediated termination products, rather than proteolytic degradation products [83, 84]. Engineering Proteins for X-ray crystallography Gene synthesis enables a much more sophisticated level of protein engineering and can significantly improve the likelihood of obtaining X-ray crystal structures. For example, one of the limitations of obtaining a protein crystal is producing enough highly purified protein to screen a large number of different crystallization conditions. As described above, gene synthesis offers the possibility of optimizing genes for expression in heterologous systems (e.g. E. coli). However, more importantly, gene synthesis enables complex protein engineering that would be extremely tedious and time consuming using conventional cloning and mutagenesis methods. An example of engineering proteins [85] for crystallization is illustrated in Fig. (2). In this example, a theoretical 3-dimensional homology model of a protein kinase is illustrated in the left panel and the amino acid sequence of this protein is shown below. Even if a protein has never been crystallized, it is possible to generate 3-dimensional homology models from crystal structures of homologous proteins or even by ab initio methods [86]. In this example, the target protein has not been previously crystallized, but a 3-dimensional homology model could be created from an existing crystal structure. In the existing crystal structure, there were residues on both the N- and C-terminus of the protein that were present during crystallization, but could not be visualized in the final electron density maps (regions indicated in orange). This result suggests that the regions
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 305
Fig. (2). Protein Design and Whole Gene Synthesis. A homology model of a protein is shown on the left and the corresponding amino acid sequence is shown below. Phylogenetically conserved and non-conserved residues are shown in blue and green, respectively. Regions of the protein that can not be visualized in homologous crystal structures are shown in orange, and a variable domain that is not present in homologous proteins is shown in red. Finally, non-conserved surface residues that can affect solubility and protein behavior are highlighted in yellow. Gene Composer software compiles this information and creates synthetic genes to express proteins lacking disordered or variable regions, and containing surface mutations that promote solubility and/or crystallization.
highlighted in orange do not form stable structures and may exist in multiple conformations within the crystal lattice. These alternative conformations could hinder crystallization of the target protein and should therefore be removed from the engineered construct (see right hand panel), since they are not contained with the “crystallizable domain”. Similar regions may also occur internal to the protein. In this example, an internal domain that is not phylogenetically conserved and predicted to be disordered [87], is present in the target protein (highlighted in red). In the engineered protein, this region is removed and replaced by a short linker. Finally, recent examples have also shown rational mutagenesis of surface residues can dramatically improve crystallization of proteins [88-91]. Unconserved residues that may inhibit crystallization are highlighted in yellow in Fig. (2). In the engineered construct, these positions are mutated to facilitate crystallization. Because the final engineered protein contains so many different deletions and mutations, it can be very time consuming to create the “crystallization optimized” construct by conventional cloning and/or mutagenesis methods. However, the optimized gene can be synthesized directly by whole gene synthesis and be simultaneously optimized for protein expression and crystallization.
306 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
Obtaining Genes One of the most time consuming activities in molecular biology and protein research is the identification, isolation, and validation of the gene sequences. Gene constructs must typically be acquired from disparate sources such as other researchers, vendors of copy DNA (cDNA) or expressed sequence tags (EST), or cloned from cDNA libraries. In many cases, these genes may not be full-length; they may have mutations; or they may be of an undesired variant sequence. Gene synthesis can be extremely useful for standardizing the way genes are obtained, and dramatically decrease the time required to obtain suitable expression constructs. In addition, resurrection of primordial or ancient DNA is also enabled by total gene synthesis, and may be of considerable utility in deciphering genome and species evolution [92-94]. As the cost and time required of gene synthesis decreases, obtaining genes will become just as routine as ordering oligonucleotides. Avoiding Potential Gene Patent Infringement In many cases, gene sequences may be patented. Research organizations that wish to use patented genes for the development of new commercial products will likely need to obtain the appropriate license from the patent holder, allowing the licensor to use the gene. Use of a patented gene for commercial purposes without appropriate licensing could have serious legal and financial ramifications. However, if licenses are not available or are unreasonably priced, then researchers in industry will look for ways to pursue their research, while minimizing liability that could result from research using a patented gene. One potential way is to obtain a synthetic gene whose sequence is perhaps only 80% identical to the native gene. Many gene patent claims are written to reach beyond the native gene sequence to include “similar” sequences (doctrine of equivalence), within defined thresholds of sequence identity. Hence, obtaining a synthetic gene that encodes a protein of interest, but whose nucleic acid sequence is only 70% identical to the native gene, may provide legal comfort in avoiding infringement. Synthetic Gene Clusters for Small Molecule Engineering, and Synthetic Gene Networks as Biological Switches, Oscillators, Sensors, etc. Gene synthesis also allows researchers to assemble different genes to create unique pathways. For example, synthetic gene sequences were used to engineer a ~31 kbp polyketide synthase gene cluster that encodes a set of enzymes which assemble into mega-enzyme complex in E. coli and can catalyze the efficient production polyketide molecules [95]. Gene synthesis has also been used to engineer a mevalonate pathway in E. coli for the efficient production of terpenoids [96]; Jay Keasling’s group assembled eleven different genes from three different organisms into engineered operons. One of these was an expression optimized amorphine synthase gene that was synthesized with its codon bias set for preferred codons for E. coli. Together, these engineering tasks produced a new metabolic pathway in E. coli, capable of producing the anti-malaria drug aretemisinin from a low cost carbon source. In order to begin testing theoretical models for concerted gene activity, researchers have used gene synthesis to create synthetic gene networks that behave in live cells like simple electronic circuits. The first such synthetic gene networks were reported in 2000 [97, 98]. Gardner, Cantor and Collins used gene synthesis to generate a bi-stable gene
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 307
regulatory network in E. coli, composed of two different promoters that drive expression of repressor proteins that repress expression from the opposite promoter. Addition of one of two different inducer molecules to live cells, promoted the permanent toggling of the circuit into one state, thereby converting the bacterium into an addressable cellular memory unit [98]. Similarly, Elowitz and Leibler produced a live cell clock system by engineering a synthetic “repressilator” gene circuit in E. coli, where the genes for three different transcriptional repressors were engineered on a plasmid to repress the promoter of one of the other two. Oscillatory circuit activity was visualized in dividing cells as the regular increase and decrease of green fluorescent protein (GFP) production from a gene that was under the control of one of the repressors in the circuit [97]. Kobayashi et al. have interfaced natural and synthetic gene networks to produce strains of E. coli that activate protein synthesis from a reporter gene, when bacterial density reaches a critical threshold as sensed by the production of the quorum signaling molecule acyl-homoserine lactone [99, 100]. This represents a novel auto-induction sensor system that may be extremely useful for industrial-scale parallelized protein production. Clearly, whole gene synthesis and these design strategies afford the opportunity to generate a wide variety of live cell biomolecular sensors. Viral and Cellular Genome Engineering In 2002, Cello and colleagues in the Wimmer lab shocked the research community when they reported the synthesis of a cDNA that could encode a functional polio virus genome when transfected into live cell cultures [101]. Their report called for renewed vigilance in worldwide vaccination to prevent any possible recurrence of polio, since terrorists could synthesize the virus. In 2003, Smith et al. reported the generation of the second synthetic viral genome, bacteriophage ΦΧ174. The methods used to arrive at the synthetic versions of the viral genomes of polio and ΦΧ174 are practically available to any research lab with access to synthetic oligonucleotides and PCR reagents. As gene synthesis methods advance (described further below), it becomes possible to consider the generation of completely new cellular genomes. For example, Venter and colleagues have proposed the generation of a minimal cellular genome that could be engineered from the ~300 E. coli genes that are known to be essential. Such a reengineering effort could produce a compact genome with all of the predicted nonessential sequences removed [102, 103]. However, it is difficult to know how to define “non-essential” DNA, and it may take years of adding back sequences to the minimal genome to figure out what elements are required to arrive at a minimal functional genome. Nevertheless, the power of selection for growth has already shown that even if only a very few synthetic cellular genome molecules are viable, these molecules can be easily isolated since the non-viable molecules can not replicate [102]. Hence, it may be very much possible that the production of new life forms could happen within the next 5 years. Venter has proposed that the name of such an organism should be “Model A”, in reference to Henry Ford’s first assembly line production model car. Proving a Gene or Genome Sequence is Correct In the small molecule world, the isolation of natural products with biological activity is most often validated by the complete synthesis of the identified molecule, followed by demonstration that its activity is identical to the purified natural product. The same logic
308 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
is also useful for understanding biological molecules. For example, the study of prions by Pruisner and colleagues, led to the hypothesis that the pathogenic version of a prion protein can drive the templated conversion of native prion precursor protein to the pathogenic form of the protein and thereby replicate itself [104]. The definitive proof of this controversial hypothesis was eventually obtained only by the demonstration that a recombinant version of the prion protein displayed the pathogenetic affects of natural infectious prion protein. This result eliminated the ongoing controversial debate that nucleic acids were somehow also required for prion replication [105]. As proposed by Venter and colleagues, there is the potential need to demonstrate the correctness of gene sequences by synthesizing the genes and demonstrating that the synthetic genes have the same activity as the natural genes [102]. This concept has even been extended to include whole genomes. As noted above, Venter and colleagues have shown that totally synthetic versions of replication competent ΦΧ174 bacteriophage have sequences that differ from the known virus by only 1 base [102]. Similarly, definitive proof of the regulatory activities of complicated gene networks may require that they be re-assembled from synthetic genes that are engineered to give quantitative activity readouts, such as the production of GFP [97, 98, 106]. In vitro Evolution and New Materials Through Expansion of the Genetic Code Total gene synthesis has enabled a new level of sophistication for the creation of combinatorial libraries of precisely designed protein variants. For example, gene synthesis allows the engineering of unique restriction sites that allow rapid and simple cassette replacement of various protein domains [107]. In addition, the production of synthetic genes encoding different protein sequences, but with nucleic acid sequences built to match as much as possible (while still maintaining a desired codon usage), can allow improved efficiency in DNA shuffling methods for directed evolution [108]. As noted above, gene synthesis technology in principal may allow the construction of totally novel cellular genomes for self-replicating bacteria. If this is possible, then it follows that such a cellular genome could be engineered to have all of its protein coding sequences encoded by only one type of codon for each of the 20 natural amino acids. This could free up 39 triplets {39 = 64 – (20 codons + 1 terminator)} to be used for encoding other “unnatural” polypeptide residues. For example, researchers in the Schultz lab at The Scripps Research Institute have used a combination of smart chemistry, combinatorial mutagenesis, and structure based design principals to engineer “orthogonal” tRNA-synthetase pairs into both prokaryotic (E. coli) and eukaryotic (S. cerevisiae) model organisms that can encode a growing array of unnatural amino acids, with side chains that can be photoreactive, fluorescent, redox-active, glycosylated [109], or contain keto, azido, acetylenic, or heavy atom functional groups (reviewed in [110]). The term “orthogonal” means that the engineered tRNA and synthetase will not crossreact with any endogenous tRNA or synthetases present in the model organism (e.g. mutant forms of E. coli tyrosyl-tRNA and tyrosyl-tRNA synthetase introduced into S. cerevisiae). The Schultz lab has also generated an E. coli strain with a 21 amino acid genetic code which carries all the necessary biosynthetic machinery to encode paminophenylalanine in response to the amber codon TAG [111]. In this case, a mutant form of the orthogonal Methanocccus jannaschii tyrosyl-tRNA synthetase was engineered to charge mutant tyrosine amber suppressor tRNA with p-
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 309
aminophenylalanine in E. coli that harbored additional genes from Streptomyces vezuelae required for the biosynthesis of p-aminophenylalanine. These approaches have also been leveraged to engineer four-base codon systems, adding a whole new dimension to the genetic code [112, 113]. Yokoyama and colleagues have recently engineered a novel two-unnatural base-pair system for use in coupled in vitro transcription-translation systems [114]. These researchers have synthesized DNA templates containing unnatural synthetic bases (denoted “s base” or “z base”) that respectively could be read by T7 RNA polymerase to generate mRNA with a complementary unnatural base (called “y base”), as well as novel tRNA molecules with the “s base” at a defined position in the anticodon loop. In addition, they used engineered versions of tRNA synthetases to charge the unnatural tRNA with an unnatural amino acid, which could be incorporated into protein by readout between codons containing “y base” and tRNA anti-codons containing “s base”. Whole gene synthesis technology may soon be applied in conjunction with methods for in vitro evolution by ribosome display or mRNA display [115, 116]. In this approach, mRNAs that encode proteins of particular properties are selected using molecular techniques that connect the protein to its genetic encoding information. Considering this, it becomes possible to imagine using whole gene synthesis in combination with gene shuffling and in vitro evolution technologies to re-engineer virtually every component of the translation machinery to produce a fantastic array of polymeric materials. One major step in this direction has been taken by Gao, Church and colleagues [117], who recently reported the use of gene synthesis to re-engineer codonaltered versions of all 21 genes encoding protein constituents of the of the E. coli 30S ribosomal subunit. This approach may soon allow the efficient in vitro synthesis of ribosomes from coupled transcription-translation reactions with completely defined molecular components. It seems reasonable to expect that such systems can also be set up for in vitro evolution of improved in vitro transcription-translation systems with expanded genetic codes for unnatural amino acids. Given the complexities, environmental risks, and regulatory oversight involved in generating live organisms with engineered genomes, it seems much more likely that synthetic in vitro transcription and translation systems will be the future source of novel polymeric materials. In general, whole gene synthesis and synthetic biology is now on the verge of completely revolutionizing the way biomaterials are produced. THE DESIGN OF SYNTHETIC GENES Fantastic advances in large-scale biochemical data collection such as genomic sequencing, structural proteomics, and microarray analysis is fueling a growing demand for computational tools that facilitate the formulation and testing of numerous hypotheses in systems biology. Likewise, the de novo synthesis of genes and genomes requires the thoughtful distillation of volumes of information ranging from codon bias to predicted mRNA secondary structure, and beyond. We therefore, devote the next section of this review to the subject of software tools that convert a protein sequence to an expression-optimized gene sequence, and a gene sequence into a set of oligonucleotides that can be used for the assembly of the desired synthetic gene.
310 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
Design Software For Converting a Protein Amino Acid Sequence to a Novel Gene Sequence by Exploiting the Degeneracy of the Genetic Code There are numerous reports of improved protein expression from synthetic genes that are engineered to contain the preferred codon usage of the heterologous host expression system [77, 107, 118-123]. Therefore, in order to facilitate the design of “expression optimized” synthetic genes for a variety of heterologous expression systems, we and others (see Table 2) have developed software tools, ours is called Gene Composer, that allow users to completely re-engineer codon usage in a given gene [77, 78, 95, 108, 124127]. Starting with either an amino acid sequence or a nucleic acid sequence, these so called “Protein-to-Gene” (Prot2Gene) algorithms are set to reference selected codon usage tables (CUTs) that typically define the average codon frequencies present in ORFs of the target organism in which heterologous protein expression is intended (codon usage tables for sequenced genes and genomes for different species can be found at http://www.kazusa.or.jp/codon/ [128]). In this way, the software can reverse-translate an amino acid sequence into a novel nucleic acid sequence by random selection of synonymous codons according to the frequency with which they are found in the intended host organism. If high level protein expression is desired, then the back-translation by Prot2Gene software can be performed using modified CUTs for highly expressed (HX) proteins such as ribosomal proteins, transcription/translation processing factors, and chaperone/ degradation proteins [43, 129, 130]. Computational methods for predicting HX-CUTs from sequenced genomes are becoming increasingly sophisticated, with the aid of robust mathematical tools for predicting codon biases of highly expressed protein families and vice versa [131-133]. For example, the concept of the “codon adaptation index” (CAI) has been introduced as a normalized measure of bias in synonymous codon usage for genes or genomes within or between species [134]. The index measures the extent to which codon usage has been selected during evolution, and can serve as a useful tool for predicting the level of heterologous protein expression. We have used such tools to assess the expected improvement in the expression level of synthetic “expression optimized” gene relative to that of an available cDNA. Sometimes it is more economical to use site directed mutagenesis to eliminate one or two rare codons in a cDNA, as compared to creating a new synthetic gene. These tools can also be used in large scale structural genomics efforts to help prioritize which protein targets are likely to benefit the most from codon re-engineering by gene synthesis. To facilitate the management and formulation of CUTs for different heterologous expression systems (i.e. average and HX CUTs for E. coli, baculovirus-insect cell, yeast etc.), we have generated a CUT database (CUT-DB) and graphical user interface (GUI) that allows the user to construct a variety of novel CUTs. For example, we have generated new HXHetCUT tables for E. coli, calculated from a set of heterologous proteins whose native cDNAs (mostly human) are known to be highly expressed in E. coli. We have also produced weighted hybrid hyCUTs [77], where codon frequencies of two or more organisms are combined in a weighted fashion. A hyCUT that combines E. coli and Drosophila melanogaster CUTs [137, 138], is particularly useful for engineering synthetic genes with a balanced high probability chance of having good protein expression in both E. coli and in the baculovirus-insect cell expression system [139]. The insect cells used for baculovirus expression of recombinant proteins are
Whole Gene Synthesis
Table 2.
Frontiers in Drug Design & Discovery, 2005, Vol. 1 311
Gene Design Software
Software Name
Prot2Gene
Gene2Oligo
Year Reported
Reference
PINCERS
CUT, RE
Assisted
1991
[124]
CalcGene
HX CUT
Size, PCA
1998
[125]
COD OP
CUT, RE
Size, Tm, PCA
1999
[78]
EcodonOpt
CUT, LP
Not applicable
2002
[108]
DNAWorks
HX CUT
Size, Tm, PCA, HL
2002
[126]
DicodonShuffle
CUT, Dinucleotide
Not applicable
2003
[135]
Gene2Oligo
Not Applicable
Size, Tm, PCA
2004
[136]
DNA2.0
CUT, RE, GC, CP, HL
Size, Tm, PCA
2004
[77]
GeMS
CUT, RE
Size, Tm, HL, PCA
2004
[95]
CAD PAM
Not yet fully reported
Not yet fully reported
2004
[117]
Gene Composer
HX CUT-DB, RE, GC, Ambush, Repeat, HL, SD
Size, Tm, HL, PCA
2005
[127]
CUT indicates that one or more codon usage tables are referenced during reverse translation of a protein sequence. CUT-DB is a database of CUTs. HX refers to a CUT for highly expressed proteins. RE indicates that restriction endonuclease sites can be identified and silently removed or inserted. Assisted, indicates that oligo design is assisted but non totally automated. Size indicates that the size of oligo building blocks can be varied. LP is a linear programming algorithm that calculates the most homologous nucleic acid sequences for two related proteins under the constraints of defined CUT (useful for DNA shuffling of synthetic genes). CP indicates that codon pairs can be identified and silently altered. HL indicates that hairpin-loop structures in a nucleic acid sequence can be silently removed. Ambush indicates that stop codons can be silently inserted or removed from the second or third reading frames. Repeat indicates that strings of repeated bases can be identified and silently removed. SD indicates that Shine-Delgarno elements can be identified and silently eliminated. Dinucleotide and GC refer to the ability of the software to maintain either a specified dinucleotide or GC content. Tm indicates that the melting temperature of overlap regions of top and bottom strand oligos can be optimized. PCA refers to the ability of the software to plan out overlapping oligos for gene synthesis by polymerase cycling assembly methods.
typically from either Spodoptera frugiperda (sf9 or sf21 cells) [140] or Trichoplusia ni (tn5 cells) [141]. However, the complete genomes of S. frugiperda (sf9 or sf21 cells) and T. ni have not yet been sequenced and the choice of the D. melanogaster CUT has been made as the closest evolutionary proxy to these genomes. Finally, since it is known that baculoviruses affect the host cell translation machinery, it is often desirable to use CUTs that are derived exclusively from baculovirus ORFs [53, 54]. As more data is generated on expression of heterologous proteins in various expression systems, including in vitro translation systems [142, 143], the CUT tools will become increasingly useful for the engineering of proteins for optimal expression. Furthermore, as the role of codon bias in gene regulation becomes better understood, software tools for the creation of modified CUTs will likely need to exploit any codon biases that are found to be correlated with tissue/developmental specific gene activity [50].
312 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
Exploiting the Degeneracy of the Genetic Code to Eliminate or Introduce Nucleic Acid Sequence Elements When back-translation of proteins is carried out under HX-CUT constraints, we typically find that there are an average of approximately two possible codons for each amino acid. Hence, even under stringent codon bias constraints, the degeneracy of the genetic code dictates an enormous number of possible sequences for reverse-translated gene products. This realization is quite liberating, since it allows gene engineers to consider a wide array of possible modifications to the nucleic acid sequence, without affecting either amino acid sequence or HX codon bias. For example, it is often desirable to constrain GC content to be within a defined range and variance across the gene [127], or to maintain a specific dinucleotide composition [135]. Maintaining GC content can aid in the downstream manufacturing of the synthetic gene from oligo building blocks, and can also help to match the GC content of a synthetic gene to that of the host organism. This can be achieved by computing a large number of possible gene sequences using specified CUT constraints, followed by ranking of the possible sequences according to how close they come to achieve a specified GC content. The variance of GC content across the gene sequence can be calculated from a pool of GC content data points calculated for a 50 base window that is marched along the DNA sequence in steps of 10 bases for example. With this in mind, the most sophisticated Prot2Gene software packages are able to exploit degeneracy in the genetic code, under CUT constraints and GC content constraints, to automatically compute the use of synonymous nucleotide changes that silently introduce or remove DNA sequence elements or encoded RNA elements. As illustrated in Fig. (3), we have listed some of the most important nucleic acid sequence elements that sophisticated Prot2Gene design software packages are able to identify, quantify, and then either “silently” introduce or remove, based on user preferences, while maintaining a specified codon bias and overall GC content for the gene. (i)
Elimination of cryptic Shine-Delgarno sequences (SD). If proteins are intended for expression in bacterial species such as E. coli or B. subtilis, it is desirable to identify and eliminate unwanted Shine-Delgarno sequences from all three reading frames within the body of the gene. This helps to ensure that the translation machinery only uses the mRNA transcripts for translation initiation at the desired start codon. The Shine-Delgarno sequence is somewhat degenerate, which makes its recognition by comparative sequence analysis somewhat difficult [144, 145]. However, this also allows multiple opportunities to silently eliminate a ShineDelgarno sequence. For eukaryotic expression systems, a similar approach can be applied to eliminate cryptic unwanted Kozak consensus sequences [146], or other possible internal translation initiation sites.
(ii)
Ensure that strings of repeated nucleotide sequences are less than a specified length (typically at least 5 nucleotides). This helps to minimize potential ribosomal slippage during translation [41].
(iii)
Introduction or removal of restriction endonuclease (RE) cleavage sites. This has powerful utility for the subsequent cloning of the synthetic gene into various expression vectors and also for generating synthetic mutli-gene gene clusters [95, 96], gene networks [97, 147], or sequence swapped mutants for directed evolution.
Whole Gene Synthesis
(iv)
Frontiers in Drug Design & Discovery, 2005, Vol. 1 313
Introduction or removal of “hidden” stop codons in the second and third reading frames. The translation machinery is not perfect, and depending on the circumstances, it can shift out of the primary reading frame and into either the second (-1) or third (+1) possible reading frame. Once a shift in reading fame occurs, then the translation machinery will end up spending precious time and energy, translating defective polypeptides that contain out of frame encoded amino acid sequences at their C-termini. It has been proposed that the presence of hidden “ambush” stop codons may be an adaptation in species with slippage prone ribosomes [148]. Therefore, if the alternative reading frames are open for long distances, and the translation machinery happens to slip into an alternate reading frame, then it will spend even more time on average translating defective proteins. Based on this concept, it can be of utility to silently introduce as many hidden stop codons as possible, into the alternate reading frames. This serves the purpose of rapidly terminating translation if frame-shifts do occur, so that less of the cell metabolism is wasted on translation of defective proteins. Interestingly, research in our laboratories using Gene Composer software has demonstrated that when gene sequences are constructed by random selection of synonymous preferred codons (i.e. those with >30% usage frequency) from E. coli or insect cell CUTs, it is possible to increase by 2- to 4-fold, the number of hidden “ambush” codons in alternate reading frames. Hence, the degeneracy of the genetic code is highly accommodative of a wide range of frequencies of hidden stop codons.
(v)
Removal of potential RNase cleavage sites. It has been shown that hairpins at the 5’ and 3’ ends of mRNAs can alter the stability of the mRNA to exonucleases and improve protein expression in E. coli. Although endonucleases that may degrade mRNAs in E. coli do not have well-defined recognition sequences, RNaseE has been shown to prefer A/U rich sequences immediately downstream of short hairpins. The degeneracy of the genetic code can allow for even these ambiguous sequences to be removed.
(vi)
Ensure that the predicted mRNA structure has a minimal number of hairpin-loop structures, with calculated DeltaG free energies of folding (∆G) below a specific threshold (a calculation that depends on the anticipated salt concentration and temperature) [149]. Reducing local mRNA secondary structure may help to minimize the chance that ribosomes will be unable to initiate translation near the ATG start codon which may be trapped in hairpin [150], or perhaps, stall during translation due to their encountering a stable RNA structure that may be difficult to read through [151]. However, it should be noted that the role of mRNA secondary structure in gene expression is a complicated subject [152], and it may not be advantageous to eliminate all local secondary structure, since there is evidence that both bacterial and eukaryotic gene sequences are biased to encode mRNAs that actually have a significant local RNA secondary structure [135]. Hence, the elimination of mRNA structure might not be desirable.
(vii) Introduction or removal of methylation sites. Methylation is known to have a variety of effects on gene expression and is a factor that needs to be considered in the design of synthetic genes [153, 154]. Furthermore, the unintended introduction of a methylation sequence into a gene could present problems for
314 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
cloning for downstream cloning, since replication of DNA may become restricted by hemi-methylation [155]. (viii) Introduce encoded tags for small interfering RNA (siRNA) mediated mRNA suppression. The use of transfected siRNA to suppress or eliminate mRNAs in cell cultures has become a powerful tool for studying the loss of function effects on cell activity [62]. Synthetic gene constructs with encoded siRNA can allow researchers to target a synthetic gene construct for siRNA suppression, while sparing the native gene sequence or vice versa. This has application for studying the effects of functionally replacing a wild type gene product with a mutant version.
Fig. (3). Using the degeneracy of the genetic code to alter gene sequences without introducing mutations. The DNA sequence encoding an expressed protein is diagrammed as a solid horizontal line. DNA/RNA sequence elements that can be altered without altering the expressed amino acid sequence are shown above the line. Silent mutations introduced into the DNA/RNA elements are shown in lower case below the line. Each triplet codon is underlined, and the DNA/RNA element that is modified is shown in italics. DNA/RNA elements removed from the gene are highlighted in red. DNA/RNA elements introduced into the gene are shown in green.
Possible Undesired Effects on Protein Expression Codon optimization should not be considered to be a universal panacea for poor protein expression. It certainly helps in many reported cases, however, codon optimization can sometimes also have negative effects on protein expression. For example, it has been suggested that codon bias may be related to the domain organization of the translated protein, and that translational pauses at un-preferred codons at protein domain boundaries may be beneficial for time dependent protein folding events and total protein production [156, 157]. Another factor to keep in mind when designing genes with optimized codon preference is that, if the resulting mRNA transcript is unstable due to other factors (i.e. the inadvertent introduction of a nuclease cleavage site), then the optimized codon bias
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 315
will have little or no effect on the overall protein expression. In well studied cases where codon optimized genes have failed to yield higher protein expression levels when compared to the native gene sequence, the reason for poor expression has been found to be attributable to an inadvertent destabilization of the mRNA transcripts [158]. Furthermore, there is a growing appreciation for the role of riboswitches in mRNAs, which serve as master control switches for numerous metabolic pathways (mostly in prokaryotes). Riboswitches are RNA structures that undergo metabolite induced allosteric interconversion between structural states that modulate the level of gene expression through steric control of translation initiation [159, 160]. Over the time, as the biological research community continues to decipher sequence elements that dictate mRNA stability, Prot2Gene software algorithms will need to evolve to incorporate this information into the design strategy for expression optimized gene sequences. Obviously, this will also require the incorporation of an organism specific knowledge base that will guide the design of genes for protein production in heterologous expression systems. Converting a Gene Sequence into Oligonucleotide Building Blocks Once a codon optimized gene sequence has been formulated with the aid of Protein2Gene software tools, the next step is to divide the gene sequence into a set of oligonucleotides that can be synthesized and used as building blocks in the assembly of the final synthetic gene. To automate this process, we and others (Table 1) have developed “Gene-to-Oligonucleotides” (Gene2Oligos) software tools that convert a duplex DNA sequence into a computed set of overlapping top and bottom strand oligonucleotides [77, 95, 108, 124-127, 136]. In addition to designing the oligo building blocks for assembly, Gene2Oligos programs usually have additional tools for adding any desired upstream or downstream flanking sequences that may facilitate gene assembly (e.g., PCR amplification tag sequences), downstream cloning (e.g. unique restriction sites), or add additional gene control elements (e.g. promoters, translation start signals, transcription terminators etc.). Numerous studies have shown that the assembly of synthetic genes from large pools of overlapping oligonucleotides requires that the pooled oligos have a uniform melting temperature (Tm), and the oligo sequences must be as unique as possible so as to avoid mis-assembly [102, 117, 118, 126, 136, 161]. Likewise, the oligos must also be relatively free of potential hairpin structures. As such, the most advanced Gene2Oligos software packages [117, 126, 127, 136] are designed to compute oligo sets with uniform low statistical variance of Tm for all overlapping segments. The Gene2Oligos algorithm of our Gene Composer software begins this process by randomly dividing the duplex sequence into an enormous number of possible overlapping sets of oligos. These oligo sets are then checked to see if they happen to meet defined criteria for oligo length limits (usually between 55 and 65 bases) and overlap length limits (usually between 20 to 30 bases). The first 1,000 oligo sets that pass this initial filter are then rank ordered, based on their average overlap Tm and statistical variance in overlap Tm. Most Gene2Oligo algorithms use the nearest-neighbor model [162] to calculate Tm at a specified salt concentration (usually the same salt concentration used during gene assembly) [126, 127, 136, 149]. From this analysis, we usually submit the top 5% of the oligo sets with the lowest variance in overlap Tm to additional analyses to identify
316 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
which of the oligo sets have the fewest number of “outlier” oligos; which have either (i) an unreasonably low overlap Tm compared to the average Tm for all overlaps, (ii) a sequence that is capable of forming a stable hairpin, or (iii) a sequence that has high potential to mis-pair with one of other oligos in the set. The best oligo set is then selected for further refinement, wherein the outlier oligos are improved. This can be achieved either by shifting the endpoints of the oligo by one or two bases (and simultaneously also the endpoints of the its neighboring adjacent oligo), or alternatively by the introduction of a point base change in the top and bottom strand oligos such that they complement each other. In the second approach, a base change may be found to be in conflict with one of the sequence features that were designed with the Protein2Gene algorithm. For this reason, most Gene2Oligos software packages operate in concert with Protein2Gene algorithms so that any such conflicts can be readily resolved [117, 126, 127, 136]. For example, we have built a variety of “conflict resolution” algorithms into Gene Composer that allow researchers to make an informed choice between a base sequence change that may improve the accuracy of gene assembly versus, one that may improve protein expression [127]. Software Visualization Tools for Gene Design Software tools that allow researchers to design synthetic genes must necessarily have a variety of visualization tools that present distilled data to the user so that they may make informed decisions to resolve conflicts when one or more sequence parameters are being optimized simultaneously. To this end, we have built a number of visualization tools into our gene Gene Composer software, so that our scientists can more effectively make decisions about the design features in synthetic genes [127]. For example, we have created a rare codon histogram viewer that allows the researcher to quickly visualize the density of rare codons in a given gene sequence. Predicted stable mRNA structures are also presented along the nucleic acid sequence as a color coded histogram of calculated local DeltaG of folding. This allows researchers to see the relative contribution of stable mRNA structures as a function of calculated DeltaG of folding at regular intervals along the mRNA. Similar visualization of melting temperature (Tm), and GC content of overlapping assembly oligos have also been developed. These visualization tools are especially useful when used in combination with conflict resolution tools which allow researchers to make an informed choice, for example between having a desired restriction site present in the final gene sequence, or instead, to eliminate a stable predicted mRNA hairpin that results from the presence of the engineered restriction site. As gene design methods become more sophisticated, we anticipate the need to continuously develop several additional visualization tools. MANUFACTURING SYNTHETIC GENES As described above, the combined actions of Protein2Gene and Gene2Oligo software algorithms can be used in combination to derive a set of oligonucleotides building blocks representing both DNA strands of a synthetic gene. In this section, we review how oligonucleotides are synthesized and then assembled into a final gene sequence. We begin this discussion with a description of phosphoramidite methods for oligonucleotide synthesis and the “N-minus mer” problem, which is the source of most of the errors in synthetic genes. We then describe several of the most important strategies and methods for producing synthetic DNA sequences larger than 1 kbp. These gene synthesis
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 317
protocols are presented in chronological order starting from ~1985, coincident with the advent of the polymerase chain reaction by Mullis [163-165]. Commercial Oligonucleotide Synthesis and the N-minus-mer Problem Most of the current gene synthesis protocols make use of oligodeoxynucleotides that are synthesized on a solid-phase matrix such as controlled-pore glass (CPG) beads by automated DNA synthesizers that deploy phosphite triester and phosphoramidite chemistries pioneered by Litsinger and [166-168] and Caruthers [169, 170]. Automated oligo synthesis is a fantastic technology. It is now quite easy to submit an electronic order for 100s of oligonucleotides (30-70 bases long) and receive these within a couple of days by overnight delivery service from any number of competing vendors. The oligos arrive in 96 well plates with a pre-defined array, wherein each well has been accurately quantified with respect to total synthetic material (optical density measurements). Typically, these oligos have also been subjected to partial purification by automated reverse phase chromatography, which eliminates all of the failed synthesis products from the so-called “full-length” material by virtue of the presence of a hydrophobic 5’-O-dimethoxytrityl (DMT) group, which serves as an 5’ terminal tag for purification. However, reverse phase purification does not eliminate the so-called “Nminus mers” because they, like full-length oligos, also have a 5’ DMT group. N-minus mers are a series of defective oligos that are missing one or more base(s) somewhere within the sequence [171]. These N-minus mers are the result of unavoidable ~<1.0% inefficiencies in each round of oligo extension. We and others have found that the incorporation of N-minus 1 mers (missing one base) during gene assembly is the predominant source of mistakes (single base deletions) found in synthetic gene constructs [78, 101, 102, 117, 118, 126, 136, 161, 172, 173]. For this reason, it is worthwhile to discuss in some detail the origin of the N-minus mer problem and strategies for handling or avoiding mistakes during gene synthesis that are caused by Nminus mers. As illustrated in Fig. (4), one way that N-minus mers are formed is when the 5’DMT blocking group is not completely removed (DMT Removal) during one round of elongation, but is then removed in the next or a subsequent round of elongation. As oligos grow in length, a greater quantity of the total starting material ends up having one or more nucleotides missing, essentially skipped during synthesis. Similarly, minor inefficiencies in the Synthon Addition and Capping steps in oligo synthesis will also influence the production of N-minus mers. For example, poor efficiency in phosphoramidite addition to the 5’ OH end of growing oligo coupled with poor efficiency in the capping of such 5’ OH ends will also result in the production of unwanted N-minus mers. In order to analyze the combined effects of inefficiencies in each of the synthetic reactions (DMT Removal, Synthon Addition, and Capping) performed during synthetic oligo elongation, we and others have generated spreadsheet models to calculate the accumulation of N-minus 1 mers as a function of chain length [118, 127]. As shown in Fig. (5), if we assume that DMT Removal, Synthon Addition, and Capping are all 99.5% efficient, we see that the synthesis of a 40 mer will result in ~13% of the total material being N-minus 1 mers. When the chemical reaction efficiency of DMT removal is 99.0%, only slightly less efficient than 99.5%, we see that ~21% of the
318 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
Fig. (4). The Generation N-Minus Mer Contaminants During Automated Solid-Phase Phosphoramidite Synthesis of Oligodeoxynucleotides. Oligodeoxynucleotide chains are synthesized by sequential elongation at their 5’ end, while their 3’-OH group is chemically anchored to a solid support matrix such as a controlled-pore glass (CPG) bead. Each deoxynucleotide elongation involves a four-step, chemical process starting with the “DMT Removal” step where the dimethoxytrityl (DMT) group on the 5’ end of the growing oligo chain is removed using a large excess of trichloroacetic acid (a weak protic acid) carried in an organic solvent phase. This results in the reversible formation of dimethoxytrityl carbocation, which, if not completely flushed away from the surface of the CPG, will react with the 5’OH to reprotect the 5’end of the oligo. Total removal of the DMT is not feasible on micro-scale planar glass surfaces, since the volume of protic acid required would produce unwanted acid depurination events. The illustration portrays the DMT Removal efficiency at 99%. The second step of elongation is called “Synthon Addition”, wherein a 5’-O-dimethoxytrityl (DMT) protected deoxynucleotide (N) phosphoramidite (pmdt-N-dmt) is flowed over the CPG matrix, where it reacts with the free 5’OH groups of the anchored chains. This results in the ejection of the pmdt group and simultaneous formation of a new phosphitetriester internucleotide bond (not shown), thereby lengthening the oligo by a single nucleotide. The illustration portrays the Synthon Addition efficiency at 99%. In the third and fourth steps of an elongation cycle, a “Capping” reaction is carried out, wherein unreacted 5’OH ends or other free hydroxyls are capped by acetylation, thereby preventing any further synthesis of erroneous oligonucleotides, followed by
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 319
“Oxidation” of the phosphitetriester internucleotide bond to phosphotriester (efficiency of capping and oxidation are not illustrated). For the sake of simplicity, we have not shown the various protecting groups that are attached to the base components of the synthons (N=A,G,C, or T).
Fig. (5). Effects of Chain Length and Chemical Reaction Efficiency in Oligonucleotide Synthesis. The percentage of full-length and N-minus 1 mer synthesis products are shown as a function of oligonucleotide chain length during automated solid-phase phosphoramidite oligo synthesis, wherein the DMT Removal reaction efficiency is set at either 99.5% or 99.0%, while Synthon Addition, and Capping efficiencies are at 99.5% efficient.
oligos accumulate as N-minus 1 mers (with Synthon Addition and Capping held at 99.5% efficient). Analyses such as these demonstrate that the quality and quantity of synthesized oligos is extremely sensitive to even small inefficiencies in the chemical reactions of oligo synthesis. The cumulative effects of minor reaction inefficiencies demonstrate that the production of oligos greater than 100 bases in length is quite difficult to achieve. These considerations place practical limitations on the size of oligonucleotides that can be used in gene assembly protocols. Furthermore, the presence of contaminating N-minus 1 mers can have a major impact on the probability for successful manufacturing of an error free synthetic gene. In turn, this influences the overall time and costs to produce a sequence validated synthetic gene. In the next sections of this review, we describe these manufacturing methods and how each of these procedures uses one or more strategies to reduce or eliminate the N-minus mer problem.
320 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Fig. (6). Solid-Phase Gene Assembly From Synthetic Oligonucleotides.
Stewart and Burgin
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 321
Gene Synthesis by Sequential Assembly of Complementary Oligonucleotides Anchored to a Solid-Phase Support In 1988, Beattie et al. described a solid phase method for the assembly of synthetic genes from oligonucleotides [174-176]. This method involves the sequential hybridization of complementary oligonucleotides to a “starting” oligonucleotide that is chemically coupled to an inert support (i.e. Teflon®) as illustrated in Fig. (6). This solidphase method starts with the synthetic production of overlapping ~30-mers, representing the top and bottom strands of the entire gene. These building block oligos are individually fractionated on denaturing polyacrylamide gels by electrophoresis, and the full-length bands are then physically extracted (after being visualized by ultraviolet light transmittance or dye staining procedure). Gel purification of the full-length oligos is one of the most common strategies for dealing with the N-minus mer problem. Once purified, the full-length oligos are phosphorylated at their 5’ ends using T4 polynucleotide kinase and ATP. These oligos are then sequentially hybridized to the growing gene on the solid-phase support. Careful selection of the hybridization temperature and buffer conditions represents a second strategy for dealing with the Nminus mer problem. Under carefully controlled conditions for hybridization, only the proper full-length oligos will stably hybridize to the anchored complementary strand, while those with base changes or deletions will not, and will be washed away [117, 177]. In this way, sequential hybridization represents perhaps, the most powerful strategy for handling the N-minus mer problem in oligo synthesis. After each of the required hybridization assembly steps have taken place, the final synthetic gene is often captured into linearized plasmid vector DNA by enzymatic ligation of the strands, with T4 DNA ligase and ATP. In this case, the ends of the linearized vector DNA are configured with unique restriction site overhangs, one of which is complementary to the planned end of the synthetic DNA. After ligation, the assembled gene duplex with its vector DNA attached is then released from the solidphase by restriction enzyme digestion, using a pre-defined rare cutting endonulcease that will not cut either the vector or the synthetic DNA [175, 176]. Automation of this entire solid-phase gene synthesis protocol by custom liquid handling devices is likely to be one of the forerunning commercial methods for gene synthesis. The FokI Method for Gene Synthesis In 1990, the first totally synthetic plasmid of 2.1 kbp was reported by Mandecki et al.[178, 179]. As illustrated in Fig. (7), these researchers made clever use of in vivo repair DNA synthesis in E. coli to carry out the cloning of carefully designed synthetic oligonucleotides, representing individual portions of the anticipated plasmid [178, 180, 181]. The oligos, 40 to 90 bases long, were designed with terminal sequences that are complementary to sequences at the ends of a linearized recipient plasmid, both of which contained a Fok I restriction site. Fok I is a class IIs endonuclease which recognizes and binds to a 5 base sequence 5’-GGATC-3’ and then introduces a staggered doublestranded break in DNA, 9 bases downstream on the top strand and 13 bases downstream on the bottom strand (4 base overhang) [182]. Efficient strand cleavage by Fok I occurs, regardless of the specific downstream sequence. In this way, Mandecki et al. were able to use Fok I to extract unadulterated duplex segments of synthetic DNA from plasmid preps of the cloned oligos. The released fragments were designed so that they had unique 4-base Fok I overhang sequences that could serve as the cohesive ends for subsequent
322 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
assembly of even larger fragments of synthetic DNA. Careful planning allowed these researchers to assemble 25 different Fok I fragments with T4 DNA ligase and ATP into plasmid molecules that were cloned by virtue of their ability to confer stable antibiotic resistance to E. coli cells that had been transfected with the synthetic DNA material. This represents one of the first uses of in vivo selection to identify properly synthesized DNA molecules, and forms the third basic strategy for eliminating defective DNAs that might result from the mis-assembly due to the presence of N-minus mers (or any other defective oligo types). As described below, Smith et al. have used a similar in vivo selection strategy to isolate synthetic ΦΧ174 bacteriophage genomes [102]. It is noteworthy that the Fok I method of Mandecki et al. is one of the first gene synthesis methods to avoid synthesis of a bottom strand oligo (or top strand depending on your perspective) [178, 181], thereby reducing by half the number of oligo bases that need to be synthesized, as compared to the aforementioned solid-phase gene synthesis method. In the early 1990s, the cost of purchasing expensive oligonucleotides seriously limited the ability of most labs to make use of gene synthesis. As such, there was a drive to reduce the cost of gene synthesis by avoiding the purchase of a second oligo strand when the first would contain all the information required for a synthetic gene. Indeed, this was an important consideration for Kalman et al. who described an in vitro enzymatic DNA synthesis method that also avoided the need to synthesize second strand oligos [121]. They used the Klenow DNA polymerase [183, 184] to convert the single strand oligos into duplexes using linearized vector DNA as primer. In this case, synthetic oligonucleotides of 65 to 80 bases in length were designed to encode a yeast codon optimized version of human serum albumin. The oligos carried unique 3’ and 5’ adapter sequences so that they could be unidirectionaly ligated into a linearized plasmid vector that was then subjected to Klenow polymerase mediated fill in to complete the duplex. The sequential unidirectional cloning of single-stranded oligonucleotides in this manner, allowed the full-length “yeastized” gene for human serum albumin to be constructed. Gene Synthesis by Polymerase Cycling Assembly (PCA) The synthetic power of the polymerase chain reaction (PCR) for the amplification nucleic acids forms one of the key technological advances from which many of the current whole gene synthesis procedures have their foundation [163-165]. Starting in ~1988, the commercial availability of reasonably priced oligonucleotides and thermostable Taq DNA polymerase formed a powerful mix that eventually led researchers to think about ways to use PCR for the construction of synthetic genes. In 1990, Dillon and Rosen reported the first use of PCR to generate a synthetic gene (HIV rev) from oligonucleotide templates [185]. Until this time, most synthetic genes were produced by assembly methods described above, involving the ligation of overlapping synthetic oligonucleotides in one form or another. Through the 1990’s, there were several reports of incremental improvements in PCR-based synthetic gene assembly (Polymerase Cycling Assembly, or PCA) from oligonucleotide templates [107, 186188]. In general, the improvements in PCA during this time, were focused on reducing the number of steps required to produce the largest possible synthetic gene. Specifically, researchers focused their attention on the development of reproducible PCA methods for the assembly synthetic genes from a single pool of a large number of oligonucleotides. For example, Stemmer et al. reported in 1995, the PCR assembly of a synthetic beta lactamase gene from 56 overlapping top and bottom strand 40-mer oligos as illustrated
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 323
Fig. (7). Fok I Method of Gene Synthesis. The sequence of steps in the preparation of small synthetic gene fragments of ~400 base pairs is shown schematically. The subsequent ligationmediated assembly of multiple such ~400 base pair fragments allowed Mandecki et al. to assemble the first synthetic plasmid molecule that was selected for replication competency by its ability to confer antibiotic resistance to E. coli (plasmid assembly not shown) [178, 179].
324 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
in Fig. (8) [188]. In this method, overlapping top and bottom strand oligos are pooled and then subjected to several cycles of denaturation, renaturation, and polymerization (also known as Overlap Extension PCR, OE-PCR). In the second step of PCA, an excess of outside flanking primers is added to the OE-PCR products and the mix is then subjected to several rounds of true PCR amplification. Typically, the flanking primers contain unique restriction endonuclease sites that facilitate subsequent cloning of the PCR products. The use of a single large pool of oligos for PCA production of synthetic genes looks easy on paper. However, we and others have observed that PCA methods involving pools of large numbers of oligonucleotides often fail to produce the desired full-length gene product [78, 107, 126, 127, 161, 172]. Failure of the PCA method can usually be attributed to the production of mis-primed OE-PCR products which are more likely to form as the number of pooled oligos increases, and as the variance in oligo concentration increases [102]. Mis-primed OE-PCR products are also more likely to form as the variance in the Tm of overlaps between top and bottom strands increases. This variable is a function of both the number of oligos in the pool, and average overlap length. In order to improve the successful outcome of gene synthesis by PCA, several modifications to the basic PCA method have been described [78, 95, 107, 172]. For example, the success rate for assembly of full-length genes can be considerably improved by conducting PCR assembly of several smaller overlapping gene fragments from sub-pools of oligos (<10 oligos per pool). Once the successful production of each overlapping gene fragment has been confirmed by gel electrophoresis, they can all be pooled and then amplified by PCR, with an excess of flanking primers covering the 5’ and 3’ ends of the desired gene. In addition, we and others have generated Gene2Oligo software tools that design the oligo building blocks so that their overlap regions are thermodynamically balanced with low variance in Tm [78, 126, 127, 161]. Together, these strategies greatly improve the success rate for gene assembly by PCA. Improved Gene Synthesis by PCA in Combination With Base Mis-Match Surveillance Most PCA-based gene synthesis methods suffer from a mutation rate of ~2 to 4 point mutations per 1 kbp of synthetic gene product [78, 126, 127, 172]. Most often these mutations are found to be single point deletions, or to a much lesser extent, base changes. The single base deletions are undoubtedly the result of the N-minus 1 mer problem in oligo synthesis described above, while base changes are thought to result from polymerase mistakes. In the face of this mutation rate, there are two general approaches for obtaining a final sequence validated synthetic gene. The first approach is to sequence a large number of individual clones until one is found to be completely accurate. The second approach is to repair the mistakes using site-directed mutagenesis with QuickChange® (www.stratagene.com) [189, 190], and/or by reconstructing the gene from sub-cloned gene fragments that are free of mutations. This second approach is often facilitated by the engineering of unique restriction sites spaced every ~200 base pairs, so that desired fragments can be easily pieced together by unidirectional subcloning, a strategy that also allows the facile generation of complicated gene variants by cassette mutagenesis [191].
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 325
Fig. (8). Gene Synthesis by Polymerase Cycling Assembly (PCA) of Overlapping Oligonucleotides. Described in the body text, the generalized PCA method illustrated in this figure is the most common laboratory method for gene synthesis from overlapping oligos.
However, repairing mistakes in synthetic genes and/or sequencing large numbers of clones is an undesirable mode of operation, since it adds considerable time and cost to obtaining the desired synthetic gene. For this reason, we and others have used mis-match specific endonucleases to affect the elimination of synthetic DNA strands that have mutations [117, 127, 173]. As illustrated in Fig. (9), Young and Dong reported a much reduced error rate for PCA gene synthesis through the use of T7 endonuclease I in the final stages of PCA-based gene synthesis [173]. Specifically, their procedure involves the melting and re-annealing of full-length synthetic PCR gene products to produce heteroduplexes, some of which will have base mis-matches where the top and bottom strands do not perfectly match. T7 endonuclease I (a junction resolvase [192-194]) is then used to carry out double-strand breakage of the heteroduplexes at mis-match sites.
326 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
Fig. (9). Gene Synthesis By Dual-Asymmetric PCR of Short Oligo Building Blocks and Overlap-Extension PCA in Combination with T7 Endonulcease I-Mediated Elimination of Mutant Strands. In 2004, Young and Dong reported a highly accurate method for gene synthesis, involving the use of short 25 base oligonucleotide building blocks with 10-15 base pair overlaps between top and bottom strands, and the use of T7 endonuclease I to reduce the number of mutant gene synthesis products as described in the body text.
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 327
In this way, hetero-duplexation exploits the duplex nature of DNA to carry out a simple parity check for accurate gene assembly. Any heteroduplexes that lack parity due to a mistake in either the top or bottom strands will have a mis-match. After digestion of the mis-matched heteroduplexes, the accurately synthesized full-length DNA molecules can be isolated away from the cleaved mutants by simple gel purification. Other mis-matchspecific endonucleases include S1 nuclease [195-197], T4 endonuclease VII [198], CEL 1 [199-201], and the MutHLS proteins [202]. These enzymes represent efficient tools for greatly improving the accuracy of gene synthesis. Another approach to improving the accuracy of PCA-based gene assembly is to actually reduce the size of oligo overlap regions to between 10 and 15 bases by using 25mer oligo building blocks as illustrated in Figs, (6 and 9). In this manner, defective oligos with internal deletions are less likely to participate in PCA, because there is a greater relative Tm difference between accurate annealing events, as compared to inaccurate annealing events involving a mutant oligo [173]. However, this approach has the disadvantage that in order to achieve accurate PCA of such short oligos, the number of oligos per sub-pool must be limited to ~ 4 oligos, and dual-asymmetric PCR (DAPCR) methods must be deployed [187]. For a 1 kbp gene, the number of required overlapping sub-pools of oligos is ~40, representing a large number of pipetting events. Furthermore, each of the DA-PCR fragments must be visualized (e.g. gel chromatography) before proceeding further with PCR amplification of overlapping DAPCR products. Synthesis of a Bacteriophage Genome by Modified PCA Methods In 2003, Smith, Venter, and their colleagues reported the synthesis of the ~5.4 kbp double-stranded circular replicative form of the ΦΧ174 bacteriophage genome [102]. This was the second successful synthetic viral genome to be produced. The first was that of the poliovirus reported by Cello et al. in 2002 [101]. As illustrated in Fig. (10), the gene synthesis methods used by Smith et al. were similar to the PCA methods described above. However, in order to achieve the successful assembly of the phage genome from a single pool of ~260 overlapping oligos, these researchers made two subtle improvements to PCA that are worth noting. First, they designed the overlapping top and bottom strands to be 42-bases long, which facilitated the one-step gel purification of all top strand oligos and all bottom strand oligos (as separate pools) to partially eliminate Nminus mers. Gel purification does not eliminate all N-minus 1 mers, because their electrophoretic separation pattern on denaturing gels always overlaps, at least partially, that of the full-length oligos. Second, the gel purified oligos were phosphorylated at their 5’ ends with T4 polynucleotide kinase, so that they could be ligated by Taq ligase at 55 degrees Celsius after being pooled and annealed into duplexes representing partial genome fragments. This high-temperature ligation step is important for such a large pool of oligos, since it effectively lengthens the overlaps between top and bottom strands, which thereby reduces the chance of mis-priming in the subsequent genome assembly by PCA. The final step of ΦΧ174 genome synthesis involves converting gel-purified linear genomes into circular molecules by intra-molecular ligation of cohesive ends that had previously been generated by restriction endonuclease digestion at engineered end sequences. Other than gel purification of oligos, the Simth et al. procedure makes no other attempt to solve the N-minus mer mutation problem, that is the source of mutation in
328 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
Fig. (10). Smith et al. Method for the Synthesis of the Replicative form of the ΦΧ174 Bacteriophage Genome. Described in the body text, this PCA method for genome assembly is prone to very high error rates. However, error rate is practically irrelevant when replication competency is used to select for viable genomes.
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 329
most current gene synthesis methods. However, no additional effort was required in this regard, since there is no need to eliminate the 1057 expected mutant versions of the ΦΧ174 genomes, when only a handful of replication competent genomes is required for a successful genome synthesis. Moreover, ΦΧ174 is notoriously error prone in its replication, and it seems reasonable to expect that even if no fully viable ΦΧ174 replicative forms were produced in the test tube, that once the mutant pool of circular genomes were electroporated into E. coli hosts, there would have been a likely large number of possible recombinations that could occur to produce a viable genome. Such viable genomes were actually very easy to pick by virtue of their halo of destroyed bacterial hosts. In this way, replication competency represents an extreme natural selection strategy for circumventing the error rate problem that is associated with the assembly of large DNA molecules from synthetic oligos. The Smith & Venter team anticipate to use the same approach to generate a minimal cellular genome containing the ~300 essential genes from E. coli [102]. This raises the specter that gene synthesis may enable the generation of new life forms, an area that needs to be carefully considered before attempting. Gene Assembly from Chip-Eluted Oligonucleotides One of the biggest cost factors in whole gene synthesis is related to the cost of oligonucleotides, and the inefficiencies embedded in the efforts that are required to identify and eliminate or repair mistakes introduced by improperly synthesized oligonucleotides. Gene synthesis methods that utilize PCR for amplification of templates, have the benefit of requiring relatively little total oligonucleotide as starting material, however, traditional methods of oligonucleotide synthesis on CPG beads results in relatively large quantities of material. There are a variety of recent technologies that allow oligonucleotides to be synthesized at relatively small scale. As of late 2004, efficient photo-programmable microfluidic picoarray technology, enabled by photo-activated acid chemistry, can produce ~4,000 different 70 base oligos on silicon wafers in only a few hours [118, 177]. On similar time scales, high-speed in-jet printer technology can produce ~25,000 different 60 base oligos on a single glass slide using standard chemistries [203]. Furthermore, rapid photo-lithographic oligo synthesis, enabled by photolabile 5’ blocking groups, can produce up to ~350,000 different 60 base oligos on a glass slide for as little as $700 [204]. Clearly, these technologies have the power to make gene synthesis a relatively inexpensive endeavor. However, despite these impressive numbers, it is worth noting that none of the chipbased approaches for oligo synthesis are immune to the N-minus mer problem. One potential improved solution to the N-minus mer problem can be achieved by reducing the number of chemical steps in oligo synthesis. Indeed, Caruthers and his team at the University of Colorado have recently developed a novel two-step cycle solid-phase oligodeoxynucleotide synthesis procedure that employs the use of peroxy anion deprotection of a 5’ aryloxycarbonyl group, which is substituted for DMT in the classic four-step solid phase synthesis procedure [205]. If this chemistry can be applied to the chip-based synthesis approaches, then we may see considerable improvements in the quality of oligo synthesis. Another potentially very powerful approach for eliminating the N-minus mer problem was reported by Gao and Church in the twilight of 2004 [117, 118]. As illustrated in Fig. (11), these researchers used photo-programmable microfluidic
330 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
picoarray technology to synthesize ~1,000 different 70 base oligonucleotides in addressable chambers on a silicon wafer. Each oligo was prepared with a 3’ uracil residue that allowed elution of the oligos from the chip into a single pool, by ammonium hydroxide treatment (used for standard deprotection). Moreover, each oligo was synthesized with generic adaptor sequences at their 5’ and 3’ ends; 10 bases long, that allowed the chip-eluted oligo pool to be amplified by PCR using common 20-mer amplification primers to yield longer 90 base pair duplexes. Embedded within the generic flanking sequences were type IIs restriction endonuclease cleavage sites that allowed the internal unadulterated 50 base sequences to be liberated by type IIs endonuclease digestion. Meanwhile, because the microfluidic picoarray method produces oligonucleotides at such low cost, the authors also used this technology to synthesize pools of 1,000s of complementary “selection” oligonucleotides that were
Fig. (11). Preparation and Purification of Chip-Eluted Oligo Pools for Multiplex Gene Synthesis. Described in the body text, the procedure illustrated below generates a pool of 1,000s of unique 50-mer oligos that can be used as a single pool for the multiplex PCA assembly of numerous synthetic genes encoded therein.
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 331
designed with sequences that are complementary to either the “left” or “right” halves of the desired 50-mers. Immobilization of the left and right half “selection” oligo pools to magnetic beads allowed the use of sequential hybridization under carefully controlled annealing temperatures to selectively isolate the accurate 50-mers away from the IIs enzyme released ends, and any defective N-minus mer strands that may have been produced during synthesis. With this approach, a pool of 1,000s of accurately prepared 50-mers, each with a different sequence, were isolated into a single pool for subsequent PCR-based assembly of numerous genes encoded therein. Using this technology, expression-optimized versions of all 21 genes that encode the proteins of the Escherichia coli 30S ribosomal subunit were synthesized by multiplex PCA methods from a single pool of chip-eluted oligos [117]. Because the oligos were purified by sequential hybridization with left and right half selection sequences, the gene synthesis error rate was reduced to 1 mistake per ~7,000 bp, almost an order of magnitude improvement over other gene synthesis methods. Richmond et al. have also described the amplification and assembly of chip-eluted DNA (AACED) as a method for high-throughput gene synthesis [204]. These approaches dramatically change the cost structure of gene synthesis. Instead of genes costing 1-2 dollars per base pair (current January 2005 prices), genes can in theory be synthesized at 1-2 kilobase pair per dollar or less. Further Possible Improvements for Gene Synthesis: Microfluidics, Templated Chemical Ligation, and Sequence Validation It should also be noted that the solid-phase support gene assembly method originally described by Beattie et al. [176] and illustrated in Fig. (6), still has continuing application especially when one considers the possible use of microfluidic technologies to establish geometrical constraints that direct sequential small scale fluid-flows after oligo-elution from photo-programmable picoarrays. In essence, this would be equivalent to a micro-fluidic version of the Beattie et al. method, using chip-eluted oligos. Another potential for technological improvement in gene synthesis methods may be found in the development of 5’ and 3’ reaction chemistries that will allow two oligonucleotides to chemically react [206, 207] to form a true phosphodiester backbone, “if and only if” those two oligonucleotides are properly annealed adjacent to each other on a complementary strand [208]. Ideally, this “templated chemical ligation” procedure would only operate when oligonucleotides were perfectly annealed to complementary strands, and perhaps, only triggered with light using photo-activated 5’ and 3’ chemistries. Similar “fragment self-assembly” methods (also called Dynamic Combinatorial Chemistry) have been applied for the discovery of novel small molecule inhibitors of proteins from pools of reactive small molecules that couple to each other, when at least two fragments bind to the protein next to each other [209, 210]. All synthetic genes must usually be validated through DNA sequencing, which is an indirect but nevertheless, real cost associated with the production of synthetic genes. Tian et al. have anticipated that improvements in the error rates of gene synthesis could possibly eliminate the need to sequence synthetic DNA molecules [117]. However, until such time, the cost of gene synthesis will eventually run into limits imposed by the cost of sequencing reactions. For this reason, there is a need to develop ultra-low cost DNA sequencing technologies [211].
332 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
CONCLUSION It would be difficult to imagine conducting gene research, without the ready supply of tailor-made synthetic oligodeoxynucleotides. Likewise, it is rapidly becoming difficult to imagine conducting gene research without a ready supply of synthetic genes. All of the modern gene synthesis methods in practice today, still deploy one or more of the chemical and/or enzymatic strategies worked out by Khorana over 30 years ago, to produce the first synthetic tRNA genes. Most importantly, all synthetic genes are assembled from synthetic oligonucleotide building blocks that are designed to form ordered double-helical complexes through complementary inter-molecular base-pairing. Though this may appear to be an obvious requirement of gene synthesis, it is worth noting that in theory, base-pairing would not be required if there were efficient synthetic methods that could chemically ligate the 5’ and 3’ ends of single strands to each other. Perhaps, some future gene synthesis methods may work this way (chemical or enzymatic), but so far, we have not seen or heard of any synthetic gene manufacturing method that does not exploit in some way, the inherent self-complementary nature of duplex DNA. Since Khorana’s first gene synthesis, considerable improvements in solid-phase oligonucleotide synthesis have been made, PCR was invented, and high-fidelity small scale synthesis of oligonucleotides on silicon wafers or glass chips have been developed. Innovative combinations of these technologies have produced cost effective methods for gene synthesis. Soon, genes and genomes will be relatively inexpensive. Coupled with manipulations of the genetic code, both in vitro and in vivo, that allow un-natural amino acids to be used in protein biosynthesis, this will generate enormous opportunities for the production of new polymeric materials including new textiles and enzymes for industrial applications. As such, , the availability of gene design software packages that can distill complicated biological information into a highly engineered gene sequence will grow in type and sophistication. For example, since rare codons have been well-documented to have a negative impact on protein expression in E. coli and other organisms, one of the first and most important applications of complete gene synthesis is directed at exploiting the degeneracy of the genetic code to improve the heterologous expression of recombinant proteins. The availability of genome sequence information has enabled the compilation of codon bias databases that can be used with increasing sophistication to design optimized open reading frame sequences for proteins. The availability of such information has led to a demand for new software programs that can simultaneously exploit the degeneracy of the genetic code, while being mindful of biophysical properties of nucleic acids, to design optimal DNA coding sequences that can be readily synthesized by PCA or solid-phase assembly methods from synthetic oligonucleotides. The synthesis genes of 1 kilobase pair genes is now routine in many labs, and several biotechnology companies have dedicated their business activities to the synthesis of genes. To exploit the huge potential of whole-genome sequence information, the ability to efficiently and inexpensively synthesize long, accurate DNA sequences is becoming increasingly more important. Over the last two years, DNA chip technology has been applied to whole gene synthesis, and has clearly demonstrated that large synthetic genes can be made cost effectively. Over the next few years, complete gene synthesis will likely become the primary method for the construction of protein expression constructs,
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 333
thereby eliminating the need to isolate cDNA clones for purposes other than identifying splice variants. These technologies should aid drug discovery by enabling a more cost effective supply of protein targets for biochemical characterization, ligand screening, and structural studies. Furthermore, gene synthesis will enable the total engineering of cost effective metabolic pathways for the production drug molecules that would otherwise be very difficult to produce synthetically [95, 96, 212]. The Current Economics of Gene Synthesis and Potential for Further Improvements Due to the dropping prices for synthetic oligonucleotides and sequencing reactions over the past 10 years, the economics of gene synthesis has reached a point where it is easier, more reliable, and less expensive to simply order a gene from one of a number of competing gene synthesis vendors than it is to source, purchase, and validate a cDNA clone (let alone isolate the cDNA yourself). The dropping prices for oligonucleotides over the last 15 years has been driven by the huge demand for the use of oligonucleotides in virtually all aspects of molecular biology, from priming PCR reactions to probing tissue sections for mRNA distribution. This demand has created opportunity for profit to be earned by those companies who could produce, purify, and validate oligonucleotides, together with the suppliers of chemical reagents and instruments that enable oligonucleotide manufacturing. The opportunity for profit has spawned the formation of numerous oligonucleotide suppliers and massive investment in innovation and instrumentation to improve synthesis efficiencies from the supply chain from raw materials to final product. This classic market driven economy for oligonucleotides has driven prices down to near commodity levels. However, the potential demand for synthetic genes and their oligonucleotide building blocks, is anticipated to place an even greater accelerated demand for low cost synthetic genes. We believe, those companies who can figure out how to consistently and profitably produce high quality, pure, full-length oligonucleotides at prices lower than their competition, will be the long-run survivors in the gene synthesis marketplace. One of the less appreciated features of whole gene synthesis is the ability to accurately design synthetic gene sequences with desired features. As the bioinformatics knowledge base grows, there will be an increasing demand for synthetic genes that incorporate multiple overlapping information content elements. The human mind is unable to hold this level of information in its decision making process and therefore, computer algorithms that can encompass known bioinformatics constraints will be increasingly in demand. Those companies that can provide the tools to design synthetic gene sequences will be equally important in the whole gene synthesis marketplace. The demand for synthetic genes will be closely correlated with the availability of increasingly sophisticated software tools that can distill massive amounts of biological information and multiple layers of genetic coding information, into a single nucleotide sequence or group of sequences. Biotechnology is a maturing industry, and the standardization of design and engineering methods such as whole gene synthesis will likely supplant many of the more traditional hands on multi-step molecular biology procedures and kits. For this reason, gene synthesis technology has caught the eye of computer scientists and mechanical engineers who see genes as a new type of Tinkertoy® that can be used to construct new gene circuits or even new life forms. Software programs for the modeling of gene activity have proved to be useful tools for testing ideas about the
334 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Stewart and Burgin
possible expected results from engineered gene circuits [213-217]. As gene activity modeling software becomes more sophisticated, and the prices for synthetic genes continue to drop, the ability of engineers to generate new life forms seems highly likely. Creating New Life Forms? Finally, it should be noted that the possibilities of synthetic biology that are opened up by efficient and cost-effective gene synthesis have major ethical ramifications. What will happen when we open the Wall Street Journal or New York Times to see front-page articles announcing the creation of a new life form from a synthetic genome? We anticipate that a public outcry will trigger a series of congressional hearings, followed by new legislation on how such synthetic biology should be managed. Such political and ethical considerations are not the scope of this review. However, the ethics of gene synthesis is a very important topic that both academic and industrial gene engineers need to consider carefully as they push the envelope of discovery. We look forward to participating in open conversations on the subject. In that regard, it should be noted it is encouraging to see researchers in scientific circles [218] and in the popular press [219], openly proposing a community wide discussion on synthetic biology. Before leaving his Caltech office for the last time in 1988, the famous Nobel Prize winning physicist Richard Feynman wrote on his chalkboard the statement “What I cannot create I do not understand.” Since all life forms as we know them use DNA as their genetic material, we can conclude from Feynman’s statement that whole gene synthesis technology represents the tool by which researchers can begin to fully understand life forms, actually by creating life forms [220]. ACKNOWLEDGEMENTS We wish to thank the entire staff at deCODE biostructures for their contributions to the development of our Gene Composer software and methods for manufacturing synthetic genes. Special thanks to John Walchli and Kathryn Hjerrild who have made Gene Composer and gene synthesis a reality on our lab. We also wish to thank, Jeffery Gulcher for sharing his thoughts on genomic information content, Barry Springer and Sung-Huo Kim for historical information, and John Mulligan for his input and thoughts on the past, present, and future of gene synthesis. ABBREVIATIONS DNA
= Deoxyribonucleic acid
RNA
= Ribonucleic acid
Poly-U
= Polyuridilic acid
oligo
= Poly-deoxyribonucleic acid or oligodeoxynucleotide or oligonucleotide
G
= Guanosine
A
= Adenosine
C
= Cytosine
Whole Gene Synthesis
Frontiers in Drug Design & Discovery, 2005, Vol. 1 335
T
= Thymidine
U
= Uracil
tRNA
= Transfer RNA
ORF
= Open reading frame
ADH
= Alcohol dehydrogenase
AcNPV
= Autographa californica Nuecleopolyhedrovirus
ENCODE
= ENCyclopedia of DNA Elements Project Consortium
siRNA
= Short interfering RNA
CPU
= Central processing unit
tmRNA
= Transfer messenger RNA
cDNA
= Duplex copy DNA (copy of mRNA)
EST
= Expressed sequence tag
kbp
= Kilobase pair
GFP
= Green fluorescent protein
Prot2Gene
= Software to convert a protein sequence to a new gene sequence
CUT
= Codon usage table
HX
= Highly expressed
HX CUT
= Codon usage table for highly expressed proteins
HyCUT
= Hybrid codon usage table for more than one organism
HetCUT
= Hybrid codon usage table for heterologously expressed proteins
CAI
= Codon adaptation index
GUI
= Graphical user interface
SD
= Shine-Delgarno sequence
RE
= Restriction endonulease
∆G
= DeltaG free energy of folding
Gene2Oligo
= Software to convert a gene sequence to a set of overlapping oligos
PCR
= Polymerase chain reaction
Tm
= Melting temperature
CPG
= Controlled-pore glass
DMT
= 5’ O-dimethoxytrityl group
Teflon®
= A registered trademark of Dupont, Inc.
336 Frontiers in Drug Design & Discovery, 2005, Vol. 1
ATP
= Adenosine triphosphate
HIV
= Human immunodeficiency virus
PCA
= Polymerase cycling assembly
OE-PCR
= Overlap extension PCR
Stewart and Burgin
QuickChange® = A registered trademark of Stratagene, Inc DA-PCR
= Dual-asymmetric PCR
AACED
= Amplification and assembly of chip-eluted DNA
Tinkertoy®
= A registered trademark of Hasbro, Inc
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27]
Watson, J. D.; Crick, F. H. Nature, 1953, 171, 737-8. The eighth day of creation: Makers of the revolution in biology. written by Judson, H. F. A Touchstone Book by Simon and Schuster, New York, 1979. Nirenberg, M. W.; Matthaei, J. H. Proc. Natl. Acad. Sci. USA, 1961, 47, 1588-602. Crick, F. H.; Barnett, L.; Brenner, S.; Watts-Tobin, R. J. Nature, 1961, 192, 1227-32. Grunberg-Manago, M.; Oritz, P. J.; Ochoa, S. Science, 1955, 122, 907-10. Mii, S.; Ochoa, S. Biochim. Biophys. Acta, 1957, 26, 445-6. Singer, M. F.; Heppel, L. A.; Hilmoe, R. J. Biochim. Biophys. Acta, 1957, 26, 447-8. Nishimura, S.; Jacob, T. M.; Khorana, H. G. Proc. Natl. Acad. Sci. USA, 1964, 52, 1494-501. Leder, P. Nirenberg, M. Proc. Natl. Acad. Sci. USA, 1964, 52, 420-7. Soll, D.; Ohtsuka, E.; Jones, D. S.; Lohrmann, R.; Hayatsu, H.; Nishimura, S.; Khorana, H. G. Proc. Natl. Acad. Sci. USA, 1965, 54, 1378-85. Holley, R. W.; Apgar, J.; Everett, G. A.; Madison, J. T.; Marquisee, M.; Merrill, S. H.; Penswick, J. R.; Zamir, A. Science, 1965, 147, 1462-5. Crick, F. H. J. Mol. Biol., 1966, 19, 548-55. Evolution of the Genetic Code.. written. by. Osawa, S. Oxford University Press, Oxford, 1995. Martin, R. G.; Matthaei, J. H.; Jones, O. W.; Nirenberg, M. W. Biochem. Biophys. Res. Commun., 1962, 6, 410-4. Matthaei, J. H.; Jones, O. W.; Martin, R. G.; Nirenberg, M. W. Proc. Natl. Acad. Sci. USA, 1962, 48, 666-77. Khorana, H. G.; Buchi, H.; Ghosh, H.; Gupta, N.; Jacob, T. M.; Kossel, H.; Morgan, R.; Narang, S. A.; Ohtsuka, E.; Wells, R. D. Cold. Spring. Harb. Symp. Quant. Biol., 1966, 31, 39-49. Khorana, H. G. Harvey. Lect., 1966, 62, 79-105. Nirenberg, M. Trends. Biochem. Sci., 2004, 29, 46-54. Gupta, N. K.; Ohtsuka, E.; Sgaramella, V.; Buchi, H.; Kumar, A.; Weber, H.; Khorana, H. G. Proc. Natl. Acad. Sci. USA, 1968, 60, 1338-44. Richardson, C. C. Proc. Natl. Acad. Sci. USA, 1965, 54, 158-65. Cozzarelli, N. R.; Melechen, N. E. Biochem. Biophys. Res. Commun., 1967, 28, 578-86. Gellert, M. Proc. Natl. Acad. Sci. USA, 1967, 57, 148-55. Zimmerman, S. B.; Little, J. W.; Oshinsky, C. K.; Gellert, M. Proc. Natl. Acad. Sci. USA, 1967, 57, 1841-8. Agarwal, K. L.; Buchi, H.; Caruthers, M. H.; Gupta, N.; Khorana, H. G.; Kleppe, K.; Kumar, A.; Ohtsuka, E.; Rajbhandary, U. L.; Van de Sande, J. H.; Sgaramella, V.; Weber, H.; Yamada, T. Nature, 1970, 227, 27-34. Khorana, H. G.; Agarwal, K. L.; Buchi, H.; Caruthers, M. H.; Gupta, N. K.; Kleppe, K.; Kumar, A.; Otsuka, E.; RajBhandary, U. L.; Van de Sande, J. H.; Sgaramella, V.; Terao, T.; Weber, H.; Yamada, T. J. Mol. Biol., 1972, 72, 209-17. Stadtman, T. C. Annu. Rev. Biochem., 1996, 65, 83-100. Blight, S. K.; Larue, R. C.; Mahapatra, A.; Longstaff, D. G.; Chang, E.; Zhao, G.; Kang, P. T.; GreenChurch, K. B.; Chan, M. K.; Krzycki, J. A. Nature, 2004, 431, 333-5. Epub 2004 Aug 25.
Whole Gene Synthesis [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 337
Hao, B.; Gong, W.; Ferguson, T. K.; James, C. M.; Krzycki, J. A.; Chan, M. K. Science, 2002, 296, 1462-6. Srinivasan, G.; James, C. M.; Krzycki, J. A. Science, 2002, 296, 1459-62. Barrell, B. G.; Bankier, A. T.; Drouin, J. Nature, 1979, 282, 189-94. Yamao, F.; Muto, A.; Kawauchi, Y.; Iwami, M.; Iwagami, S.; Azumi, Y.; Osawa, S. Proc. Natl. Acad. Sci. USA, 1985, 82, 2306-9. Santos, M. A.; Perreau, V. M.; Tuite, M. F. EMBO J., 1996, 15, 5060-8. Kawaguchi, Y.; Honda, H.; Taniguchi-Morimura, J.; Iwasaki, S. Nature, 1989, 341, 164-6. Horowitz, S.; Gorovsky, M. A. Proc. Natl. Acad. Sci. USA, 1985, 82, 2452-5. Osawa, S.; Jukes, T. H.; Watanabe, K.; Muto, A. Microbiol. Rev., 1992, 56, 229-64. Bulmer, M. Genetics, 1991, 129, 897-907. Del Tito, B. J., Jr.; Ward, J. M.; Hodgson, J.; Gershater, C. J.; Edwards, H.; Wysocki, L. A.; Watson, F. A.; Sathe, G.; Kane, J. F. J. Bacteriol., 1995, 177, 7086-91. Baca, A. M.; Hol, W. G. Int. J. Parasitol., 2000, 30, 113-8. Kane, J. F. Curr. Opin. Biotechnol., 1995, 6, 494-500. Bjork, G. R.; Durand, J. M.; Hagervall, T. G.; Leipuviene, R.; Lundgren, H. K.; Nilsson, K.; Chen, P.; Qian, Q.; Urbonavicius, J. FEBS Lett., 1999, 452, 47-51. Bregeon, D.; Colot, V.; Radman, M.; Taddei, F. Genes. Dev., 2001, 15, 2295-306. Urbonavicius, J.; Qian, Q.; Durand, J. M.; Hagervall, T. G.; Bjork, G. R. EMBO J., 2001, 20, 4863-73. Karlin, S.; Mrazek, J.; Campbell, A.; Kaiser, D. J. Bacteriol., 2001, 183, 5025-40. Sharp, P. M.; Tuohy, T. M.; Mosurski, K. R. Nucleic Acids Res., 1986, 14, 5125-43. Sharp, P. M.; Devine, K. M. Nucleic Acids Res., 1989, 17, 5029-39. Bennetzen, J. L.; Hall, B. D. J. Biol. Chem., 1982, 257, 3026-31. Ikemura, T. Mol. Biol. Evol., 1985, 2, 13-34. Moriyama, E. N.; Powell, J. R. J. Mol. Evol., 1997, 45, 514-23. Duret, L. Trends. Genet., 2000, 16, 287-9. Plotkin, J. B.; Robins, H.; Levine, A. J. Proc. Natl. Acad. Sci. USA, 2004, 101, 12588-91. Epub 2004 Aug 16. Carlini, D. B.; Stephan, W. Genetics, 2003, 163, 239-43. Carlini, D. B. J. Evol. Biol., 2004, 17, 779-85. Ranjan, A.; Hasnain, S. E. Indian. J. Biochem. Biophys., 1995, 32, 424-8. Levin, D. B.; Whittome, B. J. Gen. Virol., 2000, 81, 2313-25. Guerdoux-Jamet, P.; Henaut, A.; Nitschke, P.; Risler, J. L.; Danchin, A. DNA Res., 1997, 4, 257-65. Gutman, G. A.; Hatfield, G. W. Proc. Natl. Acad. Sci. USA, 1989, 86, 3699-703. Boycheva, S. S.; Bachvarov, B. I.; Berzal-Heranz, A.; Ivanov, I. G. Curr. Microbiol., 2004, 48, 97101. Boycheva, S.; Chkodrov, G.; Ivanov, I. Bioinformatics, 2003, 19, 987-98. The ENCODE (Encyclopedia of DNA Elements) Project. Science, 2004, 306, 636-40. Ast, G. Nat. Rev. Genet., 2004, 5, 773-82. Seeburg, P. H. Neuron, 2002, 35, 17-20. Hannon, G. J.; Rossi, J. J. Nature, 2004, 431, 371-8. Rinn, J. L.; Euskirchen, G.; Bertone, P.; Martone, R.; Luscombe, N. M.; Hartman, S.; Harrison, P. M.; Nelson, F. K.; Miller, P.; Gerstein, M.; Weissman, S.; Snyder, M. Genes Dev., 2003, 17, 529-40. Bertone, P.; Stolc, V.; Royce, T. E.; Rozowsky, J. S.; Urban, A. E.; Zhu, X.; Rinn, J. L.; Tongprasit, W.; Samanta, M.; Weissman, S.; Gerstein, M.; Snyder, M. Science, 2004, 11, 11. Itakura, K.; Hirose, T.; Crea, R.; Riggs, A. D.; Heyneker, H. L.; Bolivar, F.; Boyer, H. W. Science, 1977, 198, 1056-63. Groger, G.; Ramalho-Ortigao, F.; Steil, H.; Seliger, H. Nucleic Acids Res., 1988, 16, 7763-71. Ferretti, L.; Karnik, S. S.; Khorana, H. G.; Nassal, M.; Oprian, D. D. Proc. Natl. Acad. Sci. USA, 1986, 83, 599-603. de Vos, A. M.; Tong, L.; Milburn, M. V.; Matias, P. M.; Jancarik, J.; Noguchi, S.; Nishimura, S.; Miura, K.; Ohtsuka, E.; Kim, S. H. Science, 1988, 239, 888-93. Tong, L.; Milburn, M. V.; de Vos, A. M.; Kim, S. H. Science, 1989, 245, 244. Kim, S. H.; Kang, C. H.; Kim, R.; Cho, J. M.; Lee, Y. B.; Lee, T. K. Protein. Eng., 1989, 2, 571-5. Somoza, J. R.; Jiang, F.; Tong, L.; Kang, C. H.; Cho, J. M.; Kim, S. H. J. Mol. Biol., 1993, 234, 390404. Beck von Bodman, S.; Schuler, M. A.; Jollie, D. R.; Sligar, S. G. Proc. Natl. Acad. Sci. USA, 1986, 83, 9443-7. Springer, B. A.; Sligar, S. G. Proc. Natl. Acad. Sci. USA, 1987, 84, 8961-5.
338 Frontiers in Drug Design & Discovery, 2005, Vol. 1 [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112]
Stewart and Burgin
Phillips, G. N., Jr.; Arduini, R. M.; Springer, B. A.; Sligar, S. G. Proteins, 1990, 7, 358-65. Fitch, W. M. Science, 1976, 194, 1173-4. Fiers, W.; Contreras, R.; Duerinck, F.; Haegeman, G.; Iserentant, D.; Merregaert, J.; Min Jou, W.; Molemans, F.; Raeymaekers, A.; Van den Berghe, A.; Volckaert, G.; Ysebaert, M. Nature, 1976, 260, 500-7. Gustafsson, C.; Govindarajan, S.; Minshull, J. Trends. Biotechnol., 2004, 22, 346-53. Withers-Martinez, C.; Carpenter, E. P.; Hackett, F.; Ely, B.; Sajid, M.; Grainger, M.; Blackman, M. J. Protein Eng., 1999, 12, 1113-20. Kane, J. F.; Violand, B. N.; Curran, D. F.; Staten, N. R.; Duffin, K. L.; Bogosian, G. Nucleic Acids Res., 1992, 20, 6707-12. McNulty, D. E.; Claffee, B. A.; Huddleston, M. J.; Kane, J. F.; Protein. Expr. Purif, 2003, 27, 365-74. Haebel, P. W.; Gutmann, S.; Ban, N. Curr. Opin. Struct. Biol., 2004, 14, 58-65. Withey, J. H.; Friedman, D. I. Annu. Rev. Microbiol., 2003, 57, 101-23. Epub 2003 May 01. Hayes, C. S.; Bose, B.; Sauer, R. T. Proc. Natl. Acad. Sci. USA, 2002, 99, 3440-5. Epub 2002 Mar 12. Karzai, A. W.; Roche, E. D.; Sauer, R. T. Nat. Struct. Biol., 2000, 7, 449-55. Derewenda, Z. S. Methods, 2004, 34, 354-63. Meiler, J.; Baker, D. Proc. Natl. Acad. Sci. USA, 2003, 100, 12105-10. Epub 2003 Oct 03. Vucetic, S.; Obradovic, Z.; Vacic, V.; Radivojac, P.; Peng, K.; Iakoucheva, L. M.; Cortese, M. S.; Lawson, J. D.; Brown, C. J.; Sikes, J. G.; Newton, C. D.; Dunker, A. K. Bioinformatics, 2005, 21, 13740. Epub 2004 Aug 13. Longenecker, K. L.; Garrard, S. M.; Sheffield, P. J.; Derewenda, Z. S. Acta Crystallogr. D. Biol. Crystallogr., 2001, 57, 679-88. Epub 2001 Apr 24. Mateja, A.; Devedjiev, Y.; Krowarsch, D.; Longenecker, K.; Dauter, Z.; Otlewski, J.; Derewenda, Z. S. Acta Crystallogr. D. Biol. Crystallogr., 2002, 58, 1983-91. Epub 2002 Nov 23. Derewenda, Z. S.; Structure. (Camb), 2004, 12, 529-35. Charron, C.; Kern, D.; Giege, R. Acta Crystallogr. D. Biol. Crystallogr., 2002, 58, 1729-33. Epub 2002 Sep 26. Cipollaro, M.; Galderisi, U.; Di Bernardo, G. J. Cell. Physiol., 2005, 202, 315-22. Thornton, J. W.; Need, E.; Crews, D. Science, 2003, 301, 1714-7. Paabo, S.; Poinar, H.; Serre, D.; Jaenicke-DespRes., V.; Hebler, J.; Rohland, N.; Kuch, M.; Krause, J.; Vigilant, L.; Hofreiter, M. Annu. Rev. Genet., 2004, 30, 30. Kodumal, S. J.; Patel, K. G.; Reid, R.; Menzella, H. G.; Welch, M.; Santi, D. V. Proc. Natl. Acad. Sci. USA, 2004, 101, 15573-8. Epub 2004 Oct 20. Martin, V. J.; Pitera, D. J.; Withers, S. T.; Newman, J. D.; Keasling, J. D. Nat. Biotechnol., 2003, 21, 796-802. Epub 2003 Jun 01. Elowitz, M. B.; Leibler, S. Nature, 2000, 403, 335-8. Gardner, T. S.; Cantor, C. R.; Collins, J. J. Nature, 2000, 403, 339-42. Kobayashi, H.; Kaern, M.; Araki, M.; Chung, K.; Gardner, T. S.; Cantor, C. R.; Collins, J. J. Proc. Natl. Acad. Sci. USA, 2004, 101, 8414-9. Epub 2004 May 24. Taga, M. E.; Bassler, B. L. Proc. Natl. Acad. Sci. USA, 2003, 100, 14549-54. Epub 2003 Aug 29. Cello, J.; Paul, A. V.; Wimmer, E. Science, 2002, 297, 1016-8. Epub 2002 Jul 11. Smith, H. O.; Hutchison, C. A., 3rd; Pfannkoch, C.; Venter, J. C. Proc. Natl. Acad. Sci. USA, 2003, 100, 15440-5. Epub 2003 Dec 02. Hutchison, C. A.; Peterson, S. N.; Gill, S. R.; Cline, R. T.; White, O.; Fraser, C. M.; Smith, H. O.; Venter, J. C. Science, 1999, 286, 2165-9. Prusiner, S. B. Proc. Natl. Acad. Sci. USA, 1998, 95, 13363-83. Legname, G.; Baskakov, I. V.; Nguyen, H. O.; Riesner, D.; Cohen, F. E.; DeArmond, S. J.; Prusiner, S. B. Science, 2004, 305, 673-6. Babu, M. M.; Luscombe, N. M.; Aravind, L.; Gerstein, M.; Teichmann, S. A. Curr. Opin. Struct. Biol., 2004, 14, 283-91. Martin, S. L.; Vrhovski, B.; Weiss, A. S. Gene, 1995, 154, 159-66. Moore, G. L.; Maranas, C. D. Nucleic Acids Res., 2002, 30, 2407-16. Zhang, Z.; Gildersleeve, J.; Yang, Y. Y.; Xu, R.; Loo, J. A.; Uryu, S.; Wong, C. H.; Schultz, P. G. Science, 2004, 303, 371-3. Wang, L.; Schultz, P. G. Angew Chem. Int. Ed. Engl., 2005, 44, 34-66. Mehl, R. A.; Anderson, J. C.; Santoro, S. W.; Wang, L.; Martin, A. B.; King, D. S.; Horn, D. M.; Schultz, P. G. J. Am. Chem. Soc., 2003, 125, 935-9. Hohsaka, T.; Muranaka, N.; Komiyama, C.; Matsui, K.; Takaura, S.; Abe, R.; Murakami, H.; Sisido, M. FEBS Lett., 2004, 560, 173-7.
Whole Gene Synthesis [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 339
Anderson, J. C.; Wu, N.; Santoro, S. W.; Lakshman, V.; King, D. S.; Schultz, P. G. Proc. Natl. Acad. Sci. USA, 2004, 101, 7566-71. Epub 2004 May 11. Hirao, I.; Harada, Y.; Kimoto, M.; Mitsui, T.; Fujiwara, T.; Yokoyama, S. J. Am. Chem. Soc., 2004, 126, 13298-305. Forster, A. C.; Cornish, V. W.; Blacklow, S. C. Anal. Biochem., 2004, 333, 358-64. Lipovsek, D.; Pluckthun, A. J. Immunol. Methods, 2004, 290, 51-67. Tian, J.; Gong, H.; Sheng, N.; Zhou, X.; Gulari, E.; Gao, X.; Church, G. Nature, 2004, 432, 1050-4. Zhou, X.; Cai, S.; Hong, A.; You, Q.; Yu, P.; Sheng, N.; Srivannavit, O.; Muranjan, S.; Rouillard, J. M.; Xia, Y.; Zhang, X.; Xiang, Q.; Ganesh, R.; Zhu, Q.; Matejko, A.; Gulari, E.; Gao, X. Nucleic Acids Res., 2004, 32, 5409-17. Print 2004. Zolotukhin, S.; Potter, M.; Hauswirth, W. W.; Guy, J.; Muzyczka, N. J. Virol., 1996, 70, 4646-54. Slimko, E. M.; Lester, H. A. J. Neurosci. Methods, 2003, 124, 75-81. Kalman, M.; Cserpan, I.; Bajszar, G.; Dobi, A.; Horvath, E.; Pazman, C.; Simoncsits, A. Nucleic Acids Res., 1990, 18, 6075-81. Neves, F. O.; Ho, P. L.; Raw, I.; Pereira, C. A.; Moreira, C.; Nascimento, A. L. Protein Expr. Purif. , 2004, 35, 353-9. Richard, C.; Drider, D.; Elmorjani, K.; Marion, D.; Prevost, H. J. Bacteriol., 2004, 186, 4276-84. Tamura, T.; Holbrook, S. R.; Kim, S. H. Biotechniques, 1991, 10, 782-4. Hale, R. S.; Thompson, G. Protein Expr. Purif., 1998, 12, 185-8. Hoover, D. M.; Lubkowski, J. Nucleic Acids Res., 2002, 30, e43. Stewart, L.; Burgin, A. B. Whole gene synthesis: A gene-o-matic future. In Frontiers in Drug Design and Discovery; Eds. Atta-ur-Rahman; Springer, B. A.; Caldwell, G. W.; Bentham Science Publishers: San Francisco, 2005; Vol. 1, pp. these pages. Nakamura, Y.; Gojobori, T.; Ikemura, T. Nucleic Acids Res., 2000, 28, 292. Henaut, A.; Danchin, A. Analysis and predictions from Escherichia coli sequences. In Escherichia coli and Salmonella typhimurium cellular and molecular biology; ed. al., N. F. C. e.; ASM Press: Washington D.C., 1996; Vol. 2, pp. 2047-2066. Karlin, S.; Mrazek, J.; Campbell, A. M. Mol. Microbiol., 1998, 29, 1341-55. Karlin, S.; Theriot, J.; Mrazek, J. Proc. Natl. Acad. Sci. USA, 2004, 101, 6182-7. Epub 2004 Apr 06. Karlin, S.; Barnett, M. J.; Campbell, A. M.; Fisher, R. F.; Mrazek, J. Proc. Natl. Acad. Sci. USA, 2003, 100, 7313-8. Epub 2003 May 29. Karlin, S.; Mrazek, J.; Gentles, A. J. Curr. Opin. Struct. Biol., 2003, 13, 344-52. Sharp, P. M.; Li, W. H. Nucleic Acids Res., 1987, 15, 1281-95. Katz, L.; Burge, C. B. Genome Res., 2003, 13, 2042-51. Rouillard, J. M.; Lee, W.; Truan, G.; Gao, X.; Zhou, X.; Gulari, E. Nucleic Acids Res., 2004, 32, W176-80. Powell, J. R.; Moriyama, E. N. Proc. Natl. Acad. Sci. USA, 1997, 94, 7784-90. Moriyama, E. N.; Powell, J. R. Nucleic Acids Res., 1998, 26, 3188-93. Baculovirus expression vectors: A laboratory manual. written by O'reilly, D. R.; Miller, L. K.; Lucknow, V. A. W. H. Freeman, New York, 1992. Vaughn, J. L.; Goodwin, R. H.; Tompkins, G. J.; McCawley, P. In Vitro, 1977, 13, 213-7. Wickham, T. J.; Davis, T.; Granados, R. R.; Shuler, M. L.; Wood, H. A. Biotechnol. Prog., 1992, 8, 391-6. Kim, D. M.; Kigawa, T.; Choi, C. Y.; Yokoyama, S. Eur. J. Biochem., 1996, 239, 881-6. Hoffmann, M.; Nemetz, C.; Madin, K.; Buchberger, B. Biotechnol. Annu. Rev., 2004, 10, 1-30. Chen, H.; Bjerknes, M.; Kumar, R.; Jay, E. Nucleic Acids Res., 1994, 22, 4953-7. Shine, J.; Dalgarno, L. Proc. Natl. Acad. Sci. USA, 1974, 71, 1342-6. Kozak, M. Nucleic Acids Res., 1987, 15, 8125-48. Guet, C. C.; Elowitz, M. B.; Hsing, W.; Leibler, S. Science, 2002, 296, 1466-70. Seligmann, H.; Pollock, D. D. DNA Cell Biol., 2004, 23, 701-705. Hofacker, I. L. Nucleic Acids Res., 2003, 31, 3429-31. Voges, D.; Watzele, M.; Nemetz, C.; Wizemann, S.; Buchberger, B. Biochem. Biophys. Res. Commun., 2004, 318, 601-14. Somogyi, P.; Jenner, A. J.; Brierley, I.; Inglis, S. C. Mol. Cell. Biol., 1993, 13, 6931-40. Washietl, S.; Hofacker, I. L. J. Mol. Biol., 2004, 342, 19-30. Cho, K. S.; Elizondo, L. I.; Boerkoel, C. F. Curr. Opin. Genet. Dev., 2004, 14, 308-15. Bird, A. P.; Wolffe, A. P. Cell, 1999, 99, 451-4. Reisenauer, A.; Kahng, L. S.; McCollum, S.; Shapiro, L. J. Bacteriol., 1999, 181, 5135-9. ThanaraJ., T. A.; Argos, P. Protein Sci., 1996, 5, 1973-83.
340 Frontiers in Drug Design & Discovery, 2005, Vol. 1 [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] [195] [196] [197] [198] [199] [200] [201] [202]
Stewart and Burgin
ThanaraJ., T. A.; Argos, P. Protein Sci., 1996, 5, 1594-612. Wu, X.; Jornvall, H.; Berndt, K. D.; Oppermann, U. Biochem. Biophys. Res. Commun., 2004, 313, 8996. Brantl, S. Trends. Microbiol., 2004, 12, 473-5. Nudler, E.; Mironov, A. S. Trends. Biochem. Sci., 2004, 29, 11-7. Gao, X.; Yo, P.; Keith, A.; Ragan, T. J.; Harris, T. K. Nucleic Acids Res., 2003, 31, e143. SantaLucia, J. Jr. Proc. Natl. Acad. Sci. USA, 1998, 95, 1460-5. Mullis, K. B.; Faloona, F. A. Methods Enzymol., 1987, 155, 335-50. Saiki, R. K.; Scharf, S.; Faloona, F.; Mullis, K. B.; Horn, G. T.; Erlich, H. A.; Arnheim, N. Science, 1985, 230, 1350-4. Saiki, R. K.; Gelfand, D. H.; Stoffel, S.; Scharf, S. J.; Higuchi, R.; Horn, G. T.; Mullis, K. B.; Erlich, H. A. Science, 1988, 239, 487-91. Letsinger, R. L.; Mahadevan, V. J. Am. Chem. Soc., 1965, 87, 3526-7. Letsinger, R. L.; Ogillvie, K. K.; Miller, P. S. J. Am. Chem. Soc., 1969, 91, 3360-3365. Letsinger, R. L.; Lunsford, W. B. J. Am. Chem. Soc., 1976, 98, 3655-3661. Caruthers, M. H.; Barone, A. D.; Beaucage, S. L.; Dodds, D. R.; Fisher, E. F.; McBride, L. J.; Matteucci, M.; Stabinsky, Z.; Tang, J. Y. Methods Enzymol., 1987, 154, 287-313. Matteucci, M. D.; Caruthers, M. H. J. Am. Chem. Soc., 1981, 103, 3185-3191. Temsamani, J.; Kubert, M.; Agrawal, S. Nucleic Acids Res., 1995, 23, 1841-4. Lin, Y.; Cheng, G.; Wang, X.; Clark, T. G. Gene, 2002, 288, 85-94. Young, L.; Dong, Q. Nucleic Acids Res., 2004, 32, e59. Hostomsky, Z.; Smrt, J.; Arnold, L.; Tocik, Z.; Paces, V. Nucleic Acids Res., 1987, 15, 4849-56. Beattie, K. L.; Logsdon, N. J.; Anderson, R. S.; Espinosa-Lara, J. M.; Maldonado-Rodriguez, R.; Frost, J. D. 3rd. Biotechnol. Appl. Biochem., 1988, 10, 510-21. Beattie, K. L.; Fowler, R. F. Nature, 1991, 352, 548-9. Gao, X.; LeProust, E.; Zhang, H.; Srivannavit, O.; Gulari, E.; Yu, P.; Nishiguchi, C.; Xiang, Q.; Zhou, X. Nucleic Acids Res., 2001, 29, 4744-50. Mandecki, W.; Bolling, T. J. Gene, 1988, 68, 101-7. Mandecki, W.; Hayden, M. A.; Shallcross, M. A.; Stotland, E. Gene, 1990, 94, 103-7. Hayden, M. A.; Mandecki, W. DNA, 1988, 7, 571-7. Mandecki, W. Proc. Natl. Acad. Sci. USA, 1986, 83, 7177-81. Sugisaki, H.; Kanazawa, S. Gene, 1981, 16, 73-8. Klenow, H.; Henningsen, I. Proc. Natl. Acad. Sci. USA, 1970, 65, 168-75. Jacobsen, H.; Klenow, H.; Overgaard-Hansen, K. Eur. J. Biochem., 1974, 45, 623-7. Dillon, P. J.; Rosen, C. A. Biotechniques, 1990, 9, 298, 300. Prodromou, C.; Pearl, L. H. Protein Eng., 1992, 5, 827-9. Sandhu, G. S.; Aleff, R. A.; Kline, B. C. Biotechniques, 1992, 12, 14-6. Stemmer, W. P.; Crameri, A.; Ha, K. D.; Brennan, T. M.; Heyneker, H. L. Gene, 1995, 164, 49-53. Nelson, M.; McClelland, M. Methods Enzymol., 1992, 216, 279-303. Zheng, L.; Baumann, U.; Reymond, J. L. Nucleic Acids Res., 2004, 32, e115. Krebs, M. P.; Spudich, E. N.; Khorana, H. G.; Spudich, J. L. Proc. Natl. Acad. Sci. USA, 1993, 90, 3486-90. Hadden, J. M.; Convery, M. A.; Declais, A. C.; Lilley, D. M.; Phillips, S. E. Nat. Struct. Biol., 2001, 8, 62-7. Parkinson, M. J.; Lilley, D. M. J. Mol. Biol., 1997, 270, 169-78. Picksley, S. M.; Parsons, C. A.; Kemper, B.; West, S. C. J. Mol. Biol., 1990, 212, 723-35. Howard, J. T.; Ward, J.; Watson, J. N.; Roux, K. H. Biotechniques, 1999, 27, 18-9. Beard, P.; Morrow, J. F.; Berg, P. J. Virol., 1973, 12, 1303-13. Vogt, V. M. Eur. J. Biochem., 1973, 33, 192-200. Raaijmakers, H.; Vix, O.; Toro, I.; Golz, S.; Kemper, B.; Suck, D. EMBO J., 1999, 18, 1447-58. Oleykowski, C. A.; Bronson Mullins, C. R.; Godwin, A. K.; Yeung, A. T. Nucleic Acids Res., 1998, 26, 4597-602. Kulinski, J.; Besack, D.; Oleykowski, C. A.; Godwin, A. K.; Yeung, A. T. Biotechniques, 2000, 29, 44-6, 48. Colbert, T.; Till, B. J.; Tompa, R.; Reynolds, S.; Steine, M. N.; Yeung, A. T.; McCallum, C. M.; Comai, L.; Henikoff, S. Plant Physiol., 2001, 126, 480-4. Smith, J.; Modrich, P. Proc. Natl. Acad. Sci. USA, 1997, 94, 6847-50.
Whole Gene Synthesis [203] [204] [205] [206] [207] [208] [209] [210] [211] [212] [213] [214] [215] [216] [217] [218] [219] [220]
Frontiers in Drug Design & Discovery, 2005, Vol. 1 341
Cleary, M. A.; Kilian, K.; Wang, Y.; Bradshaw, J.; Cavet, G.; Ge, W.; Kulkarni, A.; Paddison, P. J.; Chang, K.; Seth, N.; LeProust, E.; Coffey, E. M.; Burchard, J.; McCombie, W. R.; Linsley, P.; Hannon, G. J. Nat. Methods, 2004, 1, 241-248. Richmond, K. E.; Li, M. H.; Rodesch, M. J.; Patel, M.; Lowe, A. M.; Kim, C.; Chu, L. L.; Venkataramaian, N.; Flickinger, S. F.; Kaysen, J.; Belshaw, P. J.; Sussman, M. R.; Cerrina, F. Nucleic Acids Res., 2004, 32, 5011-8. Print 2004. Sierzchala, A. B.; Dellinger, D. J.; Betley, J. R.; Wyrzykiewicz, T. K.; Yamada, C. M.; Caruthers, M. H. J. Am. Chem. Soc., 2003, 125, 13427-41. Gryaznov, S. M.; Letsinger, R. L. Nucleic Acids Res., 1993, 21, 1403-8. Gryaznov, S. M.; Schultz, R.; Chaturvedi, S. K.; Letsinger, R. L. Nucleic Acids Res., 1994, 22, 2366-9. Burgin, A. B. J.; Stewart, L. J. Use of phosphorothiolate polynucleotides in ligating nucleic acids. US Patent Application # 20030165841, 2001. Ramstrom, O.; Lehn, J. M. Nat. Rev. Drug. Discov., 2002, 1, 26-36. Hochgurtel, M.; Kroth, H.; Piecha, D.; Hofmann, M. W.; Nicolau, C.; Krause, S.; Schaaf, O.; Sonnenmoser, G.; Eliseev, A. V. Proc. Natl. Acad. Sci. USA, 2002, 99, 3382-7. Epub 2002 Mar 12. Shendure, J.; Mitra, R. D.; Varma, C.; Church, G. M. Nat. Rev. Genet., 2004, 5, 335-44. Khosla, C.; Keasling, J. D. Nat. Rev. Drug Discov., 2003, 2, 1019-25. Garcia-Ojalvo, J.; Elowitz, M. B.; Strogatz, S. H. Proc. Natl. Acad. Sci. USA, 2004, 101, 10955-60. Epub 2004 Jul 15. Endy, D.; Brent, R. Nature, 2001, 409, 391-5. Endy, D.; You, L.; Yin, J.; Molineux, I. J. Proc. Natl. Acad. Sci. USA, 2000, 97, 5375-80. McAdams, H. H.; Shapiro, L. Science, 1995, 269, 650-6. Tomita, M.; Hashimoto, K.; Takahashi, K.; Shimizu, T. S.; Matsuzaki, Y.; Miyoshi, F.; Saito, K.; Tanida, S.; Yugi, K.; Venter, J. C.; Hutchison, C. A. 3rd. Bioinformatics, 1999, 15, 72-84. Ball, P. Nature, 2004, 431, 624-6. Morton, O. Life, Reinvented. In Wired, A. Conde Nast Publication: San Francisco, USA, 2005; Vol. January 2005, pp. 169-175. Benner, S. A. Acc. Chem. Res., 2004, 37, 784-97.
Contributors
Frontiers in Drug Design & Discovery, 2005, Vol. 1 343
Contributors Aaron Krikava
Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh and McKean Roads, P.O. Box 776, Springhouse, PA 19477-0776, USA
Agustina Gómez-Hens
Department of Analytical Chemistry, Faculty of Sciences, “Marie Curie” Building Annex, Campus of Rabanales, University of Córdoba, 14071-Córdoba, Spain
Alex B. Burgin
deCODE biostructures, Inc., 7869 N.E. Day Rd. West, Bainbridge Is., WA 98110, USA
Beata Starosciak
Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh and McKean Roads, P.O. Box 776, Springhouse, PA 19477-0776, USA
Becky Hastings
Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh and McKean Roads, P.O. Box 776, Springhouse, PA 19477-0776, USA
Cheng-Pang (Matt) Hsu
Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh & McKean Roads, P.O. Box 776, Spring House, PA 19477, USA
David M. Ritchie
Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh & McKean Roads, P.O. Box 776, Spring House, PA 19477, USA
Frances Separovic
School of Chemistry, The University of Melbourne, Parkville Melbourne VIC 3010, Australia
Francisco Torrens
Institut Universitari de Ciència Molecular, Universitat de València, Dr. Moliner 50, E-46100 Burjassot (València), Spain
Gary W. Caldwell
Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh & McKean Roads, P.O. Box 776, Springhouse, PA 19477-0776, USA
Gregory C. Leo
Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh and McKean Roads, P.O. Box 776, Springhouse, PA 19477-0776, USA
344 Frontiers in Drug Design & Discovery, 2005, Vol. 1
Contributors
Guido J.R. Zaman
N.V. Organon, Molecular Pharmacology Unit, P.O. Box 20, 5340 BH Oss, The Netherlands
Jaclyn Scowcroft
Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh and McKean Roads, P.O. Box 776, Springhouse, PA 19477-0776, USA
John A. Masucci
Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh & McKean Roads, P.O. Box 776, Spring House, PA 19477, USA
John Spurlino
Department of Structural Biology, Johnson and Johnson Pharmaceutical Research and Development, Exton, Pennsylvania 19341
Jonathan D. Wren
The University of Oklahoma, Department of Botany and Microbiology, Advanced Center for Genome Technology, Norman, OK, 73019, USA
Karlen Gazarian
Department of Molecular Biology and Biotechnology of the Institute of Biomedical Research, Mexican National University, Mexico D.F, Mexico
Kristin Snyder
Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh and McKean Roads, P.O. Box 776, Springhouse, PA 19477-0776, USA
Lance Stewart
deCODE biostructures, Inc., 7869 N.E. Day Rd. West, Bainbridge Is., WA 98110, USA
Letizia Brandi
Vicuron Pharmaceuticals, Via R. Lepetit 34, 21040 Gerenzano, Italy
Mª Paz Aguilar-Caballos
Department of Analytical Chemistry, Faculty of Sciences, “Marie Curie” Building Annex, Campus of Rabanales, University of Córdoba, 14071-Córdoba, Spain
Makoto Tsuruoka
School of Bionics & School of Engineering, Tokyo University of Technology, Katakura 1404-1, Hachiouji, Tokyo, 192-0981 Japan
Margherita Sosio
Vicuron Pharmaceuticals, Via R. Lepetit 34, 21040 Gerenzano, Italy
Contributors
Frontiers in Drug Design & Discovery, 2005, Vol. 1 345
Patrick Englebienne
Biomedical Consultant, Englebienne & Associates, Strijpstraat 21, B-9750 Zingem, Belgium, and Biocybernetics Unit, Laboratory of Experimental Medicine, Free University of Brussels, Place Van Gehuchten 4, B-1020 Brussels, Belgium
Richard Alexander
Department of Structural Biology, Johnson and Johnson Pharmaceutical Research and Development, Exton, Pennsylvania 19341
Richard M. Eglen
DiscoveRx Corp., 42501 Albrae St., Fremont, CA 94358, USA
Sofia Stinchi
Vicuron Pharmaceuticals, Via R. Lepetit 34, 21040 Gerenzano, Italy
Stefania Serina
Vicuron Pharmaceuticals, Via R. Lepetit 34, 21040 Gerenzano, Italy
Stefano Donadio
Vicuron Pharmaceuticals, Via R. Lepetit 34, 21040 Gerenzano, Italy
Sylvia Urban
School of Applied Sciences (Applied Chemistry), RMIT University, GPO Box 2476V Melbourne VIC 3001, Australia
William Hageman
Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh and McKean Roads, P.O. Box 776, Springhouse, PA 19477-0776, USA
Zhengyin Yan
Johnson & Johnson Pharmaceutical Research and Development, LLC, Welsh & McKean Roads, P.O. Box 776, Spring House, PA 19477, USA
Subject Index
Frontiers in Drug Design & Discovery, 2005, Vol. 1, No. 1
347
SUBJECT INDEX TO VOLUME 1 Absorption systems 198 Acute pancreatitis 276 and thalidomide 276 Acyl-homoserine lactone 307 ADME assays 197 in drug discovery process 197 Aequorin 174 use in bioluminescence (BL) 174 Agonist binding 98 coupling of GPCR to G protein 98 AIDS 115 treatment of 115 Alzheimer’s β-amyloid precursor protein mRNA 23 targeting of 23 Amino-acylated tRNAs 298 Aminoglycoside antibiotics 18 affinities of 19 binding with RNA component 18 side effect of 19 toxicity due to 19 α-Amylase 38 binding activity of 38 Andscape library system 32 Angina 267 use of viagra (sildenafil citrate) 267 Angiogenesis inhibitors 48 Antibacterial miniaturized HTS 8 Antibiotic classes 3 and discovery of molecules 4 approaches used to identify novel 4 chemical classes of 4 effects on bacterial translation 5 effects on cell wall biogenesis 5 mechanism of action of 4 modification of 4
Antimalarial agent 115 from Cinchona officinalis 115 Antimicrobial activity 18 against gram negative bacteria 18 against Mycobacterium tuberculosis 18 Antisense drug 18 use in treatment of cytomegalovirus retinal infection 18 Anti-tumor activity 48 peptides with 48 Antiviral activity 254 of fullerenes 254 APL (dehydrodidemin) 117 in clinical development 117 Apoptosis 76 Aptamers 35 for avian mieloblastosis virus 35 for molony murine leukemia virus 35 Aspergillus penicillium 119 Assay type/format 73 cell-based assay system as 73 cell-free assay system as 73 enzyme immobilization in 73 plate formats as 73 Atropine 115 from Atropa belladonna 115 Autographa californica nuecleopolyhedrovirus (AcNPV) 301 codon bias in 301 Autoimmune diseases 53 antiphospholipid syndrome (APS) 53 diabetes mellitus 53 multiple sclerosis 53 rheumatoid arthritis 53 thormbocytopenia purpura 53
348 Frontiers in Drug Design & Discovery, 2005, Vol. 1, No. 1
Bacillus subtilis 5 essential genes of 5 genomic databases of 5 Bacterial sepsis 44 Basic fibroblast growth factor (bFGF) 35 in angiogenesis 35 Beta secretase 293 cyclic inhibitors of 293 Binding assays 7 effect on target receptor 7 inhibitor affinity in 7 Binding modes 290 of B-secretase inhibitor 290 of trifluoroacetyl-dipeptideanilides 291 Bioactive natural product 114 synthesis of 114 semi-synthesis of 114 Biochemical probes 114 Bioinformatic analysis 6 design of effective counterscreens for 6 Biological screening 114 of extract 114 Biological switches 306 synthetic gene network as 306 Biological target 70 enzymatic/binding assays for 72 identification of 70 protein interaction/ion flux assays for 72 use of HTS methodology 72 Bioluminescence (BL) 173 determination of enzymatic inhibitors using 178 usefulness of 173 use of aequorin in 174 use of green fluorescent protein (GFP) in 174 Bioluminescence resonance energy transfer (BRET) 169
Subject Index
and insulin receptor 175 role in intracellular singnalling 174 role in protein-protein interaction 174 Biopanning 31 Bioprospecting 116 of marine natural products 116 Biosensors 179 in early drug discovery process 184 use in HTS 179 use of plasmon waveguide resonance (PWR) 181 use of surface plasmon resonance (SPR) technology 179 Boolean queries 270 Bryostatins 116 as protein kinase C (PKC) 116 B-Secretase inhibitor 290 binding modes of 290 Caco-2 database 200 characteristics of 200 Caco-2 permeabilities 197,199 application of IDEA pkEXPRESSTM 199 distribution of 201 in-silico predictions of 198 prediction of 197 Caenorhabditis elegans 300 codon bias in 300 Calanolide A 115 from Calophyllum lanigerum 115 Cancer 30 Capillary electrochromatography (CEC) 152 and micro-coil NMR probes 153 Capillary electrophoresis (CE) 115,268 for drug discovery 168 Capillary separations 152 Cardiac hypertrophy 277
Subject Index
Frontiers in Drug Design & Discovery, 2005, Vol. 1, No. 1
CD polymers 233 application of 233 cDNA libraries 306 Cell adhesion 49 integrin ligands as 49 selectin ligands as 49 Cell-based assays 7 inhibition of selected targets 7 Chemical libraries 287 screening of 287 Chemiluminescence (CL) 173 determination of enzymatic inhibitors using 178 usefulness of 173 Chemometric analysis 215 Chip-eluted oligonucleotides 329 gene assembly from 329 Chromadacoyrrea 223 Chromatography 168 Chronic hepatitis C 276 and thalidomide 276 Cinnamoyal catalpol glycoside esters 140 from Jamesbrittenia fodina 140 Cladosporium spp. 119 Cloning 31 Closed-discovery model 272 Codon adaptation index (CAI) 310 Codon bias 300 and gene regulation 300 in Autographa californica nuecleopolyhedrovirus (AcNPV) 301 in Caenorhabditis elegans 300 in Drosophila melanogaster 300 in Escherishia coli 300 in gene expression 300 Codon context 301 Colloidal metals 81 surface plasmon resonance spectra 81
349
Combinatorial chemistry 113 Combinatorial phage libraries 42 peptides selected from 42 properties of protein binding sites in 42 Combinatorial phage-display proteinpeptide libraries 29 Combinatorial processes 59 in laboratory tube 59 Combinatorial technologies 167 Compound screening libraries 269 in high-throughput technologies 269 Concanavalin 45 Coupled transcription/translation (T/T) assay 8 Crohn’s disease 278 and curcumin 278 Crystallographic techniques 288 and amide hydrogen/deuterium exchange 289 automated fitting in 290 Bayesian statistics for 290 CCP4 project to 290 ELVES to 290 improvement in 288 use of electron map generation 289 use of entropy refinement methods 290 use of multiwavelength anomalous dispersion methods (MAD) 290 use of robotics 289 use of synchrotron radiation 289 Curcumin 278 and Crohn’s disease 278 and spinal cord related problems 278 and retinal diseases 278 health benefits of 278 Cussonia barteria 140
350 Frontiers in Drug Design & Discovery, 2005, Vol. 1, No. 1
Cyclic inhibitors 293 of beta secretase 293 Cyclodextrins 233 racemic-mixture separations by 233 Cyclopyranoses 231 Cystic fibrosis transmembrane conductance regulator 22 Cytokines 40 and growth factors 40 Data mining methods 268 De novo synthesis 309 of genes 309 DeltaG free energies 313 Diagnostic peptides 51 in mimicking immunodominant epitopes of pathogens 51 selection of 51 Diastereomers 233 Diode array detection (DAD) 122 DNA hybridization rate 87 variety of 87 DNA polymerase 35,38 binding of mRNA loops to 35 DNA sequencing 4 and sequenced bacterial genomes 4 binding site on RNA 5 DNA/RNA hybridization 73 evaluation of 73 Drosophila melanogaster 300 codon bias in 300 Drugs 212 amiodarone 212 rosiglitazone 212 Drug design 29,177 protein kinases in 177 using HTS 29 Drug discovery 126,167, 168,212,207 automated data analysis tools for 127
Subject Index
capillary electrophoresis (CE) for 168 flow NMR in 126 high throughput screening in 167 kinetic processes in 167 liquid chromatography(LC) for 168 metabonomic urinalysis in 212 use of IDEA pkEXPRESSTM in 207 Drug discovery process 184,197 application of biosensors 184 using ADME assays 197 Drug discovery program 7 Drug enantiomers 168 separation of 168 Drug metabolism 167 drug-drug interactions in 167 toxic metabolites in 167 Drug-drug interaction 211 Drug-receptor interactions 180 kinetic aspects of 180 with G-protein coupled receptor (GPCRs) 180 Duchenne muscular dystrophy (DMD) 22 treatment of 22 Dynamic light scattering (DLS) 239 Dystrophin allele 22 expression of 22 Ecteinascidia turbinate 117 Electromagnetic radiation 169 use in photoluminescence processes 169 Electrophoresis 168 Electrospray ionization (ESI) 77,199 in nanoflow electrospray formation on microchips 77 liquid chromatography mass spectrometry (LC/MS) methods using 199
Subject Index
Frontiers in Drug Design & Discovery, 2005, Vol. 1, No. 1
Elysia rufescens 117 kahalalide F from 117 Enantioselective enzymes 179 development of 179 End-point measurements 168 of enzymatically produced products 168 Engineered proteins 303 expression optimization of 303 for X-ray crystallography 304 Enzymes 18 Enzymatic assays 75 catalytic activity of 75 Enzymatic inhibitors 178 application of bioluminescence (BL) 178 application of chemiluminescence (CL) 178 Enzyme-substrate engineering 46 substrate-phage in 46 Erythrina vogelii 140 antifungal constituents of 140 Erythropoietin 42 Escherishia coli 300 codon bias in 300 Escherichia coli rRNA 19 mutation of 19 Escherishia coli cytochrome b562 39 ESI-MS 77 Eukaryotic genes 30 Expressed sequence tags (EST) 306 Expression optimization 303 of engineered proteins 303 FGF receptor 42 dimerization of 42 Flow injection analysis (FIA) 168 Flow systems 176 in HTS 176
351
Fluorescein-labeled ligand 170 fluorescence polarization (FP) of 170 Fluorescence correlation spectroscopy (FCS) 169 application of 173 for miniaturized HTS 172 Fluorescence polarization (FP) 76,169 for screening steroid hormone receptors 171 in drug discovery 170 of fluorescein-labeled ligand 170 principle of 170 Fluorescence polarization analysis (FPA) 88 instrumentation of 88 use for sequences of primers/probes 89 use of amplification of genes of shiga toxins 89 use of detection of amplified products 89 use of reagents/samples 88 Fluorescence resonance energy transfer (FRET) 75,171 applications of 172 monitor conformational changes 75 Fragment based design 288,291 use of crystalLEAD system 291 use of X-ray crystallography 291 using mass spectroscopy 288 using NMR 288 virtual screening by computational methods 288 Fragment self-assembly 331 Fullerenes 238 antiviral activities of 254 immobilization of biomolecules on 250 organic solvent-water partition of 238 Fullerene-SWNT 231 bioactivities of 231 Functional cell-free assays 7
352 Frontiers in Drug Design & Discovery, 2005, Vol. 1, No. 1
Fuzzy set theory (FST) 274 Gal80 repressor 45 Geiger counter 268 Gel permeation technology 8 space requirements in 7 Genes 309 de novo synthesis of 309 Gene assembly 329 from Chip-eluted oligonucleotides 329 Gene composer 310 Gene design software 311 Gene expression 300,313 and methylation 313 codon bias in 300 Gene fusions 9 role of reporter gene 9 role of stress promoter 9 Gene regulation 300 and codon bias 300 Gene reporter assays 75 in discrimination of receptor agonists from antagonists 75 in monitoring of transcriptional activation of specific genes 75 Gene shuffling 42 for generating novel proteins 42 Gene synthesis 297,301 for yeast alanine tRNA 298 for post-genomics era 301 to create synthetic gene network 306 utility of 303 Gene TNF-alpha 282 Gene-O-matic 297 future of 297 Genes code 79 Genetic code 297 redundancy of 300
Subject Index
Genome engineering 307 cellular based 307 viral based 307 Genome-scale libraries 31 generation of 31 Glycopeptide antibiotics 6 in peptidoglycan formation 6 target for 6 Gp130 family cytokines 41 biological activity of 41 GPCR signal transduction 104 monitoring of 104 G-protein coupled receptor (GPCRs) 18,72,97,101,180 cellular expression of 99 class of targets for drug discovery 97 druggable class of proteins 97 interactions of drug-receptor 180 measurement of ligand binding 101 radioligand affinity of 101 radioisotopic labeling of 101 Green fluorescent protein (GFP) 307 Guanine nucleotide binding 102 measurement of 102 radiometric binding of 102 Guest genes 31 expression of 31 Ham sandwich 268 Heliobacter pylori 276 and thalidomide 276 Hepatitis C 52 epitopes of 52 High density microplates 7 and liquid handling robotics 7 High throughput ligand binding assays 101 High throughput screening 11.69,97,167 and chemical diversity 11 flow systems in 176 in drug discovery process 167
Subject Index
Frontiers in Drug Design & Discovery, 2005, Vol. 1, No. 1
kinetic methodology in 175 pharmaceutical target identification using 70 scintillation proximity assay (SPA) in 178 role in biological activity 70 role of kinetics in 167 use of biosensors in 179 use for compound identification 97 High throughput screening (HTS) programmes 114 High-speed counter-current chromatography 120 High-throughput technologies 269 compound screening libraries 269 microarrays 269 proteomics gels 269 sequencing 269 Hit-to-lead development process 70 and drug-like properties 70 to identify lead compound 70 HIV-1 promoter 36 inhibition of 36 HIV-1 viral RNAs 22 development of 22 elongation of 22 targeting of 22 targets sites of 23 transcriptional activation of 22 Human colon carcinoma cell line Caco-2 198 Human diseases 113 Human genome 71 in identification of gene 71 in identification of ion channel 72 in identification of nuclear/ membrane receptor 71 in identification of protein targets 71 sequencing of 18,71 Human immunodeficiency virus 30,70
353
Human immunodeficiency virus type 1 (HIV-1) 52 Human interferon-alpha genes 42 for multigene DNA shuffling 42 Human lipoprotein-associated coagulation inhibitor (LACI-DI) 40 as phage displayed scaffold 40 Human pancreatic secretory trypsin inhibitor (PSTI) 40 Human pharmacology 115 Hypertension 267 use of viagra (sildenafil citrate) 267 Hyphenated spectroscopic techniques 150 future of 150 recent advances in 150 IDEA pkEXPRESSTM 197 assessment of 197 for in silico prediction of Caco-2 in drug discovery 207 permeabilities 199 predictions of 204 IFC-interchangeable flow cell probes 151 Ile-tRNA synthetases (IleRS) 7 targets for mupirocin 7 Immobilized artificial membrane (IAM) chromatography 184 Immobilized liposome chromatography (ILC) 185 in membrane partitioning studies 185 Immunogenic 31 Immunogenic mimicry 55 Immunogenicity 58 Immunoglobulins 35 Immunoglobulin-like scaffolds 37 In silico methods 79 quantitative structure activity relationship models 80
354 Frontiers in Drug Design & Discovery, 2005, Vol. 1, No. 1
pharmacodynamic characteristics of libraries 79 pharmacological characteristics of libraries 79 use in ADMET properties 79 In silico screening processes 4 Inflammatory diseases 44 Information retrieval (IR) 270 In-silico predictions 198 of Caco-2 permeabilities 198 Insulin-like growth factor 1 (IGF-1) binding proteins 46 Ion channels 18 Iron response element IRE 23 and translational silencer 23 Isoprenoids synthesis 6 glyceraldehyde 3-phosphatepyruvate pathway for 6 mevalonate pathway for 6 Jasminium subtriplinerve 140 Kinetic factors 168 Kinetic methodology 175 in high througput screening 175 Klenow DNA polymerase 322 Knowledge discovery (KD) 270 Label-based assays 75 use of tracer molecule 75 Label-based detection 75 use of fluorescent labels 75 use of radioactive isotopes 75 Label-free assays 75 physical characteristic of 75 Label-free detection 77 Lanthanide-sensitized luminescence 171 LC-13C NMR 134 LC-MS 126 use in bioactivity screening 126
Subject Index
LC-NMR 128 data handling of 133 detection limits of 132 HPLC component of 129 limitations of 134 modes of operation 130 NMR component of 129 on-flow 131 other detect use in 133 resolution of 132 sample recovery in 144 sensitivity of 132 stop-flow 131 use in natural product profiling 136 with (HP)LC/CD 138 Lead compounds 10 affinity of inhibitor of 10 antibacterial activity/selectivity of 10 assay for 10 library screening of 10 target 10 Leucine-Zipper motif 41 Liquid chromatography (LC) 168 for drug discovery 168 Liquid chromatography mass spectrometry (LC/MS) methods 199 using electrospray ionization 199 Live cell surface markers 50 panning on 50 Luciferase 173 insulin receptor 175 Lyme disease 52 Lysozyme 234 β-N-acetylglucosammidase activity of 234 molecular lipophilicity pattern of 237 physicochemical properties of 235 secondary structure regions in 236
Subject Index
Frontiers in Drug Design & Discovery, 2005, Vol. 1, No. 1
structure-function relationship of 234 Macrocycle 293 Marby-Darby canine kidney (MDCK) cells 185 Marine organisms 116 defence agents against predators 116 Mass spectrometry (MS) 77,168,178 use for biomolecular interaction studies 77 Mass spectroscopy 288 in fragment based design 288 Matrix-assisted laser-desorption ionization (MALDI) MS 77 Measurement of uncertainty (MOU) 208 Metabonomic urinalysis 211 as toxicity screen 211 in drug discovery 212 viability of 211 Metabotropic-glutamate-receptor -like and calcium sensing receptors 98 Methylation 313 and gene expression 313 Micellar electrokinetic capillary chromatography (MEKC) 184 Microarrays 269 in high-throughput technologies 269 Microchip CE 179 Monoterpene dimmers 137 from Lisianthius seemannii 137 from Cardia linnaei 137 naphthoquinones as 137 Morphine 115 from Papaver sonniferum 115 Mosher’s esher synthesis 139 Mouse knockout technology 79 Multi-cellular parasites 52 Mutagenesis strategy 31
355
Mutual information measure (MIM) 274 Myasthenia gravis 276 and thalidomide 276 Mycoplasma capricolum 299 non-universal genetic code in 299 N-Acetylglucosammidase activity 234 of lysozyme 234βNanotubes 241 organic solvent-water partition of 241 immobilization of biomolecules on single-wall carbon 252 Napthoquinone 115 from Conospermum incurvum 115 Natural products 113 cardiac glycosides as 114 characterisation of 114 origin of 114 Natural product profiling 113 chemical methods for 114 development in 113 hyphenated spectroscopic methods in 113 structure elucidation in 114 structure-activity of 114 Neutrophil elastase inhibitors 45 N-methyl-D-aspartate receptor 44 N-minus-mer 317 NMR flow probe design 130 NMR pulse sequences 133 Non-radiometric assay 97 use for GPCR screening 97 Non-recombinant cell lines 100 Non-universal genetic code 299 in Mycoplasma capricolum 299 in Tetrahymena spp. 299 UGA stop codon as 299 Novel antibacterial agents 3 against pathogenic bacteria 3 antibiotic as 4
356 Frontiers in Drug Design & Discovery, 2005, Vol. 1, No. 1
blocking bacterial growth 4 discovery of 3 effectiveness of 4 efficacy of 4 interaction of 4 mechanism of action of 4 safety in human 4 spectrum of 4 Nuclear magnetic resonance (NMR) 77,179,212,288 in fragment based design 288 usefulness of 179 Nuclear receptors 18 Nucleic acid hybridization 91 Nucleosomes 30 Oligonucleotide building blocks 315 Open reading frames (ORFs) 300 Open-discovery model 272 Optical biosensors 77 study of biomolecular interactions by 77 Organic solvent-water partition of 234 Orthogonal tRNA-synthetase 308 Oscillators 306 synthetic gene network as 306 Overlap extension PCR (OE-PCR) 324 Pandemic infections 30 Parallel artificial membrane permeation assay (PAMPA) 185 Paratope-derivative peptides 36 and parental antibody 36 Pathogen 29 Penicillium 116 Peptides 54 epitope-mimicking potentials 54 from combinatorial libraries 54 Peptide deformylase 293 inhibitors of 293
Subject Index
Peptidemimetics 23 inhibit HIV-1 replication by 23 Peptidoglycan biosynthesis 6 role of MurA 6 Phage display 30 principle of 30 Phage display libraries 32 activity of gpIII 32 activity of gpIII proteins 32 combinatorial oligonucleotide libraries as 33 landscape libraries as 33 linear/constrained libraries as 32 primary (random/secondarygeneration biased libraries) as 33 screening of 355 Phage display technology 31 Phage-display engineering 39 helices in proteins 39 of proteins of signal transduction 40 of loops 39 Phage-displayed ligands 49 internalization via 49 Phage-peptide libraries 50 intra-vascular biopanning of 50 Pharmaceutical targets 71 Pharmacodynamic (PD) properties 197 of drugs 197 Pharmacokinetic 167,197,211 screening of 183 physiologically based 188 Phorbol 115 from Homolanthus nutans 115 Phosphomycin 6 effect on MurA enzymes 6 Photoluminescence processes 169 use of electromagnetic radiation in 169 Photo-programmable microfluidic picoarray technology 329
Subject Index
Frontiers in Drug Design & Discovery, 2005, Vol. 1, No. 1
preparation of 330 purification of 330 Physical kinetic process 168 molecular diffusion as 168 Physiologically based (PB) computational models 198 GASTROPLUSTM 198 iDEATM 198 Plasmid DNA 87 Plasmon waveguide resonance (PWR) 181 application of biosensors 181 Polyacrylamide gel electrophoresis 252 Poly-deoxyribonucleotides 298 Polyetherketone (PEEK) 130 Polyuridilic acid 297 Post-genomics era 301 gene synthesis for 301 Prenylated flavone 137 from Monotes engleri 137 Principal component analysis (PCA) 215 Probe 151 and hardware developments 151 Probe DNAs 90 for stx1 90 Probe hybridization 92 Probe technology 128 Probes hyphenated 153 miniaturisation of 153 Protease inhibitors 40 Proteins 231 Protein classes 17 families of 17 Protein crystallography 287 of inhibitor complexes 288 Protein engineering 288 Protein kinases 177 in drug design 177 Protein NMR analysis 214
357
Protein phosphatase-1 47 Proteomics gels 269 in high-throughput technologies 269 Putative inhibitors 4 Quantitative structure activity relationship (QSAR) models 188 Radiometric techniques 97 Rapid assay development 87 for identification of specific gene sequence 87 Raynaud’s disease 276 Receptorsomes 99 Recombinant stable cell lines 100 Reporter assays 9 independent of growth inhibition 10 measure transcriptional response 10 Reporter gene assays 8,103 use of expression systems 104 Respiratory sincitia virus 52 Retinal diseases 278 and curcumin 278 Reverse phase (RP)-HPLC 77 Rheumatoid arthritis (RA) 44 Rhodopsin-like receptor 98 class 1 98 Riboswitches 20,315 affinities of 21 binding with flavin mononucleotide (FMN) 20 development of novel antimicrobials 22 in regulation of gene expression 20 mechanism of 20 phylogenetic distribution of 22
358 Frontiers in Drug Design & Discovery, 2005, Vol. 1, No. 1
Riboswitch aptamers 20 structure of 20 RNA enzymes 18 RNA molecules 72 gene expression of 72 signal transduction of 72 RNA polymerase 298 RNA polymerase II 45 coactivitors in 45 RNA-protein assembly modulation 73 rRNA 6 target of antibiotics 6 Saccharomicins heptadecaglycoside antibiotics 11 as broad-spectrum compounds 11 Scaffold strategy 36 emergence of 36 principle of 36 to antibodies 37 to hormones 37 to natural proteins 37 Scintillation proximity assay (SPA) 102,178,180 activation of scintillating beads 102 in HTS 178 Screening diverse chemical entities 4 Second messenger assays 104 action of novel compound 104 determination of accumulation of cAMP 104 Secretin-like receptors 98 class 2 98 Sensors 306 synthetic gene network as 306 Sequencing 269 in high-throughput technologies 269
Subject Index
Shiga toxin genes 92 secondary structures of DNA segments of 92 Short interfering RNA (siRNA) 303 Sialadenitis 276 and thalidomide 276 Signal detection 75 physico-chemical principles for 75 Signal transducers 76 Single fluorophore assay design 76 Single-wall carbon nanotubes (SWNT) 231 effect on polymerase chain reaction 232 Small inhibitory RNAs (siRNA) 18 Small molecule drug 17 biochemical assays of 17 efficacy of 18 targeting of RNA 17 use of high-throughput screening (HTS) 17 Smaller probes 151 micro-coils as 151 Sodium dodecyl sulfate (SDS) 252 Soft independent modeling of class analogy (SIMCA) 215 Spinal cord related problems 278 and curcumin 278 Staphylococcal protein A 38 as Ig-binding protein 38 Steroid hormone receptors 171 screening by fluorescence polarization (FP) of 171 Streptomycin 19 as selective RNA targeting drug 19 breakthrough discovery of 19 isolation of 19 Structure based combinatorial design 292
Subject Index
Frontiers in Drug Design & Discovery, 2005, Vol. 1, No. 1
359
Structure-based drug design 287 in lead generation 287 Surface plasmon resonance (SPR) technology 77,179 application of biosensors 179 in receptor-ligand studies 77 in imaging of peptide/nucleic acid arrays 77 Swanson’s ABC model 271 building upon 271 improvements on 271 Synthetic gene networks 306 as biological switches 306 as oscillators 306 as sensors 306
Transmission electron microscopy (TEM) 239 Transporters 18 Trifluoroacetyl-dipeptide-anilides 291 binding modes of 291 tRNA isoacceptors 300 Tumor necrosis factor-alpha (TNF-α) 44 induced cytotoxicity/apoptosis 44 Two-component signal transduction systems 6 and response regulator 6 role of histidine kinase 6 Two-hybrid screening systems 74 Type 2 diabetes 277
T4-joining enzyme 298 Tandem repeat sequence 303 Terminalia macroptera 140 Tetrahymena spp. 299 non-universal genetic code in 299 Thalidomide 276 and acute pancreatitis 276 and chronic hepatitis C 276 and Heliobacter pylori 276 and myasthenia gravis 276 and sialadenitis 276 therapeutic uses of 276 Thrombopoietin 44 Time resolved fluorescence resonance applications of 172 energy transfer (TR-FRET) 169 Time-of-flight (TOF) analysis 77 Toxicity screen 211 metabonomic urinalysis as 211 Toxicokinetics 211 Toxins 212 ANIT as 212 TA as 212 Transfer messenger RNA (tmRNA) 304 Translocation 255
UGA stop codon 299 as non-universal genetic code 299 Vascular endothelial growth factor (VEGF) 48 Viagra (sildenafil citrate) 267 in angina 267 in hypertension 267 Vinblastine 115 from Catharanthus roseus 115 Vincristine 115 from Catharanthus roseus 115 Viral promoters 101 antibiotic resistant genes as 101 cytomegalovirus virus (CMV) as 101 rous sarcoma virus (RSV) as 101 Virulence factors 4 impact of genomic revolation 4 VIS-UV absorption spectroscopy 248 Water-soluble fullerenes 255 environmental remediation of 255 therapeutics of 255 toxicity of 255 Whole-cell assays 8
360 Frontiers in Drug Design & Discovery, 2005, Vol. 1, No. 1
X-ray crystallography 291,304 engineering proteins for 304 use in fragment based design 291 X-ray diffraction 297
Subject Index
Zinc-fingers libraries 38 DNA binding domain of 38