Bulletin of Mathematical Biology (2006) 68: 735–751 DOI 10.1007/s11538-005-9006-3
ORIGINAL ARTICLE
2D Autocorrelation Modelling of the Inhibitory Activity of Cytokinin-derived Cyclin-dependent Kinase Inhibitors ´ ´ a,b , Julio Caballeroc,d , Maykel Perez Gonzalez Aliuska Morales Helguerab,e , Miguel Garrigaf , c,d,∗ ´ ´ f , Michael Fernandez Gerardo Gonzalez a
Unit of Service, Drug Design Department, Experimental Sugar Cane Station “Villa Clara-Cienfuegos,” Ranchuelo, Villa Clara, C.P. 53100, Cuba b Chemical Bioactive Center, Central University of Las Villas, Santa Clara, Villa Clara, C.P. 54830, Cuba c Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, Matanzas, Matanzas, C.P. 44740, Cuba d Probiotic Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, Matanzas, Matanzas, C.P. 44740, Cuba e Department of Chemistry, Faculty of Chemistry and Pharmacy, Central University of Las Villas, Santa Clara, Villa Clara, C.P. 54830, Cuba f Plant Biotechnology Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, Matanzas, Matanzas, C.P. 44740, Cuba Received: 18 January 2005 / Accepted: 3 March 2005 / Published online: 7 April 2006 C Society for Mathematical Biology 2006
Abstract The inhibitory activity towards p34cdc2 /cyclin b kinase (CBK) enzyme of 30 cytokinin-derived compounds has been successfully modelled using 2D spatial autocorrelation vectors. Predictive linear and non-linear models were obtained by forward stepwise multi-linear regression analysis (MRA) and artificial neural network (ANN) approaches respectively. A variable selection routine that selected relevant non-linear information from the data set was employed prior to networks training. The best ANN with three input variables was able to explain about 87% data variance in comparison with 80% by the linear equation using the same number of descriptors. Similarly, the neural network had higher predictive power. The MRA model showed a linear dependence between the inhibitory activities and the spatial distributions of masses, electronegativities and van der Waals volumes on the inhibitors molecules. Meanwhile, ANN model evidenced the occurrence of non-linear relationships between the inhibitory activity and the mass distribution at different topological distance on the cytokinin-derived compounds. Furthermore, inhibitors were well distributed regarding its activity levels in a Kohonen ∗ Corresponding
author. ´ E-mail address:
[email protected], michael
[email protected] (Michael Fernandez).
736
Bulletin of Mathematical Biology (2006) 68: 735–751
self-organizing map (SOM) built using the input variables of the best neural network. Keywords QSAR · Autocorrelation vectors · Multilinear regression · Artificial neural networks · Plant hormones
1. Introduction Cytokinins are a class of plant-specific hormones that play a central role during the cell cycle and influence numerous developmental programs (Werner et al., 2001). These biomolecules were discovered during the 1950s because of their ability to induce plant cell division. Since then, cytokinins have been shown to regulate a host of additional developmental events, such as the novo bud formation, release of buds from apical dominance, leaf expansion, delay of senescence, promotion of seed germination and chloroplast formation (Werner et al., 2001; Haberer and Kieber, 2002). The biological activities and chemistry of cytokinins are well defined but very little is known about their mode of action (Havl´ıcˇ ek et al., 1997), and it is only recently that cytokinins genes have been identified in plants (Mok et al., 2000). Chemically, natural cytokinins are N6 -substituted purine derivatives. Isopentenil adenine, Zeatine and Dihydrozeatin are the predominant cytokinins found in higher plants. Synthetic cytokinins include adenine derivatives, such as Kinetin, as well as compounds structurally unrelated to natural cytokinins, such as certain phenylureas (Mok et al., 2000; Werner et al., 2001). Natural cytokinins were found to be rather non-specific inhibitors of various cyclin-dependent kinase (CDK) enzymes, a family of enzymes directly involved in cell cycle control (D’Agostino and Kieber, 1999). Among adenine derivatives it has been discovered that 6-(benzylamino)-2-[(2-hydroxy-ethyl)amino]-9-methylpurine, named Olomoucine, specifically inhibits some CDKs at micromolar concentrations. One of the inhibited kinases, the p34cdc2 /cyclin B kinase (CBK), is a key mitotic factor, which is highly conserved and strongly implicated in cell cycle transitions in all eukaryotic cells (Meijer and Raymond, 2003). Recently, it has been discovered that a strong specific CDK inhibitor (CKI), named Roscovitine (6-(benzylamino)-2(R)-[[1-hydroxymethyl]-propyl]amino-9-isopropylpurine), displays an enhanced inhibition toward cdc2 CDK, a higher selectivity toward some CDKs, and an increased antimitotic activity (Meijer et al., 1999; Meijer and Raymond, 2003). Controlling the cell cycle by inhibition of the proteins that regulate its progression is an attractive strategy for addressing cancer and other diseases associated with abnormal cellular proliferation (Arris et al., 2000). The identified CKIs display antimitotic and apoptosis-inducing properties, and are being evaluated as potential antitumor agents because of the numerous abnormal regulations of CDKs in human cancer (Havl´ıcˇ ek et al., 1997; Meijer et al., 1999). Nowadays, the synthesis of novel highly selective CKIs as candidate for CDK-target therapy in cancer treatment is high on demand (Arris et al., 2000). Computational models that are able to predict the biological activity of compounds by its structural properties
Bulletin of Mathematical Biology (2006) 68: 735–751
737
are powerful tools to design highly active molecules. In this sense, quantitative structure–activity relationships (QSAR) studies have been successfully applied for modelling biological activities of natural and synthetic chemicals (Kubinyi, 1993). Graph-theoretical and topological methods are included in the most QSAR stud´ ´ 2004a,b; Gonzalez ´ ies (Gonzalez and Teran, et al., 2004). Among these methods, 2D spatial autocorrelations has been successfully used for modelling log P values (Devillers and Domine, 1997), biological activities (Moreau and Broto, 1980a,b), for pharmaceutical research (Wagener et al., 1995) and toxicological research (Devillers, 1999). In this work, autocorrelation vectors were used for encoding structural information from CKIs molecules, and linear and non-linear models of the CDK inhibitory activity were built using multivariate-linear regression analysis (MRA) and artificial neural networks (ANNs). A comparative study was developed according to the results of data fitting and the predictive power of the models measured by cross-validation technique. The versatility of ANNs was used also for mapping the CKIs inhibitory activities on a topological map using competitive neural networks. 2. Spatial autocorrelation approach The binding of a substrate to its receptor is dependent on the shape of the substrate and on a variety of effects such as the molecular electrostatic potential, polarizability, hydrophobicity and lipophilicity. Therefore, in a QSAR study the strategy for encoding molecular information must in some way, either explicitly or implicitly, account for these physicochemical effects. Furthermore, usually data sets include molecules of different size with different numbers of atoms, so the structural encoding structures must allow comparison of such molecules (Bauknecht et al., 1996). Autocorrelation vectors have several useful properties. First, a substantial reduction in data can be achieved by limiting the topological distance, l. Second, the autocorrelation coefficients are independent of the original atom numberings, so they are canonical. And thirdly, the length of the correlation vector is independent of the size of the molecule (Bauknecht et al., 1996). For the autocorrelation vectors, H-depleted molecular structure is represented as a graph G and physico-chemical properties of atoms as real values assigned to the vertices of G (Table 1). These descriptors can be obtained by summing up the products of certain properties of two atoms located at given topological distances or spatial lag in G. Table 1 Representation of different molecular graphs G and topological distances or spatial lags di j .
738
Bulletin of Mathematical Biology (2006) 68: 735–751
Three spatial autocorrelation vectors were employed for modelling the inhibitory activity. Moran’s index (Moran, 1950): N i j δi j ( pk i − pk)( pkj − pk) I( pk, l) = . (1) 2L i ( pk i − pk) Geary’s coefficient (Geary, 1954): (N − 1) i j δi j ( pk i − pk)( pkj − pk) c( pk, l) = . 4L i ( pk i − pk) Broto–Moreau’s autocorrelation coefficient (Moreau and Broto, 1980a): δi j pk i pkj A(Pk, l) =
(2)
(3)
i
where I(Pk, l), c( pk, d) and A( pk, l) are Moran’s index, Geary’s coefficient and Broto–Moreau’s autocorrelation coefficient at spatial lag l respectively; pk i and pkj are the values of property k of atom i and j respectively; pK is the average value of property k and δ(l, di j ) is a Dirac-delta function defined as 1 if di j = l δ(l, di j ) = (4) 0 if di j = l where di j is the topological distance or spatial lag between atoms i and j. Spatial autocorrelation measures the level of interdependence between properties, and the nature and strength of that interdependence. It may be classified as either positive or negative. In a positive case, all similar values appear together, while a negative spatial autocorrelation has dissimilar values appearing in close association (Moran, 1950; Geary, 1954). In a molecule, Moran’s and Geary’s spatial autocorrelation analysis tests whether the value of an atomic property at one atom in the molecular structure is independent of the values of the property at the neighbouring atoms. If dependence exists, the property is said to exhibit spatial autocorrelation. Moreau and Broto (1980a,b) first applied autocorrelation function to the topology of molecular structures. The autocorrelation vectors represent the degree of similarity between molecules. A data matrix is generated with the spatial autocorrelation vectors calculated for each compound. Afterwards, dimensionality reduction methods are employed for selecting the most relevant vector components for building linear and neural network models.
3. Data set and models 3.1. Data set Inhibitory activities against p34cdc2 /CBK enzyme and molecular structures of 30 CKIs were taken from the literature (Havl´ıcˇ ek et al., 1997). p34cdc2 /cyclin B
Bulletin of Mathematical Biology (2006) 68: 735–751
739
was purified from M phase oocytes of starfish (Marthaterias gracialis). The purified CDKs were assayed with [γ 32 P] ATP and histone H1 as protein substrate, in the presence of an increasing concentration of potential inhibitors for determination of maximum phosphate incorporation by P32 radioactivity. The kinases activity is expressed as micromol of phosphate groups incorporated in histone H1. IC50 refers to the micromolar concentration of the compound required for 50% inhibition of the enzyme activity (Havl´ıcˇ ek et al., 1997). Molecular structure, numbering of the substituents and biological activities are summarized in Table 2. Activities reported like lower or higher than threshold values were taken equal to the threshold value. Prior to molecular descriptor calculations, 3D structures of the studied compounds were geometrically optimised using semi-empirical quantum-chemical method AMI (Dewar et al., 1985) implemented in MOPAC 6.0 (Frank and Seiler Research Laboratory, 1993) computer software. Dragon (Todeschini et al., 2003) computer software was used for calculating unweighted and weighted Moran, Geary and Broto–Moreau 2D autocorrelation vectors. We tried atomic masses, atomic van der Waals volumes, atomic Sanderson electronegativities and atomic polarizabilities as weighting properties. Autocorrelation vectors were calculated at spatial lags l ranging from 1 up to 8. The total number of computed descriptors was 96. Descriptors with constant values were discarded. For the remaining descriptors, pairwise correlation analysis was performed in order to reduce, in a first step, the collinearity and correlation among descriptors. The procedure consists of elimination of the descriptor with lower variance from each pair of descriptors with the modulus of the pair correlation coefficients higher than a predefined value Rmax (0.90). Afterwards, the number of remained descriptors was 39. 3.2. Forward stepwise multi-linear regression analysis The most significant parameters for the MRA model were identified from the data set using forward stepwise regression method (Kowalsky and Wold, 1982), where the independent variables are individually added or deleted from the model at each step of the regression depending on the Fisher ratio values selected to enter and to remove until the best model is obtained. Statistical analysis and data exploration was carried out using the Statistic version 6.0 (StatSoft, 2001) computer software. Examining the regression coefficients, the standard deviations, the Fisher ratio, the significances and the number of variables in the equation determined the quality of the model. 3.3. Artificial neural networks 3.3.1. Basic ANNs are computer-based models in which a number of processing elements, also called neurons, units, or nodes are interconnected by links in a net-like structure, forming “layers” (Aoyama et al., 1990; Sumpter et al., 1994). A variable value is assigned to every neuron. The neurons can be one of three different kinds. The input neurons receive their values from independent variables, input layer. The
740
Bulletin of Mathematical Biology (2006) 68: 735–751
Table 2 Chemical structures of cytokinin-derived cyclin depend kinase inhibitors and experimental and predicted inhibitory activities by forward stepwise multi-linear regression analysis and artificial neural network models. log(l/IC50 ) R1
R2
R3
Experimental
MRA
ANN 1
1 2 3 4 5 6 7 8 9 10 11 12
H H H H H H H H Benzyl Benzyl Benzyl Benzyl
H NH2 Cl CH3 NH–CH2 –CH2 OH H Cl NH–CH2 –CH2 OH H H NH2 H
−2.301 −2.000 −3.000 −2.505 −2.301 −2.544 −1.845 −1.699 −2.301 −1.602 −1.954 −1.398
−2.659 −2.149 −2.107 −2.743 −1.676 −2.848 −2.279 −2.026 −2.490 −1.799 −2.301 −1.125
−2.356 −2.574 −2.726 −2.573 −2.501 −2.563 −1.844 −2.150 −2.305 −2.152 −2.346 −1.396
13 14 15 16
NH–CH2 –CH2 OH NH2 Cl Cl
−0.778 −1.602 −1.079 −1.602
−0.785 −1.600 −1.234 −2.106
−0.826 −1.769 −1.586 −1.524
CH2 –CH2 OH
CH2
−1.813
−1.637
−1.240
18 19 20 21 22 23 24
Benzyl Benzyl Benzyl CH2 –CH= CH(CH2 )2 CH2 –CH= CH(CH2 )2 Cyclohexyl Cyclohexyl Benzyl Benzyl Benzyl Benzyl Benzyl
H H H H H CH3 CH3 CH3 H CH3 H NH–CH (OH)CH3 CH3 CH3 CH3 CH3
Cl NH-CH2 -CH2 OH Cl NH–CH2 –CH2 OH NH–CH2 –CH2 OH NH–CH2 –CH2 OHCH3 NHCH(C2 H5 )CH2 OH
CH3 CH3 CH(CH3 )2 CH2 –CH2 OH CH(CH3 )2 CH(CH3 )2 H
−2.114 −0.778 −1.230 −0.903 −0.301 0.046 0.187
−1.698 −1.056 −1.095 −0.383 −0.623 −0.396 −0.333
−1.983 −0.908 −1.009 −0.578 −0.388 −0.127 −0.106
25 26
H Benzyl
CH3 CH3
−3.000 −2.602
−2.820 −2.820
−2.524 −2.610
17
Bulletin of Mathematical Biology (2006) 68: 735–751 Table 2
741
Continued log(l/IC50 ) R1
R2
Experimental
MRA
ANN 1
27 28
H Benzyl
CH3 CH3
−2.699 −3.000
−2.820 −2.126
−2.578 −2.388
29 30
H Benzyl
CH3 CH3
−2.477 −3.000
−1.851 −2.610
−2.217 −2.423
hidden neurons collect values from other neurons, giving a result that is passed to a successor neuron. The output neurons take values from other units and correspond to different dependent variables, forming the output layer. In this sense, network architecture is commonly represented as I–H–O, where I, H and O are the number of neurons in the input, hidden and output layers respectively. The links between units have associated values, named weights, that condition the values assigned to the neurons. There exist additional weights assigned to bias values that act as neuron value offsets. The weights are adjusted through a training process in order to minimize network error. Commonly, neural networks are adjusted, or trained, so that a particular input leads to a specific target output. The characteristics of ANNs have been found to be suitable for data processing, in which the functional relationship between the input and the output is not previously defined. This is due to the fact that structure–activity relationships are often non-linear and very complex, and neural networks are able to approximate any kind of analytical continuous function, according to Kolmogorov’s (1957) theorem. 3.3.2. Neural network data reduction method A neural network feature selection procedure that extracts non-linear information from the data set was employed for data dimensionality reduction before
742
Bulletin of Mathematical Biology (2006) 68: 735–751
network training. In this regard. neuro-genetic input selection routine (NGISR) of Statistica Neural Networks package from Statistica 6.0 computer software was used. This tool combines the algorithms of Genetic Algorithms, Probabilistic and Generalized Regression Neural Networks to automatically search for optimal combinations of input variables (StatSoft, 2004). 3.3.3. Feed-forward neural networks Feed-forward networks had 3–3–1 architecture and training functions that updated weights and bias values according to gradient descent momentum and an adaptive learning rate. Network training function parameters were optimised by varying both learning rate and momentum from 0.01 to 0.99. Matlab version 6.5 (The MathWorks, 2002) was used for implementing fully connected, three-layer, feed-forward computational neural networks with backpropagation training and non-automatic regularization. In these nets, the transfer function of input and output layers was linear, and the hidden layer had neurons with a hyperbolic tangent transfer function. Weight decay regularization algorithm was implemented with the aim of improving network generalization by avoiding model overfitting (Guo-Zheng et al., 2004). The network performance function (Eq. (5)) was modified by adding a term that consists of the mean of the sum of squares of the network weights and biases. Using this modified performance function (Eq. (6)) will cause the network to have smaller weights and biases, and this will force the network response to be smoother and less likely to overfit. 1 (Yi − Ai )2 N N
F = MSE =
(5)
i=1
MSEREG = γ × MSE + (1 − γ ) × MSW MSW =
1 n
n
w 2j .
(6) (7)
j=1
In these equations, F is the network performance function, MSE the mean of the sum of squares of the network errors, N the number of compounds, Yi the predicted biological activity of compound i, Ai the experimental biological activity of compound i, γ the performance ratio that was optimised by varying from 0.90 to 1.0, MSEREG the modified network performance function, MSW the mean of the sum of the network weights, and w j are the weights of the neuron j (Demuth and Beale, 2003a). Network training was stopped when the minimum gradient of 0.001 was reached and then adjusted network weight and bias were stored. 3.3.4. Self-organizing maps In order to settle structural similarities among the cytokinin-derived CKIs, a Kohonen self-organizing map (SOM) was built. The autocorrelation descriptors selected by NGISR were used for unsupervised training of 7 × 7 neuron map.
Bulletin of Mathematical Biology (2006) 68: 735–751
743
Kohonen (1982) introduced a neural network model that generates SOM. Neurons are arranged in a 2D network. Molecules characterized by m descriptors are projected into this network. With m > n a Kohonen network can be used to project a higher-dimensional space into a lower-dimensional space (Gasteiger and Zupan, 1995). Such maps of surface properties have been used for comparing wide variety of biologically active compounds (Gasteiger and Li, 1994). m 2 outcs ← min (Xsi − wi j ) . (8) i=1
Kohonen network is trained using an unsupervised and competitive learning process. In our case, a molecule s, characterized by m descriptors xsi , will be projected into the (central) neuron cs that has weights w ji most similar to the input variables (Eq. (8)). During the learning process, weights of the neurons in the network are changed to make them even more similar to the input variables. The weights of all neurons are adjusted but to an extent that it decreases with increase in the distance from the central, winning neuron, cs . Finally, a molecule is projected into that neuron of the network with weights that come closest to the description of the molecule by the autocorrelation vector. It should be noticed that the criterion embedded in Eq. (8) for determining the winning neuron for a molecule basically constitutes the measure determining the similarity of molecular structures. Molecules with similar autocorrelation vectors, Xs , are projected into the same or closely adjacent neurons. SOM were implemented in Matlab 6.5. neurons were initially located at a grid topology. The ordering phase was developed in 1000 steps with 0.9 learning rate until tuning neighbourhood distance (1.0) was achieved. The tuning phase learning rate was 0.02. Training was performed for a period of 2000 epochs in an unsupervised manner (Demuth and Beale, 2003b). 3.4. Model validation Models were validated by calculating Q2 values from “Leave-One-Out” (LOO) cross validation. A data point is removed (left out) from the set, and the model refitted: the predicted value for that point is then compared to its actual value. This is repeated until each datum has been omitted once; the sum of squares of these deletion residuals can then be used to calculate Q2 , an equivalent statistic to R2 , N (Y i − Ai )2 (9) Q2 = 1 − i=1 N 2 i=1 (Yi − Ai ) where N is the number of compounds, Yi and A1 are the predicted and experimental biological activities of left-out compound i, respectively, A is the average experimental activity of left-in compounds different to i. The Q 2 values can be considered a measure of the predictive power of a model: whereas R2 can always be increased artificially by adding more parameters
744
Bulletin of Mathematical Biology (2006) 68: 735–751
(descriptors or neurons), Q 2 decreases if a model is overparameterized (Hawkins, 2004), and is therefore a more meaningful summary statistic for predictive models.
4. Results and discussion 4.1. Multi-linear regression analysis 2D autocorrelation descriptors were used for obtaining, in a first approach, a MRA model for the inhibitory activities of cytokinin-like compounds against CDK enzyme with acceptable statistic significance and predictive power (Eq. (10)). Following the principle of parsimony (Hawkins, 2004), we choose a three-variable model as the “best” model. log(l/IC50 ) = 0.1793 × ATS1m + 2.219 × MAT8ν − 2.208 × MATS1e − 6.052 N = 30 p < 10−5
R = 0.889 Q2 = 0.712
(10) S = 0.433
F = 32.732
S = 0.508.
In Eq. (10), N is the number of compounds included in the model, R the correlation coefficient, S the standard deviation of the regression, F the Fisher ratio, Q2 the squared of correlation coefficient of the cross validation, p the significance of the variables in the model, and Scν the standard deviation of the cross validation. Inhibitory activities of the CKIs predicted by the linear model appear in Table 2. This model is able to explain about 80% data variance and more importantly it is quite stable to the inclusion–exclusion of compounds as measured by the squared of correlation coefficient (Q2 > 0.7). Variables in the model correspond to Moran’s and Broto–Moreau’s spatial autocorrelation coefficients
Table 3 Symbols of the descriptors selected by forward stepwise multi-linear regression analysis and neuro-genetic input selection routine and their definitions. Variablea Forward stepwise multi-linear regression analysis ATS1m Broto–Moreau autocorrelation of lag 1/weighted by atomic masses MATS8v Moran autocorrelation of lag 8/weighted by atomic van der Waals volumes MATS1e Moran autocorrelation lag of 1/weighted by atomic Sanderson electronegativities Neuro-genetic input selection routine ATS1m Broto–Moreau autocorrelation of lag 1/weighted by atomic mass MATS5m Moran autocorrelation of lag 5/weighted by atomic masses MATS6m Moran autocorrelation of lag 6/weighted by atomic masses MATS8m Moran autocorrelation of lag 8/weighted by atomic masses GATS8m Geary autocorrelation of lag 8/weighted by atomic masses a The
definition of the terms appears largely explained in Todeschini and Consonni (2000).
Bulletin of Mathematical Biology (2006) 68: 735–751
745
weighted by atomic masses, atomic van der Waals volume and atomic Sanderson electronegativities (Todeschini and Consoni, 2000) (Table 3). These autocorrelation descriptors represent the degree of similarity between inhibitor molecules based on such properties at spatial lags 1, 8 and 1, respectively. Inhibition of CDK enzyme by purine-based inhibitors is reported to involve competitive interactions with the ATP-binding site in the monomeric CDK. The inhibitors bind in the adenine-binding pocket of the enzyme, but in an unexpected and different orientation from the adenine of the authentic ligand ATP (Meijer and Raymond, 2003). CKIs are all flat, hydrophobic heterocycles, which bind in the ATP-binding pocket through two to three hydrogen bonds with the backbone atoms of Leu83 and Glu81 in the active site and hydrophonic and van der Waals interactions (Meijer and Raymond, 2003). This fact agrees well with our linear model in which inhibitory activity depend of mass, van der Waals volume and electronegativity distributions on the inhibitor molecule. 4.2. Feed-forward neural networks Since biological interactions are non-linear by nature, the main goal of this work was to train ANNs for modelling the inhibitory activities against CDK enzyme of CKIs. Choosing the optimum architecture for a networks is always a difficult task, in our work we followed the criterion that 1.8 < ρ < 2.2 (ρ = (number of data points in the training set)/(number of adjustable weights and bias controlled by the networks) (So and Richards, 1992). In this sense, the network inputs were fixed to three descriptors and ANN architecture was 3:3:1 representing ρ = 1.88. The selection of the optimum variable subset among a large number of descriptors for fitting a model is a key question in modelling processes. A lot of reports described the use of MRA and principal components analysis for dimensionality reduction in ANN modelling (Zahouily et al., 2002; Hemmateenejad et al., 2003). Recently, several novel approaches that attempt to select variables that gather non-linear information have been published. The most of these methods combine genetics algorithm and different ANN approaches (Yasri and Hartsough, 2001; ´ et al., 2003). In this work, we used the NGISR approach implemented in Vanyur Statistica Neural Networks package (see section 3.3.2) for selecting a second variable subset enable of retaining non-linear information. This method is a feature selection routine based on a neuro-genetic algorithm which reduces data dimensionality by removing redundant information. Applying the aforementioned method, the data were reduced to five descriptors that appear in Table 3. Correlation matrix of these variables is reported in Table 4. As can be observed, the selected variables do not correlate significantly. It is worth noting that all the selected autocorrelation descriptors are just weighted by atomic masses. Afterwards, all possible combinations of three variables, within this reduced data set, were tested for training feed-forward ANNs. The subset of three descriptors giving the lowest MSE was used as network inputs for generating the model ANN 1 (Table 5) using the inhibitory activities toward CDK enzyme as target outputs. Network inputs and outputs were normalized before training processes. Firstly, learning rate and momentum were optimised by response surface analysis.
746
Bulletin of Mathematical Biology (2006) 68: 735–751
Table 4 Correlation matrix of the descriptors selected by neuro-genetic input selection routine.
ATS1m MATS5m MATS6m MATS8m GATS8m
ATS1m
MATS5m
MATS6m
MATS8m
GATS8m
1.000
0.000 1.000
0.000 0.382 1.000
0.016 0.413 0.277 1.000
0.013 0.087 0.010 0.103 1.000
Learning rate and momentum were both varied from 0.10 to 0.99. Figure 1 depicts the projection of the fitted response surface of the performance function of the trained feed-forward networks. Minimum MSE was reached for optimum learning rate and momentum of 0.75 and 0.85, respectively. ANN models often tend to overfit the data having low predictive power (Hawkins, 2004). Therefore, with the aim of improving network generalization we modified the network performance function by adding a term that consists of the mean of the sum of squares of the network weight and biases (Eq. (6)). In this equation, the parameter γ was optimised by calculating R2 and Q2 of LOO cross validation using optimum learning rate and momentum. Figure 2 depicts R2 and Q2 of LOO cross validation for networks trained with γ value ranging from 0.80 to 1.00. Fitting of the data was improved with the increment of γ and maximum R2 of 0.945 was reached at γ maximum. However, Q2 of LOO cross-validation was increased until a maximum of 0.756 for γ = 0.97, but beyond this value, Q2 began to decrease. This result confirmed that as better as the network fits the data, worse is its predictive power. Since an important feature of a QSAR model is its ability for making predictions, we consider γ = 0.97 as the optimum value for generating optimum ANN 1. Table 4 summarizes the statistics of data fitting and LOO cross validation for the ANN model 1, where learning rate, momentum and γ were fixed to their optimum values of 0.75, 0.85 and 0.97, respectively. The predicted inhibitory activities for this model are also reported in Table 2. This non-linear model overcomes the linear model obtained in this work. The network fitted the data with a higher R2 being able to describe about 87% of data variance in comparison with 80% by the linear model. At the same time, the MSE of data fitting was about 0.102 for ANN 1 model, while MRA showed a higher MSE value of 0.187. Moreover, the
Table 5 Statistics of artificial neural network models for inhibitory activity of cytokinin-derived cyclin-dependent kinase inhibitors. Model
Selection method
ANN 1
NGISR
ANN 2
MRA
a Network
Descriptors
R
MSE
Q2
MSEcv a
ATS1m, MATS6m, GATS8m ATS1m, MATS8v, MATS1e
0.932
0.102
0.759
0.155
0.926
0.110
0.646
0.238
mean square error (MSE) of LOO cross validation.
Bulletin of Mathematical Biology (2006) 68: 735–751
747
Fig. 1 Projection of the fitted response surface of network mean square errors (MSE) varying both learning rate and momentum from 0.01 to 0.99.
network was able to predict the inhibitory activity of unknown compounds with higher accuracy. Its higher Q2 of LOO cross validation of 0.759 emphasized that ANN 1 had higher predictive power. ANN accommodated well data set on an error surface minimum by means of smooth adjusting of weight and bias that allowed an adequate network generalization. In the light of these facts, neural network
Fig. 2
R2 (•) and Q2 () of LOO cross validation for ANN 1 model varying γ from 0.80 to 1.0.
748
Bulletin of Mathematical Biology (2006) 68: 735–751
approach proved to be more reliable than forward stepwise MRA method for modelling this biological property. The network reveals that there exists a nonlinear dependence between inhibitory activities of the cytokinin-derived CKIs and the spatial autocorrelations of atomic masses on the inhibitor structure. Variables selected by forward stepwise MRA were also used for training neural networks. In this case, learning rate, momentum and γ were fixed to their optimum values. Nevertheless, statistics of this network (ANN 2) reflected that these descriptors did not give a good network model (Table 3). When it is compared to ANN 1, this model showed higher MSE values for both processes data fitting and LOO cross validation. Despite R2 value being higher in comparison to the linear model’s, a lower Q2 of LOO cross validation (Q2 < 0.7) shows that ANN 2 decreases in stability and predictive power. In this case, neural network was unable to overcome the MRA model. This result agrees well with previous reports in which ANN trained with variables selected by MRA were less predictive in comparison to the corresponding linear model (Zahouily et al., 2002). In such cases, networks were less robust and data was overfitted. This result points out that relevant non-linear information was lost from the data set during forward stepwise variable selection procedure. 4.3. Kohonen self-organizing map Variables selected by NGISR approach were used to obtain a SOM of the inhibitory activity of the cytokinin-derived compounds. We built a 7 × 7 Kohonen
Fig. 3 Kohonen self-organizing map of inhibitory activities of cytokinin-derived cyclindependent kinase inhibitors. The colour legend means minimum activity: log(l/IC50 ) = −3.000 and maximum activity: log(l/IC50 ) = −0.187. Number represents amount of compounds (more than one) placed in the same neuron. Underlined number means conflictive neuron.
Bulletin of Mathematical Biology (2006) 68: 735–751
749
SOM. Figure 3 depicts the SOM map of the data; 26 of a total of 49 neurons were occupied. Four neurons were occupied by two compounds at the same time. Only two neurons were classified as “conflictive” neurons in which less and more active compounds were placed at the same time. In the map, the activity level of a “conflictive” neuron was placed according to the activity level of its neighbourhood. As it is observed, compounds with a similar range of activity were grouped into neighbouring areas. It is worth noting that CKIs with low inhibitory activities, which include structures with non-free 1, 3 and 7 positions, with removed or changed side chain at position 2 and with removed hydrophobic group at position 6 were placed in adjacent to less active neurons. Literature reported those positions that are fundamental for interaction between purine-based inhibitors and adenine-binding site in CDK monomeric enzyme (Meijer and Raymond, 2003). On the other hand, the most active cytokinin derivatives, Olomoucine (compound 23) and highly similar compounds, were placed adjacent to highly active neurons: in this case, two neurons were double occupied, one by compounds 21 and 22, and another by compounds 23 and 24. Similarly, half-active compounds were placed in the map at spaced medium-active neurons. 5. Concluding remarks Biological phenomena are complex by nature. In this work, the inhibitory activity against CDK enzyme of a set of cytokinin-derived compounds was successfully modelled using MRA and ANN. 2D spatial autocorrelation descriptors were used for encoding structural information of the studied compounds. Neural network approach showed to overcome linear model by having higher correlation coefficient and predictive power. Our results also point that dimensionality reduction approach using a combination of genetic algorithms, probabilistic and generalized regression neural networks is a better option for data selection before network training. A non-linear dependence between CKIs inhibitory activities and the spatial autocorrelations of atomic masses on the inhibitor structure were found. Selected autocorrelation descriptors were also able to well distribute data set on a Kohonen SOM. References Aoyama, T., Suzuki, Y., Ichikawa, H., 1990. Neural networks applied to structure–activity relationships. J. Med. Chem. 33, 905–908. Arris, C.E., Boyle, F.T., Calvert, A.H., Curtin, N.J., Endicott, J.A., Garman, E.F., Gibson, A.E., Golding, B.T., Grant, S., Griffin, R.J., Jewsbury, P., Johnson, L.N., Lawrie, A.M., Newell, D.R., Noble, M.E.M., Sausville, E.A., Schultz, R., Yu, W., 2000. Identification of novel purine and pyrimidine cyclin-dependent kinase inhibitors with distinct molecular interactions and tumor cell growth inhibition profiles. J. Med. Chem. 43, 2797– 2804. Bauknecht, H., Zell, A., Bayer, H., Levi, P., Wagener, M., Sadowski, J., Gasteriger, J., 1996, Locating biologically active compounds in medium-sized heterogeneous datasets by dopological autocorrelation vectors: Dopamine and benzodiazepine agonist. J. Chem. Inform. Comput. Sci. 36, 1205–1213. D’Agostino, I.B., Kieber, J.J., 1999. Molecular mechanisms of cytokinin action. Curr. Opin. Plant Biol. 2, 359–364.
750
Bulletin of Mathematical Biology (2006) 68: 735–751
Demuth, H., Beale, M., 2003a. Neural Network Toolbox User’s Guide for Use with MATLAB, 4th edn. The Mathworks Inc., Massachusetts, pp. 51–61, Chapter 5. Demuth, H., Beale, M., 2003b. Neural Network Toolbox User’s Guide for Use with MATLAB, 4th edn. The Mathworks Inc., Massachusetts, pp. 9–23, Chapter 8. Devillers, J., 1999. Autocorrelation descriptors for modelling (eco)toxicological endpoints. In: Devillers, J., Balaban, A.T. (Eds.), Topological Indices and Related Descriptors in QSAR and QSPR. Gordon and Breach Science Publishers, pp. 595–612. Devillers, J., Domine, D., 1997. Comparison of rehability of log P values calculated from a group contribution approach and from the autocorrelation method. SAR QSAR Environ. Res. 7, 195–232. Dewar, M.J.S., Zoebisch, E.G., Healy, E.T., Stewart, J.J.P., 1985. AME: New general purpose quantum mechanical molecular model. J. Am. Chem. Soc. 107, 3902–3910. Frank, J., 1993. Seiler Research Laboratory, MOPAC version 6.0. U.S. Air Force Academy. Gasteiger, J., Zupan, J., 1995. Neural networks in chemistry. Angew. Chem. Int. Ed. Engl. 32, 503–527. Gasteiger, J., Li, X., 1994. Abbildung elektrostatischer Potentiale muscarinischer und nicotinischer Agonisten mit ku¨nstlichen neuronalen Netzen. Angew. Chem. 106, 671– 674. Geary, R.F., 1954. The contiguity ratio and statistical mapping. Incorp. Stat. 5, 115–145. ´ ´ Gonzalez, M.P., Helguera, A.M., Gonzalez-D´ ıaz, H., 2004. A TOPS-MODE approach to predict permeability coefficients. Polymer 45, 2073–2079. ´ ´ C., 2004a. A TOPS-MODE approach to predict adenosine kinase inhibiGonzalez, M.P., Teran, tion. Bioorg. Med. Chem. Lett. 14, 3077–3079. ´ ´ C., 2004b. QSAR study of N6 -(substituted-phenylearbamoyl) adenosineGonzalez, M.P., Teran, 5 -uronamides as agonist for A1 adenosine receptors. Bull. Math. Biol. 66, 907–920. Guo-Zheng, L., Jie, Y., Hai-Feng, S., Shang-Sheng, Y., Wen-Cong, L., Nian-Yi, C., 2004. Semiempirical quantum chemical method and artificial neural networks applied for λmax . Computation of some azo dyes. J. Chem. Inform. Comput. Sci. 44, 2047–2050. Haberer, G., Kieber, J.J., 2002. Cytokinins, new insights into a classic ophytohormone. Plant Physiol. 128, 354–362. Havl´ıcˇ ek, I., Hanuˇs, J., Vesely, ´ J., Leclere, S., Meijer, I., Shaw, G., Strnad, M., 1997. Cytokininderived cyclin-dependent kinase inhibitors: Synthesis and cdc2 inhibitory activity of olomoucine and related compounds. J. Med. Chem. 40, 408–412. Hawkins, D.M., 2004. The problem of overfitting. J. Chem. Inform. Comput. Sci. 44, 1–12. Hemmateenejad, B., Akhond, M., Miri, R., Shamsipur, M., 2003. Genetic algorithm applied to the selection of factors in principal component-artificial neural networks: Application to QSAR study of calcium channel antagonist activity of 1,4-dihydropyridines (nifedipine analogous). J. Chem. Inform. Comput. Sci. 43, 1328–1334. Kohonen, T., 1982. Self-organized formation of topologically correct feature maps. Biol. Cybernet. 43, 59–69. Kolmogorov, A.N., 1957. Doklady Akademiia Nauk SSSR. 114, 953–954. Kowalsky, R.B., Wold, S., 1982. Pattern recognition in chemistry. In: Krishnaiah, P.R., Kamal, L.N. (Eds.), Handbook of Statistics. North-Holland, Amsterdam, pp. 673–697. Kubinyi, H., 1993. QSAR: Hansch Analysis and Related Approaches. VCH, New York. Meijer, L., Raymond, E., 2003. Roscovitine and other purines as kinase inhibitors from starfish oocytes to clinical trials. Acc. Chem. Res. 36, 417–425. Meijer, L., Leelere, S., Leost, M., 1999. Properties and potential applications of chemical inhibitors of cyclin-dependent kinases. Pharmacol. Ther. 82, 279–284. Mok, M.C., Martin, R.C., Mok, D.W., 2000. Cytokinins: Biosynthesis, metabolism and perception. In Vitro Cell. Dev. Biol. Plant. 36, 102–107. Moran, P.A.P., 1950. Notes on continuous stochastic processes. Biometrika 37, 17–23. Moreau, G., Broto, P., 1980a. Autocorrelation of a topological structure: A new molecular descriptor. Nouv. J. Chim. 4, 359–360. Moreau, G., Broto, P., 1980b. Autocorrelation of Molecular structures: Application to SAR studies. Nouv. J. Chim. 4, 757–764. So, S., Richards, W.G., 1992. Application of neural network: Quantitative structure–activity relationships of the derivatives of 2,4-diamino-5-(substituted-benzyl)pyrimidines as DHFR inhibitors. J. Med. Chem. 35, 3201–3207. StatSoft Inc, 2001. STATISTICA (data analysis software system), version 6. www.statsoft.com.
Bulletin of Mathematical Biology (2006) 68: 735–751
751
StatSoft Inc, 2004. Electronic Statistics Textbook. StatSoft, Tulsa, OK, web: http://www. statsoft.com/textbook/stathome.html. Sumpter, B.G., Getino, C., Noid, D.W., 1994. Theory and applications of neural computing in chemical science. Annu. Rev. Phys. Chem. 45, 439–481. The MathWorks Inc. (2002). MATLAB version 6.5. www.mathworks.com. Todeschini, R., Consonni, V., 2000. Handbook of Molecular Descriptors. Wiley-VCH, Weinheim. Todeschini, R., Consonni, V., Pavan, M., 2003. DRAGON, version 2.1. ´ R., Heberger, ´ Vanyur, K., Jakus, J., 2003. Prediction of anti-HIV-I activity of a series of tetrapyrrole molecules. J. Chem. Inform. Comput. Sci. 43, 1829–1836. Wagener, M., Sadowski, J., Gasteiger, J., 1995. Autocorrelation of molecular properties for modelling corticosteroid binding globulin and cytosolic Ah receptor activity by neural networks. J. Am. Chem. Soc. 117, 7769–7775. ¨ Werner, T., Motyka, V., Strnad, M., Schmulling, T., 2001. Regulation of plant growth by cytokinin. Proc. Natl. Acad. Sci. U.S.A. 98, 10487–10492. Yasri, A., Hartsough, D., 2001. Toward an optimal procedure for variable selection and QSAR model building. J. Chem. Inform. Comput. Sci. 41, 1218–1227. Zahouily, M., Rhihil Bazoui, A., Sebti, S., Zakarya, D., 2002. Structure–cytotoxicity relationships for a series of HEPT derivatives. J. Mol. Model. 8, 168–172.