Computational Neurogenetic Modeling
TOPICS IN BIOMEDICAL ENGINEERING INTERNATIONAL BOOK SERIES Series Editor: Evangelia Micheli-Tzanakou Rutgers University Piscataway, New Jersey
Signals and Systems in Biomedical Engineering: Signal Processing and Physiological Systems Modeling Suresh R. Devasahayam
Models of the Visual System Edited by George K. Hung and Kenneth J. Ciuffreda
PDE and Level Sets: Algorithmic Approaches to Static and Motion Imagery Edited by Jasjit S. Suri and Swamy Laxminarayan
Frontiers in Biomedical Engineering: Edited by Ned H.C. Hwang and Savio L-Y. Woo
Handbook of Biomedical Image Analysis: Volume I: Segmentation Models Part A Edited by Jasjit S. Suri, David L. Wilson, and Swamy Laxminarayan
Handbook of Biomedical Image Analysis: Volume II: Segmentation Models Part B Edited by Jasjit S. Suri, David L. Wilson, and Swamy Laxminarayan
Handbook of Biomedical Image Analysis: Volume III: Registration Models Edited by Jasjit S. Suri, David L. Wilson, and Swamy Laxminarayan
Complex Systems Science in Biomedicine Edited by Thomas S. Deisboeck and J. Yasha Kresh Computational Neurogenetic Modeling Lubica Benuskova and Nikola Kasabov
A Continuation Order Plan is available for this series. A continuation order will bring delivery of each new volume immediately upon publication. Volumes are billed only upon actual shipment. For further information please contact the publisher.
Computational Neurogenetic Modeling
Dr. Lubica Benuskova Senior Research Fellow Knowledge Engineering and Discovery Research Institute AUT, Auckland, New Zealand
and
Professor Nikola Kasabov Founding Director and Chief Scientist Knowledge Engineering and Discovery Research Institute AUT, Auckland, New Zealand
~ Springer
Dr. Lubica Benuskova Senior Research Fellow Knowledge Engineering and Discovery Research Institute, www.kedri.info AUT, Auckland, New Zealand
[email protected]
Professor Nikola Kasabov Founding Director and Chief Scientist Knowledge Engineering and Discovery Research Institute, www.kedri.info AUT, Auckland, New Zealand
[email protected]
Library of Congress Control Number: 200693690 I
ISBN-lO: 0-387-48353-5 ISBN-13: 978-0-387-48353-5
eISBN-lO: 0-387-48355-1 eISBN-13: 978-0-387-48355-9
Printed on acid-free paper. © 2007 Springer Science + Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science + Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
9 8 765 4 3 2 1 springer.com
Dedication
To the memory ofmy parents.
Lubica
To my mother and the memory ofmy father.
Nikola
Preface
It is likely that future progress in many important areas of science (e.g.
brain science, bioinformatics, information science, physics, communication engineering and social sciences) can be achieved only if the areas of computational intelligence, brain science and bioinformatics share and integrate their methods and knowledge. This book offers some steps in this direction. The book presents the background knowledge and methods for the integration of gene information and brain activity information with the purpose of the creation of biologically plausible computational models aiming at modeling and understanding the brain. The book is aiming at encouraging research in information sciences in the direction of human-like and human-oriented information processing. In this context, "human-like" means that principles from the brain and genetics are used for the creation of new computational methods. "Humanoriented" means that these machines can be used to discover and understand more about the functioning of the brain and the genes, about memory and learning, about speech and language, about image and vision, and about ourselves. This work was partially supported by the research grant AUTX0200l "Connectionist-based intelligent information systems", funded by the New Zealand Foundation for Research, Science, and Technology - FRST, through the New Economy Research Fund - NERF. There are a number of people whom we would like to thank for their encouragement and contribution to the book. These are several colleagues, research associates and postgraduate students we have worked with at the Knowledge Engineering and Discovery Research Institute in Auckland, New Zealand, in the period 2002- 2006: Dr Qun Song, Dr Zeke S. Chan, Dr Paul S. Pang, Dr Liang Goh, Vishal Jain, Tian-Min Ma (Maggie), Peter Hwang, Paulo Gottgtroy, Natalia Bedran, Joyce D'Mello and especially Simei Gomes Wysoski, who did the technical edition of the book and developed specialized simulation tools we used for our computer experiments. We appreciate the discussions we had with a number of colleagues from different laboratories and countries. Among them are Walter Freeman from University of California at Berkeley; Takeshi Yamakawa - Kyushu
VIII
Preface
Institute of Technology ; John G. Taylor - Kings College, London; Cees van Leuwen - RIKEN, Japan; Michael Arbib - University of Southern California ; Dimiter Dimitrov - National Cancer Institute in Washington DC; Alessandro E. P. Villa -University of Lausanne, Bassem Hassan Catholic University of Leuven, Gary Markus - New York. We thank the Springer, New York team and especially Aaron Johnson and Beverly Rivero for the encouragement and patience. When one of the authors, N. Kasabov, presented his talk at the ICONIP 2002 in Shanghai and suggested that gene interaction information should be used in biologically plausible neural network models, Walter Freeman commented "Yes, that makes sense, but how do we do that?" Michael Arbib when visiting Auckland in 2004 made a remark that integrating genes (molecular level) into neural networks may require to go to atom (quantum) level. This book is presenting some initial answers to these questions . It presents the foundations and the concepts of computational neurogenetic modeling (CNGM), initially introduced by the authors in 2004 (Kasabov and Benuskova 2004). The book was written by the two authors in a close collaboration, where Lubica Benuskova wrote chapters 2,3 ,8,9, 10 and compiled Appendix I , while Nikola Kasabov wrote chapters 1,4,5,6, 7 and compiled Appendix 2 and 3. Each of the authors also had a smaller contribution to the other chapters as well. The book is intended for postgraduate students and researchers in the areas of information sciences, artificial intelligence, neurosciences , bioinformatics, and cognitive sciences. The book is structured so that every chapter can be used as a reading material for research oriented courses at a postgraduate level. Additional materials, including: data, simulation programs, lecture notes, color figures, etc. can be found on the web site www.kedri.info. Dr Lubica Benuskova Prof. Dr. Nikola Kasabov Knowledge Engineering and Discovery Research (www.kedri.info), Auckland University of Technology Auckland, New Zealand I September 2006
Institute
Contents
Preface 1 Computational Neurogenetic Modeling (CNGM): A Brief Introduction 1.1 Motivation - The Evolving Brain 1.2 Computational Models of the Brain 1.3 Brain-Gene Data, Information and Know ledge 1.4CNGM: How to Integrate Neuronal and Gene Dynamics? 1.5 What Computational Methods to Use for CNGM? 1.6 About the Book 1.7 Summary 2 Organization and Functions of the Brain 2.1 Methods of Brain Study 2.2 Overall Organization of the Brain and Motor Control 2.3 Learning and Memory 2.4 Language and Other Cognitive Functions 2.4.1 Innate or Learned? 2.4.2 Neural Basis of Language 2.4.3 Evolution of Language, Thinking and the Language Gene 2.5 Neural Representation ofInformation 2.6 Perception 2.7 Consciousness 2.7.1 Neural Correlates of Sensory Awareness 2.7.2 Neural Correlates of Reflective Consciousness 2.8 Summary and Discussion
VII
1 1 .4 6 12 14 15 16 17 18 23 25 29 29 30 33 36 37 41 .41 .44 .49
3 Neuro-Information Processing in the Brain 53 53 3.1 Generation and Transmission of Signals by Neurons 3.2 Learning Takes Place in Synapses: Toward the Smartness Gene 56 58 3.3 The Role of Spines in Learning 3.4 Neocortical Plasticity 61
X
Contents 3.4.1 Developmental Cortical Plasticity 61 3.4.2 Adult Cortical Plasticity 64 3.4.3 Insights into Cortical Plasticity via a Computational Model... 66 3.5 Neural Coding: the Brain is Fast, Neurons are Slow 74 3.5.1 Ultra-Fast Visual Classification 74 3.5.2 Hypotheses About a Neural Code 77 Coding Based on Spike Timing 77 The Rate Code 77 3.6 Summary 78
4 Artificial Neural Networks (ANN) 4.1 General Principles 4.2 Models of Learning in Connectionist Systems 4.3 Unsupervised Learning (Self Organizing Maps - SOM) 4.3.1 The SOM Algorithm 4.3.2 SOM Output Sample Distribution Clustering Information Visualization of Input Variables Relationship Between Multiple Descriptors The Connection Weights Interpretation by the Fuzzy Set Theory 4.3.3 SOM for Brain and Gene Data Clustering 4.4 Supervised Learnin g 4.4.1 Multilayer Perceptron (MLP) 4.4.2 MLP for Brain and Gene Data Classification Example 4.5 Spiking Neural Networks (SNN) 4.6 Summary
81 81 84 93 93 95 95 95 96 96 96 97 97 98 98 99 99 102 105
5 Evolving Connectionist Systems (ECOS) 5.1 Local Learning in ECOS 5.2 Evolving Fuzzy Neural Networks EFuNN 5.3 The Basic EFuNN Algorithm 5.4 DENFIS 5.4.1 Dynamic Takagi-Su geno Fuzzy Inference Engine 5.4.2 Fuzzy Rule Set, Rule Insertion and Rule Extraction 5.5 Transductive Reasoning for Personalized Modeling 5.5.1 Weighted Data Normalization 5.6 ECOS for Brain and Gene Data Modeling 5.6.1 ECOS for EEG Data Modeling, Classification and Signal Transition Rule Extraction
107 107 108 112 116 118 119 120 122 122 122
XI
5.6.2 ECOS for Gene Expression Profilin g 5.7 Summary
124 126
6 Evolutionary Computation for Model and Feature Optimization .127 6.1 Lifelong Learning and Evolution in Biological Species: Nurture vs. Nature 127 128 6.2 Principl es of Evolutionary Computation 6.3 Genetic Algorithms 128 6.4 EC for Model and Parameter Optimization 133 6.4. 1 Example 133 6.5 Summary 136
7 Gene/Protein Interactions - Modeling Gene Regulatory Networks (GRN) 137 7.1 The Central Dogma of Molecular Biology 7.2 Gene and Protein Expression Data Analy sis and Modeling 7.2.1 Example 7.3 Modeling Gene/Prote in Regulato ry Networks (GPRN) 7.4 Evolving Connectionist Systems (ECOS) for GRN Modeling 7.4.1 General Principles 7.4.2 A Case Study on a Small GRN Modeling with the Use of ECOS 7.5 Summary
8 CNGM as Integration of GPRN , ANN and Evolving Processes 8.1 Modeling Genetic Control of Neural Development 8.2 Abstract Computational Neurogenetic Model 8.3 Continuous Model of Gene-Protein Dynamics 8.4 Towards the Integration of CNGM and Bioinformatics 8.5 Summary
9 Application of CNGM to Learning and Memory 9.1 Rules of Synaptic Plasticity and Metaplasticity 9.2 Toward a GPRN of Synaptic Plasticity 9.3 Putative Molecular Mechanisms of Metaplasticity 9.4 A Simple One Protei n-One Neuronal Funct ion CNGM 9.5 Application to Modeling ofL-LTP 9.6 Summary and Discussion 10 Applications of CNGM and Future Development 10.1 CNGM of Epilepsy 10.1.1 Genetically Caused Epilepsies
137 141 143 145 150 150 151 153
155 156 161 165 171 175
177 177 185 193 196 198 202
205 206 206
XII
Contents
10.1.2 Discussion and Future Developments 10.2 CNGM of Schizophrenia 10.2.1 Neurotransmitter Systems Affected in Schizophrenia 10.2.2 Gene Mutations in Schizophrenia 10.2.3 Discussion and Future Developments 10.3 CNGM of Mental Retardation 10.3.1 Genetic Causes of Mental Retardation 10.3.2 Discussion and Future Developments 10.4 CNGM of Brain Aging and Alzheimer Disease 10.5 CNGM of Parkinson Disease 10.6 Brain-Gene Ontology 10.7 Summary
209 210 212 214 217 2 18 219 223 224 229 232 235
Appendix 1.•................................................•..............•....•••..............•...... 237 237 A.I Table of Genes and Related Brain Functions and Diseases Appendix 2 A.2 A Brief Overview of Computational Intelligen ce Methods A.2.1 Probabil istic and Statistical Methods Stochastic Models A.2.2 Boolean and Fuzzy Logic Models Boolean Models Fuzzy Logic Models A.2.3 Artificial Neural Networks Evolving Classifier Function (ECF) A.2.4 Methods of Evolutionary Computation (EC)
247 247 247 250 250 250 251 253 254 256
Appendix 3 257 A.3 Some Sources of Brain-Gene Data, Information, Knowledge and Computational Models 257
References
259
Index
287
1 Computational Neurogenetic Modeling (CNGM): A Brief Introduction
This chapter introduces the motivation and the main concepts of computational neurogenetic modeling (CNGM). It argues that with the presence of a large amount of both brain and gene data related to brain functions and diseases, it is required that sophisticated computational models are created to facilitate new knowledge discovery that helps understanding the brain in its complex interaction between genetic and neuronal processes. The chapter points to sources of data, information and knowledge related to neuronal and genetic processes in the brain. CNGM is concerned with the integration of all these diverse information into a computational model that can be used for modeling and prediction purposes. The models integrate knowledge from mathematical and information sciences (e.g. computational intelligence - CI), neurosciences, and genetics. The chapter also discusses what methods can be used for CNGM and how. The concepts and principles introduced in this chapter are presented in detail and illustrated in the rest of the book.
1.1 Motivation - The Evolving Brain According to the Concise Oxford English Dictionary (1983), "evolving" means "revealing", "developing". It also means "unfolding, changing". The term "evolving" is used here in a broader sense than the term "evolutionary". The latter is related to a population of individual systems traced over generations (Darwin 1859, Holland 1975, Goldberg 1989), while the former, as it is used in this book, is mainly concerned with a continual change of the structure and the functionality of an individual system during its lifetime (Kasabov 2003, Kasabov 2006). In living systems and in the human brain in particular, evolving processes are observed at different levels (Fig. 1.1) (Kasabov 2006). At the quantum level, particles are in a complex evolving state all the time, being at several locations at the same time, which is defined by probabilities.
2
I Computational Neurogenetic Modeling(CNGM): A BriefIntroduction
At a molecular level, DNA, RNA and protein molecules, for example, evolve and interact in a continuous way. The area of science that deals with the information processing and data manipulation at this level is Bioinformatics. At the cellular level (e.g. a neuronal cell) all the metabolic processes, the cell growth, cell division etc., are evolving processes. 6. Evolutionary (population/generation) processes 5. Brain cognitiveprocesses (learning,thinking, etc.) 4. System level information processing(e.g. auditory system) 3. Information processing in a cell (neuron) 2. Molecularlevel of information processing(genes, proteins) 1. Quantum level of information processing Fig. 1.1. Six levels of evolving processes in the brain: evolution, cognitive brain processes,brain functions in neural networks, a single neuron functions, molecular processes, and quantumprocesses
At the level of cell ensembles, or at the neural network level, an ensemble of cells (neurons) operates in a concert, defining the function of the ensemble or the network, for instance perception of sound. In the human brain, complex dynamic interactions between groups of neurons can be observed when certain cognitive functions are performed, e.g. speech and language processing, visual pattern recognition, reasoning and decision making. At the level of population of individuals, species evolve through evolution (Darwin 1859)- the top level in Fig. 1.1. Evolutionary processes have inspired the creation of computational modeling techniques called evolutionary computing (EC) (Holland 1975, Goldberg 1989). A biological system evolves its structure and functionality through both lifelong learning by an individual and the evolution of populations of many such individuals. In other words, an individual is a result of the evolution of many generations of populations, as well as a result of its own developmental lifelong learning processes. There are many physical and information processes of dynamic interaction within each of the six levels from Fig. 1.1 and across the levels. Inter-
1.1 Motivation - The Evolving Brain
3
actions are what make an organism a living one, and that is also a challenge for computational modeling. For example, there are complex interactions between DNA, RNA and protein molecules. There are complex interactions between the genes and the functioning of each neuron, a neural network, and the whole brain. Some of these interactions are known to have caused brain diseases, but most of them are unknown at present. An example of interactions between genes and neuronal functions is the dependence of development of brain with human characteristics on expression of genes like FOXP2, the gene involved in speech production (Enard et al. 2002), ASPM and Microcephalin that affect the brain size (Evans et al. 2005, Mekel-Bobrov et al. 2005), and HARIF, that is of fundamental importance in specifying the six-layer structure of the human cortex (Pollard et al. 2006). Another example is the observed dependence between long-term potentiation (learning) in the synapses and the expression of the immediate early genes and their corresponding proteins such as Zif/268 (Abraham et al. 1994). Yet another example are putative genetic mutations for many brain diseases that have been already discovered (see Appendix 1). Generally speaking, neurons from different parts of the brain, associated with different functions, such as memory, learning, control, hearing and vision, function in a similar way. Their functioning is defined by several factors, one of them being the level of neurotransmitters. These factors are controlled both through genetics and external inputs. There are genes that are known to regulate the level of neurotransmitters for different types of neurons from different areas of the brain. The functioning of these genes and the proteins produced can be controlled through nutrition and drugs. This is a general principle that can be exploited for different models of the processes from Fig. 1.1 and for different systems performing different tasks (learning, hearing etc.). We will refer to the above as neurogenetics (Kasabov and Benuskova 2004). The evolving processes in the brain are based on several major principles (Arbib 1972, Grossberg 1982, Arbib 1987, Taylor 1999, Freeman 2000, Arbib 2003, van Ooyen 2003, Marcus 2004a), such as: • Evolving is achieved through both genetically defined information and learning. • The evolved neurons have a spatial-temporal representation where similar stimuli activate close neurons. • The evolving processes lead to a large number of neurons involved in each task, where many neurons are allocated to respond to a single stimulus or to perform a single task; e.g. when a word is heard, there are millions of neurons that are immediately activated.
4
1 Computational Neurogenetic Modeling (CNGM): A BriefIntroduction
• Memory-based learning, i.e. the brain stores exemplars of facts that can be recalled at a later stage. • Evolving is achieved through interaction with the environment and other systems. • Inner processes take place, such as sleep memory consolidation. • The evolving processes are continuous and lifelong. • Through evolving brain structures, higher-level functions emerge which are embodied in the structure, and can be represented as a level of abstraction (e.g. the acquisition and the development of speech and language). The advancement in brain science, molecular biology and computational intelligence, results in a large amount of data, information and knowledge on brain functioning, brain-related genetics, brain diseases and new computational intelligence methods. All these constitute a strong motivation for the creation of a new area of science that we call computational neurogenetic modeling (CNGM), with the following general objectives: 1. To create biologically plausible neuronal models. 2. To facilitate a better understanding of the principles of the human brain, the genetic code, and life in general. 3. To enable modeling of the brain. 4. To create new generic methods of computational intelligence and a new generation of intelligent machines.
1.2 Computational Models of the Brain A project, called The Blue Brain Project, marks the beginning of a study how the brain works by building very large scale models of neural networks (http://bluebrainproject.epfl.ch/index.html). This endeavor follows a century of experimental "wet" neuroscience and development of many theoretical insights of how neurons and neural networks function (Arbib 2003). The Blue Brain Project was launched by the Brain Mind Institute, EPFL, Switzerland and IBM, USA in May, 2005. Scientists from both organizations will work together using the huge computational capacity of IBM's Blue Gene supercomputer to create a detailed model of the circuitry in the neocortex - the largest and most complex part of the human brain. The neocortex constitutes about 85% of the human brain's total mass and is thought to be responsible for the cognitive functions of language, learning, memory and complex thought. The Blue Brain Project will also build models of other cortical and subcortical parts of the brain and models of
1.2 Computational Models of the Brain
5
sensory and motor organs. By expanding the project to model other areas of the brain, scientists hope to eventually build an accurate, computerbased model of the entire brain. The project is a massive undertaking because of the hundreds of thousands of parameters that need to be taken into account. EPFL' s Brain and Mind Institute's world most comprehensive set of empirical data on the micro-architecture of the neocortex will be turned into a working 3-dimensional model recreating the high-speed electrochemical interactions of the brain's interior. The first objective is to create a software replica of the neocortical column at a cellular level for real-time simulations. An accurate replica of the neocortical column is the essential first step to simulating the whole brain. The second and subsequent phases will be to expand the simulation to include circuitry from other brain regions and eventually the whole brain. In the Blue Column, a nickname for the software replica of the neocortical column, not only the cells but an entire microcircuit of cells will be replicated (like the duplication of a tissue). Neocortical column is stereotypical in many respects from mouse to man with subtle variations in different ages, brain regions and species. The Blue Column will first be based on the data obtained from rat somatosensory cortex at 2 weeks of age because these data are the most abundant. Once built and calibrated with iterative simulations and experiments, comparative data will be used to build columns in different brain regions, ages and species, including humans. The Blue Column will be composed of 104 morphologically complex neurons with active ionic channels to enable generation of electrical currents and potentials. The neurons will be interconnected in a 3dimensional (3D) space with 107 -108 dynamic synapses. The Blue Neuron will receive about 103 -104 external input synapses and generate about 103 -104 external output synapses. Neurons will transmit information according to dynamic and stochastic synaptic transmission rules. The Blue Column will self-adapt according to synaptic learning algorithms running on 107 _10 8 synapses, and according to metaplasticity, supervised and reward learning algorithms running on all synapses. The column project will also involve the database of 3D reconstructed model neurons, synapses, synaptic pathways, microcircuit statistics, and computer model neurons. Single synapses and whole neurons will be modeled with the molecular level details however a neocortical column will be modeled at the cellular level. In the future, the research will go in two directions simultaneously: 1. The first direction will be the simplification of the column and its software or hardware duplication to build larger parts of the neocortex and eventually the entire neocortex.
6
1 Computational Neurogenetic Modeling (CNGM): A BriefIntroduction
2. The second direction will stay with a single neocortical column, moving down to the molecular level of description and simulation. This step will be aimed at moving towards genetic level simulations of the neocortical column. A very important reason for going to the molecular level is to link gene activity with electrical activity, as the director of the project Henry Markram reckons. A molecular level model of the neocortical column will provide the substrate for interfacing gene expression with the network structure and function. The neocortical column lies at the interface between the genes and complex cognitive functions. Establishing this link will allow predictions of the cognitive consequences of genetic disorders and allow reverse engineering of cognitive deficits to determine the genetic and molecular causes. It is expected that this level of simulation will become a reality with the most advanced phases of Blue Gene and Blue Brain Project development. As is the model replica of the cortical column based on many computational models of neurons, channel kinetics, learning, etc., we can ask whether there are any models of the computational neurogenetic type to be employed to model the interaction between the genes and neural networks. In other words, are there already corresponding neurogenetic models for brain functions and processes? The goal of this book is to address this question and to introduce the development of such models, which we call computational neurogenetic models (CNGMs).
1.3 Brain-Gene Data, Information and Knowledge The brain is a complex system that evolves its structure and functionality over time. It is an information-processing and control system, collaborating with the spinal cord and peripheral nerves. Each part of the brain is responsible for a particular function, for example: the Cerebrum integrates information from all sense organs, motor functions, emotions, memory and thought processes; the Cerebellum coordinates movements, walking, speech, learning and behavior; the Brain stem is involved in controlling the eyes, in swallowing, breathing, blood pressure, pupil size, alertness and sleep. A simplified view of the outer structure of the human brain is given in Fig 2.1. The structure and the organization of the brain and how it works at a higher and a lower level are explained in Chaps. 2 and 3, respectively. Since the 50-ties of the 20th century, experimental brain data gathering has been accompanied by the development of explanatory computational brain models. Many models have been created so far, for example:
1.3 Brain-Gene Data, Information and Knowledge
7
- Brain models created at USC by a team lead by Michael Arbib, at http://www-hbp.usc.edu/Projects/bmw.htm; - Mathematical brain function models maintained by the European Bioinformatics Institute (EBI): http://www.ebi.ac.uk; - Wayne State Institute Brain Injury Models at http://rtb.eng.wayne.edu/braini; - The Neural Micro Circuits Software: www.lsm.tugraz.at; - Neuroscience databases (Koetter 2003); - Genetic data related to brain (Chin and Moldin 2001) and many more (see Appendix 3). None of these brain models incorporate genetic information despite of the growing volume of data, information and knowledge on the importance and the impact of particular genes and genetic processes on brain functions. Brain functions, such as learning and memory, brain processes, such as aging, and brain diseases, such as Alzheimer, are strongly related to the level of expression of genes and proteins in the neurons (see Appendix 1, Appendix 3 and also Chaps. 9 and 10). Both popular science books and world brain research projects, such as the NCB I (the National Center for Biomedical Information), the Allen Brain Institute, the Blue Brain Project, the Sanger Centre in Cambridge, and many more, have already revealed important and complex interactions between neuronal and genetic processes in the brain, creating a massive world repository of brain-gene data, information and knowledge. Some of the information or/and references to it, are given in Appendix 1 and 3 and Chaps. 9 and 10. The central dogma ofmolecular biology states that DNA, which resides in the nucleus of a cell or a neuron, transcribes into RNA, and then translates into proteins. This process is continuous, evolving, so that proteins in turn cause genes to transcribe, etc. (Fig. 1.2). The DNA is a long, double stranded sequence (a double helix) of millions or billions of 4 base molecules (nucleotides) denoted as A, C, T and G, that are chemically and physically connected to each other through other molecules. In the double helix, they make pairs such that every A from one strand is connected to a corresponding C on the opposite strand, and every G is connected to aT. A gene is a sequence of hundreds and thousands of bases as part of the DNA, that is translated into a protein or several proteins. Only less than 5% of the DNA of the human genome contains protein-coding genes, the other part is a non-coding region that may contain useful information as well. For instance, it contains the RNA genes, regulatory regions, but mostly its function is not currently well understood.
8
1 Computational Neurogenetic Modeling (CNGM): A BriefIntroduction
The DNA of each organism is unique and resides in the nucleus of each of its cells. But what makes a cell alive are the proteins that are expressed from the genes, and define the function of the cell. The genes and proteins in each cell are connected in a dynamic regulatory network consisting of regulatory pathways - see Chap. 7. Normally, only few hundreds of genes are expressed as proteins in a particular cell. At the transcription phase, one gene is transcribed in many RNA copies and their number defines the expression level of this gene. Some genes may be "over-expressed", resulting in too much protein in the cell whereas some genes may be "under-expressed" resulting in too little protein. In both cases the cell may be functioning in a wrong way that may be causing a disease. Abnormal expression of a gene can be caused by a gene mutation - a random change in the code of the gene, where a base molecule is either inserted, or - deleted, or - altered into another base molecule. Drugs can be used to stimulate or to suppress the expression of certain genes and proteins, but how that will affect indirectly the other genes related to the targeted one, has to be evaluated and that is where computational modeling of gene regulatory networks (GRN) and CNGM can help. Output Cell Function Translation mRNA into protein production Transcription Genes copied as mRNA
DNAgenes
RNA
Proteins
Protein-gene feedback loop through Transcription Factors
Fig. 1.2. The genes in the DNA transcribe into RNA and then translate into proteins that define the function of a cell (The central dogma of molecular biology). Gene information processing in moredetails is presented in Chap. 7
It is always difficult to establish the interaction between genes and proteins. The question "What will happen with a cell or the whole organism if one gene is under-expressed or missing?" is now being answered by using a technology called "Knockout gene technology" (Chin and Moldin 2001). This technology is based on a removal of a gene sequence from the DNA
1.3 Brain-Gene Data, Information and Knowledge
9
and letting the cell/organism to develop, where parameters are measured and compared with these parameters when the gene was not missing. The obtained data can be further used to create a CNGM as described in Chap. 8. Information about the relationship between genes and brain functions is given in many sources (see Appendices 1 and 3). In the on-line published book "Genes and Diseases" (www.ncbi.nlm.nih.gov/books) the National Center for Biological Information (NCB!) has made available a large amount of gene information related to brain diseases, for example: - Epilepsy: One of several types of epilepsy, Lafora disease (progressive myoclonic, type 2), has been linked to a mutation of the gene EMP2A and EMP2B found on chromosome 6 (see Chap. 10). - Parkinson disease: Several genes: Parkin, PARK7, PTEN, alpha synuclein and others, have been related to Parkinson disease, described in 1817 by James Parkinson (see Chap. 10). - Huntington disease: Mutation in the HD gene on chromosome 4 was linked to this disease. - Sclerosis: A gene SODI was found to be related to familial amyotropic lateral sclerosis (see Chap. 10). - Rett syndrome: A gene MeCP2, on the long arm of chromosome X (Xq28), has been found to be related to this disease. The gene is expressed differently in different parts of the brain (see Fig. 1.3). The Allen Brain Institute has completed a map of most of the genes expressed in different sections of the brain of a mouse and has published it free as the Allen Brain Atlas (www.alleninstitute.org). In addition to the gene ontology (GO) of the NCBI, a brain-gene ontology (BGO) of the Knowledge Engineering and Discovery Research Institute KEDRI (www.kedri.info) contains genes related to brain functions and brain diseases, along with computational simulation methods and systems (Fig. 1.4). The BGO allows users to "navigate" in the brain areas and find genes expressed in different parts of it, or for a particular gene - to find which proteins are expressed in which cells of the brain. Example is given in Fig. 1.5. Gene expression data of thousands of genes measured in tens of samples collected from two categories of patients - control (class 1) and cancer (class 2) using micro-array equipment in relation to the brain and the central nervous system have been published by (Ramaswamy et al. 2001) and (Pomeroy et al. 2002). The first question is how to select the most discriminating genes for the two classes that can possibly be used as drug targets. The second question is to build a classifier system that can correctly
10
I Computational Neurogenetic Modeling (CNGM): A BriefIntroduction
classify (predict) for a new sample which class it is likely to belong to that can be used as an early diagnostic test. The answer to the latter question is illustrated in Chaps. 4, 5 and 6, where classification and prediction computational models are presented. Average Difference Value
:
•
.
I
~
l
..
I I I I
I
!
Fig. 1.3. The gene MeCP2, related to Rett syndrome, is expressed differently in different parts of the human brain (left vertical axis), the highest expression level being in the Cerebellum (source: Gene Expression Atlas at http://expression.gnf.org/cgi-bin/index.cgi)
Fig. 1.4. A snapshot of a structure of the Brain-Gene Ontology (BGO) of the KEDRI Institute (httpz/www.kedri.info/)
1.3 Brain-Gene Data, Information and Knowledge
11
Average 01fference Value
c....b
r
AtoPndQIQ Thal~'I
... iiiiiiiiiiiiliiiiiil-----------~---ri ... ... ."
Cor~s_callo ~ ~l
... l..c""' _ _
~
Fig. 1.5. The expression of the GABRA2 gene causes the production of the a2 subunit of GABA A receptor in the synapses of neurons, and it is differently expressed in different parts of the brain (source: Gene Expression Atlas at http://expression.gnf.org/cgi-bin/index.cgi)
To answer the former question, many techniques for gene selection have been developed (Baldi and Brunak 2001) and made available as part of Bioinfonnatics tools. As an example, here we will take a publicly available gene expression data of 60 samples of CNS cancer (medulloblastoma) representing 39 children patients who survived the cancer after treatment, and 21 - who did not respond to the treatment (Pomeroy et al. 2002). Fig.l.6 illustrates the selection of the top 12 genes out of 7129, as numbered in the original publication (Pomeroy et al. 2002), based on 60 samples, using a signal to noise ratio method (SNR) in a software environment NeuCom (www.theneucom.com). The SNR method used in the analysis from Fig.l.6 ranks all variables according to their discriminative power between the two class samples. The following formula is used to calculate each gene G; its SNR coefficient S;: S;= abs (M;(l) -
M;(2)) /
(Std. (1) + StdP))
(1.1)
where M;(I) and M;(2) are the mean values of the gene G; in class 1 and class 2 samples, where Std?) and StdP) are the corresponding standard deviations. The CNS cancer data set as shown in Fig.l.6 is used further in the book to illustrate different methods that can be used in building CNGM. The selected smaller number of genes, out of thousands, can be further analyzed and modeled in terms of their interaction and relation to the functioning of neurons, neural networks, the brain and the CNS, which is a subject of CNGM.
12
1 Computational Neurogenetic Modeling (CNGM): A BriefIntroduction
-.._-
... _ D.
jOl'l_.... w
.' I
1~L-..
.=.
- .
0'
."
I...·"....."'.....
:.J
I~ro ""'"
~
_n_
"
I
m•
~-
r--- o.."...-.-
Dl!I
I
0'
.,
.-
0. ........ .. . . - ...
J
I
i '"
' w.•
D>
D"
01
~
01.
•
(i,llD)JA
ow
;~1:»1l5..~1~
VM;';';;-
'::<J J
Fig. 1.6. 12 genes selected as top discriminating genes from the Central Nervous System (CNS) cancer data that discriminates two classes - survivals and not responding to treatment (Pomeroy et al. 2002). The NeuCom software system is used for the analysis (www.theneucom.com) and the method is called "Signal-toNoise ratio". See Color Plate 1
1.4 CNGM: How to Integrate Neuronal and Gene Dynamics? A CNGM integrates genetic, proteomic and brain activity data and performs data analysis, modeling, prognosis and knowledge extraction that reveals relationship between brain functions and genetic information. Let us look at this process as a process of building mathematical function or a computational algorithm as follows. A future state of a molecule M' or a group of molecules (e.g. genes, proteins) depends on its current state M, and on an external signal Em: (1.2) A future state N' of a neuron, or an ensemble of neurons, will depend on its current state N and on the state of the molecules M (e.g. genes) and on external signals En: (1.3) And finally, a future cognitive state C' of the brain will depend on its current state C and also on the neuronal- N, and the molecular- M state and on the external stimuli E;
1.4 CNGM: How to Integrate Neuronal and Gene Dynamics?
13
(1.4)
C'=Fc (C,N,M, Ec )
The above set of equations (or algorithms) is a general one and in different cases it can be implemented differently, e.g.: • One gene - one neuron/brain function (see Chaps. 9 and 10). • Multiple genes - one neuronlbrain function, no interaction between genes (see Chaps. 9 and 10). • Multiple genes - multiple neuron/brain functions where genes interact in a gene regulatory network (GRN) and neurons also interact in neural network architecture (see Chap. 8 and also Fig. I. 7). • Thousands of genes - complex brain/cognitive function/s where genes interact within GRN and neurons interact in several hierarchical neural networks (discussed in Chap. 8). GRN
ANN
output
.. - - - - - - - - - ,
Fig. 1.7. A more complex case ofCNGM, where a GRN of many genes is used to represent the interaction of genes, and an ANN is employed to model a brain function. The model output is compared against real brain data for validation of the model and for verifying the derived gene interaction GRN after model optimization is applied (see Chap. 8)
The most common models of brain functions so far are based on the methods of artificial neural networks (ANN) (Rolls and Treves 1998), which is not surprising, as ANN are design using principles of the brain. Other methods, such as differential equations, statistical regression, evolutionary computation (EC), etc. are also used to model the brain. Generally speaking, different methods of mathematical and information sciences can be applied for the development of CNGM. In the next section we briefly discuss these methods, their principles, advantages, limitations and roles in building CNGMs. A short description of the methods is given in Appendix 2 and a more detailed description of the main methods - ANN and evolutionary computation (EC) is given in Chaps. 4, 5 and 6.
14
1 Computational Neurogenetic Modeling (CNGM): A Brieflntroduction
1.5 What Computational Methods to Use for CNGM? In order to implement Eq. 1.2 and Eq. 1.3 and to build a CNGM for classification or prediction and for knowledge discovery from brain-gene data, different mathematical and information science methods can be used. Here we will present a list of methods, most of them falling in the area of computational intelligence (CI) and knowledge engineering (KE), with a brief comment about their applicability to build CNGM. • Experimentally derived analytical functions and statistically derived regression functions (Pevzner 2000) are applicable for instance to derivation of GRN when sufficient experimental data and expert knowledge are available. • Probabilistic learning methods, e.g. Hidden Markov Models HMM (Hunter 1994, Pevzner 2000, Somogyi et al. 2001) are applicable when a priori information is known as prior probabilities and the distribution of the data is also known in advance. • Statistical learning methods, e.g. Support Vector Machines (SVM), Bayesian classifiers (Vapnik 1998, Baldi and Brunak 2001) are applicable when statistically significant amount of data is available and certain probability distribution may be required. • Case-based reasoning (e.g. k-I\TN; transductive reasoning) (Baldi and Brunak 2001) are applicable when a smaller number of variables is used and a relatively small number of samples, that have reliable variable values. The method is adaptable to new data and applicable for personalized modeling (Song and Kasabov 2006). • Decision trees (Hunter 1994) are applicable for classification tasks and are not adaptive to new data. They are good to extract structured domain information. • Rule-based systems (propositional logic dated back to Aristotle) and fuzzy systems (introduced by (Zadeh 1965)) are applicable if domain knowledge on the problem in hand is available even in an imprecise or incomplete form. Usually fuzzy logic methods are combined with ANN methods for the creation of fuzzy-ANN (Kasabov 1996a). • Artificial neural networks (ANN) (Grossberg 1982, Kohonen 1984, Bishop 1995, Kasabov 1996a) in their different models such as self organizing maps (SOM), multilayer perceptrons (MLP), radial basis function neural networks (RBF), are "model -free" and are applicable when some data is available, but there is no knowledge on what analytical function would be appropriate to use (Chap. 4). Some of the ANN models are adaptive to new data, such as the evolving connectionist systems (ECOS) (Kasabov 2001).These methods are described in Chap. 5.
1.6Aboutthe Book
15
• Evolutionary computation (EC) methods, such as genetic algorithms (GA) (Holland 1975, Goldberg 1989) are applicable when neither data nor much knowledge is available on the problem. They are based on the "generate and test" approach using fitness (goodness) criteria to evaluate how good a generated solution is. For applications of the EC methods, see (D'Haeseleer et al. 2000, Fogel and Come 2003). For a more detailed description of the principles of the EC and some of their applications, see Chap. 6. • Hybrid systems, e.g. knowledge-based neural networks; neuro-fuzzy systems; neuro-fuzzy-genetic systems; evolving connectionist systems (Kasabov 1996a, Kasabov and Song 2002) have the advantages of both ANN and rule based systems (see Chap. 5).
1.6 About the Book This book consists of 5 main parts as shown in Fig. 1.8: 1. Background knowledge on brain information processing (Chaps. 2, 3); 2. Background knowledge on gene information processing (Chap. 7); 3. Background knowledge on computational methods and methods of computational intelligence (CI), mainly ANN and EC (Chaps. 4, 5, 6); 4. Methodologies for building CNGM (Chaps. 8, 9); 5. Applications of CNGM for modeling brain functions and diseases and as novel generic techniques of Cl (Chaps. 9, 10). Brain information processing (Chapters 2, 3, Appendices I and [ \ , 3) Gene information processing (Chapter 7 and Appendix 3) Computational modeling techniques (Chapters 4, 5, 6, and Appendix 2)
f----+
/
Methodologies for building CNGM (Chapters 8 and 9)
r-+
Applications of CNGM for modeling brain functions and diseases (Chapters 9 and 10)
Fig. 1.8. The five partsof the book and theirconnections
16
1 Computational Neurogenetic Modeling (CNGM): A Brief Introduction
1.7 Summary This chapter presents the motivations and the rationale behind the CNGM, along with some introduction to the main concepts, expectations and organization of the book. To conclude, we raise several questions that will be addressed in the book. Hopefully the readers will be able to answer them on the completion of the reading: • Is it possible to create a truly adequate CNGM of the whole brain? Would gene-brain maps help in this respect (see http://alleninstitute.org)? • How can a dynamic CNGM be used to trace over time and predict the progression of brain diseases, such as epilepsy and Parkinson disease? • Is it possible to use CNGM to model gene mutation effects? • Is it possible to use CNGM to predict drug effects? • How can CNGM facilitate better understanding of brain functions, such as memory and learning? • Whether and which generic problems of artificial intelligence (AI), such as classification, prediction, feature selection, pattern discovery, adaptation and visualization, etc. can be efficiently solved with the use of a brain-gene inspired CNGM (Bentley 2004)?
2 Organization and Functions of the Brain
This chapter gives an overview of the brain organization and functions performed by different parts of the brain. We will try to answer the following questions: How is the human brain organized at the macroscopic and microscopic levels? Which functions are performed by the brain? How is the organization of the human brain related to its functions? These and many more other questions about the brain are still under investigation of thousands of neuroscientists all over the world. The first Nobel Prize for pioneering discoveries related to the brain microscopic organization was given to the Spanish scientist Santiago Ramon y Cajal (1852-1934) and Italian Camillo Golgi (1843-1926). These scientists are considered to be the founders of neuroscience and modem brain study. German physicist Herman Ludwig Ferdinand von Helmholtz (1821-1894) is the founder of psychophysics, that is a quantitative experimental and theoretical research of relations between mental and brain functions. Up to these days, the division of cerebral cortex based on its microstructure introduced by a neuroanatomist Korbinian Brodmann (1868-1918), is being used. Frenchman Paul Broca (1824-1880) and Russian Alexander Romanovich Luriya (1902-1977) pioneered research on brain localization of cognitive functions based upon cognitive deficits caused by brain lesions. Even nowadays, invaluable information on brain functions comes from the study of patients with mental and neurological deficits caused by injuries of particular brain areas. Currently however, these approaches are refined with noninvasive imaging techniques like functional magnetic resonance (fMRI), positron emission tomography (PET), electroencephalogram (EEG), and others, which provide a rich source of information about the dynamics and organization of the brain. We will begin this chapter with the description of methods of brain study. Next we will introduce the reader into the basics of how human brain works, how different parts specialize and how they are interconnected, how different parts form hierarchically organized neural networks, how they cooperate and lead to behavior. Although the focus will be on human brain, many features are shared with animals especially mammals and the closest evolutionary relatives (great apes) in particular. We will mention what is happening in the human brain during cognitive processes such as perception, learning, memory
18
2 Organization and Functions of the Brain
storage and recall, thinking and language processing. We will conclude with description of relationships between different neural levels of complexity in order to discover conditions that supposedly lead to consciousness that is the capability of reflecting the outer and inner world.
2.1 Methods of Brain Study At present, a number of techniques are available to investigate where in the brain particular cognitive and other kinds of functions are based. In general, these methods are divided as being invasive or noninvasive. In medicine the term invasive relates to a technique in which the body is entered by puncture, incision or other intrusion. Noninvasive means the opposite, the technique that does not intrude into the body. We will provide an overview of both kinds of methods of brain study starting with the invasive techniques. Information about functions of the brain is still being gathered from brain damaged subjects. Deficits in cognitive processing are observed in people who have suffered some kind of brain damage, due to an accident, stroke, tumor, etc. The damaged areas indicate their involvement in those mental processes or brain functions, which became disturbed. The main problem with this method is that observations are made after the event and therefore lack the proper experimental control before the accident. Very similar in nature to this first method are lesion studies. Comparison is made between cognitive performance before and after the deliberate removal or lesion of part of the brain. These types of studies are usually performed on animals. In humans they are performed only for therapeutic reasons like dissecting the corpus callosum connecting the two hemispheres to treat the life-threatening epilepsy or removal of a lifethreatening tumor. The problem with this approach is that lesions may damage other systems which happen to be next to or pass through the target part being damaged and thus the proper involvement of the target part in a given function may be misjudged. Another invasive method of the brain study is the direct stimulation. Researchers perform electrical, magnetic or chemical stimulation of some neural circuit or part of it, and observe the consequences. Electrical stimulation is delivered through microelectrodes inserted into the brain. This type of research is done routinely on animals. It can be done on human subjects during the brain surgery when the skull has to be opened anyway and surgeons have to map the functions of the operated area and its surrounded parts. Electrical stimulation ofthe brain (ESB) can be also used to
2.1 Methods of BrainStudy
19
treat chronic tremors associated with Parkinson disease, chronic pain of patients suffering from back problems and other chronic injuries and illnesses. ESB is administered by passing an electrical current through a microelectrode implanted in the brain. With chemical stimulation, a particular chemical compound is administered into a chosen part of the brain that is supposed either to stimulate or inhibit neurons within it. The least invasive methods of the stimulation methods is magnetic stimulation, called the Transcranial Magnetic Stimulation (TMS). TMS and rTMS (repetitive TMS) are simply the applications of the principle of electromagnetic induction to get electric currents across the insulating tissues of the scalp and skull without the tissue damage. The electric current induced in the surface structure of the brain, the cortex, activates nerve cells in much the same way as if the currents were applied directly to the cortical surface. However, the path of this current is complex to model because the brain is a non-uniform conductor with an irregular shape. With stereotactic, MRIbased control (see below), the precision of targeting TMS can be as good as a few millimeters. However, besides the invasiveness there are other problems with the methods of direct stimulation. Intensity of an artificial stimulation can be stronger or weaker than the level of spontaneous activity in the target circuit. Therefore artificial stimulation can engage more or respectively less of brain circuitry than is normally involved in the studied function. Thus, there are difficulties in determining which brain circuitries have been actually affected by the stimulation and thus which brain structures actually mediate the studied function. Often used methods of brain study in animal research are the single- and multi unit recordings. Microelectrode recordings from individual neurons or from an array of neighboring neurons indicate specific neural networks dedicated to processing of particular stimuli (e.g. bars of a certain orientation, movement in a particular direction, particular objects like faces, and so on). The problem with this method albeit very precise is that it is an invasive method, i.e. requires an invasion into the brain and into the brain cells. Moreover, without post-mortem histology, it is almost impossible to tell where exactly the recordings were actually made from. Classical anatomical methods, by means of which Cajal and other pioneers made their discoveries, are histology and staining. Anatomists still use to dissect dead brains, stain their cells with different dyes (Golgi stain, lucifer yellow, etc), and study them under the microscope. Thus they can reveal the microscopic structure of the brain in terms of cell types and neural connectivity between cells. The biggest disadvantage is that this study can be performed on dead specimen only.
20
2 Organization and Functions of the Brain
The oldest noninvasive method to measure electrical activity of the brain is the electroencephalography (EEG). An EEG is a recording of electrical signals from the brain made by attaching the surface electrodes to the subject's scalp. These electrodes record electric signals naturally produced by the brain, called brainwaves. EEGs allow researchers to follow electrical potentials across the surface of the brain and observe changes over split seconds of time. An EEG can show what state a person is in (e.g., asleep, awake, epileptic seizure, etc.) because the characteristic patterns of brainwaves differ for each of these states. One important use of EEGs has been to show how long it takes the brain to process various stimuli. A major drawback of EEGs, however, is that they cannot show us the structures and anatomy of the brain and tell us which specific regions of the brain do what. In recent years, EEG has undergone technological advances that have increased its ability to read brain activity from the entire head from up to 128 sites simultaneously. The greatest advantage of EEG is that it can record changes in the brain activity almost instantaneously. On the other hand, the spatial resolution is poor, and thus should be combined with CT or MRI (sec below). Related method to EEG, called magnetoencephalography (MEG) measures millisecond-long changes in magnetic fields created by the brain's electrical currents. MEG is a rare, complex and expensive neuroimaging technique. MEG machine uses a non-invasive, whole-head, 248-channel, super-conducting-quantum-interference-device (SQUID) to measure small magnetic signals reflecting changes in the electrical signals in the human brain. The incorporation of liquid helium creates the incredibly-cold conditions (4.2 degrees of Kelvin) necessary for the MEG's SQUIDS to be able to measure these brain magnetic fields that are billions of times weaker than the earth's magnetic force. Investigators use MEG to measure magnetic changes in the active, functioning brain in the speed of milliseconds. Besides its precision another advantage of MEG is that the biosignals it measures are not distorted by the body as in EEG. Used in conjunction with MRI or fMRI (see below), to relate the MEG sources to brain anatomical structures, researchers can localize brain activity and measure it in the same temporal dimension as the functioning brain itself. This allows investigators to measure, in real-time, the integration and activity of neuronal populations while either working on a task, or at rest. The brains of healthy subjects and those suffering from dysfunction or disease are imaged and analyzed . The oldest among the noninvasive methods to study brain anatomy is Comput er Tomography (CT). It is based on the classical X-ray principle. X-rays reflect the relative density of the tissue through which they pass. If a narrow X-ray beam is passed through the same point at many different
2.1 Methods of BrainStudy
21
angles, it is possible to construct a cross-sectional visual image of the brain. A 3D X-ray technique is called the CAT (Computerized Axial Tomography). CT is noninvasive and shows only the anatomical structure of the brain, not its function. Positron Emission Tomography (PET) is used for studying the living brain activity. This noninvasive method involves an on-site use of a machine called cyclotron to label specific drugs or analogues of natural body compounds (such as glucose or oxygen) with small amounts of radioactivity. The labeled compound (a radiotracer) is then injected into the bloodstream which carries it into the brain. Radiotracers break down, giving off sub-atomic particles (positrons). By surrounding the subject's head with a detector array, it is possible to build up images of the brain showing different levels of radioactivity, and therefore, cortical activity. Thus, depending on whether we used glucose (oxygen) or some drug, PET can provide images of ongoing cortical or biochemical activity, respectively. Among the problems with this method are expense including the on-site cyclotron and also technical parameters like the lack of temporal (40 seconds) and spatial (4 mm - 1 em) resolution. Usually the PET scan is combined either with CT or MRI to correlate the activity with brain anatomy. Single-Photon Emission Computed Tomography (SPECT) uses gamma radioactive rays. Similar to PET, this noninvasive procedure also uses radiotracers and a scanner to record different levels of radioactivity over the brain. SPECT imaging is performed by using a gamma camera to acquire multiple images (also called projections) from multiple angles. A computer can then be used to apply a tomographic reconstruction algorithm to the multiple projections, yielding a 3D dataset (like in CT). Special SPECT tracers have long decay time, thus no on-site cyclotron is needed, which makes this method much less expensive than PET. However, the temporal and spatial resolution of brain activity is even smaller than in PET. Magnetic Resonance Imaging (MRI) uses the properties of magnetism instead of injecting the radioactive tracers into the bloodstream to reveal the anatomical structure of the brain. A large (and loud) cylindrical magnet creates a magnetic field around the subject's head. Detectors measure local magnetic fields caused by alignment of atoms in the brain with the externally applied magnetic field. The degree of alignment depends upon the structural properties of the scanned tissue. MRI provides a precise anatomical image of both surface and deep brain structures, and thus can be combined with PET. MRI images provide greater detail than CT images. Problems: Expense, cannot be used in patients with metallic devices, patient must hold still for 40-90 min.
22
2 Organization and Functions of the Brain
Functional MRJ (fMRI) combines visualization of brain anatomy with the dynamic image of brain activity into one comprehensive scan. This noninvasive technique measures the ratio of oxygenated to deoxygenated hemoglobin which has different magnetic properties. Active brain areas have higher levels of oxygenated hemoglobin than less active areas. An fMRI can produce images of brain activity as fast as every 1-2 seconds, with very precise spatial resolution of about 1-2 mm. Thus, fMRI provides both an anatomical and functional view of the brain and is very precise. FMRI is a technique for determining which parts of the brain are activated by different types of brain activity, such as sight, speech, imagery, memory processes, etc. This brain mapping is achieved by setting up an advanced MRI scanner in a special way so that the increased blood flow to the activated areas of the brain shows up on fMRI scans. The subject in a typical experiment lies in the magnet and a particular form of stimulation is set up (auditory, visual, etc). For example, the subject may wear special glasses so that pictures can be shown during the experiment. Then, MRI images of the subject's brain are taken. Firstly, a high resolution single scan is taken. This is used later as a background for highlighting the brain areas which were activated by the stimulus. Next, a series of low resolution scans are taken over time, for example, 150 scans, one every few seconds. For some of these scans, the stimulus (sound, picture) will be presented, and for some of the scans, the stimulus will be absent. The low resolution brain images in the two cases can be compared, to see which parts of the brain were activated by the stimulus. After the experiment has finished, the set of images is analyzed. Firstly, the raw input images from the MRI scanner require mathematical transformation to reconstruct the images into space, so that the images look like brains. The rest of the analysis is done using a series of tools which correct for distortions in the images, remove the effect of the subject moving their head during the experiment, and compare the low resolution images taken when the stimulus was off with those taken when it was on. The final statistical image shows up bright in those parts of the brain which were activated by this experiment. These activated areas are then shown as colored blobs on top of the original high resolution MRI scan, for interpretation of the experiment. This combined activation image can be rendered in 3D, and the rendering can be calculated from any angle. By means of fMRI a very comprehensive picture of brain in action can be derived. Comparisons between healthy and ill brains can be made and correlated with structural changes if these are present. There is a serious interpretation problem with all the methods that measure cortical activity (PET, fMRI, EEG, MEG). How does an experimenter decide which cortical activity is specifically related to the psychological process in question? It is done by the so-called subtraction method.
2.2 Overall Organization of the Brain and Motor Control
23
The experimenter calculates the difference image between that of the process and that of a control situation. The difference images from individual subjects are averaged to produce a group mean difference image. It can be quite problematic to determine which situation should represent the control background. It is also problematic to align individual or average functional images with anatomical structures, when comparing the coordinates of activation to a standard atlas which by no means has been proven to reflect the borders of particular areas for all people. In fact, the converse is true; most Brodmann areas differ between individuals. Nevertheless, comparing images taken during some cognitive processing to those taken before or after it, scientists are gaining many new insights about the brain structure and function.
2.2 Overall Organization of the Brain and Motor Control It is estimated that there are io'' - 1012 of neurons in the human brain (Kandel et al. 2000). Three quarters of neurons form a 4-6 mm thick cerebral cortex that constitutes a heavily folded brain surface. Cerebral cortex is thought to be a seat of cognitive functions, like perception, imagery, memory, learning, thinking, etc. The cortex cooperates with evolutionary older subcortical nuclei that are located in the middle of the brain, in and around the so-called brain stem (Fig. 2.1). Subcortical structures and nuclei are comprised for instance of basal ganglia, thalamus, hypothalamus, limbic system and dozens of other groups of neurons with more or less specific functions in operations of the whole brain. For example, the input from all sensory organs comes to the cortex preprocessed in thalamus. Emotions and memory functions depend upon an intact limbic system. When one of its crucial parts, hippocampus, is damaged, humans (and animals) loose their ability to store new events and form new memories. When a particular cortical area has been damaged, a particular cognitive deficit follows. However, all the brain parts, either cortical or subcortical, are directly or indirectly heavily interconnected, thus forming a huge recurrent neural network (in the terminology of artificial neural networks). Thus, we cannot speak of totally isolated neuroanatomic modules. Fig. 2.1 shows a schematic functional division of the human cerebral cortex. One third of the cortex is devoted to processing of visual information in the primary visual cortex and higher-order visual areas in the parietal cortex and in the inferotemporal cortex. Association cortices take about one half of the whole cortical surface. In the parietal-temporal-occipital as-
24
2 Organization and Functions of the Brain
sociation cortex, sensory and language information are being associated. Memory and emotional information are associated in the limbic association cortex (internal and bottom portion of hemispheres). The prefrontal association cortex takes care of all associations, evaluation, planning ahead and attention. Language processing takes place within the temporal cortex, parietal-temporal-occipital association cortex, and frontal cortex.
limbic association cortaX
Fig. 2.1. Gross anatomical and functional division of the human cerebral cortex. The same division applies for the right hemisphere. Dashed curves mark the position of evolutionary older subcortical nuclei in and around the brainstem of the brain. Each of the depicted areas has far more subdivisions
At the border between the frontal and parietal lobes, there is a somatic sensory cortex, which processes touch and other somatosensory signals (temperature, pain, etc.) from the body surface and interior. In the front of it, there is a primary motor cortex, which issues signals for voluntary muscle movements including speech. These signals are preceded by the preparation and anticipation of movements that takes place in the premotor cortex. The plan of actions and their consequences, inclusion and exclusion of motor actions into and from the overall goal of an organism, are performed within the prefrontal association cortex. Subcortical basal ganglia participate in preparation and tuning of motor outputs, in the sense of initiation and the extent of movements. Cerebellum executes routine automatic movements like walking, biking, driving, etc. We want to point out that
2.3 Learning and Memory
25
there are far more anatomical and functional subdivisions within each of the mentioned areas. Functions, or better, dominances of the right and left hemispheres in different cognitive functions are different (Kandel et al. 2000). It was shown by Roger Sperry and Michael Gazzaniga in the studies of the so-called "split-brain" patients to whom connections between the two hemispheres were cut because of the therapeutic reasons. The dominant hemisphere (usually the left one) is specialized for language, logical reasoning, awareness of cognitive processes and awareness of the results of cognitive processes. Although the non-dominant hemisphere (usually the right one) is able to carry out cognitive tasks, it is not aware of them or their results. It is specialized for emotional and holistic processing, intra- and extrapersonal representation of space. Its intactness is crucial for the awareness of the body integrity (Damasio 1994). Lesion of the parietal cortex including the somatosensory cortex leads to the so-called anosognosia. The limbs and the body are intact but the cortical and mental representations become missing. Patients who have undergone a stroke to the right parietal lobe, neglect the left half of their body, in spite they can see it. It is not a consequence of the left hemiparalysis. Mirror damage to the left parietal lobe does not lead to anosognosia. It seems that the right hemisphere is dominant in mental representations of intra- and extrapersonal space. In other words, subjective experience of the body self depends upon specific brain mechanisms, namely an integrity of primary and higher-order somatosensory cortical areas in the right hemisphere (Damasio 1994). Although, right part of the body is represented in the right somatosensory cortex and the left half of the body in the right hemisphere, the latter seems to have a special role in the integral self-awareness.
2.3 Learning and Memory Capability of learning and memory formation is one of the most important cognitive functions. Our identity largely depends upon what we have learned and what we can remember. We can divide the study of learning and memory into two levels: 1. The system level (where?) that attempts to answer the question which brain parts and pathways the memory trace is stored in - the top-down approach, which will be the topic of this section, and 2. Molecular level (how?), which is devoted to investigation of the ways of coding and storage of information at the cellular and molecular level the bottom-up approach, which will be introduced in the next chapter.
26
2 Organization and Functions of the Brain
It is interesting that learning and memory is independent from other cognitive functions and as such can be studied independently. It was
shown for the first time at a famous patient from 1953, which underwent a bilateral removal of the middle portion of his temporal lobe including hippocampus to treat severe epilepsy. He has lost the ability to store new memories (the so-called anterograde amnesia). He could not keep a memory of new people, objects, facts, or places no longer than few minutes. Moreover, he lost all memories from about two years before the operation (the so-called retrograde amnesia), the period which scientists consider to be the consolidation period during which memories enter the long-term memory. However, all these heavy cognitive deficits did not affect other mental functions like language processing and thinking, including IQ and implicit memory (see below).
Facts
-j-
Conditioning
Events
Priming
!
Neocortex
hippocampus and neocortex
Habituation and sensitisation
Skills
j Reflex pathways
Emotional
Classical
j
j
Amygdala
Cerebellum
Basal ganglia
Fig. 2.2. Different kinds of long-term memory fall under two general categories: explicit and implicit
It has been long recognized that there is a short-term memory and a long-term memory. Short-term memory lasts for a few minutes and is also called the working memory. It occurs in the prefrontal cortex, although
2.3 Learning and Memory
27
other parts of the cortex relevant to the memory content are activated too (Roberts et al. 1998). The learning process and the process of long-term memory formation can be divided into these four stages: 1. Encoding. Attention focus and entering of new information into the working memory. Finding associations with already stored memories. 2. Consolidation. The process of stabilization of new information, transformation into a long-term memory by means of rehearsal. 3. Storage. Long-term storing of information in memory. 4. Recall. Retrieval of information into the working memory. Based on clinical, imaging and animal studies we can divide long-term memory into two main categories that have different subtypes with different mechanisms and different localizations in the brain (Fig. 2.1). Explicit (declarative) memory is a memory of facts (semantic memory) and a memory of events (episodic memory). Recall from explicit memory requires conscious effort and stored items can be expressed in language. Hippocampus is a crucial but only a transitory stage in the explicit memory. How is the explicit memory formed? Information comes to brain through the sensory organs (visual, auditory, olfactory, tactile), and proceeds through subcortical sensory nuclei and sensory cortical areas into multimodal association areas, like for instance the parieto-temporooccipital association cortex, limbic association cortex and the prefrontal association cortex. From there the information is relayed through parahippocampal cortex, perirhinal cortex and entorhinal cortex into the hippocampus. From hippocampus the information is relayed to subiculum from where it returns back to entorhinal cortex and all the way back to association cortical areas. Thus the brain circuit for the long-term explicit memory storage forms a re-entrant closed loop. According to experimental data, the "synaptic re-entry reinforcement" or SRR hypothesis and the corresponding computational model have been formulated and simulated (Wittenberg et al. 2002, Wittenberg and Tsien 2002). According to this hypothesis, after initial learning, reactivation of hippocampal memory traces repeatedly drives cortical learning. Thus, a memory trace (engram) is stored after many repetitions. Repeated reinforcement of synapses during the reactivation of memory traces could lead to a situation in which memory traces compete, such that the strengthening of one memory is always at the expense of others, which are either weakened or lost entirely. In other words, a single memory stored in a neural network is either lost (owing to synaptic decay) or strengthened and maintained by repeated rounds of synaptic potentiation each time the memory is reactivated. Once cortical connections are fully consolidated and stabilized, the hippocampus itself becomes dispensable. Differences in the frequency with which memory traces are
28
2 Organization and Functions of the Brain
either consciously or subconsciously recalled could be another factor affecting the selection of which memories are consolidated. An increasing amount of evidence suggests a role of sleep in memory consolidation by means of learning-induced correlations in the spontaneous activity of neurons and replaying the patterns of wake neural activities during sleep (Maquet 2001, Stickgold et al. 2001). Although others point out that people lacking REM sleep do not show memory deficits and that a major role of sleep in memory consolidation is unproven (Siegel 2001). An interesting question is how the degradation of out-dated hippocampal memory traces occurs after memory consolidation is finished. The most recent hypothesis is that memory clearance may actually involve newborn neurons. Neurogenesis in the dentate gyrus of the hippocampus persists throughout life in many vertebrates, including humans. The progenitors of these new neurons reside in the subgranular layer of the dentate gyrus (Seri et al. 2001). Deletion of the Presenilin-l gene in excitatory neurons of the adult mouse forebrain led to a pronounced deficiency in enrichment-induced neurogenesis in the dentate gyrus (Feng et al. 2001). This reduction in neurogenesis did not result in appreciable learning deficits, indicating that addition of new neurons is not required for memory formation. However, the postlearning enrichment experiments led to a postulate that adult dentate neurogenesis may playa role in the periodic clearance of outdated hippocampal memory traces. The clearance can happen because these adult-born neurons are short-lived, with a life span of several weeks in rodents. The implicit or nondeclarative memory serves to store the perceptual and motor skills and conditioned reactions. Recall of stored implicit information occurs without a conscious effort, automatically and the information is not expressed verbally. Basal ganglia and cerebellum are important for acquisition of motor habits and skills that are characterized by precise patterns of movements and fast automatic reactions. Cerebellum is the key structure for conditioning. Conditioned emotional reactions require amygdala in the limbic system. Nonassociative learning like habituation and sensitization occur in primary sensory and reflex pathways. Priming is an increase in the speed or accuracy of a decision that occurs as a consequence of a prior exposure to some of the information in the decision context, without any intention or task related motivation, and occurs in neocortex. Although implicit and explicit learning concern different memory contents, they share cellular and molecular mechanisms (Bailey et al. 2004). These mechanisms will be one of the topics of the next chapter. Later we also introduce the genetics of learning and memory and the neurogenetic computational model.
2.4 Language and OtherCognitive Functions
29
2.4 Language and Other Cognitive Functions Language is distinctive from other forms of communications by its form, content, use and creativity (Mayeux and Kandel 1991): • Form. Language is formed from a limited set of "nonsense" elementary sounds (i.e. phonemes) that are arranged into various predictable sequences that signal content. • Content. Language provides means of communicating contents whose meanings are independent of the immediate situation. • Use. Through language we organize our sensory experience and express our thoughts, feelings, and expectations. • Creativity. With every new thought we speak we create original sentences. We readily interpret original sentences spoken by others.
2.4.1 Innate or Learned? Although the acquisition of language undoubtedly involves learning, developmental and neurobiological studies indicate a large innate part to it. First, both natural and sign language functions are dominantly localized to the left hemisphere, as revealed by lesion studies and brain imaging. Second, there are universal regularities in the acquisition of language across all human cultures (see Table 2.1). Children learn the words and rules of language effortlessly by simply listening to the speech around them. There exist about 6000 live languages and all of them have principal characteristics in common. Currently, there exist strong evidence from comparative linguistics and genetic anthropology that language has appeared just once, in Africa, and all other languages descend from it (Cavalli-Sforza 2001). Noam Chomsky assumes that humans have some innate program (generative grammar) that prepares them to learn language in general. In more modem terms, these innate neural mechanisms determine possible characteristics of acquired language, and the process of its acquisition. In addition, there is a critical period during postnatal development for learning language. Investigation of feral children has showed that it is impossible to acquire a fully developed language (especially syntax-vice and grammar-vice) when this critical period is over, which is at about 8 years. In summary, psychologists and linguists now believe that the mechanisms for the universal aspects of language acquisition are determined by the structures in the human brain. Thus, the human brain is innately prepared to learn and use language. The particular language spoken and the
30
2 Organizationand Functions of the Brain
dialect and accent are determined by the social environment. The questions now being debated are which language characteristics derive from neural structures specifically related to language acquisition and which from cognitive characteristics that are more general. Table 2.1. Stages of development in the acquisition oflanguage in humans Average Age
Language ability
6 months 1 year 1 VI years
Beginning of distinct babbling. Beginning of language understanding, one-word utterances. Words used singly, child uses 30-50 words (simple nouns, adjectives, action words) one at a time but cannot link them to make phrases, does not use functors (the, and, can, be) necessary for syntax. Two-word (telegraphic) speaker, 50 to several hundred words in the vocabulary, much use of two-word phrases that are ordered according to syntactic rules, child understands propositional rules. Three or more words in many combinations, functors begin to appear, many grammatical errors and idiosyncratic expressions, good understanding of language. Full sentences, few errors, vocabulary of around 1000 words. Close to adult speech competence.
2 years
2 VI years
3 years 4 years
2.4.2 Neural Basis of Language The basic model of language processing during the simple task of repeating the word that has been heard is the Wernicke-Geschwind model (Mayeux and Kandel 1991) (Fig. 2.3). According to this model, the language task involves transfer of information from the inner ear through the auditory nucleus in thalamus to the primary auditory cortex (Brodmann's area 41), then to the higher-order auditory cortex (area 42), before it is relayed to the angular gyrus (area 39). Angular gyrus is a specific region of the parietal-temporal-occipital association cortex, which is thought to be concerned with the association of incoming auditory, visual and tactile information. From here, the information is projected to Wernicke's area (area 22) and then, by means ofthe arcuate fasciculus, to Broca's area (44, 45), where the perception of language is translated into the grammatical structure of a phrase and where the memory for word articulation is stored. This information about the sound pattern of the phrase is then relayed to the facial area of the motor cortex that controls articulation so that the word can be spoken. It turned out that a similar pathway is involved in naming an object that has been visually recognized. This time, the input
2.4 Language and Other Cognitive Functions
31
proceeds form retina and LGN (lateral geniculate nucleus) to the primary visual cortex, then to area 18, before it arrives to the angular gyrus, from where it is relayed by a particular component of arcuate fasciculus directly to Broca's area, bypassing Wernicke's area. Lesions in different parts of the cerebral cortex cause selective language disturbances , called aphasias, instead of an overall reduction in language ability. Normal language is dependent not only on cortical but also on subcortical structures and connections. Lesions that do not affect the cerebral cortex, typically vascular lesions in the basal ganglia and/or thalamus, can also result in aphasia. Basal ganglia take part in motor output and thalamus in perception. Furthermore , the damage to the brain language areas often affects also other cognitive and intellectual skills to some degree.
Occipital cortex
Eye
Temporal cortex
Fig. 2.3. Lateral view of the exterior surface of the left hemisphere with the main language processing areas. Broca's area (Brodmanu's areas 44/45) is adjacent to the regions of premotor (6) and motor (4) cortices that control the movements of facial expressions, articulation and phonation. Wernicke's area (area 22) lies in the posterior superior temporal lobe near the primary and higher-order auditory cortices in the superior temporal lobe (areas 41/42). Wernicke's and Broca's areas are joined by a fiber tract called the arcuate fasciculus
Wernicke's aphasia is characterized by a prominent deficit in language comprehension. The lesion primarily affects area 22 (Wernicke's area), and often extends to the superior portions of the temporal lobe (areas 39/40), and inferiorly to area 37. Comprehension of both auditory and visuallanguage inputs is severely impaired, accompanied with severe reading
32
2 Organization and Functions of the Brain
and writing disabilities. Speech is fluent, grammatical, but lacks meaning. Phenomena like empty speech, neologisms, and logorrhea occur. Patients are generally unaware of these speech failures, probably because of the lack of their own language comprehension. Occasionally, a right visual field defect is encountered. Conduction aphasia is the result of damage to the arcuate fasciculus. Symptoms resemble those of the Wernicke's aphasia. Many patients with conduction aphasia have some degree of impairment of voluntary movement.
j
Speech
Writing
Visual processing Visual cortex (areas 17, 18, 19)
Auditory processing Auditory cortex (areas 41/42)
j Language comprehension Temporo-parietal areas (areas 39, 22, 37, 40) Semantic association Left anterior inferior frontal cortex Motor control Left prem otor cortex (areas 44, 45, 6)
Speech
1
Writing
Fig. 2.4. Recent model of the neural processing of language build upon the Wernicke-Geschwind original model. Simplified scheme shows the relationships between various anatomical structures and functional components of language. Connections are actuallyreciprocal
Broca's aphasia is characterized by a prominent deficit in language production. Lesions affect areas 44 and 45 (Broca's area), and in severe cases also other prefrontal regions (8, 9, 10, 46) and premotor regions (area 6). The most severe case is the complete muteness. Usually, speech contains
2.4 Language and OtherCognitive Functions
33
only key words, nouns are expressed in singular, verbs in the infinitive. Articles, adjectives, adverbs and grammar are missing altogether. Unlike Wernicke's aphasia, patients with Broca's aphasia are generally aware of these errors. Reading and writing are also impaired, because they include also motor components. Some defects in comprehension related to syntax may be encountered. Right hemiparesis and loss of vision is almost always present in this type of aphasia. Lesions to the prefrontal cortical regions other than Broca's area or to the parietal-temporal cortical regions other than Wernicke's area can result in various language deficits in production or comprehension, respectively. When certain portions of higher-order visual areas are damaged, specific disorders of reading and/or writing follow (dyslexias, alexias and agraphias). Homologous language areas in the right hemisphere process affective components of language like musical intonation (prosody) and emotional gesturing. Disturbances in affective components of language associated with damage to the right hemisphere are called aprosodias. The organization of prosody in the right hemisphere seems to mirror the anatomical organization of the cognitive aspects of language in the left hemisphere. Thus, patients with posterior lesions do not comprehend the affective content of other people's language. On the other hand, lesion to the anterior portion of the right hemisphere leads to a flat tone of voice whether one is happy or sad. To sum up, recent cognitive and imaging studies have revealed that language processing involves a larger number of areas and a more complex set of interconnections than just a serial interconnection of Wernicke's area to Broca's area. Thus, a more realistic scheme illustrating the neural processing of language is shown in Fig. 2.4. 2.4.3 Evolution of Language, Thinking and the Language Gene
In most individuals the left hemisphere is dominant for language and the cortical speech area of the temporal lobe (the planum temporale) is larger in the left than in the right hemisphere. Since important gyri and sulci often leave impression upon the skull, it is possible to examine human fossils in search for such impressions. Marjorie LeMay searched the fossil skulls for the morphological asymmetries associated with speech and has found them besides in the modem Homo sapiens also in the Neanderthal man (Homo sapiens neanderthalensis, dating back 30,000 to 50,000 years) and in Peking man (Homo erectus pekinensis, dating back 300,000 to 500,000 years). The left hemisphere is also dominant for the recognition of species-
34
2 Organization and Functions of the Brain
species cries in Japanese macaque monkeys, and asymmetries similar to those of humans are present in brains of modem-day great apes. G. Rizzolatti et al. (DiPellegrino et aI. 1992, Rizzolatti et aI. 1996) have found out that neurons in the ventral premotor cortex of macaque monkeys are active not only when the monkey executes motor actions, but also when she watches others, either monkeys or humans, to perform the same actions. Thus, these mirror neurons follow or imitate what others are doing. They may form a neural basis for learning by imitation, which is very important for language acquisition. In humans, the ventral premotor area includes Broca's area (areas 44/45), which is a specific cortical area associated with expressive and syntactical aspects of language. Thus maybe, evolution of the ventral premotor area with its mirror neurons played an important role in evolution of neural basis for language (Rizzolatti and Arbib 1998). It is also intriguing to see areas responsible for contemplation of (motor) actions and areas processing language being at the same place in the brain . Thus thinking can be hypothesized to be a contemplation of actions in the real or abstract spaces. According to Rizzolatti and Arbib (Rizzolatti and Arbib 1998) and Corballis (Corballis 2003) speech and language have evolved from the communication gestures and not from the vocal communication of primates. Vocal production of primates is controlled in the emotional cortical and subcortical centers and serves mainly to communicate emotional state (anger, fear, content, etc.). On the other hand, communication modality of gestures is visual and motor and involves the premotor cortex. Our ancestors probably started to use the gestures and mirror neurons to communicate nonemotional contents. Brachial and manual gestures were probably accompanied with the oro-facial movements and differentiating sounds. Communication gestures started to be associated with these accompanying sounds on a regular basis, which has led to the liberation of these sounds in language. Although the anatomical structures that are prerequisites for language may have arisen early, many linguists believe that language per se emerged rather late in the prehistoric period of human evolution (about 100,000 years ago). There exist strong evidence from comparative linguistics and genetic anthropology that language arose only once, in Africa, and all other languages descend from a single original language (CavalliSforza 2001). Modem Homo sapiens, which evolved in Africa, began to leave Africa some 100,000-80,000 years ago, while taking this first language with them. This theory fits well with the estimated time of fixation of the so-called language gene FOX?2 in modem humans (Enard et aI. 2002). FOX?2 (fork-head box P2) is located on human chromosome 7q31. It codes a protein containing 715 amino-acid bases, which belongs to the class of the so-called fork-head transcription factors that control transcrip-
2.4 Language andOther Cognitive Functions
35
tion of DNA. So far it is not known which transcription is controlled by FOXP2. This gene is present in all mammals however humans gained two mutations compared to chimpanzees, gorillas and macaques, which have identical proteins FOXP2. It seems that two functional copies of the human gene FOXP2 must be present to acquire a full language. People with the point mutation of one of the gene copies have serious problems with articulation, grammar, expression and comprehension of language. One of the consequences of this mutation is the disorder of sequencing of subtle oro-facial movements. This motor disorder is accompanied with serious mental problems with sequencing the syllables into words and words into grammatically correct sentences. Over the years much evidence has accumulated to support the idea that aspects of our genetic makeup are critical for acquisition of language (Marcus and Fisher 2003). To gain insights into the evolution of language and also into its neural basis, studies of communication systems and language abilities of great apes like chimpanzees and bonobo have been very helpful (SavageRumbaugh and Lewin 1994). In short summary, bonobos' capacity for processing grammar even after many years of proper training remains limited. With respect to the language production, they are at the level of a two-year child (see Table 2.1). With respect to the language understanding, they can be at the level of 2.5-3-year child. Thus, so far it seems that fully developed language is an exclusive form of human communication. An important topic in the study of language is the relation of language and its evolution to other cognitive functions and their evolution, respectively (Gardenfors 2000, Marcus 2004b). Language is to communicate about something that is not here and not now. Thus, a more general cognitive abilities such as being able to create detached representations and being able to make anticipatory planning (planning about future needs, goals, events, etc.) can be necessary (but not sufficient) cognitive prerequisites for language. Grammar is an enhanced formal mean for organization of language. Except the evolution of general cognitive abilities, the evolution of language may go hand in hand with the development of advanced forms of cooperation. Without the aid of symbolic communication about detached contents, we would not be able to share visions about the future. We need language in order to convince each other that a future goal is worth striving for (Gardenfors 2000).
36
2 Organization and Functions of the Brain
2.5 Neural Representation of Information The first principle of representation of information in the brain is redundancy. Redundancy means that every information (meant in any sense) is stored, transmitted and processed by a redundant number of neurons and synapses so that it does not become lost when neural networks undergo damage, for instance due to aging. When neural networks get damaged, their performance does not drop down to zero abruptly, like in a computer, but instead it degrades gracefully. Computer models of neural networks also confirm the idea that a degradation of performance with the loss of neurons and synapses is not linear but instead neural networks can withstand quite substantial damage, and still perform well. Next, the contemporary view on the nature of neural representation is such that information (in the sense of content or meaning) is represented by place in the cortex (or in general in the brain). However, this placing is a result of anatomical framework and shaping by input, i.e. by experience-dependent plasticity. For instance, a sound pattern for the word "apple" is represented in the auditory areas of the temporal cortex. It is represented as a spatial pattern of active versus inactive neurons. This neural representation is associated (connected) through synaptic weights with the neural representation of a visual image of apple in the parietal cortex, with the neural representation of an apple odor in the olfactory cortex, with memories on the grandma garden and facts about apples, being represented in some other areas of the cortex, etc. Neural representations (that is distributions or patterns of active neurons) within particular areas and their associations between areas appear as a result of learning (i.e. synaptic plasticity). Different objects are represented by means of different patterns or distributions of active neurons within cortical areas. Therefore we speak about the so-called distributed representations. Current hypothesis states that recall from memory is an active process. Instead of passive processing of all electrical signals that arrive from hierarchically lower processing levels, cortical neural networks should be able to use fragments of activity patterns to fill in the gaps, and thus quickly recreate the whole neural representation. The filling-in process can be nicely modeled by means of model neural networks (Fig. 2. 5). Neural representations (patterns of activities) are stored in the matrix of synaptic weights through which neurons in the network are interconnected. The weight distribution storing a particular object representation is created due to an experience-dependent synaptic plasticity (learning). When a sufficiently large portion of this neural representation is activated from outside the network,
2.6 Perception
37
few electric signals along all the synapses in the network quickly switch on the correct remaining neurons in the representation. Neural representations in the sense of patterns of activity have a holistic character. Patterns of activity are being recalled (restored) as a whole. Thus, we can see a nice relation between the character of neural representations and gestalts. Gestalt psychology was developed at the beginning of the 20th century by Max Wertheimer, Kurt Koffka and Wolfgang Kohle in Germany. Gestalt psychology considers holistic mental gestalts (shapes, forms) to be the basic mental elements. For the gestalt to be stored and recalled, certain rules must be fulfilled, like the rules of proximity, good continuation, symmetry, etc. These rules have been experimentally verified.
.. • ....••
lIiiii••
• :=iii=:::
-·-·:-1
11111111· .Iiiiil•
•;:iiii:: .
iiiiiiii :. Fig. 2.5. Illustration of spontaneous re-creation of neural representation after few input impulses(figure in the uppermost left corner). Black pixel represents a firing neuron while blank pixel represents a silent neuron. Between each pattern of activity from left to right (I ms time frame), neurons in the network exchange only one impulse. Thus, basically after exchanging only two-three spikes, the memory pattern is re-created. Network remains in the restored memory pattern until a different external input arrives
To conclude, neural representations of objects are stored in the matrix of synaptic weights as a whole. We are not able to trace down a sequence of steps leading to the holistic percept. Synaptic weights implicitly bind together parts of the pattern.
2.6 Perception Perception is accompanied by sensory awareness, and therefore we will describe the underlying neural processes in relation to the next section on consciousness. We will concentrate on visual perception and visual awareness since similar principles apply to all sensations. Neurons in different areas of the visual cortex respond to various elementary features, like ori-
38
2 Organization and Functions of the Brain
ented edges of light intensity (bars), binocular disparity, movement, color, etc. (Kandel et al. 2000). Visual areas in the occipital, parietal and inferior temporal cortex, though reciprocally connected, are hierarchically organized. Results of processing at lower hierarchical levels are relayed to higher-order areas. Neurons in higher-order areas respond to various combinations of elementary features from lower-order areas. In primates, based on matching psychophysical and physiological data, three main visual systems, relatively independent but mutually heavily interconnected, have been identified: the "magno", "parvo" and the color system (Livingstone and Hubel 1988). The "magno" system is responsible for perception of movement, depth and space, and separation of objects. Several cues leading to the depth perception have been identified: stereopsy, depth from perspective, depth from mutual movement and occlusion, etc. The "parvo" system is responsible for shape recognition. For separation and recognition of objects, we use separation based on movement, separation from background, filling in of borders, shape from shading, etc. The color system is responsible for color perception. With respect to cortical neurons belonging to these three systems, they possess different combinations and ranges of these four physiological properties: sensitivity to color (small/ large), sensitivity to the light contrast (small/large), temporal resolution (small/large), spatial resolution (small/large). These are the so-called elementary features of visual objects. Elementary features belonging to one visual object activate different and spatially separated groups of neurons within the cerebral cortex. Scientists at the Max Planck Institute in Germany under the leadership of Wolf Singer analyzed trains of spikes of neurons within the visual cortex. They have proposed an intriguing hypothesis about the neural correlate of perception and sensory awareness (Gray et al. 1989, Singer 1994, Roelfsema et al. 1997). Binding of spatially separated neurons coding for features belonging to one visual object could be performed by transient synchronization of firing of these neurons (Fig. 2.6). Similar synchronous oscillations of neurons were detected also in auditory, somatosensory, parietal, motor, and prefrontal cortices in the case of auditory, tactile and other perceptions, respectively (Traub et al. 1996). Oscillations of neurons with frequencies around and above 40 Hz (long known as gamma oscillations) have been detected in the cerebral cortex of humans, primates and other investigated mammals, in particular as a result of sensory stimulation. This synchronization occurs over relatively long distances (mm to em), between different cortical areas, between cortex and thalamus, between the two hemispheres. Synchronization means that neurons discharge with the same frequency and the same phase (Fig. 2.6). This results in a distributed pattern of simul-
2.6 Perception
39
taneously firing neurons. Neural correlates of different objects can differ in (a) which neurons are members of the pattern, (b) which is the particular frequency of their synchronization, and (c) which is the phase of their synchronization. Thus, transient synchronous gamma oscillations have been suggested as a possible candidate for the mechanism of binding many elementary features belonging to one object to one transient whole corresponding to a percept. Establishment of transient synchrony is based upon the underlying synaptic connectivity. In the laboratory ofW. Singer, an interesting experiment was performed to demonstrate that when these synchronizations are disturbed, perception is also disturbed (Konig et al. 1996). In normal rearing, kittens develop normal sharp vision. Neurons in their primary visual cortex are sharply orientationally selective. Distribution of ocular dominance favors binocular neurons; however there are still also monocular neurons present. Firing of neurons responding to the left and right eye is synchronized when they are visually stimulated with the same object. After birth, the eyesight of kittens was disturbed in such a way that surgically they were made strabismic to one eye. They developed a syndrome typical for a strong uncorrected strabismus. Objects are fixated only with the good eye, whereas the strabismic eye is connected with a perceptual deficit called strabismic amblyopia (something like a blurred vision). Analysis of spike trains in response to visual stimuli in the primary cortex of these cats have revealed several facts: (i) neurons that respond only to the stimulation of the strabismic eye have normal physiological characteristics in terms of orientation selectivity, (ii) however synchronization of discharges of these neurons is very weak compared to synchronization of discharges of neurons responding to the stimulation of a good eye. (iii) There is no synchronization between firing of neurons responding to the good and strabismic eye. Thus, these animals have a perceptual deficit connected to the strabismic eye and this perceptual deficit is accompanied by the desynchronization in the visual cortex. Absence of synchronization means troubles in simultaneous binding of features belonging to one object which results in a blurred vision. Another experimental phenomenon strongly suggesting a one-to-one correspondence between transient synchronizations and perception is binocular rivalry. During binocular rivalry, each eye is constantly stimulated with a different pattern. Visual percept is neither an average of these two patterns nor their sum. Instead, a random alternation between the two percepts occurs as if they were competing with each other, hence the term binocular rivalry. Fries et al. (Fries et al. 1997) discovered that neurons which respond to one or the other pattern are synchronized only during the corresponding percept. Thus, although the pattern is constantly stimulating
40
2 Organization and Functions of the Brain
an eye, cortical neurons get synchronized only when the pattern is perceived.
a
• • c
e
b
' . ·/;::····I ·~~>· · ·
. ', _. \', : • •
'- '
d
..
:
I
.:.
-::
....
. ( , .. • •
\:
. ,
.....
-
Fig. 2.6. (a) Scheme illustrating spatially separated specialized neurons in the visual cortex. Each of them responds to a different elementary feature of one object; (b) Record of firing of these neurons before, during and after the presentation of that object. During the presence of object that activates a given set of neurons, their spikes are synchronized. These synchronizations repeat several times with some period, thus we speak about synchronous oscillations. Period of object presentation is denoted by a thick bar under the spikes. Activities before and after the object presentation are desynchronized spontaneous discharges of neurons; (c) The same as in (a), however this time a different object is presented, which activates a different set of neurons (features); (d) The same as in (b), the spikes of a different set of neurons are synchronized thus binding together features belonging to their object; (e) Illustration of different frequencies and phases of oscillations. Full and dashed curves denote oscillations of the same frequencies but occurring with different phases. The dotted curve denotes an oscillation with the frequency three times frequency of the first two oscillations
An important study of Rodriguez et a1. (Rodriguez et a1. 1999) has demonstrated that perception of faces in humans is accompanied by a transient (~180 ms) synchronization of gamma activity in hierarchically highest visual areas in the parietal cortex and premotor areas in the frontal cortex. They used sophisticated computational procedures for analysis of EEG
2.7 Consciousness
41
signals recorded during the task of face recognition. Humans were presented with the so-called Mooney faces either in the canonic upright position or in an upside down position. These black-and-white pictures are made out of the very high contrast photographs of human faces. It is very difficult to recognize a human face when they are presented upside down. Viewing these images, either in a normal or turned position is always accompanied with the increase of gamma activity in the visual areas. However, precise transient synchronization of gamma oscillations occurs only when the subject perceives a face. It is intriguing that this synchronization occurs only in the left hemisphere which is the so-called conscious hemisphere. When the subject did not perceive a face, but instead only a nonsense arrangement of black-and-white patches, no synchronization happened in the cortex. The second transient episode of synchronization occurred in the premotor and motor areas of both hemispheres during the motor response of subjects by which they indicated whether they saw a face or a no-face. Thus, transient synchronizations may accompany also other cognitive processes not only perception. W. Miltner et al. (Miltner et al. 1999) indeed detected synchronization of gamma oscillations during an associative learning. Humans were supposed to learn to associate a visual stimulus with the tactile stimulus . A selective synchronization occurred between the visual cortex and that part of somatosensory cortex which represented the stimulated hand, during and after the learning. When people forgot the learned association, synchronization between these two stimuli, or rather between neural responses to these two stimuli, disappeared.
2.7 Consciousness When we speak of consciousness we usually mean reflective or secondary consciousness, "the inner eye of our mind". Present neuroscience has a good reason to assume that neural mechanisms of reflective consciousness are derived from the mechanisms of sensory awareness that is related to perception (Singer 1999b). Thus, in building the picture of neural correlates of reflective consciousness we will proceed through assumed neural correlates of sensory awareness that is sometimes referred to as primary or phenomenal consciousness.
2.7.1 Neural Correlates of Sensory Awareness Currently , transient (l00-200 ms) synchronous gamma oscillations are being studied as a promising candidate for the mechanism of binding many
42
2 Organization and Functions of the Brain
elementary features belonging to one object to one transient whole corresponding to a percept of that object (Engel et al. 1999, Singer 1999a). Such synchronized activity summates more effectively than nonsynchronized activity in the target cells at subsequent processing stages, and the activity can spread to a longer distances. If so, synchronization could increase the effect that a selected population of neurons has on other populations with great temporal specificity (in the range of milliseconds). There is also evidence that synchrony is important for inducing changes in synaptic efficacies and hence facilitate transfer of information into memory. Different objects in one scene may be associated with different phase-locked synchronous oscillations within the gamma frequency band. Thus, increased coherence between brain areas confined to a narrow band around 40 Hz may denote a holistic perception of a complex stimulus. Based on experimental findings, crucial neural conditions for a conscious percept to be experienced is (Koch and Crick 1994, Singer 1994, Crick and Koch 1995, Koch 1996, Rodriguez et al. 1999): • Over the chain of primary and higher-order sensory areas with the areas that have direct connections to the frontal cortex being at the end of this chain (e.g. the posterior parietal cortex), and over the evolutionary youngest cortical areas, i.e. the frontal and prefrontal cortex, certain suprathreshold quantity (number) of neurons must be coherently active for a certain time of 100-200 milliseconds (see Fig. 2.7).
Fig. 2.7. Corticocortical connections between the posterior parietal cortex and the main subdivisions of the frontal cortex. Illustrated areas showed increased coherence within the 40 Hz band in the Rodriguez's et al. experiment on recognition of Mooney faces. When a human face was recognized, transient coherence occurred in the time window of 180-360 ms after the beginning of the picturepresentation
2.7 Consciousness
43
Let us go through this condition in greater detail. Why higher-order sensory areas? Because they code for invariant object features, and thus come closer to the invariant object identification (Engel et al. 1999). Primary sensory areas also take part in the chain and their activity must take part in a reciprocal reverberant interaction with higher-order areas (Silvanto et al. 2005). With respect to the quantitative condition, that a certain suprathreshold number of neurons must be synchronized within relevant cortical areas: Electrophysiological measurements on blind-sighted monkeys and fMRI on blind-sighted humans have shown that besides the superior colliculus, also the hierarchically higher visual cortical areas remain responsive to visual stimuli when VI is inactivated or damaged (Sahraie et al. 1997). Thus, blindsight seems to be mediated by both, intact relays within the extra-geniculostriate pathway which leads to superior colliculus, and also by the sparse and spared relays within the retino-geniculatecortical pathways themselves. However, neither subcortical structures nor an insufficient number of active cortical neurons can lead to a conscious percept. There is also an intuition from theory: in order for a large synchronization to occur in some physical system, a certain threshold number of elements must start the process otherwise synchronization does not spread over distance. Generating sensory awareness involves the process of attention. Several areas in the prefrontal cortex are crucially involved in attention, namely areas 8Av (major connections with the visual system), 8Ad (major connections with the auditory system) and 8B (major connections with the limbic system) (Roberts et al. 1998). Attentional selection may depend on appropriate binding (coherence) of neuronal discharges in sensory areas in two simultaneously active directions: an attentional mechanism in prefrontal cortex could induce synchronous oscillations in selected neuronal populations (top-down interaction) , and strongly synchronized cell assemblies could engage attentional areas into coherence (bottom-up interaction) (Singer 1994). Another prefrontal areas activated during sensory perception include areas 9, 10,45,46,47 (see Fig. 2.8). These prefrontal areas are known to be involved in an extended action planning. In addition, these prefrontal areas plus the posterior parietal cortex are known to be involved in the working memory. Posterior parietal cortex is also known to be involved in mental imagery. For planning of actions it is necessary to keep track of at least one sequence of partial actions, hence the overlap between planning and memory mechanisms . It might be that sensory contents reach awareness only if they are bound to prefrontal areas via the posterior parietal cortex and thus have a possibility to become part of the working memory and action planning (Engel et al. 1999). In tum, action planning may influence
44
2 Organization and Functions of the Brain
organization of attentional mechanisms and thus what is being perceived. Actually, the underlying action planning can occur at a subconscious level (Libet 1985,1999). Coherences in the involved areas are generated internally within the cortex and although they are phase-locked, they are not stimulus locked. They are superimposed upon global thalamocortical gamma oscillations which are generated and maintained during cognitive tasks (Ribary et al. 1991). Thalamocortical oscillations may provide the basic oscillatory modulation of cortical oscillations. Other cortical mechanisms are then responsible for a precise phase-locking of internal cortical synchronous oscillations. In particular, these are lateral inhibitory and excitatory interactions, regularly bursting layer V pyramidal cells, and spike-timing dependent rapid synaptic plasticity. In the latest, synapses and thus the inputs which do not drive the postsynaptic cell in synchrony are temporary weakened . Lateral
.~.!3." " " "''' '''' ' '''' ' 7'....~...''
..····8Ad
6
'..
.........·; ;: 6d / ..·8·;..:;;;·
..······)
..... ( ..··· 46
9/46v
88
.......................... , ......i:':
! 44\6
.................. 45A 458
Medi al
cc
~.~
.
9
,
.
/ ..
--'( ;;_ :;=!=3;=:'~ ~O
.
47;:j'2·..·..·
45A 47/12
C·..· ~·3. ·...
~
·11........ 10
....···14
Orb ito-frontal
..../
Fig. 2.8. Human prefrontal cortex. Lateral view (from outside), medial view (from inside) and the orbito-frontal view (from below) at the left hemisphere. The same divisions hold also for the right hemisphere. Numbers denote the corresponding Brodmann's areas. CC means corpus callosum
2.7.2 Neural Correlates of Reflective Consciousness
Since early childhood, we are engaged in learning, first through nonverbal and later through verbal communication, to assume what is going on inside of other people. Our reflective consciousness and our self-reflection develop gradually, step by step. Due to learning to assume what is going on
2.7 Consciousness
45
inside of other people, we can learn to assume what is going on inside of ourselves. Self-reflection is possible only thanks to communication and social interaction. However, it seems that our brains already possess certain structures that have been prepared and selected for this task - these are mirror neurons (Rizzolatti et al. 1996) and mentalization module (Frith 2001). In 1996, scientific community got stirred by the discovery of mirror neurons. G. Rizzolatti et al. (1996) recorded activity of neurons in the ventral premotor cortex of macaque monkeys. They have found out that these neurons are active not only when the monkey executes its own actions, but also when she watches others, either monkeys or humans, to perform the same action. Thus these neurons fire in the same way when others perform a given action as when a monkey performs the same action. The observation/execution matching system represents a given action irrespectively who performs it. Mirror neurons may be a part of the mind reading system. People have the ability to explain and predict behavior of others in terms of their presumed thoughts and feelings. The ability to attribute mental states to others and to ourselves is called "mentalization" or "theory of mind". Noninvasive brain imaging has shown that the ability to attribute various mental states, desires and beliefs to others and also to ourselves depends upon full functioning of a specific neurocognitive module (Frith 2001). The mentalization module includes in both hemispheres: (i) the medial prefrontal cortex (area 32), in particular the most anterior part of paracingulate cortex, a region on the border between anterior cingulate and medial prefrontal cortex (very medial), (ii) the temporal-parietal junction at the top of the superior temporal gyrus (stronger on the right), and (iii) the temporal poles adjacent to the amygdala (somewhat stronger on the left). Neural activity in all three or at least in the prefrontal part of this mentalization module as revealed by the brain imaging is significantly lower in autistic people (Frith 2001). Autistic people are not able to "read out" neither the mind of others nor the mind of themselves, while there is a whole spectrum of severity of autistic disorder. Mentalization module overlaps substantially with the brain higher-order emotional system. Medial parts of the prefrontal cortex (area 32) and orbitofrontal parts (i.e. areas 10, 11, 12, 13, 14) are evolutionary younger parts of the brain emotional system (Damasio 1994). These medial and orbital prefrontal areas are thought to be responsible for the so called secondary emotions. Secondary emotions are emotional feelings based on learned variety of associations between primary emotions and life situations. Hierarchy of these associations involves planning and strategies related to one's social role and personal goals in relation to the past and future. Evaluation and planning and feelings in the social and emotional spheres are therefore linked to be processed by the same structures in the prefrontal cortex, and
46
2 Organization and Functions of the Brain
these overlap with the mentalization module. According to Damasio (1994), the feeling of self, and consequently the awareness of self, would depend also on the intactness of the somatosensory system, on the signaling from the cortex down to the body and back. As we have said, it might be that sensory contents reach awareness only if they are temporarily synchronized with activity in the prefrontal areas, thus displaying a highly coherent joint activity. In such a way they can become part of conscious working memory and action planning. According to Singer (l999a) reflective consciousness would be based upon the same processes, i.e. highly coherent activity, happening over the prefrontal areas involved in planning and working memory and between areas devoted to representations of our inner world. Secondary consciousness or metaawareness would result from iteration of the very same processes that support primary consciousness, except that they are not applied to the signals arriving from the sensory organs, i.e. from the outer world, but to the outputs of previous cognitive operations (Singer 1999b). By means of recording electromagnetic activity of the brain it is possible to capture and visualize the fast semi-global coherent activity of the brain that accompanies conscious perception of a stimulus (Tononi and Edelman 1998). It is almost a magical view, because this semi global coherent activity changes with time as a burning fire which is boosted from the centre of the brain and its flames transiently engage currently synchronized brain areas (see Fig. 2.9). Why is the fire boosted from the centre of the brain? Clinical research has revealed that damaging intralaminar nuclei in the thalamus leads to the loss of consciousness and to coma. Neurons in intralaminar nuclei possess dense reciprocal connections to and from the brain cortex. Intralaminar nuclei are the source of arousal, without which the cortex cannot function. G. Edelman and G. Tononi (Edelman and Tononi 2000) call this ever changing semi global coherent activity, the dynamic core. The dynamic core corresponds to a large (semi global) continuous cluster of neuronal groups that are coherently active on a time scale of hundreds of milliseconds. Its participating neuronal groups are much more strongly interactive among themselves than with the rest of the brain. The dynamic core must also have an extremely high complexity as opposed to for instance convulsions. Each roughly 150 ms, a pattern of semi-global activity must be selected within less than a second out of a very large, almost infinite, repertoire of options. Thus, the dynamic core changes in composition over time. As suggested by imaging, exact composition of the core varies significantly not only over time within one individual, but also vary significantly across individuals.
2.7 Consciousness
47
According to Edelman and Tononi (2000), the dynamic core consists of a large number of distributed groups of neurons which enter the core temporarily based on their mutual coherence. Connecting groups of neurons into temporarily synchronized whole requires dense recurrent connections between brain areas, along which a reiterated reentry of signals occurs. Neural reference space for any conscious state may be viewed as an abstract N-dimensional space, where each axis (dimension) stands for some participating group of neurons that code for (represent) a given aspect of the conscious experience. There can be hundreds of thousands of dimensions. The distance from the beginning of the axis represents the salience of that aspect. It may, for instance, correspond to the number of firing neurons within a given group. We would like to point out the interesting similarity between this abstract N-dimensional neural space and the conceptual spaces introduced by Gardenfors (2000). Heanng
Imagery
Fig. 2.9. (a) Illustration of the dynamic core, a changing coherent semi global activity of the brain, which is supposed to be a neural correlate of consciousness. One configuration of the core lasts for about 150 ms; (b) Interpretation of the dynamic core as an N-dimensional neuronal reference space, where each axis (dimension) denotes some group of neurons which encodes (represents) a given aspect of the conscious experience. Each axis can be broken down into more elementary axes. There can be hundreds of thousands of dimensions
What would be, in this theory, a neural basis for subconsciousness? The same group of neurons may at times be part of the dynamic core and underlie conscious experience, while at other times it may not be part of it and thus be involved in subconscious processing. Koch and Crick (1994) have proposed that those active neurons which are not at the moment tak-
48
2 Organization and Functions of the Brain
ing part in the semi global activity keep processing their inputs, and results of this processing may still affect behavior. We would like to mention also the explanation of neural correlate of qualia or the hard problem of consciousness, according to Edelman and Tononi (2000). Qualia are specific qualities of subjective experiences, like redness, blueness, warmth, pain, and so on. According to the dynamic core hypothesis, pure redness would be represented by one particular state of the dynamic core that is by one and only one point in the N-dimensional neural space. This core state would certainly include large participation of neurons that code for the red color and a small participation of neurons that code for other colors and for anything else. Coordinates of a point in the Ndimensional reference neural space are determined by activities of all neuronal groups that are at the moment part of the core. And these activities vary in time and across individuals. Thus, the subjective experience of redness will be different in different people and can be different for the same individual for instance in the morning and in the evening. Another mystery is why consciousness fades as we fall asleep even when we nowadays know that the brain and especially the cortex remain highly active. Sleep research has revealed that during sleep, humans normally go through two-three cycles of two sleep phases. One of these two phases is the so called REM sleep, according to the accompanying Rapid Eye Movements. EEG activity of the brain during the REM phase is very similar to the EEG activity of the awake brain during cognitive activity. Hence the term paradoxical sleep for the REM sleep phase, as it was not sleep at all. We dream mostly during REM sleep phases. When awakened during the REM phase, we can recall the content of a dream. We experience self-awareness when we dream but not when we are in the deep sleep (Llinas and Ribary 1994). Thus "1" is preserved during dreaming as well as the awake-like EEG activity of the brain. When awakened around at the end of the REM phase, we can remember that we dreamt, not knowing about what. When awakened during the non-REM sleep phase, we mostly deny any experience of dreaming. The non-REM sleep phase is also called the deep sleep, and the brain activity occurs in typical slow large regular waves. Recently, experiment with the spread of activity within neocortex during sleep have revealed that different cortical areas stop communicating over distance with each other during the non-REM sleep - a stage of sleep for which people mostly report no or very little conscious experience on waking (Massimini et al. 2005). Thus, it seems that the coherent semi global activity is disrupted during the non-REM sleep, and so is the conSCIOUS awareness.
2.8 Summary and Discussion
49
2.8 Summary and Discussion Neurobiologists have made a great progress in trying to find the so-called neural correlates of consciousness. However, not every scientist finds this research compelling . In his influential book Shadows of the Mind, physicist Roger Penrose brings the problem of explaining consciousness to the domain of elementary laws of physics (Penrose 1994). According to him, at present any scientific (including physical) theory does not help us to come to terms with the puzzle of mentality including consciousness within such a physically determined universe. He has been calling for a radical upheaval in the very basis of physical theory. He sees the consciousness link in a new physical theory based upon the union of Einstein's general relativity with quantum theory. His critics question the competence of physics ever having anything of importance to say about mental phenomena in general, and consciousness in particular (http://psyche.cs.monash.edu.au/psyche-index-v2 .html#som). The grounds for this criticism are different, ranging from computational to neurobiological arguments (Koch and Hepp 2006). Nevertheless , some neurobiologists find quantum states of microtubules, tiny elongated organelles spanning the interior of neurons, to be the gate to consciousness (Hameroff and Penrose 1996). There is indeed a puzzling temporal aspect to the consciousness. Famous experiments of Benjamin Libet on human subjects showed that it takes a while, about 0.5s, for a conscious awareness to develop (Libet 1985). Acts initiated by a subject's free will are preceded by a specific electrical change in the brain, the so-called readiness potential (RP), which begins 550 ms before the act itself. Human subjects became aware of their own intention to act 350-400 ms after RP started, but 200 ms before the act execution (which is about the time of the process of initiation and generation of motor movements) . These data pose many questions including philosophical about the nature of free will, whether it exists at all (Libet 1999). While the author remains optimistic about this issue, the question still remains why it takes so long for a conscious awareness to develop. Why it develops at all? Which processes are responsible for it and how they lead to consciousness? Even concerning the mystery of mentality, there is not a general agreement among scientists. Some think that there is no mystery at all, and consciousness and other mental phenomena are equivalent to a particular underlying process, be it the specific computations (Dennett 1991), brainspecific processes (Searle 2002), or complexity (Edelman and Tononi 2000). In another influential book written on consciousness, philosopher
50
2 Organization and Functions of the Brain
and mathematician, David J. Chalmers clearly argues that consciousness and mentality are indeed genuinely puzzling and not explainable by present theories (Chalmers 1996). Chalmers asks: why should quantum processes (or any other specific physical processes) in microtubules (or any other brain substructures) or any specific computational processes give rise to consciousness? If one takes consciousness seriously, Chalmers says, one has to go beyond a strict materialist framework. The fundamental laws linking the physical and the experiential are yet to be discovered. The exercise for the reader can be to think about the nature of such fundamental laws. Although a lot is known about the brain, issues about its functioning, representation and processing of information are still subjects of an intense research. The nature of brain dynamics is still unknown. Some researchers find evidence of chaos, whereas some are doubtful (Theiler 1995). Main proponents of a chaotic dynamics, W.J. Freeman (Freeman 2003) and I. Tsuda (Tsuda 2001), argue in favor of chaotic itinerancy based on EEG and other neurophysiological data. According to the picture of chaotic itinerancy, a complex system such as the (human) brain evolves by steps along a trajectory in the state space. Each step corresponds to a shift from one basin of attraction to another. Attractors represent classes for abstraction and generalization. Thus, the brain states evolve aperiodically through sequences of attractors. In a closed system the next attractor would be chosen solely by internal dynamics. In an open system, such as the brain, external inputs interfere with internal dynamics. Moreover, due to the changes induced by learning, trajectories continually change. Chaotic itinerancy occurs in sequence of cortical states marked by state transitions that appear in temporal discontinuities in neural activity patterns (Freeman 2003). Experimental EEG data show that the entire cerebral cortex is constantly wandering in the fractal distributions of phase transitions that give the 1/j form of the temporal and spatial frequency spectra (with a E (1,3), (Freeman 2003)). From this type of frequency spectra it appears that the brain maintains a state of self-organized criticality (Bak et al. 1987). The self-organized criticality state can form the basis of the brain capacity to rapidly adjust to new external and internal stimuli. State changes resembling phase transitions occur continually everywhere in cortex at scales ranging from millimeters to ~O.l m. Local neural activity can trigger a massive state change. However, several issues of caution should be pointed out. In spite the compelling evidence for self-organized criticality in the brain, the nature of the critical state is still unknown in neurobiological interpretation. The spatial and temporal power spectral densities often show the l/j form, however more often this form is broken down due
2.8 Summary and Discussion
51
to the distortions by clinically defined peaks. Therefore the measurements of a vary widely. Aperiodic oscillations giving the llj power spectral densities are commonly referred to as chaotic. However the brain activity is not at all consistent with low-dimensional deterministic chaos (Theiler 1995, Benuskova, Kanich et al. 2001). It is high dimensional, noisy, nonGaussian, and nonstationary (Freeman 2003). Tremendous physical complexity of the brain arises also from the fact that it is not a homogenous tissue. Each part of the brain is morphologically different and has its own genetic profile as can be seen by analysis of large-scale human and mouse transcriptomes. Therefore the conditions for assessment of the type of dynamics are difficult to be met. Moreover brains are open systems driven by stochastic input. Thus it seems that the brain activity hardly can conform to the mathematical definitions of chaos. Whether the term chaotic itinerancy (or any other term from the chaotic vocabulary) is appropriate to describe state transitions in brain and cortex in particular remains open to challenge. Thus, the complex spatio-temporal activity data from the brain still awaits explanation.
3 Neuro-Information Processing in the Brain
While Chapter 2 presents the higher-level brain organization, this chapter presents a view on the low level information processing in the brain. Neuro-infonnation processing in the brain depends not only on the organization of the brain and properties of brain neural networks, but also on the properties of processing units - neurons and signal processing networks within neurons. These internal networks involved in signal processing are comprised of second and third messengers, enzymes, transcription factors and genes.
3.1 Generation and Transmission of Signals by Neurons A neuron (Fig. 3.1) receives and sends out electric and chemical signals. The place of signal transmission is a synapse. In the synapse, the signal can be nonlinearly strengthened or weakened . The efficacy of synaptic transmission is also called a synaptic weight or synaptic strength. One neuron receives and sends out signals through 103 to 105 synapses. Dendrites and soma, i.e. the body of a neuron, constitute the input surface for receiving signals from other neurons.
axon
basal dendrites dendritic tree
synapse
Fig. 3.1. Schematic illustration of a neuron and its parts. There is a synapse at every dendritic spine. Synapses are also formed on the dendritic shafts and on the soma
54
3 Neuro-Infonnation Processing in the Brain
Dendritic tree consists of thousands of dendrites which are covered by tiny extensions called spines. Most of synapses are formed on dendrites, particularly on spines. Spines are very important devices in relation to learning and memory, as we will see later. Electrical signals transmitted by synapses can have a positive or negative electric sign. In the former case, we speak about excitatory synapses and in the latter case about inhibitory synapses. When the sum of positive and negative contributions (signals) weighted by synaptic weights gets bigger than a particular value, called the excitatory threshold, a neuron fires, that is, emits an output signal called a spike. A spike is also called an action potential or a nerve impulse. Usually, as a result of synaptic stimulation and summation of positive and negative signals, a neuron fires a whole series (train) of spikes (Fig. 3.2). Mean frequencies of these spike trains range from 1 to102 Hz. The output frequency is proportional to the overall sum of positive and negative synaptic contributions. Spikes are produced at the initial segment of an axon, the only neuronal output extension. Then they propagate very quickly along the axon towards other neurons within a network. Propagation speed of nerve impulses is 5-100m/s. At its distant end, an axon makes thousands of branches, each of which is ended by a synaptic terminal (bouton). Spike train
11111 - -
Fig. 3.2. Generation and propagation of spikes in neurons. EPSP = excitatory postsynaptic potential, IPSP = inhibitory postsynaptic potential, LPSP = total postsynaptic potential equal to EPSP - IPSP, & = excitatory threshold
Transmission of signals from one neuron to another occurs in synapses. A synapse consists of a presynaptic terminal (bouton), synaptic cleft and postsynaptic membrane (Fig. 3.3). In the presynaptic terminal, there are dozens of vesicles filled with molecules of neurotransmitter (NT) ready to be released. When a presynaptic spike arrives into a terminal, calcium ions rush in and cause the fusion of vesicles with the presynaptic membrane. This process is also called exocytosis. Molecules of NT are released into the synaptic cleft (Fig. 3.3b), and diffuse towards the receptors within a postsynaptic membrane. Molecules of NT form a transient bond with the
3.1 Generation and Transmission of Signals by Neurons
55
molecules of receptors. This causes opening of ion channels associated with postsynaptic receptors. In the excitatory synapse, receptors are associated with sodium (Na+) ion channels, and a positive excitatory postsynaptic potential (EPSP) is generated. In the inhibitory synapse, receptors are associated with chlorine «(1) ion channels, and a negative inhibitory postsynaptic potential is generated. Alternatively, there can be ion channels for potassium (K+), which flows out and thereby lowers the interior potential. Eventually, NT releases its bond with receptors and diffuses back to the presynaptic membrane and out of the synaptic cleft. Special molecular transporters within a presynaptic membrane take molecules of NT back inside the terminal, where they are recycled into new vesicles. This process is called a reuptake of NT. The whole synaptic transmission lasts for about I millisecond. Such a synapse is called a chemical synapse, because the transmission of an electric signal is performed in a chemical way.
10 " m
pos tsynaptic membrane
Fig. 3.3. Scheme of synaptic transmission. (a) Synapse is ready to transmit a signal. NT = neurotransmitter, R = postsynaptic receptor. (b) Transmission of an electric signal in a chemical synapse upon arrival of an action potential into the terminal (plus signs around the terminal). AMPAR = AMPA-receptor-gated ion channel for sodium, NMDAR = NMDA-receptor-gated ion channel for sodium and calcium
The postsynaptic potential (PSP), either excitatory or inhibitory, has some amplitude and duration. The amplitude and duration of PSP depend upon the number of activated receptor-ion channels and upon the time for how long they stay open. This may last for milliseconds, tens of milliseconds or hundreds of milliseconds. The duration of channel opening depends upon the number of released NT molecules and upon the type of receptors that are associated with ion channels. The amplitude of PSP also
56
3 Neuro-Information Processing in the Brain
depends upon the electric input resistance for ions, which in tum depends upon the size and shape of a postsynaptic spine and dendrites, and upon the distance of synapse from soma. For instance, a short and stubby dendritic spine has a much smaller electric resistance than a long and thin spine. All these pre- and postsynaptic factors determine the weight (strength, efficacy) of a particular synapse. Within a postsynaptic membrane, there are also kinds of receptors that are not associated with an ion channel, but instead with an enzyme. When the overall amount of released NT reaches some critical concentration, these receptor-enzyme complexes activate particular cytoplasmatic enzymes, the so-called second messengers. Second messengers trigger chains of various biochemical reactions which may lead to a change in synaptic weight, or even to transient changes in gene expression leading to alteration in biomolecular synthesis of receptors, neurotransmitters and enzymes. Thus, second messengers may act locally within a synapse itself, or they may activate further (third and so on) messengers that carry the message to the genome of a neuron, thus causing a change in its biochemical machinery related to signal processing. Therefore, it is now widely accepted that the activity of a neuron itself, influences its processing of information, and even its life itself, whether it survives or not.
3.2 Learning Takes Place in Synapses: Toward the Smartness Gene For major discoveries in the field of synaptic mechanisms of learning, the 2000 Nobel Prize for medicine went to the neuroscientists Eric R. Kandel and Paul Greengard. The 3rd laureate, Arvid Carlsson, got his share of the prize for discoveries of actions of neurotransmitter dopamine. At present, it is widely accepted that learning is accompanied by changes of synaptic weights in cortical neural networks (Kandel et al. 2000). Changes of synaptic weights are also called synaptic plasticity. In 1949, the Canadian psychologist Donald Hebb formulated a universal rule for these changes: "When an axon of cell A excites cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells so that A's efficiency as one of the cells firing B is increased", which has been verified in many experiments and its mechanisms elucidated (Hebb 1949). In cerebral cortex and in hippocampus of humans and animals, learning takes place in excitatory synapses formed upon dendritic spines that use glutamate as their neurotransmitter. In the regime of learning, glutamate
3.2 Learning Takes Place in Synapses: Toward the Smartness Gene
57
acts on specific postsynaptic receptors, the so-called NMDA receptors (Nmethyl-D-aspartate). NMDA receptors are associated with ion channels for sodium and calcium (see Fig. 3.3). The influx of these ions into spines is proportional to the frequency of incoming presynaptic spikes. Calcium acts as a second messenger thus triggering a cascade of biochemical reactions which lead either to the long-term potentiation of synaptic weights (LTP) or to the long-term depression (weakening) of synaptic weights (LTD). In experimental animals, it has been recorded that these changes in synaptic weights can last for hours, days, even weeks and months, up to a year. Induction of such long-term synaptic changes involves transient changes in gene expression (Mayford and Kandel 1999, Abraham et al. 2002). A subcellular switch between LTD and LTP is the concentration of calcium within spines (Shouval, Bear et al. 2002). We speak about an LTD/LTP threshold. In tum, the intra-spine calcium concentration depends upon the intensity of synaptic stimulation that is upon the frequency of presynaptic spikes. That is, more presynaptic spikes mean more glutamate within synaptic cleft. Release of glutamate must coincide with a sufficient depolarization of the postsynaptic membrane to remove the magnesium block ofthe NMDA receptor. The greater the depolarization, the more ions of calcium enters the spine. Postsynaptic depolarization is primarily achieved via AMPA (amino-methylisoxasole-propionic acid) receptors, however, recently a significant role of backpropagating postsynaptic spikes has been pointed out (Markram et al. 1997). Calcium concentrations below or above the LTD/LTP threshold, switch on different enzymatic pathways that lead either to LTD or LTP, respectively. However, the current value of the LTD/LTP threshold (i.e. the properties of these two enzymatic pathways) can be influenced by levels of other neurotransmitters, an average previous activity of a neuron, and possibly other biochemical factors as well. This phenomenon is called metaplasticity, a plasticity of synaptic plasticity (Abraham and Bear 1996). Dependence of the LTD/LTP threshold upon different postsynaptic factors is the subject of the Bienenstock, Cooper and Munro (BCM) theory of synaptic plasticity (Bienenstock et al. 1982) (for a nice overview see for instance (Jedlicka 2002)). The BCM theory of synaptic plasticity has been successfully applied in computer simulations to explain experience-dependent changes in the normal and ultrastructurally altered brain cortex of experimental animals (Benuskova et al. 1994). The easiness with which LTD and LTP can be evoked in the developing and in the adult brain are not the same. One of the factors responsible for this difference may be the genetically programmed difference in the NMDA receptor composition (Bliss 1999). The NMDA receptor is made up of an NR1 subunit, which is obligatory for channel function, and a se-
58
3 Neuro-Information Processing in the Brain
lection of developmentally and regionally regulated NR2 subunits (A to D). For example, the glutamate-evoked positive current has a longer duration in receptors containing NR2B subunits than in those containing NR2A subunits. The proportion of NR2B subunits is higher in young animals than in adults, and this probably accounts for the greater degree of synaptic plasticity seen in young animals that is accompanied by greater easiness of youngsters in formation of new memories (Bliss 1999). This conclusion was indeed confirmed in another study, in which scientists inserted an extra copy of the NR2B gene into mice (Ping et al. 1999). Mice with the NR2B gene insertion performed better than mice without the insertion on all of the tests used to evaluate their intelligence and memory. Adding a single gene to mice significantly boosted the animals' ability to solve maze tasks, learn from objects and sounds in their environment and to retain that knowledge. This strain of mice also retained into adulthood certain brain features of juvenile mice, which, like young humans, are widely believed to be better than adults at grasping large amounts of new information. The research proves that the NR2B gene is a key switch that controls the brain's ability to associate one event with another, the core feature of learning. The finding also shows that genetic improvement of intelligence and memory in mammals is now feasible, thus offering a striking example of how genetic technology may affect mankind and society in future. This research can ultimately lead to human gene therapy for use in areas such as dementia, mental retardation, etc, although more research is needed on comparison of regulation and functions of human and mouse NMDA receptors.
3.3 The Role of Spines in Learning Dendrites of cortical excitatory pyramidal neurons are abundant in tiny membrane extensions called spines. They are named so because they resemble in shape the spines on the rose stem. About 80% of all synaptic connections in the cerebral cortex are excitatory and vast majority of them is formed on the heads of synaptic spines. For many years the role of spines was a mystery. Nowadays it is accepted that they play several important roles in synaptic plasticity and learning. First, it was discovered that spines change their size, shapes and numbers during the induction and maintenance of LTP (Lee et al. 1980, Geinisman et al. 1991). There are growth changes on spines, like spine head swelling, spine neck thickening, and increase in appearance of spines with mushroom-shaped heads. Morphological properties of spines and changes
3.3 The Role of Spinesin Learning
59
in their shape were first supposed to playa role in affecting the efficacy of synaptic transmission by means of changes in the input resistance (Koch and Poggio 1983). Long, thin spines create a big input electrical resistance, while short, stubby spines create a smaller input resistance. Later, a role in sequestering and amplifying the calcium concentrations was suggested to be the main role of spines (Zador et al. 1990). Through this role a mechanisms for saturation and stopping the infinite growth of synaptic weights was proposed, as well as the role in the LTP/LTD threshold (Gold and Bear 1994). While all these effects can take place, another important role for spines was suggested in the transport of new receptors into the spine head (Benuskova 2000). This model is based on our older hypothesis that the changes in efficacy of excitatory dendritic spine synapses can result from the fusion of transport vesicles carrying new membrane material with the postsynaptic membrane of spines (Fedor et al. 1982). Spacek and Harris indeed found structural evidence for exocytotic activity within spines in hippocampal CAl pyramidal neurons (Spacek and Harris 1997). Smooth vesicles of the diameter around 50 nm occurred in the cytoplasm of spine heads, adjacent to the spine plasma membrane, and fusing with the plasma membrane. In addition, (Lledo et al. 1998) showed that inhibitors of membrane fusion blocked or strongly reduced LTP when introduced into CAl pyramidal cells. On the other hand, an increase in synaptic strength was elicited when membrane fusion was facilitated. In the CA 1 region, LTP requires the activation of the NMDA glutamate receptors and a subsequent rise in postsynaptic calcium concentration. Besides other roles, Ca2+ plays a crucial role in the final stage of vesicle fusion with the membrane, and the number of fused vesicles is proportional to [Ca2+] (Sudhof 1995). Since LTP in CA 1 neurons is accompanied by appearance of AMPA subclass of glutamate receptors (Liao et al. 1995), it is reasonable to assume that vesicles can be a mean of their insertion. Indeed, Kharazia et al. observed GluRI (a subunit of AMPA receptors) containing vesicles associated with the cytoplasmic side of some GluRI-containing cortical synapses (Kharazia et al. 1996). Moreover, tetanic stimulation induces a rapid delivery of GluRI into spines and this delivery requires activation of NMDA receptors (Shi et al. 1999). Another effect of the vesicle fusion with the spine membrane would be the shaping and growth of the spine, which were observed during the induction and maintenance of LTP. However, prior to fusion the vesicles must get very close to the plasma membrane. The main mechanism for displacement of vesicles within axons and dendrites is the fast active transport with the speeds of 0.001-0.004 um/ms (Schnapp and Reese 1986). Fast transport depends on the direct interaction of transported vesicles with microtubules via the translocator kinesin-like molecules
60
3 Neuro-Information Processing in the Brain
(Schnapp and Reese 1986). However, microtubules do not enter spines (Spacek and Harris 1997). Thus, while the fast transport can bring vesicles close to the walls of dendritic shafts, another mechanism must come into play within spines themselves. The first natural candidate for this mechanism can be the diffusion of vesicles. However, we have shown that an electrophoretically driven, directed motion of negatively charged vesicles towards the spine head, evoked by the synapse stimulation itself can be ten times faster (Benuskova 2000). We have estimated the intensity of intra-spine electric fields triggered by stimulation of excitatory spine synapses. We have shown that this electric force can cause fast electrophoretic movement of negatively charged vesicles which bring new postsynaptic receptors and membrane for insertion during the induction of LTP. Due to the direction of an intra-spine electric field, movement of vesicles is electrophoretically directed along the longitudal spine axis towards the spine head. Spinnulae, small extrusions in the middle of the spine head might be the witnesses of such a directed vesicle fusion. The number of fused vesicles may be proportional not only to the increased calcium concentration within the spine head during the induction of LTP but also to the magnitude of electric force which drives vesicles towards the postsynaptic membrane. The thicker the spine gets, the smaller the electric field becomes, thus none or far less vesicles would get to the postsynaptic membrane in time to catch up with the increased [Ca2+J during NMDAR stimulation. This effect is congruent with the effect of spine dimensions on [Ca2+] (Gold and Bear 1994), and thus may also be a part of saturation mechanism in LTP. Electrophoretic hypothesis is illustrated in Fig. 3.4.
Fig. 3.4. Schematic illustration of the electrophoretic hypothesis linking the stimu-
lation of the spine excitatory synapses with the morphological changes and insertion of new receptors during LTP. Arrows point in the direction of the intra-spine electric field E. E is getting smalleras the spine gets thickerdue to new membrane insertion. Spinulae in the middle of the spine head would be the place where the vesicles fuse most often due to the direction of E. SER = smooth endoplasmic reticulum
3.4 Neocortical Plasticity
61
Morphological changes on spines, dendrites and growth of synapses probably constitute the mechanism of long-term and permanent memory storage (Rhawn 1996, Bailey et al. 2004, Maviel et al. 2004). There is a local machinery for protein synthesi s at spines and dendrites consisting of polyribosomes, tRNAs , microRNAs, initiation factors and mRNAs for glutamate receptors, structural proteins and kinases like CaMKII (Steward 1997). MicroRNAs are small, non-coding RNAs that control the translation of target messenger RNAs (mRNAs). It has been shown that both sensory experience and synaptic stimulation leads to translation of a-CaMKII (Wu et al. 1998), the kinase that is crucially involved in the induction of LTP. In addition it has been shown that a brain-specific microRNA, miR134, is localized at the synapto-dendritic compartment of rat hippocampal neurons and regulates the size of dendritic spines. This effect is mediated by miR-134 inhibition of the translation of an mRNA encoding the protein kinase , Limkl (Lim-do main- containing protein kinase) , that controls spine development (Schratt et al. 2006).
3.4 Neocortical Plasticity The ability of the brain to wire and rewire itself in response to changes in experience has become known as experience-dependent plasticity. The brain is able to remodel its connections in order to adjust the organism's respon se to changing conditions. Experience-dependent neocortical plasticity means that cortical neurons change their response characteristics due to altered stimulation of cortical inputs (i.e. experience). Amount of experience-dependent cortical plasticity, i.e. the degree of changes, is the biggest in early development and decays with age, however it never ceases totally. Developmental plastic ity refers to the cortical changes during the early stages of postnatal brain development. 3.4.1 Developmental Cortical Plasticity With respect to the cortical developmental plasticity, the most studied and best known neural system is the visual system of mammals. It is assumed that the visual system of primates has around twenty hierarchical processing levels (areas) , starting from the eye retina, going through the LGN in thalamus and primary visual cortex in the occipital cortex, to tens of the higher-order visual areas in the parietal and temporal cortices. Neurons at every hierarchical level of the visual system respond to different abstract elementary features of visual objects. These elementary features are for
62
3 Neuro-Information Processing in the Brain
example, edges of light intensity, angles between edges, distances between points (disparity), colors, movements, direction and/or speed of the movement, and so on. Higher-order visual neurons respond to various combinations of elementary features processed at lower levels. The most studied cortical visual area is the primary visual cortex (visual one, VI), which is the hierarchically lowest visual cortical area. The Nobel laureates, David Hubel and Torsten Wiesel, discovered that neurons in VI have different orientation selectivity and ocular dominances. Orientation selectivity means that different neurons respond to the optical stimulation by the bars (edges) of different angular orientations that are like this: \ I/ / - , and so on, over the 360 0 circle. Ocular dominance means the degree to which a neuron is dominated by the right or left eye. When a neuron is dominated by both eyes equally, we say it is binocular. Hubel, Wiesel, and many other scientists have studied whether the distribution (pattern) of ocular dominances and orientation selectivity within neurons of VI is inborn or acquired by experience. Currently, it is widely accepted that these patterns are in part inborn however for the development of the normal sharp binocular vision, normal visual stimulation is necessary. Immediately after eye opening, neurons in VI are not silent but instead each one responds to an optical stimulation by bars of several close orientations. It is said that neurons are broadly tuned. There is also some initial ocular dominance distribution. Degrees of ocular dominance are equally distributed. It can be said that nature (i.e. innate factors) provides some initial "scaffold", in the sense of a "framework outlining parts to be formed on it later". In time, and experiencing normal (natural) visual stimulation, neurons become sharply tuned and the distribution of ocular dominance becomes skewed towards binocular neurons. The development of sharply tuned binocular neurons is completed during the so-called critical period of postnatal development. In other words, the critical period is a limited time period after birth, only during which the development of normal response characteristics of neurons is possible. After eye opening, the critical period of cats lasts for about 2-3 months the critical period of monkeys lasts for about 6 months, and in humans up to about 5-6 years after birth. Developmental plasticity, i.e. the degree of evoked long-lasting changes in response characteristics of cortical neurons, decays with time during the critical period and ceases by its end. Abnormal visual experience during the critical period causes the development of irreversible abnormal response characteristics of cortical neurons. Such an abnormal visual experience can change even the initial response properties of neurons. Thus, a "scaffold" itself, i.e. the innate basis for normal response
3.4Neocortical Plasticity
63
characterist ics, can be disrupted by an inappropriate experience (at least in the visual cortex). Examples of abnormal visual experience during the critical period that lead to the development of permanent and irreversible changes in response characteristics of cortical neurons in VI (the same outcomes were observed for monkeys and cats) (Clothiaux et al. 1991): • Binocular deprivation (BD). When both eyes are close shut during the whole critical period, both eyes are completely deprived of visual inputs. Neurons in VI remain broadly orientation selective and the distribution of ocular dominances remains unchanged . At the perceptual level, poor orientation selectivity results in blurred vision, which can not be corrected by any means. • Monocular deprivation (MD). One of the eyes is close shut during the whole critical period. Neurons in VI become dominated only by the open eye. Neurons dominated by the open eye develop normal, i.e. sharp tuning to oriented bars, and thus gain normal orientation selectivity properties. These response characteristics are irreversible, even when the closed eye is opened at the end of the critical period and remains open for the rest of life. At the perceptual level, an animal stays blind on the closed eye for the whole life and has a sharp vision connected only to the open eye that received a normal visual input. • Normal rearing (NR) after MD. When after cortical neurons loose their responsivity to the closed eye, this eye is opened and the critical period is not over yet, neurons in V 1 regain binocular properties and develop normal orientation selectivity. A normal sharp vision can be restored. • Reverse suture (RS). One of the eyes is close shut for some time since the beginning of the critical period. After cortical neurons loose their responsivity to the closed eye, this eye is opened and the formerly opened eye is closed (before an end of the critical period). Cortical neurons in VI loose their responsivity to the newly closed eye, and regain and retain responsivity to the newly opened eye. Properties and time course of cortical plasticity in the auditory cortex is not very well known. Only recently it has been shown that the development of normal cortical representations of sounds correlates with the capacity to hear. Prof. Klinke 's team from the Goethe University in Frankfurt has advanced our knowledge about the developmental plasticity in the auditory cortex (Klinke et al. 1999). First they demonstrated that the cats that are deaf because of the innate (congenital) degeneration of the organ of Corti do not have normal representations of sound frequencies in their auditory cortex, although there is some weak representation of sounds be-
64
3 Neuro-Information Processing in the Brain
cause some sounds can be transmitted through bones in the skull. Scientists implanted the so-called cochlear implant into the inner ear of deaf kittens which were 2 to 4 months old. The cochlear implant is an electronic device that enables the transformation of sound air waves into electric stimulation of an auditory nerve (which is spared in these cats) and thus enables to relay sounds into the auditory cortex. They used a cochlear implant that was a crude substitute for a true cochlea, in the sense it was far from the faithful replication of natural stimulation. In spite of that, after a period of I to 3 weeks the kittens, completely deaf before they received the implant, started to hear. Perceptual improvement was accompanied by the development of almost normal sound maps (representations of sound frequencies) within auditory cortices of these cats. Thus, the cats have not started to hear immediately after they received a cochlear implant. Instead, they started to hear only after almost normal neural sound representations have developed in their auditory cortex. Degree of plastic changes in the auditory cortex in response to auditory stimulation decreases with age, although it seems that there is not such a sharp critical period like in the visual system. 3.4.2 Adult Cortical Plasticity
For the first time, the experience-dependent neocortical plasticity of the adult cerebral cortex was demonstrated in the somatosensory cortex of monkeys. At any moment, millions of signals from the whole body, its internal and external surface, travel to the brain through myriads of peripheral nerves and the spinal cord. Processing of bodily signals, like touch, warmth, pain, takes place in the somatic sensory cortex, which occupies the postcentral gyrus of parietal cortex (see Fig. 3.5). Human (or monkey or any other animal) body is mapped upon the somatosensory cortex in a topographic order. Topography means that neighboring body parts occupy neighboring parts in the map. A body map is deformed, i.e. those body parts that are often actively used are represented by bigger neural areas. Jon Kaas from Vanderbilt University and Michael Merzenich from University of California were among the first to discover that none of the two somatosensory maps are the same. They have discovered that individual variations in body representations reflect variances in tactile experience of different individuals especially with respect to fingers (Kandel et al. 2000). Details of cortical maps represent individual experience. Many experiments support their conclusions. For instance, in one experiment, monkeys were trained to touch a rotating disk for one hour per day. Before the experiment and several weeks later, scientists
3.4 Neocortical Plasticity
65
measured an extent of finger representation in the somatosensory cortex. They discovered that the cortical area representing stimulated finger tips has enlarged. Each finger has its own discrete area of the hand map. Neurons in area 1 respond only when touching finger 1, and so on. After differential tactile stimulation of the tips of fingers 2, 3 and 4, their cortical area has enlarged. In another experiment researchers sutured the skin of the fingers 3 and 4. Thus, a monkey had to use these two fingers as a single one. In the somatosensory hand map, an entirely new representation has developed. On the border between finger 3 and 4 exclusive representations, a band of neurons responding only to the joint stimulation of these two fingers has emerged.
Fig. 3.5. Schematic illustration of the bodily representation in the somatosensory cortex. Body surface is represented topologically while more important parts like face, lips, hands and especially fingers take more cortical representation than other parts like back, legs, arms, etc.
Similar somatosensory plasticity has been revealed in humans by means of magnetoencephalography (Mogilner et al. 1993). Adult humans were studied before and after surgical separation of inborn webbed fingers (syndactyly). The presurgical map contained nonsomatotopic hand representation. Within weeks after surgery, cortical reorganization over distances of 3-9 mm was evident, correlating with the new functional status of their separated digits. On the other hand, 95-100 % of people who have lost their arm or leg experience the phantom limb (Melzack 1999). They feel their lost limb as if it was still there, they feel it moving when they move, and it is still part of their subjective body experience. They can feel pain in it, touches, warmth, cold, and so on. At the beginning, the phantom limb
66
3 Neuro-Information Processing in the Brain
has a normal shape and size, but in time it starts to change: it may float freely in the air, contract into to the stump, or acquire any other bizarre shape, size or connection with the rest of the body. Nevertheless, it always remains an integral part of the patient's self as if the reorganization of large somatotopic maps was not entirely possible. 3.4.3 Insights into Cortical Plasticity via a Computational Model As we have seen although some forms of experience-dependent cortical plasticity disappear at the end of a developmental critical period, the adult cortex retains a significant capacity to undergo functional changes in response to alterations in sensory input. We are interested in the rules that determine how the adult cortex changes its synaptic circuitry to adapt to changes in the pattern of afferent activity. Very well structured and suitable system for such studies is the rat whisker system. Whiskers can be thought of as tiny fingers, with which the animal palpates objects all around. The facial whiskers of rats are aligned in five rows (row A is dorsal and row E is ventral) and the whiskers within a row are numbered from caudal to rostral, like positions on a chessboard (see Fig. 3.6a). Each facial whisker projects via the trigeminal nuclei and "barreloids" in the ventral posterior medial nucleus (VPM) in the thalamus into a separate cluster of neurons in layer IV, called barrels (Jensen and Killackey 1987). Clusters of neurons in layer IV remind the barrels of neurons hence their name. Barrels form the morphological basis of a discrete one whisker-one column organization of the barrel cortex (Fig. 3.6b).
a
b
Fig. 3.6. (a) In the experiment to gain insights into an adult experience-evoked cortical plasticity, all whiskers on one side of the rat's face but two (D2 and D3) were cut close to the fur. Dots mark the positions of cut whiskers aligned in five rows, A to E; (b) Illustration of cortical one whisker-one barrel mapping and the positionof the recording electrode. Clusters of neurons in layer IV remindthe barrels of neurons hencetheir name
3.4 Neocortical Plasticity
67
To alter the pattern of sensory activity, all whiskers except two, D2 and one neighbor in the D row, were cut close to the fur on one side of the face. We can think about this procedure as a removal of all fingers but two (however neither pain nor nerve damage is involved in whisker clipping). After 3, 7-10 and 30 days of "whisker pairing" (rats could use only 2 whiskers for palpation, all other whiskers were re-clipped regularly), the activity of single neurons in barrel D2 was measured in response to controlled deflection of the two paired whiskers, D2 and "D-paired", and the three cut neighbors ("D-cut", C2, and E2). Prof. Ebner and his team recorded and documented progressive and complex changes in D2 barrel cell responses during the course of paired whisker experience (Diamond et aI. 1993, Armstrong-James et aI. 1994). The physiological study outlined above motivated us to develop a computational model of a barrel D2 neuron to gain a deeper insight into which synapses in the cortex modify, how they modify and why they modify (Benuskova et aI. 1994). The weights of the synaptic inputs to the modeled barrel neuron (see Fig. 3.7a) were modifiable according to the Bienenstock, Cooper and Munro (BCM) theory (Bienenstock et aI. 1982, Cooper et aI. 2004). The BCM theory postulates that the neuron possesses a synaptic modification threshold, ~ that changes as a nonlinear function of the time-averaged postsynaptic activity (see Fig. 3.7b). b
a
4»
0: potent iation c
I I
I
,
I
postsynapti c activity c
I I I
I I
TT r
wh iskers C20 1 0 2
n 0 3 E2
c 4> < 0: depression
Fig. 3.7. (a) Schematic illustration of the barrel cortex circuit used in our model. In our simplified circuitry, whiskers C2, DI, D2, D3, and £2 converge through polysynaptic pathways (broken lines) upon a cell in the VPM barreloid D2, and through separate polysynaptic intracortical pathways upon a model cell in the barrel D2. Synaptic weights (m) that were modifiable in our model are denoted by open triangles. (b) Schematic illustration of the BCM synaptic modification rule together with how the same level of postsynaptic activity c' can result in synapse potentiation or depression depending on the current value of Brvt. According to the BCM synaptic modification rule, active excitatory inputs d, > 0 are strengthened when postsynaptic activity c > Brvt and ¢ > O. Active inputs weaken when c < Brvt and ¢< 0
68
3 Neuro-Information Processing in the Brain
Current position of 8M dictates whether a neuron's activity at any given instant will lead to strengthening or weakening of the synapses impinging on it. Whisker pairing was simulated by setting input activities of the model barrel cell to the noise level, except two inputs that represented untrimmed whiskers. Initially low levels of cell activity, resulting from whisker trimming, led to low values of~. As certain synaptic weights potentiated, due to the activity of the paired whiskers, the values of ~ increased and after some time their mean reached an asymptotic value. This saturation of ~ led to the depression of some inputs that were originally potentiated. The changes in cell response generated by the model replicated those observed in in vivo experiments (Benuskova et al. 1994). Previously, the BCM theory has explained salient features of developmental experience-dependent plasticity in kitten visual cortex (Bienenstock et al. 1982, Clothiaux et al. 1991). Our results suggested that the idea of a dynamic synaptic modification threshold, ~, is general enough to explain plasticity in different species, in different sensory systems and at different stages of brain maturity. Another test of BCM theory of synaptic plasticity was its application upon experience-evoked plasticity in the developmentally altered neocortex (Benuskova, Rema et al. 2001). If some synaptic plasticity rule captures real relations in the biological system then it should work even if this system is altered, provided corresponding parameters in the synaptic plasticity equations are changed accordingly. Numerous experimental results indicate that prenatal ethanol exposure impairs the development of synaptic plasticity mechanisms in the rat neocortex: namely, there is a decrease in the number of NMDA receptor-gated ion channels and the development of long, ineffectual dendritic spines (Miller and Dow-Edwards 1988, AIRabiai and Miller 1989, Rema and Ebner 1999). After the offsprings were born, they were fed normally, that is without ethanol, until they had reached adulthood. Then they were tested for ability of their cerebral cortex to undergo plastic changes in response to a new sensory experience. In these experiments again whisker D2 and D3 neighbor, "D-paired", were left intact while all other whiskers were cut. A novel sensory experience had caused plastic neural changes in both, normal and impaired cortices (i.e., cortices that were exposed to ethanol in utero), but these changes differed markedly from each other (Rema and Ebner 1999). As we will show below, the BCM theory can explain both the normal and impaired neocortical plasticity. The original BCM theory is a one cell theory that is one cell in this theory represents a population of cells with the defined properties, and inputs are expressed as instant frequencies of spikes. For instance here, one
3.4 Neocortical Plasticity
69
model cell represents the whole population of excitatory cells in the barrel D2 (Fig. 3.7a). Similarly, the inputs and their weights represent rather particular relay pathways with many synaptic contacts than a one individual synapse. Thus, a change in synaptic weight on the model neuron may represent any kind of plastic changes - i.e. biochemical, and/or morphological. Synaptic plasticity is modeled according to the BCM synaptic modification rule (Cooper 1987). If we consider the case of a linear cell, the modification of the /h synapse with the weight m, at time t is proportional to the product of input activity at the /h synapse, d(t), and a function ¢, in such a way that dm(t) [ lJ = TJ¢ e(t), eM (t)}I, (t) dt
- '-
(3.1)
The "modification rate", 17, is equal to the magnitude of the synaptic modification for the i" input in one time step, when ¢ = 1 and di(t)= 1. Modification function ¢ is a parabolic function of the cell's current firing rate e(t) and modification threshold ~(t), i.e. ¢[c(t),eM(t)] = e(t)[c(t) - eM (t)]
(3.2)
The dynamic synaptic modification threshold ~ is proportional to the squared postsynaptic response averaged over some recent past time t, such that (3.3)
The positive constant a determines how far to the right on the x axis we can place the actual or effective threshold for synaptic potentiation in the equation for the modification function ¢. It will tum out that the value of the constant a is the key to the differences in simulated experience-evoked cortical plasticity between normal and faulty cortex. Cell's current firing rate, e, for simulation of the model cell is calculated as a linear sum of the thalamocortical (VPM) input and intracortical input activities, such that
e(t) = m
vpm (t) Ld?m (t) + Lm~or (t)dt or (t)
(3.4)
where m vpm is the synaptic weight of the thalamocortical input from VPM upon the cell in the barrel-column and mi" is the weight of the lateral intracortical connection from neighboring barrel-column correspond-
70
3 Neuro-Infonnation Processing in the Brain
ing to the whisker i = D2, DI, D3, C2, E2 (see also Fig. 3.7a). The thalamocortical or intracortical input activities d?m(/) ot d," (I), respectively, relayed from the lh whisker, i = D2, DI, D3, C2, E2, are comprised of the sum of the evoked response plus random noise: (3.5) (3.6) In these equations, d(t) is equal to either 1 or 0, depending on whether or pm not the lh whisker is deflected. Parameter 0 < Ir < 1 is the input strength constant of the lh whisker input relayed through VPM to the model barrelcolumn D2. Parameter 0 < I;or < 1 is the intracortical input strength constant of the lh whisker input relayed through its own barrel-column to the model barrel-column D2. Qualitative and quantitative agreement with exm
cor »Irpmfor i = DI,
perimental data was obtained when Ib2 »I?m and I i m,
D3, C2, E2. For D2, and Ie;; '" Ib2 in accordance with experimental data (Armstrong-James et al. 1991). The thalamocortical noise n?m(/) and intracortical noise n~or (I) are defined as random variables which are uniformly distributed in the interval [-Ai(noise),+AiCnoise)], where Ai < ~ is the noise amplitude. They represent the random spontaneous neural activity. Whisker pairing was simulated by setting input activities to the noise levels, except for the two inputs that represented untrimmed whiskers. To match the simulated cell response evolution in time with the experimental testing procedure, the cell's response, c, of the model barrel-column cell was calculated as a sum of the short-latency « lOms) and long-latency (10-100 ms) responses to 50 deflections of the corresponding whisker (Armstrong-James and Callahan 1991, Armstrong-James et al. 1991). The short-latency response, CSL, is the number of spikes generated in response to activation of a multi-whisker thalamocortical input (see Fig. 3.7a). The long-latency response, CLL, is the number of spikes generated in response to activation of intracortical inputs. The total cell response reads (3.7)
At testing intervals, the weight values were fixed and
cor
dr = d;or = 1 for
deflected whisker, otherwise d?m=di = O. Since the response to intracor-
3.4 Neocortical Plasticity
71
tical inputs was measured over the time interval 9 times longer that the short-latency response, we multiply the second term by this number. For the values of simulation parameters and their ranges, as well as other details of the model, we refer the reader to (Benuskova, Rema et aI. 2001). In the following two figures, Fig. 3.8 and 3.9, we show the results of computer simulations of the above described model of neocortical experiencedependent plasticity. a
60
'3 60 E
:l ....40 0 00
.. 30
"-2"',,~
'
~30
-- ,-
II Do
.20 ..."i5.10
b
35
E
i26
-~..... , ........ -.... _.... _, ...... a
2
, ~20 ,,
f ,--r----------. .....
"",
.
............. ~
-
2 f
8,16
.ll.
-
..."t10
i
-::6
"IlL
10
15
20
25
Da", of whl"ker palMoa
30
5
10
15
20
25
30
D8'" of whl"ker pa1rlna
Fig. 3.8. Evolution of the long-latency responses evoked in the real and simulated barrel-column D2 by deflection of paired whiskers, D2 and D-paired (D3). Longlatency responses (10-100 ms poststimulus) are mediated by intracortical pathways. Lines correspond to simulation results, and discrete points with S.E. bars correspond to experimental data averagedover the whole population of cells. Evolution of responses to (a) whisker D2 and (b) paired whisker D3, for the normal cortex (dashed lines and squares) and for the impaired cortex (solid lines and triangles)
Our model based on the BCM theory of synaptic plasticity, explains the plastic changes in the barrel D2 as a result of the modification of excitatory synapses between (i) neighboring cortical barrel-columns, and (ii) between VPM and the model cell representing the barrel D2. In the model of normal barrel cortex: (1) short-latency «10 ms) responses to all whiskers increase, (2) long-latency responses to the paired whiskers, D2 and its spared D row neighbor (D3) initially increase and then decrease. In the model of barrel cortex of adult rats which were prenatally exposed to ethanol (impaired cortex): (1) short-latency «10 ms) responses to all whiskers decrease, (2) long-latency responses to the paired whiskers, D2 and its spared D row neighbor (D3) initially decrease and then increase. Compared to the values of parameters of the model simulating normal cortex, in the model simulating faulty cortex, three factors made a difference: (1) Levels of evoked intracortical activity and levels of noise were much lower than in the model of normal cortex. This is in accordance with
72
3 Neuro-Information Processing in the Brain
experimental findings in (Miller and Dow-Edwards 1988, Rema and Ebner 1999). (2) The thalamocortical noise had to be increased compared to the model of normal plasticity, thus reflecting alterations of thalamocortical projections induced by prenatal ethanol (Granato et al. 1995). (3) The value of the constant a in temporal dependence of 8M upon the past activity of neuron (equation 2.4) must have been twice as large as the value of a in the model of normal cortex (Benuskova, Rema et al. 2001). In spite of that the 8M behaves differently from 8M in the model of a normal cortex and determines a different course of plasticity. a
2S
-= •
~
E20 :;
~
--------------.
",,--------- -
.
S'S
y
..c OJ
.
to
OJ
,
.5
.E
8.10
<$2
..• =as
OJ
..
1
-e
0
OL
to
~
In
0
b
S ;., to "C4
0
S
10
15
OJ
;.
20
2S
Days or whisker pairing
50
0
5
10
15
20
25
Days or whisker pairing
50
Fig. 3.9. (a) Evolution of the short-latency responses evoked in the real and simulated barrel-column D2 by deflection of whisker D2. Short-latency response «10 ms poststimulus) is mediated by thalamocortical pathway. Dashed line corresponds to simulation results, and squares with S.E. bars correspond to experimental data averaged over the whole population of cells in the normal barrel D2. Solid line and triangles belong to the impaired cortex. (b) Illustration of evolution of the mean value of the modification threshold 8M for each day of whisker pairing. Note that in the model of impaired cortex the value of 8M decreases (triangles), whereas in the model of normal cortex 8M increases (squares), compared to the initial value of ~ immediately after whisker trimming
Theoretical and experimental results indicate that the search for the mechanism of 8M should be focused on the excitatory synapses that are formed on dendritic spines with NMDA receptor-gated ion channels (Bear et al. 1987, Bear 1995). According to the physiological model of 8M, the value and sign of ¢ are determined by the Ca2+ movement into dendritic spines. Synaptic efficacy will increase when presynaptic activity evokes a large postsynaptic calcium signal (¢ > 0). This will occur only when the membrane potential exceeds the level required to allow enough ions of Ca2+ to enter through the NMDA receptor-activated channels (c > 8M). When the amplitude of the evoked Ca2+ signal falls below a certain critical
3.4 Neocortical Plasticity
73
level, corresponding to ¢ = 0 and c = 8rvI, then active synapses will be weakened over time. This is in agreement with the fact that the induction of long-term synaptic depression (LTD) requires a minimum level of postsynaptic depolarization and rise in intracellular Ca 2+ concentration in the postsynaptic neuron (Artola et al. 1990). On the other hand, relatively high level of intracellular Ca 2+ concentration in the postsynaptic neuron leads to the induction of long-term synaptic potentiation (LTP). Thus, there is a LTD/LTP crossover point with a critical level of NMDA receptor activation and Ca 2+ entry, for changing the direction of synaptic modification. In the BCM theory, the LTD/LTP crossover point corresponds to 8rvI, or to the point where the ¢ function changes the sign from minus to plus (see e.g. Fig. 3.7b). In our simulations this LTD/L TP crossover point was influenced by the value of the constant a in equation 2.4. It is noteworthy that in the model of faulty cortex, the value of a must have been twice as large as the value of a in the model of normal cortex. This is important especially in the light of experimental finding that in the barrel cortex of adult rats, which were prenatally exposed to ethanol, the number of NMDA receptors is reduced to about one half of the number of NMDA receptors in the normal adult barrel cortex (with respect to all NRl, NR2A, and NR2B subunits) (Rema and Ebner 1999). Thus, we have arrived at hypothesis that the effective position of 8rvI on the x axis may depend on both the number of NMDA receptors and the average past activity. That is, the smaller number ofNMDA receptors is on the postsynaptic neuron the higher is the value of 8rvI, and vice versa. On the other hand, the small average activity lowers the 8rvI value therefore the value of 8rvI is in fact lower in the impaired cortex than in the normal cortex (Fig. 3.9). It is still however too high for extremely low levels of activity in the impaired cortex. Exposing the prenatal alcohol-exposed rats to enriched rearing conditions significantly improves all measured cortical functions but does not restore normal values (Rema and Ebner 1999). The results predict that combinations of interventions will be necessary to completely restore cortical function after exposure of the fetal brain to alcohol. Maybe insertion of the gene for NR2B would help in addition to enriched environment. Relevant studies indicate that not only the number of NMDA receptors may influence 8rvI value, but there are also other factors that are related to the cascade of intracellular events triggered by Ca 2+ influx which can affect 8rvI. We will investigate these factors more thoroughly in the chapter on computational neurogenetic model ofleaming and memory.
74
3 Neuro-Infonnation Processing in the Brain
3.5 Neural Coding: the Brain is Fast, Neurons are Slow Neurons within and between different brain areas send messages to each other by means of output spikes. Thus, neural representations of objects "communicate with each other" since neurons within and between these representations send messages to each other. Within the brain, in the cortex in particular, the principle of topographic mapping of input features is obeyed all over. Thus, the information about the features of stimulus is encoded in the places where neurons responding to object fire. These places get allocated due to interplay between the inborn scaffold and early experience. The principle of topology preservation holds in the somatosensory cortex, auditory and visual cortices. The principle of topology preservation means that features of the object that are close together in the object are close to each other in its neural representation too. For instance, ascending sound frequencies are mapped onto ordered neural stripes in the primary auditory cortex, and bars of similar orientations are mapped onto neighboring columns within the primary visual cortex. Every object, comprised of number of features, is thus represented by a spatial pattern of activated columns within a map. Neurons within one cortical column (there may be 10000 of them) redundantly represent one elementary feature. When activated, neurons are supposed to send a message of their activation and about the salience of the present feature up and down the hierarchy of processing areas, and also within the same processing area. Thus, the messages exchanged by neurons are about the salience of features that they represent. At present the nature of these messages, or the nature of a neural code, is a mystery. Is it the number of spikes sent? Is it the timing of spikes? Is it the frequency of spikes that conveys the message? In this section we will introduce possible solutions to this problem.
3.5.1 Ultra-Fast Visual Classification Simon Thorpe and his team at the University of Toulouse, France, performed an experiment with humans, in which subjects were supposed to classify pictures into two categories, either an animal or a non-animal category (Thorpe et al. 1996). Subjects were not supposed to name an object, they were just asked to classify it. They have not seen the pictures before. For each category they should press one of the two response buttons. The problem was that pictures were shown to subjects for an extremely short period of only 20 milliseconds (1 second = 1000 ms). Exact timings were controlled by a computer. Hundreds of pictures were used for statistical evaluation.
3.5 NeuralCoding: the Brainis Fast, Neurons are Slow
75
Every two seconds, a picture from one or the other class was drawn at random. Each time, it was a different picture, thus there was no repetition. During an experiment the brain activity of subjects was recorded. Reaction time, i.e. time of pressing the button, was equal to 250 ms, on average. Activity in the inferotemporal (IT) cortex occurred on average in 150 ms, so the preparation and execution of motor response took on average 100 ms. After an experiment was over, people used to say that they did not have enough time to realize (to be aware of) what was actually on the picture. Decision to press one or the other button was made on feelings. In spite of that, they correctly classified the pictures in 94% of instances. The same experiment was carried out with monkeys (Thorpe and FabreThorpe 2001). Their average reaction time was 170 ms, IT activity occurred after 100 ms, thus the motor response was prepared in about 70 ms after visual processing. (monkey's brain size is about 1/3 of human size.) Their classification accuracy reached 91%, so they were almost as good as humans were. Experimenters varied an experimental protocol to discover that the ultra-fast classification did not depend on classes of objects, did not depend on color, and did not depend on attention or eye fixation. It is amazing that after such an extremely short presentation of complex stimuli upon eye retina, the primate brain is able to perform a correct classification in more than 90% of instances in a matter of less than 200 ms. 250ms
I
1 Fig. 3.10. Serial processing of the visual stimulus in the image classification experiment with humans. Location of illustrated cortical areas is only schematic. VI = primary visual cortex, V2 = secondary visual cortex, V4 = quartiary visual cortex, IT = inferotemporal cortex, PFC = prefrontal cortex, PMC = premotor cortex, MC = motorcortex
Let us trace the visual brain processing in this experiment (see Fig. 3.10.). Projected image stimulates retina for 20 ms. In about 80 ms, neu-
76
3 Neuro-Information Processing in the Brain
rons in the thalamic LGN (lateral geniculate nucleus) respond. Thalamic neurons activate neurons in the primary visual cortex (V1). Then, activation proceeds to and through higher-order visual areas, V2, V4 and IT. We speak about the so-called "WHAT" visual system, which is assumed to be responsible mainly for classification and recognition of objects. In the highest-order area of this system, i.e. the inferotemporal (IT) cortex, activity appears after 150 ms since the picture onset (on average). It is thought that here, in the IT area, the classification process is completed (Thorpe and Fabre-Thorpe 2001). If we divide 150 ms since the picture onset by the number of processing areas (i.e. retina, thalamus, VI, V2, V4), on average each of them has only 30 ms for processing of signals. The frontal areas, PFC, PMC and MC, are responsible for preparation and execution of motor response, for what they need only 100 ms. Divided by three, again we get about 30 ms for each area. Since each of the mentioned areas has further subtle subdivisions, each sub area can have only 10 ms to process signals and send them higher in the hierarchy of processing. At the same time, neurons in each area send signals up and down in the stream of hierarchical processing. Whether 10 or 30 ms, it is an extremely short time for processing in one single area. Cortical neurons, when naturally stimulated fire with frequencies of the order of 10 to 100 Hz. A neuron firing with an average frequency of 10 Hz (i.e., 10 impulses in 1000 ms), may fire the first spike in 100 ms from the beginning of stimulation. Thus, during the first 10-30 ms there will be no spikes from this neuron. Another neuron firing with the frequency of 100 Hz fires 1-3 spikes during the first 10-30 ms. In each of the above-mentioned areas, there are millions perhaps milliards of neurons, then these neurons exchange only 1-3 spikes, and the result of this processing is sent higher to higher-order areas, and lower to the lower-order areas. Each neuron receives signals from 10000 other neurons and sends off signals to the next 10 000 neurons. Synaptic transmission delay in one synapse is about 1 ms. A neuron cannot wait 10 000 ms to receive signals from all its presynaptic neurons. Thus, the signals ought to come almost simultaneously, and not one after another. Another complication in the neuronal processing of inputs is the fact that firing is a stochastic process. A good model for it is a Poisson stochastic process where the value of dispersion is equal to the value of the mean, thus the dispersion is large. Speaking about firing frequencies of 10 or 100 Hz, we mean average frequencies over relatively long time periods, let us say 500 ms (half of a second). Thus, a neuron firing with the average frequency of 100 Hz does not have to fire a single spike during the first 10-30 ms from the beginning of stimulation, and a neuron firing with the average frequency of 10Hz may fire four spikes. Thus, to summarize, it is
3.5 NeuralCoding: the Brainis Fast,Neurons are Slow
77
really a problem how neurons code information. So far, this problem has not been solved. In the following section we will introduce several current hypotheses. 3.5.2 Hypotheses About a Neural Code
These hypotheses can be divided into two categories: (A) spike timing hypotheses and (B) rate code hypotheses (Maass and Bishop 1999). Coding Based on Spike Timing 1. Reverse correlation. The first option is that the information about the salience of the object feature is encoded in the exact temporal structure of the output spike train. Let us say that two neurons fire three spikes within 30 ms. The first neuron fires a spike train with this temporal structure I II and the second neuron with this temporal structure I[I . By means of the techniques of reverse correlation, it is possible to calculate which stimulus exclusively causes which temporal pattern of which neuron. The main proponents of this theory are Bialek and his coworkers who have made its successful verification in the fly visual system (Rieke et al. 1996). 2. Time to the first spike. Let at time instant to a stimulus arrives to the neural network. Neurons that fire the first (let us say in a window of 10 ms) carry the information about the stimulus features. The rest of neurons and the rest of impulses are ignored. This theory is favored by S. Thorpe (Thorpe et al. 1996, Thorpe and Fabre-Thorpe 2001). 3. Phase. Information about the presence of the feature is encoded in the phase of neuron's impulses with respect to the reference background oscillation. Either they are in a phase lead or in a phase lag. The information can also depend on the magnitude of this phase lead (lag). This coding is preferred by people investigating hippocampus (Jensen 2001). 4. Synchronization. Populations of neurons that represent features belonging to one object can be bound together by synchronous firing. Such synchronization was discovered in the laboratory ofW. Singer in the cat visual cortex to accompany percepts (Fries et al. 1997). It was also detected in the human cortex during perception of meaningful stimuli (faces)(Rodriguez et al. 1999). The Rate Code 1. Temporal average rate. In this respect, works of an English physiologist Adrian from the 30-ties of the 20 th century are being cited. Adrian found
78
3 Neuro-Information Processing in the Brain
out that the average frequency of a neuron in the somatosensory cortex is directly proportional to the pressure applied to its touch receptor. Similar dependencies have been discovered in the auditory and visual cortices. That is, in the auditory cortex, the heard frequency is encoded by the average firing frequency of auditory neurons, and in the visual cortex, the average frequency of neurons encodes for the salience of its visual elementary feature. This coding is still being considered for stationary stimuli that last up to around 500 ms or longer, so that neurons have enough time to count (integrate) impulses over long time. Neurons that have the highest frequency signalize the presence of the relevant feature. 2. Rate as a population average. An average frequency is not calculated as a temporal average but rather as a population average. One feature is represented by a population of many (10 000) neurons, for instance in one cortical column. Upon presence of a feature, most of them are activated. When we calculate the number of spikes in a 10 ms window of all these neurons and divide this number by the number of neurons, we will get approximately the same average frequency as when calculating a temporal average rate of any of these neurons (provided they all fire with the same average rate). This idea has been thoroughly investigated by Shadlen and Newsome (Shadlen and Newsome 1998). They showed on concrete examples, that by means of population averaging we can get a reliable calculation of neuron's average rates even in the case when they have a Poisson-like distribution of output spikes. Populations that relay the highest number of spikes signalize the presence of the relevant feature.
3.6 Summary The problem of coding of neural messages is not new. It was thought, however, that we already have its solution. That this not the case became apparent several years ago, when researches evaluated the temporal intervals of neural processing. Before this, it was thought that in the neural system, the messages are coded by the rate code, that is, by the time average of frequency. The rate code still takes place for the stationary stimuli that last for up to 500 ms or more. Hubel and Wiesel and many others, made their discoveries in many different brain areas, using an idea of the rate code. However, the brain is able to process extremely complex stimuli at much shorter time scales, while the processing takes place in large and hierarchically neural networks comprised of milliards of neurons. Neurons in
3.6 Summary
79
these networks do not have time to calculate average frequencies. Upon seeing all the options for neural coding introduced in the previous section, one gets inevitably confused. Every hypothesis has some experimental support, thus which one is correct and which one is not? Is it possible that different brain areas use different codes? In one area it is a spike timing code and in another area it is a rate code? Or does the brain switch between different codes according to the task to be performed? These possibilities remain to be explored. Or, let us reason that all these hypotheses do not have to be in fact mutually exclusive. For instance, as an exercise, let us imagine that a population of neurons representing one feature has a synchronized firing in response to their feature to be saliently present in the input. Those neurons that have the highest output rates have also the shortest time to the first spike. Their synchronization guarantees that all of them have the same phase difference with respect to the background oscillation. Actually, the background oscillation might have helped to synchronize them. The information about the intensity of a feature in the stimulus (feature salience) is encoded in the average frequency of the whole population of neurons (within 10 ms) and is relayed upon the next population of neurons in the processing hierarchy. And, an actual temporal pattern of spikes relayed by one population of neurons indeed corresponds only to one stimulus. It may be that the mentioned options only different angles under which we can view the same process. Experimental developmental neuroscience brings abundant evidence that in the developing brain there exist strong genetic programs determining the overall pattern of hierarchical organization and connections between brain areas. Nature provides an anatomical and physiological "scaffold" in the sense of a framework that outlines the structures to be formed later. A newborn brain is not a "tabula rasa" however nurture and experience can shape it dramatically. Information needed to specify precise subtle differentiation of neurons and subtle twists of interneuronal connectivity far more surpasses that contained in genetic programs. Genetic programs provide for a vast overproduction of abundant and redundant synaptic connections in the developing brain. Individual differences in experience early in life cause selective pruning of majority of these synapses. Only synapses which mediate a genuine individual experience of an individual remain. Process of experience-dependent synaptic pruning during early stages of brain development may constitute the basis of brain and mind individuality. Early developmental overproduction of redundant synapses lasts only for some time after birth. Time windows, i.e. beginnings and durations, are different for different brain systems. They also differ for different animal species. In
80
3 Neuro-Information Processing in the Brain
general, what lasts weeks and months in rats, cats and monkeys, usually lasts for years in humans. Later in adolescence and adulthood, a new experience is being "burned" into the brain mainly by a selective creation of new synapses and by changing the efficacies of synaptic transmission of existing connections. This does not mean that synapses cannot be removed due to experience later in life, or that new connections cannot be created due to experience later in life. They can, but the prevailing process of experience-dependent adult cortical plasticity is not based on pruning some kind of permanently created abundant connections like in the early development. Some cortical areas retain the capacity of synaptic plasticity for the whole course of life. These are association cortical areas, highest-order sensory areas, premotor and emotional cortical areas of the brain. Capacity of the brain to be plastic and change its microstructure due to experience is of a profound importance for discovering the rules of the mindlbrain relation. Brain plasticity also relates to the genetic code and future modeling of brain functions will require this information.
4 Artificial Neural Networks (ANN)
This chapter introduces the basic principles of artificial neural networks (ANN) as computational models that mimic the brain in its main principles. They have been used so far to model brain functions, along with solving complex problems of classification, prediction, etc. in all areas of science, engineering, technology and business. Here we present a classification scheme of the different types of ANN and some main existing models, namely self-organized maps (SOM), multilayer-perceptrons (MLP) and spiking neural networks (SNN). We illustrate their use to model brain functions, for instance the generation of electrical oscillations measured as LFP. Since ANNs are used as models of brain functions, they become an integral part of CNGM where gene interactions are introduced as part of the structure and the functionality of ANN (see e.g. Chap. 8).
4.1 General Principles ANNs are massively parallel computational systems inspired by biological neural networks. They can have different architectures and properties of their processing elements. Illustration of a general architecture of an ANN is shown in Fig. 4.1.
,
'._.. _.. .. _... _...
Feedlor wa rd connec tions
Feedbac!l;
c o n n ec lio n s
, _~
1
Lateral
c c nn e c nc e s
Fig. 4.1. A general architecture of ANN with three kinds of connectivity: feedfor-
ward, feedback, and lateral. Neurons are usually organized into layers
82
4 Artificial Neural Networks (ANN)
There are different variations derived from this general architecture: for instance there can be only feedforward connections between layers, or there can be only feedback connections in the one and only layer with the feedforward input, or there can be feedback connections allowed only between some layers, etc. In general, ANNs perform mappings of vectors from m-dimensional input space upon vectors from n-dimensional output space. We can also say that ANNs learn how to associate vectors from one space to vectors from another space. First of all, there are many models of ANNs, each of them having different particular properties. However, they all are adaptable systems that can reorganize their internal structure based on their experience in the process called training or learning. ANN are very often being referred to as connectionist systems (Kasabov 1996a, Arbib 2003). Basic processing elements (processing units) of ANNs are formal neurons based on the rate code hypothesis of a neural code according to which the information being sent from one neuron to another is encoded in the output rate of its spike train. Thus the input-output function of an {h element of an ANN is o;(t) =
g( ~
W ik
(t)x k
(4.1)
(t)J
where o.; xi, Wik E 91 are the output rate of the {h neuron, input rate of the J(h input, and Wik is the synaptic weight between the kth and {h element, respectively. The function g is the so-called activation or transfer function of a neuron. It can be linear or nonlinear, continuous and differentiable or binary, depending on which ANN model we are working with. w
b
::~ .~o ~
Fig. 4.2. (a) A simplifieddrawing of a real neuron; (b) a diagram of a simple artificial rate-based neuron model
Different models of ANNs differ with respect to their architecture, transfer functions of their elements and the rules used to modify the
4.1 General Principles
83
weights between neurons in the process of learning (Bishop 1995). A simple model of an artificial neuron and different activation functions are illustrated in Fig. 4.2 and Fig. 4.3, respectively. Different models of ANNs differ with respect to their architecture, transfer functions of their elements and the rules used to modify the weights between neurons in the process of learning (Bishop 1995). A simple model of an artificial neuron and different activation functions are illustrated in Fig. 4.2 and Fig. 4.3, respectively. o
-----------'!~-------+ w x
------------~-------+ ------------~
--7t\=--o
____!__::.;
.
w x
--------~:----
------------
---J-~--~-~~
------71 Fig. 4.3. Different typesof activation functions for artificial neuron models
Most of the known ANN training algorithms are influenced by a concept introduced by Donald Hebb (Hebb 1949). He proposed a model for unsupervised learning in which the synaptic strength (weight) is increased if both the source and the destination neurons become simultaneously activated. It is expressed as: (4.2)
r
where wij(t) is the weight of the connection between the lh and neuron in the network at the moment I, o, and OJ are the output signals of the neurons i and j at the same moment I. The weight wij(t+ 1) is the adjusted weight at the next time moment (1+ 1). Usually some kind of weights normalization is applied after each adjustment to prevent their growth to infinity (Miller and MacKay 1994). In general terms, a connectionist system {S, W, P, F, 1, L} that is defined by its structure S, its connection weights W, its parameter set P, its
84
4 Artificial Neural Networks (ANN)
function F , its goal function J, and a learning procedure L , learns, if the system optimizes at least part of its structure S and its function F when observing events z l, z2, z3, ... from the problem space Z. Through a learning process, the system improves its reaction to the observed events and captures useful information that may be later represented as knowledge . The goal of a learning system is defined as finding the minimum of an objective function J(S) named "the expected risk function" (Amari 1990, Amari and Kasabov 1998). The function J(S) can be represented by a loss function Q(Z,S) and an unknown probability distribution !l(Z). Most of the learning systems optimize a global goal function over a fixed part of the structure of the system. In ANN this part is a set of predefined and fixed number of connection weights, i.e. the set number of elements in the set W. As an optimization procedure, some known statistical methods for global optimization, are applied (Amari and Kasabov 1998), for example the gradient descent method. Final structure S is expected to be globally optimal, i.e. optimal for data drawn from the whole problem space Z. In case of a changing structure S and changing (e.g. growing) part of its connections W, where the input stream of data is continuous and its distribution is unknown, the goal function could be expressed as a sum of local goal functions 1' , each one optimized in a small sub-space of Z'c Z as data is drawn from this sub-space. Something more, while the learning process is taking place, the number of dimensions of the problem space Z may also change over time. The above scenarios would reflect in different models of learning as it is explained next. Let us introduce a general classification scheme of AJ\TNs which will lead to explanation of their properties, capabilities and drawbacks (Kasabov 2003).
4.2 Models of Learning in Connectionist Systems There are many methods for learning that have been developed for connectionist architectures (for a review see (Arbib 2003». It is difficult and quite risky to try to put all the existing methods into a clear classification structure (which should also assume "slots" for new methods) but this is necessary here in order to define the scope for further applications of ANN to buildCNGM. A connectionist classification scheme is explained below. On the one hand, this scheme is a general one as it is valid not only for connectionist
4.2 Models of Leaming in Connectionist Systems
85
learning models, but also for other learning paradigms, for example evolutionary learning, case-based learning, analogy-based learning and reasoning, etc. On the other hand, the scheme is not a comprehensive one as it does not present all existing connectionist learning models. It is only a working classification scheme needed for the purpose of this work. A (connectionist) system that learns from observations zl, z2, z3, ... from a problem space Z can be designed to perform learning in different ways. The classification scheme below outlines main questions and issues and their alternative solutions when constructing a connectionist learning system. Now, let us offer an explanation of individual issues and alternatives (Kasabov 2006). 1. What space has the learning system developed in? a) The learning system has developed in the original problem space Z. The structural elements (nodes) of the connectionist learning system are points in the d-dimensional original data space Z. This is the case in some clustering and prototype learning systems. One of the problems here is that if the original space is high dimensional (e.g. 6,000 genes expressed in the brain, or 64 EEG channels) it is difficult to visualize the structure of the system and observe some important patterns. For this purpose special visualization techniques, such as Principal Component Analysis (PCA), or Sammon mapping, are used to project the system structure S into a visualization space V. b) The learning system has developed in its own machine space M The structural elements (nodes) of the connectionist learning system are created in a system (machine) space M, different from the d-dimensional original data space Z. An example is the Self-Organizing Map (SOM) neural network (Kohonen 1997). SOMs develop in a two-, three-, or more dimensional topological spaces (maps) from the original data.
2. Is the space open? a) An open problem space is characterized by unknown probability distribution P(Z) of the incoming data and a possible change in its dimensionality. Sometimes the dimensionality of the data space may change over time, involving more, or fewer dimensions, for example adding new modalities to a person identification system. b) A closed problem space has a fixed dimensionality, and either a known distribution of the data or the distribution can be approximated in advance through statistical procedures. 3. Is learning on-line?
86
4 ArtificialNeural Networks (ANN)
a) Batch-mode, off-lin e learning In this case a pre-defined learning (training) set of data P={zl, z2, ..., zp} is learned by the system through propagating this data set several times through the system. Each time the system optimizes its structure S based on the average value of the goal function over the whole data set P. Many traditional algorithms, such as the backpropagation algorithm, use this type of learning (Rumelhart et al. 1986, Werbos 1990). b) On-line, pattern mode, incremental learning On-line learning is concerned with learning each data example separately as the system operates (usually in a real time) and the data might exist only for a short time. After observing each data example the system makes changes in its structure (e.g. the W connections) to optimize the goal function J. A typical scenario for on-line learning is when data examples are drawn randomly from a problem space and fed into the system one by one for training. Although there are chances of drawing the same examples twice or several times, this is considered as a special case in contrast to the offline learning when one example is presented to the system many times as part of the training procedure. Methods for on-line learning in ANN are studied in (Albus 1975, Fritzke 1995, Saad 1999). In (Hassibi and Stork 1992) a review of some statistical methods for on-line learning, mainly gradient descent methods applied on fixed size connectionist structures, is presented. Some other types of learning, such as incremental learning and lifelong learning are closely related to on-line learning. Incremental learning is the ability of an ANN to learn new data without fully destroying the patterns learned from old data and without the need to be trained on either old or new data. According to Schaal and Atkeson (Schaal and Atkeson 1998) incremental learning is characterized by the following features: • Input and output distributions of data are not known and these distributions may change over time . • The structure of the learning system W is updated incrementally. • Only a limited memory is available so that data have to be discarded after they have been used. On-line learning, incremental learning, and lifelong learning, are typical adaptive learning methods. Adaptive learning is aiming at solving the wellknown stability/plasticity dilemma, which means that the system is stable enough to retain patterns learned from previously observed data while it is flexible enough to learn new patterns from new incoming data.
4.2 Models of Leamingin Connectionist Systems
87
Adaptive learning is typical for many biological systems and is also useful in engineering applications, such as robotic systems and process control. Significant progress in adaptive learning has been achieved due to the Adaptive Resonance Theory (ART) (Carpenter and Grossberg 1991) and its various models that include unsupervised models (ARTl-3, FuzzyART) and supervised versions (ARTMAP, FuzzyARTMAP-FAM) (Carpenter et al. 1991). c) Combined on-line and off-line learning. In this mode the system may work for some of the time in an on-line mode, after that it switches to off-line mode, etc. This is often used for optimization purposes where a small "window" of data from the continuous input stream can be kept aside, and the learning system, which works in an on-line mode, can be locally or globally optimized through off-line learning on this window of data through "window-based" optimization of the goal function J(W). 4. Is the learning process lifelong? a) Single session learning The learning process happens only once over the whole set P of available data (even it may take many iterations during training). After that the system is set in operation and never trained again. This is the most common learning mode in many existing connectionist methods. b) Lifelong learning Lifelong learning is concerned with the ability of a system to learn from continuously incoming data in a changing environment during its entire existence. Growing, as well as pruning may be involved in the lifelong learning process, as the system needs to restrict its growth while always maintaining a good learning and generalization ability. 5. Is output data available and in what form? The availability of output data that can be used for comparison with what the learning system produces on its outputs defines four types of learning: a) Unsupervised learning There are no desired output data attached to the examples zl, z2, z3, ... The data is considered as coming from an input space Z only. b) Supervised learning There are desired output data attached to the examples zl, z2, z3, ...The data is considered as coming in (x, y) pairs from both an input space X and an output space Y that collectively define the problem space Z. The con-
88
4 Artificial Neural Networks (ANN)
nectionist learning system associates data from the input space X to data from the output space Y. c) Reinforcement learning In this case there are no exact desired output data, but some hints about the "goodness" of the system reaction are available. The system learns and adjusts its structural parameters from these hints. In many robotic systems a robot learns from the feedback from the environment that may be used as a qualitative indication of the correct movement of the robot.
d) Combined learning This is the case when a connectionist system can operate in more than one of the above learning modes. 6. Is evolution of populations of individuals over generations involved in the learning process? a) Individual, development-based learning A system is developed independently and is not part of a development process of a population of individual systems. b) Evolutionary learning, population-based learning over generations Here, learning is concerned with the performance of not only an individual system, but of a population of systems that improve their performance through generations. The best individual system is expected to emerge, to evolve from such populations. Evolutionary computation (EC methods, such as genetic algorithms (GA have been widely used for optimizing ANN structures (Yao 1993, Fogel 1995, Watts and Kasabov 1998). Such ANNs are called evolutionary neural networks. They utilize ideas from Darwinism. Most of the evolutionary computation methods developed so far assumes that the problem space is fixed, i.e. the evolution takes place within a pre-defined problem space and this space does not change dynamically. Therefore, these methods do not allow for modeling real, on-line adaptation. In addition they are very time consuming which also prevents them from being used in real world applications. 7. Is the structure of the learning system of a fixed size or is it evolving? Here we will refer again to the bias/variance dilemma, (see for example (Grossberg 1969,1982, Carpenter and Grossberg 1991)). With respect to an ANN structure, the dilemma states that if the structure is too small, the ANN is biased to certain patterns, and if the ANN structure is too large there are too many variances that may result in over-training, poor generalization, etc. In order to avoid this problem, an ANN structure should
4.2 Models of Leaming in Connectionist Systems
89
change dynamically during the learning process thus better representing the patterns in the data and the changes in the environment. a) Fixed-size structure This type of learning assumes that the size of the structure S is fixed (e.g. number of neurons, number of connections) and through learning the system changes some structural parameters (e.g. W - the values of connection weights). This is the case in many multi-layer perceptron ANNs trained with the backpropagation algorithm (Rosenblatt 1962, Arbib 1972, Rumelhart et al. 1986, Arbib 1987, Amari 1990, Werbos 1990, Amari and Kasabov 1998). b) Dynamically changing structures According to Heskes and Kappen (Heskes and Kappen 1993) there are three different approaches to dynamically changing structures: constructivism, selectivism, and a hybrid approach. Connectionist constructivism is about developing ANN that have a simple initial structure and grow during its operation through inserting new nodes. This theory is supported by biological facts (see (Saad 1999)). The insertion can be controlled by either a similarity measure of input vectors, or by the output error measure, or by both, depending on whether the system performs unsupervised or supervised mode of learning. A measure of difference between an input pattern and already stored ones is used for deciding whether to insert new nodes in the adaptive resonance theory models ARTl and ART2 (Carpenter and Grossberg 1991) for unsupervised learning. There are other methods that insert nodes based on the evaluation of the local error. Such methods are the Growing Cell Structure and Growing Neural Gas (Fritzke 1995). Other methods insert nodes based on a global error to evaluate the performance of the whole ANN. One such method is the Cascade-Correlation Method (Fahlman and Lebiere 1990). Methods that use both similarity and output error for node insertion are used in Fuzzy ARTMAP (Carpenter et al. 1991) and also in EFuNN (Evolving Fuzzy NN) (Kasabov 2003). Connectionist selectivism is concerned with pruning unnecessary connections in an ANN, that starts its learning with many, in most cases redundant, connections (Sankar and Manmone 1993, Rummery and Niranjan 1994). Pruning connections that do not contribute to the performance of the system can be done by using several methods: Optimal-Brain Damage (LeCun et al. 1990), Optimal Brain Surgeon (Hassibi and Stork 1992), Structural Learning with Forgetting (Ishikawa 1996). 8. How do structural modifications affect the partitioning of the problem space?
90
4 Artificial Neural Networks(ANN)
When a connectionist model is created, either in a supervised or in an unsupervised mode, the nodes and the connections divide the problem space Z into segments. Each segment of the input sub-space is mapped onto a segment from the output sub-space in case of a supervised learning. The partitioning in the input sub-space imposed by the model can be one of the following two types: a) Global partitioning (global learning) Learning causes global partitioning of the space. Partitioning hyperplanes can be modified either after every example is presented (in case of on-line learning), or after all examples are presented for one iteration (in case of a batch-mode learning). Through the gradient descent learning algorithm the problem space is partitioned globally. This is one of the reasons why global learning in multilayer perceptrons suffers from the catastrophic forgetting phenomenon (Miller and MacKay 1994, Robins 1996). Catastrophic forgetting (also called unlearning) is the inability of the system to learn new patterns without forgetting previously learned patters. Methods to deal with this problem include rehearsing of the ANN on a selection of past data, or on generated new data points from the problem space (Robins 1996). Other techniques that use global partitioning are Support Vector Machines (SVM (see (Kecman 2001)) for a comparative study of ANN , fuzzy systems and SVM). Through learning, the SVM optimize the positioning of the hyper-planes to achieve maximum distance from all data items on both sides of the plane . b) Local partitioning (local learning) In case of a local learning the structural modifications of the system affect the partitioning of only a small part of the space where the current data example is drawn from. Each subspace is defined by a neuron. The activation of each neuron is defined by local functions imposed on its subspace. As example of such local functions, the kernels K are defined by formula:
K(x)=exp(-x 2 / 2)/ .,j2; while IK(x) = 1 for all x s Z
(4.3)
Other examples of local partitioning are when the space is partitioned by hyper-cubes and fractals in a 3D space. Before creating a model it is important to choose which type of partitioning would be more suitable for the task in hand. In the evolving connectionist systems presented later in this book, the partitioning is local. Local partitioning is easier to adapt in an on-line mode, faster to calculate, and does not cause catastrophic forgetting. 9. What knowledge representation is facilitated in the learning system?
4.2 Models ofLeaming in ConnectionistSystems
91
It is a well-known fact that one of the most important characteristics of the brain is that it can retain and build knowledge. However, it is not known yet how exactly the activities of the neurons in the brain are transferred into knowledge. For the purpose of the discussion in this chapter, knowledge can be defined as the information learned by a system that the system can interpret in different ways and can use in inference procedures to obtain new facts and new knowledge. Traditional ANNs and connectionist systems have been known as poor facilitators of representing and processing knowledge despite of some early investigations (Hinton 1989,1990). However, some of the issues of knowledge representation in connectionist systems have already been addressed in the so-called knowledge based neural networks (KBNN) (Towell and Shavlik 1993, Towell and Shawlik 1994, Cloete and Zurada 2000). KBNN are ANNs that are pre-structured in a way that allows for data and knowledge manipulation, which includes learning, knowledge insertion, knowledge extraction, adaptation and reasoning. KBNN have been developed either as a combination of symbolic AI systems and ANN (Towell et al. 1990), or as a combination of fuzzy logic systems and ANN (Jang 1993, Yamakawa et al. 1993, Furuhashi et al. 1994, Hauptmann and Heesche 1995, Kasabov 1996b). Rule insertion and rule extraction operations are examples of how a KBNN can accommodate existing knowledge along with data, and how it can 'explain' what it has learned. There are different methods for rule extraction that are applied to practical problems (Hayashi 1991, Kasabov 1996c, Duch et al. 1998, Kasabov 1998, Mitra and Hayashi 2000, Kasabov 2001). Generally speaking, learning systems can be distinguished based on the type of knowledge they represent:
a) No explicit knowledge representation isfacilitated in the system An example for such connectionist system is the traditional multilayer perceptron network trained with the backpropagation algorithm (Rosenblatt 1962, Amari 1967, Rumelhart et al. 1986, Arbib 1987, Werbos 1990). b) Memory-based knowledge The system retains examples, patterns, prototypes, cases, for example instance-based learning (Aha et al. 1991), case-based reasoning systems (Mitchell et al. 1997), and exemplar-based reasoning systems (Salzberg 1990). c) Statistical knowledge
92
4 Artificial Neural Networks(ANN)
The system captures conditional probabilities, probability distribution, clusters, correlation, principal components, and other statistical parameters (Bishop 1995). d) Analytical knowledge The system learns an analytical function f X -+ Y, that represents the mapping of the input space X into the output space Y. Regression techniques and kernel regressions in particular, are well established (Haykin 1994, Bishop 1995). e) Symbolic knowledge Through learning, the system associates information with pre-defined symbols. Different types of symbolic knowledge can be facilitated in a learning system as discussed further below. f) Combined knowledge The system facilitates learning of several types of knowledge. g) Meta-knowledge The system learns hierarchical level of knowledge representation where meta-knowledge is also learned, for example, which piece of knowledge is applicable and when. h) "Consciousness" ofa system The system becomes "aware" of what it is, what it can do, and where its position among the rest of the systems in the problem space is.
i) "Creativity" ofa system An ultimate type of knowledge would be such knowledge that allows the system to act creatively, to create scenarios, and possibly to reproduce itself, for example, a system that generates other systems (programs) improves in time based on its performance in the past. 1o. What type of symbolic knowledge is facilitated by the system? If we can represent the knowledge learned in a learning system as symbols, different types of symbolic knowledge can be distinguished: • • • • • • • •
Propositional rules First-order logic rules Fuzzy rules Semantic maps Schemata Meta-rules Finite automata Higher-order logic
4.3 Unsupervised Learning (SelfOrganizing Maps- SOM)
93
11. If the systems' knowledge can be represented as fuzzy rules, what types of fuzzy rules are facilitated by the system? Different types of fuzzy rules can be used, for example: • Zadeh-Mamdani fuzzy rules (Zadeh 1965, Mamdani 1997) • Takagi-Sugeno fuzzy rules (Takagi and Sugeno 1985) • Other types of fuzzy rules, e.g., type-2 fuzzy rules (for a comprehensive reading, see (Mendel 2001). Generally speaking, different types of knowledge can be learned from a process or from an object in different ways, all of them involving the human participation. They include direct learning by humans, simple problem representation as graphs, analytical formulas, using ANN for learning and rule extraction, etc. All these forms can be viewed as alternative and possibly equivalent forms in terms of final results obtained after a reasoning mechanism is applied on them. Elaborating analytical knowledge in a changing environment is a very difficult process involving changing parameters and formulas with the change of the data. If evolving processes are to be learned in a system and also understood by humans, neural networks that are trained in an on-line mode and their structure is interpreted as knowledge are the most promising models at present. 12. Is the learning process active? Humans and animals are selective in terms of processing only important information. They are searching actively for new information (Taylor 1999, Freeman 2000). Similarly, we can have two types of learning in an intelligent system: • Active learning in terms of data selection, filtering and searching for relevant data. • Passive learning - the system accepts all incoming data.
4.3 Unsupervised Learning (Self Organizing Maps - SOM)
4.3.1 The SOM Algorithm
Self organizing maps belong to the unsupervised artificial neural network modeling methods (Kohonen 1984). The model typically projects a high dimensional dataset on to a lower dimensional space. The SOM network consists of two layers: the input and the output layers. The dataset pre-
94
4 Artificial Neural Networks (ANN)
sented to the network is comprised of samples characterized by p descriptors - variables. Each sample is represented by a vector that includes all p descriptors and there are as many sample vectors as samples. The input layer is comprised of p nodes (neurons) (Fig. 4.4). The output layer forms a d-dimensiona1 map, where d < p. In this study, the map is in the form of a rectangular 2D grid with I by m neurons laid out on a hexagona11attice ( C = I x m neurons in the output layer). Each neuron Cj of the output layer, also called a cell, is linked to the neurons i = 1,2,... ,p of the input layer by connections that have weights wij associated with them, forming a vector wij. These weights represent the virtual values for each descriptor in each output neuron such that each output neuron or cell of the output layer Cj stores a virtual vector of connection weights wij. These virtual vectors represent the co-ordinates of centers of groups of similar input vectors, where similarity is measured in terms of Euclidean distance:
D(x,wj)
=
[L(i=l....p)(x;-wijif'
(4.4)
for all neurons (cells) Cj and with x a sample vector. The aim of the SOM algorithm is to organize the distribution of sample vectors in a d- dimensional space (in our case - two dimensional) using their relationship to the virtual vector distribution thus preserving the similarity and the difference between the input vectors. Similar input vectors are allocated to the same virtual vector and the virtual vector changes with the addition of new input vectors to it. The virtual vectors that are neighbors on the map (neighboring neurons) are expected to represent neighboring groups (clusters) of sample vectors; consequently, sample vectors that are dissimilar are expected to be distant from each other on the map.
Fig. 4.4. (a) Self-organizing map architecture. (b) The input layer is linked to the cells of the output layer by connections called weightswhich define the virtual assemble of the input variables. Lateral connections are treated within the neighborhood function
4.3 Unsupervised Learning (Self Organizing Maps - SOM)
95
Two different learning algorithms could be used in a SOM: sequential or batch. The first one is an incremental algorithm that is commonly used but learning is highly dependent on the order of input. The batch algorithm overcomes this drawback. Furthermore, the batch algorithm is significantly faster (Kohonen 1997). The process involves presenting the whole sample vectors as input to the SOM at once. Using a distance measure, the sample vectors are compared to the virtual vectors that have been randomly assigned to the output neurons at the beginning of the algorithm. Each sample vector is assigned to the nearest virtual vector according to the distance results and the virtual vectors are modified to the mean of the sample vectors that are assigned to it. Details about the algorithm can be found in (Kohonen 1984,1990,1997). At the end of the training, an output neuron has been determined for each sample vector such that each sample is then assigned to a neuron or cell of the map. 4.3.2 SOM Output
Sample Distribution
A direct result of the SOM algorithm is a distribution of the samples on the SOM topological map. According to the properties of the algorithm, the samples that are in the same cell are very similar, and similar to those in neighboring cells. They are less similar to samples that are in distant cells. Each cell unit is at this stage, a cluster and the SOM training procedure constitutes a clustering method, clustering the samples in cells and similar cells together. The approximate number of cells in the output layer can be defined using the formula:
C = 5";;;
(4.5)
where C is the number of cells and n is the number of training samples (sample vectors). Clustering Information
Despite that a SOM clusters samples onto the cells of the map it is of interest to define larger clusters by regrouping the neighboring cells that contain similar samples. The definition of larger clusters can be achieved using several methods. A method well-known to experienced SOM users is the unified-matrix (U-matrix) approach (Utlsh and Siemon 1990). The U-
96
4 Artificial Neural Networks (ANN)
matrix displays the distances between the virtual sites and provides a landscape formed by light plains separated by dark ravines. Another method is a classical clustering analysis of SOM output using any of the classical distance measures and linkages, or the K-means method. These methods are applied to the results of the SOM model, or more precisely on the virtual vector for each neuron of the output layer. Visualization of Input Variables
To analyze the contribution of input variables to cluster structures of the trained SOM, each input variable and the connection weight of its associated descriptor calculated for each virtual vector during the training process, can be visualized in grey scale on the SOM map. Remembering that each cell of the map is represented by a virtual vector, and also that each virtual vector is composed of as many weight values as descriptors, it is possible to visualize each descriptor's weight values associated with each neuron or cell of the trained SOM map. A map can be visualized separately for each descriptor. Relationship Between Multiple Descriptors It may be important to investigate the relationship between sets of descriptors (input variables) and try to find meaningful patterns of their values in combination. For example, it might be important to investigate the relationship between biological and environmental variables across the samples. A second set of descriptors for each sample can be introduced into the SOM and trained along with the first set of descriptors. Initially, each descriptor set is submitted to the trained SOM, and then, the mean value of each descriptor in the descriptor sets in each output of the trained SOM is calculated. If a neuron was not occupied by input vectors, the value is replaced with the mean value of neighboring neurons. These mean values assigned on the SOM map can once again be visualized in grey scale and then compared with the map of the samples as well as other descriptor maps.
The Connection Weights
As described previously, the two layers of the SOM network are linked by connections that are called weights. The set of weight values for each output neuron comprises a virtual vector for that neuron. These weights represent the coordinates of each output neuron in a multidimensional space with as many dimensions as descriptors. But in case of binary data, be-
4.3 Unsupervised Learning (Self Organizing Maps - SOM)
97
°
cause the observed or real values are or 1, the virtual values are constrained between and 1. It should be immediately obvious that these values can be used as some sort of measurement, evaluation, gradient, or as an index depending on the content of the data set and the meaning of the descriptors. One interpretation theory is given in the next section.
°
Interpretation by the Fuzzy Set Theory In the classical approach of the set theory, if a subset A of a set E is considered, the characteristic function is XA . XA is a two-valued function taking its values in {O,l} and defined as follows: If
xEA,xAx)=l
(4.6)
If
x Il A,xAx)= 0
(4.7)
However, in some cases the membership of an element to a given subset is imprecise. (Zadeh 1965) proposed fuzzy set theory to explicitly account for this. In this case, the characteristic function is replaced by the membership functionji, wherejA is a real-valued function taking its values in [0,1]. Now the functionjA(x) gives an indication of the degree of truthfulness for x to be member to the fuzzy subset A. 4.3.3 SOM for Brain and Gene Data Clustering SOM can be trained on unlabeled data and can be used to identify groups (clusters) of data samples grouped based on their similarity. Such data can be 64 EEG channel data, for example, and the SOM can identify similar channel vectors, visualizing them in close topological areas on the map. Gene expression data of tens and hundreds of genes can also be clustered, visualized and explored in terms of how similar the gene expression vectors are. This is illustrated in Fig. 4.5, where the brain cancer data is used after the top 12 genes are selected (See Fig. 1.6). In Fig. 4.5, the top left map shows the SOM output derived from 60 samples and 12 input gene expression variables from the eNS data (Pomeroy et al. 2002). In the top right map the class labels are mapped (class survival in the left blob, and class fatal - on right side) and the bottom three maps show the contribution of gene Gl, G3 and G4 respectively. None of them on its own can discriminate correctly the samples and a good discrimination is achieved through their interaction and pattern formation.
98
4 Artificial Neural Networks (ANN)
. ..
00
0:
••
O'
O'
1(1
Fig. 4.5. SOM output derived from 60 samples and 12 input gene expression variables from the eNS data (Pomeroy et al. 2002) (See Fig. 1.6) - the top left map; in the top right map the class labels are mapped (class survival in the left blob, and class fatal - on right side); the bottom three maps show the contribution of gene G 1, G3 and G4 respectively - none of them on its own can discriminate correctly the samples and a good discrimination is achieved through their interaction and pattern formation. The software system ViscoverySOMine was used for this experiment (http://www.somine.info/). See Color Plate 1
4.4 Supervised Learning
4.4.1 Multilayer Perceptron (MLP)
Multilayer perceptrons (MLP) have a feedforward architecture (see Fig. 4.1). Multilayer perceptrons (MLP) trained with a backpropagation algorithm (BP) use a global optimization function in both on-line (pattern mode) training, and in a batch mode training (Rumelhart et al. 1986, Amari 1990, Werbos 1990, Saad 1999). In the on-line, pattern learning mode of the backpropagation algorithm, after each training example is presented to the system and propagated through it, an error is calculated and then all
4.4 Supervised Learning
99
connections are modified in a backward manner. This is one of the reasons for the phenomenon called catastrophic forgetting - if examples are presented only once, the system tends to forget to react properly on previously used examples (Robins 1996). MLP can be trained in an on-line mode, but they have limitations in this respect as they have a fixed structure and the weight optimization is a global one if a gradient descent algorithm is used for this purpose. A very attractive feature of the MLP is that they are universal function approximators (Cybenko 1989, Funihashi 1989, Hornik et al. 1989, Kurkova 1991) even though in some cases they may converge in a local minimum. Some connectionist systems, that include MLP , use local objective (goal) function to optimize the structure during the learning process. In this case when a data pair (x, y) arrives, the system optimizes its functioning always in a local vicinity of x from the input space X, and in the local vicinity of y from the output space Y. In MLP, the activation function of formal neurons is in the shape of sigmoid function , in the so-called RBF networks the neurons have a radial basis activation function, most frequently a Gaussian.
4.4.2 MLP for Brain and Gene Data Classification MLP can be trained on labeled data and can be used to identify the class a new sample belongs to. Such data can be EEG channel, data measuring different states of the brain labeled by categories (class labels) . MLP can be trained on this data and then used to identify for a new data sample, which brain state (class label) it belongs to. Gene expression data of several genes , identifying different categories (class labels) can be used to train a MLP. The trained system can be used to classify a new gene expression vector to one of the pre-defined categories. In (Khan et al. 1998) a multilayer perceptron ANN was used to achieve a classification of 93 % of Ewings sarcomas, 96 % of rhabdomyosarcomas and 100 % of neuroblastomas. From within a set of 6567 genes, 96 genes were used as variables in the classification system. Whether these results would be different using different classification methods needs further exploration.
Example The CNS cancer gene expression data from (Pomeroy et al. 2002) is used here to build a classifier based on MLP. First, 12 features (genes) are se-
100
4 Artificial Neural Networks (ANN)
lected from the whole data set using t-test, rather than using SNR as it was in Fig.1.6. The selected features are shown in Fig. 4.6. Figure 4.7 shows the projection of the CNS data and the selected 12 features into a PCA space that shows that the features can possibly be successfully used for building a classifier. After the features are selected and evaluates, a MLP classifier is built (Fig. 4.8). A MLP that has 12 inputs (12 gene expression variables), 1 output (the class - control or cancer) and 5 hidden nodes is trained on all 60 samples of the gene expression data from the CNS cancer case study data. The error decreases with the number of iterations applied (Fig. 4.8). (The experiments are performed in a software environment NeuCom (www.theneucom.com). For a full validation of the classification accuracy of the method used (here it is MLP) and the features selected (here they are 12 genes) a crossvalidation experiment needs to be done, where a model is trained on part of the data and tested for generalization on the other part (Baldi and Brunak 2001, Kasabov 2006).
..... '_0--. ICNS__ " 3 " - " - 1'-"":-:oA:7"..-,...,.-----------,3 J:
5
I "'-
10.....
I.....,... I"""",odn__
•
3 3
5 3
5 2
5 1
5 I ...
GIl52 G327 G3'8 Gl\oI. G2196 GElS G3;A5 G33lllG2'96 G2996 G3185 G1Q5.l
"''''Iblt,
Fig. 4.6. Selecting the top 12 genes from the case study CNS cancer data (Pomeroy et al. 2002) as it was shown in Fig.1.6, but here using the t-test method. The selected genes compare well with the selected in Fig.1.6 genes. (A proprietary software system SIFTWARE (www.peblnz.com) was used for the purpose). See Color Plate 2
4.4 Supervised Learning
-
...
101
1r:·_=::_=r;I"";:;:I;:;:::::======::;r~======~;;;::::;:====;:==i OJ ... "
II
~
j
t
."
J.
," .." , '
",
J e ••~". '"
fJ .: a.a·. .'.
..-".
,.
Gll52
'" 1 eo ~ eo
f
•
j
Cl
~ 3l
-_.
1 :2 J 4 5
s
D' 0 _ _ - - - . .-
789 11J11 121314
~C~l
....
0-
Fig. 4.7. The discriminative power of the selected genes 12 genes in Fig. 4.6 is evaluated through Principal Component Analysis (PCA) method. It is seen that the first PC has a significant importance in keeping the samples still distant after the PCA transformation. (A proprietary software system SIFTWARE (www.peblnz.com) was used for the purpose of the analysis). See Color Plate 2
1".. ....... 0 .....
lorsl..... T.....L-...~'..
31
es-
CD
...........
13
II
",--------------==='l ~P.
N......,..hddIooo_~
~ • •--O ~
to
r-;;r-
~v... ~ ~ OwIcur~,--~ ..-..-,-~
-~
Fig. 4.8. MLP that has 12 inputs (12 gene expression variables from Fig. 4.6), 1 output (the class of survivals vs not responding to treatment) and 5 hidden nodes is trained on all 60 samples of the gene expression data from the CNS cancer case study data. The error decreases with the number of iterations applied (altogether 500). (The experiments are performed in a software environment NeuCom (www.theneucom.com). See Color Plate 3
102
4 Artificial Neural Networks (ANN)
4.5 Spiking Neural Networks (SNN) SNN models are more biologically plausible to brain principles than any of the above ANN methods. Spiking model of a neuron - element of the spiking neural network (SNN) communicates with other neurons in the network by means of spikes (Maass and Bishop 1999, Gerstner and Kistler 2002). Neuron i receives input spikes from presynaptic neurons jEf';, where T, is a pool of all neurons presynaptic neurons to neuron i (Fig. 4.9). This is a more biologically realistic model of neuron that is currently used to model various brain functions, for instance the pattern recognition in the visual system (Delorme et al. 1999, Delorme and Thorpe 2001). We will describe the Spike Response Model (SRM) as a representative of spiking neuron models that are all variations of the same theme (Gerstner and Kistler 2002). In SRM, the state of a neuron i is described by the state variable u;(t) that can be interpreted as a total somatic postsynaptic potential (PSP). The value of the state variable u;(t) is the weighted sum of all excitatory and inhibitory synaptic PSPs, Gif (t - t j - ~if)' such that: u;(t) = LLJij&/I-lj-~ij)
(4.8)
jEfj liEF]
where: I', is the pool of neurons presynaptic to neuron i, F; is the set of times f.j < t when presynaptic spikes occurred, and l1ij is an axonal delay between neurons i and j, which increases with Euclidean distance between neurons in the network. The weight of synaptic connection from neuron j to neuron i is denoted by Jij. It takes positive (negative) values for excitatory (inhibitory) connections, respectively. When u;(t) reaches the firing threshold 9;(/) from below, neuron i fires, i.e. emits a spike (see Fig. 4.10). The moment of 9;(t) crossing defines the firing time I; of an output spike.
Fig. 4.9. Spiking model of a neuron sends and receives spikes to and from other neurons in the network, exactly like biological neurons do
4.5 Spiking Neural Networks (SNN)
103
Immediately after firing the output spike at t;, neuron's firing threshold 9;(1) increases k-times and then returns to its initial value .90 in an exponential fashion. In such a way, absolute and relative refractory periods are modeled: 9;(t-tJ = k x90
t-t J exp( ----:;:
(4.9)
where '() is the time constant of the threshold decay. Synaptic PSP evoked on neuron i when a presynaptic neuron) from the pool 1; fires at time tj , is expressed by the positive kernel Bij(t -t j - t:.ij)= Bij(S) such that Bij(s)=A
(exp[__ s ]-exp(-~J] t'decay
(4.10)
T rIse
where t's are time constants of the decay and rise of the double exponential, respectively, and A is the amplitude of PSP. To make the model more biologically realistic, each synapse, be it excitatory or inhibitory one, can have a fast and slow component of its PSP, such that Btype(S) = Atype(exp[_ _ s lj type T decay
]_exp(__ type J]
(4.11)
S
T rise
where type denotes one of the following:fast_excitation,fast_inhibition, slow_excitation, and slow_inhibition, respectively. These types ofPSPs are based on neurobiological data (Destexhe 1998, Deisz 1999, Kleppe and Robinson 1999, White et al. 2000). Thus, in each excitatory and inhibitory synapse, there can be a fast and slow component of PSP, based on different types of postsynaptic receptors (listed in Table 4.1). Table 4.1 represents a relationship between the activity of a neuron and its molecular (protein, gene) basis. This is in the core ofthe CNGM developed later in Chap. 8. As an example of network comprised of spiking neurons we present the architecture that can be used for modeling the generation of cortical local field potential (LFP) (see Fig. 4.11). Model neural network has two layers. The input layer is supposed to represent the thalamus (the main subcortical sensory relay in the brain) and the output layer represents cerebral cortex. Individual model neuron can be based upon the classical Spike Response Model (SRM) (Gerstner and Kistler 2002). The weight of synaptic connection from neuron} to neuron i is denoted by Jij. It takes positive (negative) values for excitatory (inhibitory) connections, respectively. Lateral and input connections have weights that decrease in value with distance from the
l04
4 Artificial Neural Networks (ANN)
center neuron i according to a Gaussian formula while the connections themselves can be established at random (for instance with p = 0.5). For instance, the asynchronous thalamic activity in the awake state of the brain can be simulated by series of random input spikes generated in the input layer neurons. For the state of vigilance, a tonic, low-frequency, nonperiodic and non-bursting firing of thalamocortical input is typical (Beierlein et al. 2002). For simulation of the sleep state we can employ regular oscillatory activity coming out of the input layer, etc. LFP can be defined as an average of all instantaneous membrane potentials, i.e.: I
<1>(t) =
(4.12)
LUi (t) N N
-
H
a
b
u;(I)
9;(1 - 1/)
---
90
I,'
t/
time (ms)
"")~ C
90
time (ms)
Iti
"")r 1~ <,>"
)
time (rn s )
Fig. 4.10. (a) Suprathresholdsummation ofPSPs in the spiking neuron model. Af-
ter each generation of postsynaptic spike there is a rise in the firing threshold that decays back to the resting value between the spikes. (b) Subthreshold summation of PSPs that does not lead to the generation of postsynaptic spike, but still can contribute to the generation of LFP/EEG. (c) PSP is generated after some delay taken by the presynaptic spike to travel from neuronj to neuron i
Spiking neurons can be interconnected into neural networks of arbitrary architecture similarly like the traditional formal neurons. At the same time it has been shown that SNN have the same computational power as traditional ANNs (Maass and Bishop 1999). With spiking neurons, however, new types of computation can be modeled, like coincidence detection, synchronization phenomena, etc. (Konig et al. 1996). Spiking neurons are more easily implemented in the hardware than traditional neurons (Tikovic et al. 2001) and integrated with neuromorphic systems (Smith and Hamilton 1998).
4.6 Summary
105
Table 4.1. Neuronal parameters and their related protein s in our model of SNN. This is used in the CNGM in Chap. 8. Neuron's parameter Pj , i.e. amplitude and time constants of: Fast excitation PSP Slow excitation PSP Fast inhibition PSP Slow inhibition PSP Firing threshold
Relevant protein(s) Pt
*
AMPAR NMDAR GABRA GABRB SCN, KCN, CLCN
*Abbreviations: PSP = postsynaptic potential, AMPAR = (amino- methylisoxazole- propionic acid) AMPA receptor, NMDAR = (N-methyl-D-aspartate acid) NMDA receptor, GABRA = (gamma-aminobutyric acid) GABA A receptor, GABRB = GABA B receptor , SCN = Sodium voltage-gated channel, KCN = kalium (potassium) voltage-gated channel, CLCN = chloride channel. b
Fig. 4.11. (a) ANN model of the thalamocortical (TC) system. (b) The SNN represents the cortex and the input layer the thalamus. About 10-20% of N neurons are inhibitory (filled circles) . The model does not have a feedback from the cortex to the thalamus
4.6 Summary This chapter presented some principles and basic models of ANN and also demonstrated the use of these models on brain, or on gene expression data, but not on brain-and-gene data together. Artificial neural networks are very sophisticated modeling techniques capable of modeling extremely complex functions. Traditional ANNs like MLP suffer from frequent convergence to local minima of the error function. They are also difficult to train and for practical applications require a lot of experimentation with the number of hidden neurons and other parameters. In spite of that, ANNs are applicable in virtually every situation in which a relationship between the predictor variables (independent variables, inputs) and predicted variables
106
4 Artificial Neural Networks (ANN)
(dependent variables, outputs) exists, even when that relationship is very complex and not easy to articulate in the usual terms of correlations or differences between classes. ANNs also keep in check the curse of dimensionality problem that bedevils attempts to model nonlinear functions with large numbers of variables (Bishop 1995). ANNs learn by example. The neural network user gathers representative data, and then invokes training algorithms to automatically learn the structure of the data. All these advantages are carried over to a new class of ANNs called spiking neural networks, or pulse-coupled networks, since they are comprised of neuron models emitting and receiving spikes like the biological neurons. These latter models are nowadays used to model neural functions. However, in spite of growing knowledge about genetic influence upon neural functions, computational models of brain and neural functions lack this important component. Classical ANNs with sigmoid neurons are no longer considered to be faithful models of the brain or neural system in general. Instead, spiking neuron models are currently preferred in this respect. SRM that we described in the previous section is a highly simplified spiking model of a neuron and neglects many aspects of neuronal dynamics. In particular, all postsynaptic potentials are assumed to have the same shape. The form of postsynaptic potentials depends on the location of the synapse on the dendritic tree. Synapses that are located at the distal end of the dendrite are expected to evoke a smaller postsynaptic response at the soma than a synapse that is located directly on the soma. If several inputs occur on the same dendritic branch within a few milliseconds, the first input will cause local changes of the membrane potential that influence the amplitude of the response to the input spikes that arrive slightly later. Such nonlinear interactions between different presynaptic spikes are neglected in the SRM. A purely linear dendrite, on the other hand, can be incorporated into the model. For more detailed models of biological neurons other models may be more suitable, which one can find in General Neural Simulation System (GENESIS) (Bower and Beeman 1998) and NEURON (Carnevale and Hines 2006). However, these more detailed models are computationally very expensive and have many parameters to fit. In spite of that they are widely used to model neural functions especially at a one cell level or at the level of relatively small networks. One of the problems of ANNs used to model brain functions is that they do not contain genes, in spite it is known genes playa crucial role in determining the functioning of brain neural networks in norm and disease (Chin and Moldin 2001). In Chap. 8, we will take modeling of brain functions one step further by incorporating genes and gene interactions as causal forces for neural dynamics.
5 Evolving Connectionist Systems (ECOS)
This chapter extends Chap. 4 and presents another type of ANNs that evolve their structure and functionality over time from incoming data and learn rules in an adaptive mode. They are called ECOS (Kasabov 2002b, Kasabov 2006). ECOS learn local models allocated to clusters of data that can be modified and created in an adaptive mode, incrementally. Several ECOS models are presented along with examples of their use to model brain and gene data.
5.1 Local Learning in ECOS Evolving connectionist systems (ECOS) are modular connectionist-based systems that evolve their structure and functionality in a continuous, selforganized, on-line, adaptive, interactive way from incoming information; they can process both data and knowledge in a supervised and/or unsupervised way (Kasabov 2002b, Kasabov 2006). ECOS learn local models from data through clustering of the data and associating a local output function for each cluster. Clusters of data are created based on similarity between data samples either in the input space (this is the case in some of the ECOS models, e.g. the dynamic neurofuzzy inference system DENFIS (Kasabov and Song 2002), or in both the input space and the output space (this is the case in the EFuNN models (Kasabov 2001)). Samples that have a distance to an existing cluster center (rule node) N ofless than a threshold Rmax (for the EFuNN models it is also needed that the output vectors of these samples are different from the output value of this cluster center in not more than an error tolerance E) are allocated to the same cluster Nc. Samples that do not fit into existing clusters, form new clusters as they arrive in time. Cluster centers are continuously adjusted according to new data samples, and new clusters are created incrementally. The similarity between a sample S = (x,y) and an existing rule node N = (WI, W2) can be measured in different ways, the most popular of them being the normalized Euclidean distance:
108
5 Evolving Connectionist Systems (ECOS)
L(X; _~(i))2 d(S,N) =.:...(i=.:...I,.:... .../7..:...)
(5.1)
_
n where n is the number of the input variables. ECOS learn from data and automatically create a local output function for each cluster, the function being represented in the W2 connection weights, thus creating local models. Each model is represented as a local rule with an antecedent - the cluster area, and a consequent - the output function applied to data in this cluster, e.g.:
IF (data is in cluster Nc)
(5.2)
THEN (the output is calculated with a function Fc) Implementations of the ECOS framework require connectionist models that support these principles. Such model is the Evolving Fuzzy Neural Network (EFuNN).
5.2 Evolving Fuzzy Neural Networks EFuNN In EFuNN the nodes representing membership functions (MF) can be modified during learning. Each input variable is represented here by a group of spatially arranged neurons to represent a fuzzy quantization of this variable. For example, three neurons can be used to represent "small", "medium" and "large" fuzzy values of the variable. Different MFs can be attached to these neurons. New neurons can evolve in this layer if, for a given input vector, the corresponding variable value does not belong to any of the existing MFs to a degree greater than a set threshold. A new fuzzy input neuron, or an input neuron, can be created during the adaptation phase of an EFuNN. An optional short-term memory layer can be used through feedback connections from the rule node layer (see Fig. 5.1). The layer of feedback connections could be used if temporal relationships between input data are to be memorized structurally. The third layer contains rule nodes that evolve through supervised/unsupervised learning. The rule nodes represent prototypes of inputoutput data associations, graphically represented as an association of hyper-spheres from the fuzzy input and fuzzy output spaces. Each rule node r is defined by two vectors of connection weights - WIer) and W2(r), the latter being adjusted through supervised learning based on the output error, and the former being adjusted through unsupervised learning based on a
5.2 Evolving FuzzyNeural Networks EFuNN
109
similarity measure within a local area of the problem space. The fourth layer of neurons represents fuzzy quantization for the output variables, similar to the input fuzzy neurons representation. The fifth layer represents the real values for the output variables.
Fuzzy outputs Wz
Rule (case) layer
~r..r-=f--~~::::--.-........ Fuzzy input
layer
Inputs
XI. XZ, ••• , Xn
Fig. 5.1. An EFuNN architecture with a short term memory and feedback connections (adapted from (Kasabov 2001))
The evolving process can be based on two assumptions, that either no rule nodes exist prior to learning and all of them are created during the evolving process, or there is an initial set of rule nodes that are not connected to the input and output nodes and become connected through the learning process. Each rule node (e.g., r,) represents an association between a hypersphere from the fuzzy input space and a hyper-sphere from the fuzzy output space (see Fig. 5.2), the WI(rj) connection weights representing the coordinates of the centre of the sphere in the fuzzy input space, and the W2(rj) - the coordinates in the fuzzy output space. The radius of an input hypersphere of a rule node is defined as (1 - Sthr), where Sthr is the sensitivity threshold parameter defining the minimum activation of a rule node (e.g., rl) to an input vector (e.g., (Xd 2 , Yd2 ) ) in order for the new input vector to be associated to this rule node. Two pairs of fuzzy input-output data vectors d, = (XdI, Yd l) and d 2 = (Xd 2, Yd2) will be allocated to the first rule node ri if they fall into the rl
110
5 EvolvingConnectionist Systems(ECOS)
input sphere and in the ri output sphere, i.e. the local normalized fuzzy difference between XdI and Xd-, are correspondingly smaller than the radius r and the local normalized fuzzy difference between Yd I and Ydz is smaller than an error threshold (Errthr) .
...........
~
i
I I - Sthr
Fig. 5.2. Each rule created during the evolving process associates a hyper-sphere from the fuzzy input space to a hyper-sphere from the fuzzy output space. Through accommodating new nodes the centre of the rule node moves slightly (adapted from (Kasabov2001))
The local normalized fuzzy difference between two fuzzy membership vectors dIf and d Zj that represent the membership degrees to which two real values di and d: data belong to the pre-defined MF are calculated as D(d Ij , dzj) = I IdIf- dz~ /I (dIf + d zj). For example, if d Ij = [0, 0, 1,0,0,0] and dzj = [0, 1,0,0,0,0], then D(dIj, dzj) = (l + 1) / 2 = 1 which is the maximum value for the local normalized fuzzy difference. If data example d, = (XdI, YdI) where Xd I and YdI are correspondingly the input and the output fuzzy membership degree vectors, and the data example is associated with a rule node rI with a centre rI], then a new data point d: = (Xdz, Ydz), that is within the shaded area as shown in Fig. 5.2, will be associated with this rule node too. Through the process of associating (learning) new data points to a rule node, the centre of this node hyper-sphere is adjusted in the fuzzy input space depending on a learning rate lr, and in the fuzzy output space de-
5.2 Evolving Fuzzy Neural Networks EFuNN
III
pending on a learning rate Ir2' as it is shown in Fig. 5.2 on two data points. The adjustment of the centre rl I to its new position rl 2 can be represented mathematically by the change in the connection weights of the rule node rl l) l) from WI(rl and W2(rl to W I(rI 2) and W2(r j2) as it is presented in the following vector operations:
Wr (1[2) = Wr (ri) + lr; * Ds(XdpXdJ
(5.3)
(5.4) where: Erri Yd», Yd2) = Ds(YdI, Yd2) = Yd, - Yd2 is the signed value rather than the absolute value of difference vector; A I(rl I) is the activation of the rule node rl I for the input vector Xd2. The idea of dynamic creation of new rule nodes over time for a time series data is graphically illustrated in Fig. 5.3. While the connection weights from WI and W2 capture spatial characteristics of the learned data (centers of hyper-spheres), the temporal layer of connection weights W3 from Fig. 5.1 captures temporal dependences between consecutive data examples. If the winning rule node at the moment (t - 1) (to which the input data vector at the moment (t - 1) was associated) was rl = inda, (t - 1), and the winning node at the moment t is ri = indal(t), then a line between the two nodes is established as follows: ( 2)(t-l) +r I 3 * Arl[ ( )(t-l) W31[,r ( 2) (1) -- W31[,r
* Arr ( 2)(1)
(5.5)
t
where: AI(ri ) denotes the activation of a rule node r at a time moment (t); lr, defines the degree to which the EFuNN associates links between rules (clusters, prototypes) that include consecutive data examples (if Ir3 = 0, no temporal associations are learned in an EFuNN). The learned temporal associations can be used to support the activation of rule nodes based on temporal, pattern similarity. Here, temporal dependences are learned through establishing structural links. These dependences can be further investigated and enhanced through synaptic analysis (at the synaptic memory level) rather than through neuronal activation analysis (at the behavioral level). The ratio (spatial - similarity) / (temporal - correlation) can be balanced for different applications through two parameters Ss and Tc such that the activation of a rule node r for a new data example dnew is defined as the following vector operations: (5.6) where: f is the activation function of the rule node r, D(r, dnew) is the normalized fuzzy difference value and r(t-l) is the winning neuron at time moment (t - 1).
112
5 Evolving Connectionist Systems (ECOS)
Several parameters were introduced so far for the purpose of controlling the functioning of an EFuNN. Some more parameters will be introduced later, that will bring the EFuNN parameters to a comparatively large number. In order to achieve a better control of the functioning of an EFuNN structure, the three-level functional hierarchy is used, namely: genetic level, long-term synaptic level, and short-term activation level. output
Input data overtime
Fig. 5.3. The rule nodes in an EFuNN evolve in time depending on the similarity in the inputdata
At the genetic level, all the EFuNN parameters are defined as genes in a chromosome, these are: 1. Structural parameters, e.g., number of inputs, number of MF for each of the inputs, initial type of rule nodes, maximum number of rule nodes, number of MF for the output variables, number of outputs. 2. Functional parameters, e.g., activation functions of the rule nodes and the fuzzy output nodes; mode of rule node activation (,one-of-n', or 'many-of-a') depending on how many activation values of rule nodes are propagated to the next level); learning rates lr., Ir2 and Ir3; sensitivity threshold (Sthr) for the rule layer; error threshold (Errthr) for the output layer; forgetting rate; various pruning strategies and parameters, as explained in the EFuNN algorithm below.
5.3 The Basic EFuNN Algorithm In an EFuNN, a new rule node r; is connected and its input and output connection weights are set. The EFuNN algorithm, to evolve EFuNNs
5.3 The Basic EFuNN Algorithm
113
from incoming examples, is given below as a procedure of consecutive steps (Kasabov 200 I). Vector and matrix operation expressions are used to simplicity of presentation. 1. Initialize an EFuNN structure with maximum number of neurons and no (or zero-value) connections. Initial connections may be set through inserting fuzzy rules in the structure. If initially there are no rule nodes connected to the fuzzy input and fuzzy output neurons, then create the first node r, = I to represent the first example d, and set its input Wj(rn) and output Wz(rn ) connection weight vectors as follows:
Wj(rn )
=
EX; Wz(rn )
=
(5.7)
TE
where TE is the fuzzy output vector for the current fuzzy input vector
EX 2. WHILE DO Enter the current example (Xd; Ydi ) , EX denoting its fuzzy input vector. If new variables appear in this example, which are absent in the previous examples, create new input and / or output nodes with their corresponding membership functions. 3. Find the normalized fuzzy local distance between the fuzzy input vector EX and the already stored patterns (prototypes, exemplars) in the rule (case) nodes, ri, rj = rl, ri. ... , rn , (5.8)
4. Find the activation zt.Ijj) of the rule (case) rj, rj = rj, rz. ... , r.; Here, radial basis, radbas, activation, or a saturated linear one, satlin, can be use, i.e. Al (r j
)
= radbas(D(EX,rj »,or
Al (r j ) = satlin(l - D(EX,rj »
(5.9)
The former may be appropriate for function approximation tasks, while the latter may be preferred for classification tasks. In case of the feedback variant of an EFuNN, the activation is calculated as explained above: Aj(r) = radbas(Ss *D(EX,r)- ti- WJ),or Al(r i ) = satlin(l- Ss* D(EX,rj ) + Tc * WJ )
(5.10)
114
5 Evolving Connectionist Systems (ECOS)
5. Update the pruning parameter value for the rule nodes, e.g. age, average activation as pre-defined in the EFuNN chromosome. 6. Find all case nodes rj with an activation value A 1(rj) above a sensitivity threshold Sthr. 7. If there is no such case node, then using the procedure from step 1. ELSE 8. Find the rule node inda, that has the maximum activation value (e.g., maxa.i. 9. There are two modes: 'one-of-n' and 'many-ofn'. (a) In case of 'one-ofn' EFuNNs, propagate the activation maxa, of the rule node inda, to the fuzzy output neurons. (5.1 1)
(b)
In case of 'many-of-n' mode, the activation values of all rule nodes that are above an activation threshold of Athr are propagated to the next neuronal layer. Find the winning fuzzy output neuron inda- and its activation maxa-.
10. Find the desired winning fuzzy output neuron indt, and its value maxt-. 11. Calculate the fuzzy output error vector: Err = A 2 - TE. 12. IF (inda, is different from indt2) or (D(A 2, TE) > Errthr), ELSE Update: (a) the input, (b) the output, and (c) the temporal connection vectors (if such exist) of the rule node k = inda, as follow: (a)
(b)
(c)
Ds(EX, WI(k)) = EX - WI (k); WI(k) = WI(k) + lr, * (5.12) Ds(EX, WI(k)), where lr, is the learning rate for the first layer; Ds(EX, WI(k)) = EX - WI (k); WI(k) = WI(k) + lr, * (5.13 Ds(EX, WICk)), where tr, is the learning rate for the first layer; Wll, k) = W3(l, k) + lrs * AI(k) * AI(lit- 1J, here I is the (5.14 winning rule neuron at the previous time moment (t-l),
5.3 The Basic EFuNN Algorithm
115
and A I (l)(t-I) is its activation value kept in the short term memory. 13. Prune rule nodes} and their connections that satisfy the following fuzzy pruning rule to a pre-defined level: 14. IF (a rule node rj is OLD) AND (average activation A1airj) is LOW) AND (the density of the neighboring area of neurons is HIGH or MODERATE (i.e. there are other prototypical nodes that overlap with) in the input-output space; this condition apply only for some strategies of inserting rule nodes as explained in a sub-section below) THEN the probability of pruning node (rj) is HIGH. The above pruning rule is fuzzy and it requires that the fuzzy concepts of OLD, HIGH, etc., are defined in advance (as part of the EFuNN's chromosome). As a partial case, a fixed value can be used, e.g. a node is OLD if it has existed during the evolving of an EFuNN from more than 1000 examples. The use of a pruning strategy and the way the values for the pruning parameters are defined depends on the application tasks.
15. Aggregate rule nodes, if necessary, into a smaller number of nodes. 16. END of the while loop and the algorithm. 17. Repeat steps 2 to step 17 for a second presentation of the same input data. With good dynamic characteristics, the EFuNN model is a novel efficient model especially for on-line tasks. The EFuNN model has the following major strong points: • • • • •
Incremental Fast learning (possibly 'one pass') On-line adaptation 'Open'structure Allowing for time and space representation based on biological plausibility • Rule extraction and rule insertion
116
5 Evolving Connectionist Systems(ECOS)
5.4 DENFIS The Dynamic Evolving Neural-Fuzzy Inference Systems (DENFIS) is also based on the ECOS principle and motivated by EFuNNs. DENFIS has an approach similar to EFuNNs especially similar to EFuNNs' m-of-n mode. DENFIS is a kind of dynamic Takagi-Sugeno type fuzzy inference systems (Takagi and Sugeno 1985). An evolving clustering method (ECM) is used in DENFIS models to partition the input space for creating the fuzzy rules. DENFIS evolve through incremental, hybrid (supervised/unsupervised), learning and accommodate new input data, including new features, new classes, etc. through local element tuning. New fuzzy rules are created and updated during the operation of the system. At each time moment the output of DENFIS is calculated through a fuzzy inference system based on mmost activated fuzzy rules which are dynamically selected from the existing fuzzy rule set. As the knowledge, fuzzy rules can be inserted into DENFIS before, or during its learning process and, they can also be extracted during the learning process or after it. The fuzzy rules used in DENFIS are indicated as follows:
R,: if XI is F Il and X2 is F 12 and ... and Xp is F IP,
(5.15)
then y, = b» + blIxI + b12X2 +... + btpxp where "Xi is F,", 1= 1,2, ... m;j = 1,2, ... P, are M x P fuzzy propositions that form m antecedents for m fuzzy rules respectively; xj>j = 1,2, ... , P, are antecedent variables defined over universes of discourse ~,j = 1, 2, ... , P, and Fij, 1= 1,2, ... M;j = 1,2, ... , P are fuzzy sets defined by their fuzzy membership functions !!Fij: X, ~ [0, 1], I = 1, 2, M; j = 1, 2, ... , P. In the consequent parts of fuzzy rules, y" 1= 1,2, m, are the consequent variables defined by linear functions. In DENFIS, Fij are defined by the following Gaussian type membership function GaussianMF = a exp [-
(x _m)2]
(5.16)
20'2
When the model is given an input-output pair (Xi, di ) , it calculates the following output value:
Color Plate 1
...
-
-- - -
· 1-
II
.. .. .. .. ... .. ••
II
"
••
•
...
-
..
I .:J
.
~._-
.---- _....
~ c
~ ~
Fig. 1.6. 12 genes selected as top discriminating genes from the Central Nervous System (CNS) cancer data that discriminates two classes - survivals and not responding to treatment (Pomeroy et al. 2002). The NeuCom software system is used for the analysis (www.theneucom.com) and the method is called "Signal-toNoise ratio"
Fig. 4.5. SOM output derived from 60 samples and 12 input gene expression variables from the CNS data (Pomeroy et al. 2002) (See Fig. 1.6) - the top left map; in the top right map the class labels are mapped (class survival in the left blob, and class fatal - on right side); the bottom three maps show the contribution of gene G1, G3 and G4 respectively - none of them on its own can discriminate correctly the samples and a good discrimination is achieved through their interaction and pattern formation. The software system ViscoverySOMine was used for this experiment (http://www.somine.info/)
Color Plate 2
..
\ . - - . I _ .... \r..-........
.
-
Fig. 4.6. Selecting the top 12 genes from the case study CNS cancer data (Pomeroy et al. 2002) as it was shown in Fig.1.6, but here using the t-test method. The selected genes compare well with the selected in Fig.1.6 genes. (A proprietary software system SIFTWARE (www.peblnz.com) was used for the purpose)
.[rt
• -
_o-
.....
".
..e : ._
x
....
lW, --
...;..;;;.....-
--
Mol
-
-,
"JI rr.!L..J
.!J
,. ............ ,.. v
I
.!.L...J
~
-"-'
.
Fig. 4.7. T e iscnmmatrve power 0 t e se ecte genes genes mig. 4.6 is evaluated through Principal Component Analysis (PCA) method. It is seen that the first PC has a significant importance in keeping the samples still distant after the PCA transformation. (A proprietary software system SIFTWARE (www.peblnz.com) was used for the purpose of the analysis)
Color Plate 3
, ........- - JCiAI
'~ '.
•
II
..
:::J
J Fig. 4.8. MLP that has 12 inputs (12 gene expression variables from Fig. 4.6), 1 output (the class of survivals vs not responding to treatment) and 5 hidden nodes is trained on all 60 samples of the gene expression data from the CNS cancer case study data. The error decreases with the number of iterations applied (altogether 500). (The experiments are performed in a software environment NeuCom (www.theneucom.com) , _ ..=::.
Fig. 5.8. The rule nodes of an evolved ECOS model from data of a person A using 37 EEG channels as input variables, plotted in a 3D PCA space. The circles represent rule nodes allocated for class 1 (auditory stimulus), asterisks - class 2 (visual stimulus), squares - class 3 (AV- auditory and visual stimulus combined) and triangles - class 4 (no stimulus). It can be seen that rule nodes allocated to one stimulus are close in the space, which means that their input vectors are similar
Color Plate 4
_ • -
r-rr-rr-r-
Fig. 5.10. A leave-one-cross validation method is applied to validate an ECF ECOS model on the 60 CNS cancer samples (Pomeroy et al. 2002), where 60 models are created - each one on 59 samples, after one example is taken out, and then the model is validated to classify the taken out example. The average accuracy over all 60 examples is 82%, where 49 samples are classified accurately and 11 incorrectly. Class 1 is the non-responding group (21 samples) and class 2 is the group of survivals (39 samples)
-- -_
or-
-
r--r-
-
--
..,
Fig. 5.11. An ECOS classifier is evolved on the 12 CNS cancer genes from Fig. 4.6. Aggregated (across all clusters) general profiles for each of the two classes are shown. The profiles, that capture the interaction between genes, show that genes 1, 5 and 11 are differently expressed across samples of each class, gene 6 is highly expressed in both classes and the other genes - lowly. This suggests an interesting interaction between some genes that possibly define the outcome of cancer of the CNS. The analysis is performed with the use of a proprietary software system SIFTWARE (www.peblnz.com)
Color Plate 5
a
'0 11
•
a...._
.!l ...J
;: . ,-... - ....----"I
b
'0 11
•
-"
a...._
,;::--...-....
.!l ...J
•
'2
Fig. 5.12. As ECOS are local learning models based on clustering of data into clusters, it is possible to find the profiles of each cluster of the same class and see that the profiles are different that points to the heterogeneity of the gene expressions in CNS cancer samples (data from (Pomeroy et al. 2002)). (a) Class 1; (b) Class 2 (a proprietary software system SIFTWARE (www.peblnz.com))
Color Plate 6
) Siftwdre Genelic Algorithm ror Offline
GJ
rcr Oplimiwlion
e.-v_ e.-v_ r J
0...
Si'lgIo AI 100Sl Zvorll"dU""","""
::I f70
, . . - - - - E....... (; M.
100. - - - - - - , - - - - , . . - - --.,.---,--------,
~
.. MnF"ooId'.....
~
r. EpOOho......
~
'IDOl.)
r Splo...o,..o.a
_ _- - ,
F"lIIdcr... ..
~
!ll
J'----~.-/
U.. GAIorF_E_
10
25
15
G.ner.hoM F_E...actlonR....
r-:---:--- R.... - - - - , SOIls-. 3lS367 1100 '_R-.ng 0 ; 0 . 0
l...G.. ~
•
::I
mar-
II_S... S_nPqMotion
0
0
G_L.-glh
28
N"""O-OO N...... F......
~--
roR....._ _ (' R _
(' U.. nr-~
R---.g G___
1-
r
2 7
Alil+_."_IIWlnm.,,_
2
'---
l
6
B
10 12
1
~
Fig. 6.3. GA optimization of the parameters and the set of input variables (features) of an ECOS model for classification of CNS cancer samples into two classes - class of survivals and a class of non-responding to treatment (see (Pomeroy et al. 2002)). The best ECOS model, after 20 generations of populations of 20 individuals, has an accuracy of almost 94% when tested in a 3-fold cross validation procedure. The model has the following input variables: 1, 3, 4, 7, 9, 11, I2,(represented in lighter color) and variables 2, 5,6, 8 and 10 are not used (represented in a darker color). Optimal values of the ECF parameters (Rmax , Rmin , m of n, epochs) are shown in the figure
Color Plate 7
......
, \,frw.1Ir
Iff
'_
'5c
Fig. 6.4. The optimal parameter values and input gene variables are used to derive the final ECOS model that has 22 clusters (rule nodes). This figure shows aggregated profiles for the two classes while the individual cluster profiles for each class are shown in Fig. 6.5 and Fig. 6.6, respectively
Color Plate 8 ~
) ( 1.,,,,, 1
Gen. EJlp,..i4Oft CI..... Rul• • (Red•
hi~.
Grt.n. low)
" 12
•
G.nt NumbiN
.1.1 ~
Ir=--S-_-.-G..- I I
Fig. 6.6. Individual cluster profiles for class 2 (cancer survivors) obtained using 7 genes selected through GA optimization as shown in Fig. 6.3
b
Fig. 7.4. (b) Gene expression microarray contains in each cell the expression level of one gene in one sample, or - the ratio of its expression between two samples (e.g. normal and diseased). The level of gene expression in each pixel is encoded on the black and white scale with darker cells denoting lower gene expressions and lighter cells denoting higher gene expressions
Color Plate 9
'::==317.
~ j~~
~_~ J~ r: C111.! J .:JlVr _~I~J~
--~-.oo!:!j.:J
~~
.- ~ J-:'1IIr~ • ...... ~
hI
J.
J
r:
~
J.:J IV
~r
-t:;JJ~
J.:Jr1rr' J. r
-...
_~
J .:JIIIrr'
J.:PIt'
Fig. 7.5. ass pro 1 es 0 4 pes 0 cancer extracteCl rom a trameCl EFuNN on 399 inputs (gene expression values) and 14 outputs using data from (Ramaswamy et al. 2001). The profiles of each class can be modified through a threshold tuned for each individual class that defines the membership degree above which a gene should be either over-expressed (lighter sign) or under-expressed (darker sign) in all rules of this class in order for this gene to appear in the profile. The last profile is of the CNS cancer
...c.._-=-_.. . . .
e- .....
1Fig. 7.6. Among the CNS cancer group there are 3 clusters that have different gene expression profiles, as detected by an EFuNN ECOS trained system from Fig. 7.5. The highly expressed genes (lighter lines) in cluster 1,2 and 3 of CNS cancer data are different
Color Plate 10
1.5
0 .5 l::
o
"fi)
:fi ... - 0 .5
~
c..
><
W
- 1
- 2 .5 '----::--_ _-'-_ _-'-_ _-'--_ _.L.-_ _" - - _ - - - - ' ' - - _ - - - - ' _ 2
4
6
Ti~e
10
12
14
16
Fig. 7.8. A cluster of genes that are similarly expressed over time (17 hours)
logIO(expression)
2r
The Response of Human Fibroblasts to Serum Data
15/
-1
,,--II
o
"--
5
-'-
--1.
10 lime (hour) 15
20
Fig. 7.10. The time course data of the expression of genes in the Human fibroblast response to serum data
5.4 DENFIS
(5.17)
-mljY] IyJluljexp - (Xij2crlj M
P
117
[
2
1= 1
j=1
f(xJ = - M - P--[-(,----Xij--m-Ijy-] II1uljexp 2 1=1 j=1 2crlj The goal is to design the system from (8) so that the following objective function is minimized: E=
1
2
For optimizing the parameters descent algorithm can be used:
(5.18)
2
- ( f(xJ
-d)
blj, mlj, alj and Oij in DENFIS, the steepest (5.19)
8E cp(k + 1) = cp(k) -77 'P8cp
Here, 11 is the learning rate and
(J
respec-
• Building a Takagi-Sugeno fuzzy inference engine dynamically The Takagi-Sugeno fuzzy inference engine is used in both on-line and off-line modes of DENFIS. The difference between them is that for forming a dynamic inference engine, only first-order Takagi-Sugeno fuzzy rules are employed in DENFIS on-line mode and both first-order TakagiSugeno fuzzy rules and expanded high-order Takagi-Sugeno fuzzy rules are used in DENFIS off-line modes. To build such a fuzzy inference engine, several fuzzy rules are dynamically chosen from the existing fuzzy rule set depending on the position of current input vector in the input space. • Dynamic creation and updating of fuzzy rules All fuzzy rules in the DENFIS on-line mode are created and updated during a 'one-pass' training process by applying the Evolving Clustering Method (ECM) and the Weighted Recursive Least Square Estimator with Forgetting Factors (WRLSE). • Local generalization Similar to EFuNNs, DENFIS model has local generalization to speed up the training procedure and to decrease the number of fuzzy rules in the system. • Fast training speed
118
5 Evolving Connectionist Systems (ECOS)
In the DENFIS on-line mode, the training is a 'one-pass' procedure and in the off-line modes, WLSE and small-scale MLPs are applied, which lead DENFIS to have the training speed for complex tasks faster than some common neural networks or hybrid systems such as multi-layer perceptron with backpropagation algorithm (MLP-BP) and Adaptive Neural-Fuzzy Inference System (ANFIS), both of which adopt global generalization. • Satisfactory accuracy Using DENFIS off-line modes, we can achieve a high accuracy especially in non-linear system identification and prediction. 5.4.1 Dynamic Takagi-Sugeno Fuzzy Inference Engine
The Takagi-Sugeno fuzzy inference engine (Takagi and Sugeno 1985) utilized in DENFIS is a dynamic inference model. In addition to dynamically creating and updating fuzzy rules in the DENFIS on-line mode, the major differences between such inference engine and the general Takagi-Sugeno fuzzy inference engine are described as follows: - First, depending on the position of the current input vector in the input space, different fuzzy rules are chosen from the fuzzy rule set, which has been estimated during the training procedure, for constructing an inference engine. If there are two input vectors very close to each other, especially in DENFIS off-line modes, two identical fuzzy inference engines are established and they may be exactly the same. In the on-line mode, however, although sometimes two inputs are exactly same, their corresponding inference engines are probably different. This is because these two inputs corne into the system from the data stream at different moments and the fuzzy rules probably have been updated during this interval. - Second, also depending on the position of current input vector in the input space, the antecedents of fuzzy rules, which have been chosen from the fuzzy rule set for forming an inference engine, may be different. An example is illustrated in Fig. 5.4, where two fuzzy rule groups, FG I and FG2, are estimated depending on two input vectors XI and X2 respectively in a 2-D input space. We can know from this example that, for instance, the region C represents a linguistic meaning 'large' in FG I on the XI axis but it represents a linguistic meaning 'small' on that in FG2 • Also, the region C is presents as different membership functions respectively in FG r and FG2 .
5.4 DENFIS
119
a
H
C A
I
I
I
I
I
D
I
I
A
H
I
C
b
E C D
I
I
I
I
I
~ I
I
I
I
C E D
Fig. 5.4. Two fuzzy rule groups corresponding with input X I and.e, in a 2-D space
5.4.2 Fuzzy Rule Set, Rule Insertion and Rule Extraction
Fuzzy rules in a DENFIS are created during a training procedure, or come from rule insertion. In the on-line mode, the fuzzy rules in the rule set can also be updated as new training data appear in the system (Kasabov and Song 2002). As the DENFIS uses a Takagi-Sugeno fuzzy inference engine the fuzzy rules inserted to or extracted from the system are Takagi-Sugeno type fuzzy rules. These rules can be inserted into the rule set before or during the training procedure and they can also be exacted from the rule set during or after the training procedure.
120
5 Evolving Connectionist Systems (ECOS)
The inserted fuzzy rules can be the rules that are extracted from a fuzzy rule set created in previous training of DENFIS, or they can also be general Takagi-Sugeno type fuzzy rules. In the latter, the corresponding nodes of the general Takagi-Sugeno fuzzy rules have to be found and located in the input space. For an on-line learning mode, their corresponding radiuses should also be defined. The region can be obtained from the antecedent of a fuzzy rule and the centre of this region is taken as the node corresponding with the fuzzy rule. A value of (0.5 ~ I )Dthr can be taken as the corresponding radius.
5.5 Transductive Reasoning for Personalized Modeling Most of learning models and systems in artificial intelligence developed and implemented so far are based on inductive methods, where a model (a function) is derived from data representing the problem space and this model is further applied on new data. The model is usually created without taking into account any information about a particular new data vector (test data). An error is measured to estimate how well the new data fits into the model. The inductive learning and inference approach is useful when a global model ("the big picture") of the problem is needed even in its very approximate form. In contrast to the inductive learning and inference methods, transductive inference methods estimate the value of a potential model (function) only in a single point of the space (the new data vector) utilizing additional information related to this point (Vapnik 1998). This approach seems to be more appropriate for clinical and medical applications of learning systems, where the focus is not on the model, but on the individual patient. Each individual data vector (e.g.: a patient in the medical area; a future time moment for predicting a time series; or a target day for predicting a stock index) may need an individual, local model that best fits the new data, rather then a global model, in which the new data is matched without taking into account any specific information about this data. An individual model AI; is trained for every new input vector Xi with data use of samples D, selected from a data set D, and data samples DO,i generated from an existing model (formula) M (if such a model is existing). Data samples in both D, and DO,i are similar to the new vector Xi according to defined similarity criteria.
5.5 Transductive Reasoning for Personalized Modeling
121
Data D, selected from D in the vicinity of the input vector Xi
M, the
Outputy,
Data Do,! selected from D in the vicinity of the input vectorXi
Fig. 5.5. A block diagram of a transductive reasoning system
Transductive inference is concerned with the estimation of a function in single point of the space only. For every new input vector Xi that needs to be processed for a prognostic task, the N, nearest neighbors, which form a sub-data set D, are derived from an existing data set D. If necessary, some similar vectors to vector Xi and their outputs can also be generated from an existing model M. A new model M, is dynamically created from these samples to approximate the function in the point Xi - Fig. 5.6. The system is then used to calculate the output value Yi for this input vector Xi .
0
0
0 0
0
0
L>.
0
•
XI
0
0
0 0
L>.
L>.
0
L>.
0
L>.
L>. 0
L>. 0
0
0
0
0
L>.
L>.
L>.
0 0
.Xz
L>. L>.
0
0
• - a new data vector o - a sample from D L\ - a sample from M
D,
0 0
0
0
L>. 0
0 0
L>.
Dz
Fig. 5.6. In the centre of a transductive reasoning system is the new data vector (here illustrated with two of them - Xl and X2), surrounded by a fixed number of nearest data samples selected from the training data D and generated from an existing model M (Song and Kasabov 2006)
122
5 Evolving Connectionist Systems (ECOS)
5.5.1 Weighted Data Normalization In many neural network and fuzzy models and applications, raw (not normalized) data is used. This is appropriate when all the input variables are measured in the same units. Normalization, or standardization, is reasonable when the variables are in different units, or when the variance between them is substantial. However, a general normalization means that every variable is normalized in the same range, e.g. [0, I] with the assumption that they all have the same importance for the output of the system. For many practical problems, variables have different importance and make different contribution to the output(s). Therefore, it is necessary to find an optimal normalization and assign proper importance factors to the variables. Such a method can also be used for feature selection or for reducing the size of input vectors through keeping the most important ones This is especially applicable to a special class of neural networks or fuzzy models - the clustering based models (or also: distance-based; prototypebased) such as: RBF, ART, ECOS. In such systems, distance between neurons or fuzzy rule nodes and input vectors are usually measured in Euclidean distance, so that variables with a wider normalization range will have more influence on the learning process and vice versa. A method, called TWNFI (Transductive weighted neuro-fuzzy inference method) that incorporates the ideas of transductive neuro-fuzzy inference and the weighted data normalization is published in (Song and Kasabov 2006).
5.6 ECOS for Brain and Gene Data Modeling
5.6.1 ECOS for EEG Data Modeling, Classification and Signal Transition Rule Extraction In (Kasabov et al. 2006) a methodology for continuous adaptive learning and classification of human scalp electroencephalographic (EEG) data in response to multiple stimuli is introduced based on ECOS. The methodology is illustrated on a case study of human EEG data, recorded at resting-, auditory-, visual-, and mixed audio-visual stimulation conditions. It allows for incremental, continuous adaptation and for the discovery of brain signal transition rules. The method results in a good classification accuracy of EEG signals of a single individual, thus suggesting that ECOS could be successfully used in the future for the creation of intelligent per-
5.6 ECOS for Brain and Gene Data Modeling
123
sonalized human-computer interaction models, continuously adaptable over time, as well as for the adaptive learning and classification of other EEG data, representing different human conditions. The method could help understand better hidden signal transitions in the brain under certain stimuli when EEG measurement is used (see Fig. 5.7). €F'o'
0 e @ ~60eCDcweee
(!!)90ClQ0lge e 000<;:;>00000
€>eOcue
eG..,
(")o8G>Ge
e> o6e e e e o G oOe
Fig. 5.7. Layout of the 64 EEG electrodes (extended IntemationallO-lO System)
Fig. 5.8 shows the rule nodes of an evolved ECOS model from data of a person A using 37 EEG channels as input variables, plotted in a 3D PCA space.
1
o. 00
0'
0. -
o. o.
0.02
,
O.
Fig. 5.8. The rule nodes of an evolved ECOS model from data of a person A using 37 EEG channels as input variables, plotted in a 3D PCA space. The circles represent rule nodes allocated for class 1 (auditory stimulus), asterisks - class 2 (visual stimulus), squares - class 3 (AV- auditory and visual stimulus combined) and triangles - class 4 (no stimulus). It can be seen that rule nodes allocated to one stimulus are close in the space, which means that their input vectors are similar. See Color Plate 3 The circles represent rule nodes allocated for class 1 (auditory stimulus), asterisks - class 2 (visual stimulus), squares - class 3 (A V- auditory and
124
5 Evolving Connectionist Systems (ECOS)
visual stimulus combined) and triangles - class 4 (no stimulus). It can be seen that rule nodes allocated to one stimulus are close in the space, which means that their input vectors are similar. The allocation of the above nodes (cluster centers) back to the EEG channels for each stimulus is shown in Fig. 5.9.
Fig. 5.9. The allocation of the cluster centers from the ECOS model in Fig. 5.7 back to the EEG channels for each of the stimuli of classes from 1 to 4 (i.e. A, Y, AY, No - from left to right,respectively)
5.6.2 ECOS for Gene Expression Profiling
ECOS can be used for building adaptive classification or prognostic systems and for extracting the rules (profiles) that characterize data in local clusters (Kasabov 2002a, Kasabov 2006). This is illustrated in Fig. 5.10 and Fig.5.11 on the 12 CNS genes from Fig. 4.6, where a classification system is evolved and the aggregated (across all clusters) general profiles for each of the two classes are shown. The profiles, that capture the interaction between genes, show that some genes are differently expressed across samples of each class. This points to an interesting interaction between genes that possibly defines cancer of the CNS, rather than a single gene only. Before the final classifier is evolved in Fig. 5.11, a leave-one-cross validation method is applied to validate the ECOS model on the 60 samples, where 60 models are created - each one on 59 samples, after one example is taken out, and then the model is validated to classify the taken out example. The average accuracy over all 60 examples is 82% as shown in Fig.5.10. 49 samples are classified accurately, out of 60. This accuracy is further improved in Chap. 6 when EC is used to optimize the feature/gene set and the parameters of the ECOS model.
5.6 ECOS for Brain and Gene Data Modeling
I ..
125
::J
.-s,--------
,
~
II
J,.
I I
J
I I
~
A
I I
"
,I
1 -------
Fig. 5.10. A leave-one-cross validation method is applied to validate an ECF ECOS model on the 60 CNS cancer samples (Pomeroy et al. 2002), where 60 models are created - each one on 59 samples, after one example is taken out, and then the model is validated to classify the taken out example. The average accuracy over all 60 examples is 82%, where 49 samples are classified accurately and 11 incorrectly. Class 1 is the non-responding group (21 samples) and class 2 is the group of survivals (39 samples). See Color Plate 4
"',..,......, ,..-
,0-
e-
0... 1
r,..-
, "
-:l110l:ll'l
t l.."
Fig. 5.11. An ECOS classifier is evolved on the 12 CNS cancer genes from Fig. 4.6. Aggregated (across all clusters) general profiles for each of the two classes are shown. The profiles, that capture the interaction between genes, show that genes 1, 5 and 11 are differently expressed across samples of each class, gene 6 is highly expressed in both classes and the other genes - lowly. This suggests an interesting interaction between some genes that possibly define the outcome of cancer of the CNS. The analysis is performed with the use of a proprietary software system SIFTWARE (www.peblnz.com). See Color Plate 4
126
5 Evolving Connectionist Systems (ECOS)
The profiles shown in Fig. 5.11 are integrated, global class profiles. As ECOS are localleaming models based on clustering of data into clusters, it is possible to find the profiles of each cluster of the same class. We can see that the profiles are different which points to the heterogeneity of the cancer CNS samples (see Fig. 5.12). a
b
. --
-'- -.J [
0- ....
1]
.,)
,...---
Fig. 5.12. As ECOS are local learning models based on clustering of data into clusters, it is possible to find the profiles of each cluster of the same class. Different profiles point to the heterogeneity of the gene expressions in CNS cancer samples (data from (Pomeroy et al. 2002)). (a) Class 1; (b) Class 2 (a proprietary software system SIFTWARE (www.peblnz.com)). See Color Plate 5
5.7 Summary This chapter gives a brief introduction to a class of ANN models, called ECOS. These techniques are illustrated for the analysis and profiling of both brain and gene expression data. Further development of the techniques is their use to combine genes and brain data, where each neuron (node) will have gene parameters that need to be adjusted for the optimal functioning of the neuron.
6 Evolutionary Computation for Model and Feature Optimization
This chapter introduces the main principles of evolutionary computation (EC) and presents a methodology for using it to optimize the parameters and the set of features (e.g. genes, brain signals) in a computational model. Evolutionary computation (EC) methods adopt principles from the evolution in Nature (Darwin 1859). EC methods are used in Chaps. 7 and 8 of the book to optimize gene interaction networks as part of a CNGM.
6.1 Lifelong Learning and Evolution in Biological Species: Nurture vs. Nature Through evolutionary processes (evolution) genes are slowly modified through many generations of populations of individuals and selection processes (e.g. natural selection). Evolutionary processes imply the development of generations of populations of individuals where crossover, mutation, selection of individuals, based on fitness (survival) criteria are applied in addition to the developmental (learning) processes of each individual. A biological system evolves its structure and functionality through both, lifelong learning of an individual, and evolution of populations of many such individuals, i.e. an individual is part of a population and is a result of evolution of many generations of populations, as well as a result of its own developmental, of its lifelong learning process. Same genes in the genotype of millions of individuals may be expressed differently in different individuals, and within an individual - in different cells of their body. The expression of these genes is a dynamic process depending not only on the types of the genes, but on the interaction between the genes, and the interaction of the individual with the environment (the Nurture versus Nature issue). Several principles are useful to take into account from evolutionary biology:
128 • • • •
6 Evolutionary Computation for Model and Feature Optimization
Evolution preserves or purges genes. Evolution is a non-random accumulation of random changes. New genes cause the creation of new proteins. Genes are passed on through evolution - generations of populations and selection processes (e.g. natural selection).
6.2 Principles of Evolutionary Computation Evolutionary computation (EC) is concerned with population-based search and optimization of individual systems through generations of populations (Goldberg 1989, Koza 1992, Holland 1998). EC has been applied so far to the optimization of different structures and processes, one of them being the connectionist structures and connectionist learning processes (Fogel et al. 1990, Yao 1993). Methods ofEC include in principal two stages: 1. Creating new population of individuals, and 2. Development of the individual systems, so that a system develops, evolves through interaction with the environment that is also based on the genetic material embodied in the system. The process of individual (internal) development has been in many EC methods ignored or neglected as insignificant from the point of view of the long process of generating hundreds generations, each of them containing hundreds and thousands of individuals.
6.3 Genetic Algorithms Genetic algorithms (GA) are EC models that have been used to solve complex combinatorial and organizational problems with many variants, by employing analogy with Nature's evolution. Genetic algorithms were introduced for the first time in the work of John Holland (Holland 1975). They were further developed by him and other researchers (Goldberg 1989, Koza 1992, Holland 1998). The most important terms used in a GA are analogous to the terms used to explain the evolution processes. They are: • Gene - a basic unit, which defines a certain characteristic (property) of an individual. • Chromosome - a string of genes; it is used to represent an individual, or a possible solution to a problem in the solution space.
6.3 GeneticAlgorithms
129
• Population - a collection of individuals. • Crossover (mating) operation - sub-strings of different individuals are taken and new strings (off-springs) are produced. • Mutation - random change of a gene in a chromosome. • Fitness (goodness) function - a criterion which evaluates how good each individual is. • Selection - a procedure of choosing a part of the population which will continue the process of searching for the best solution, while the other set of individuals "die". A simple genetic algorithm consists of steps shown in Fig. 6.1. The process over time has been 'stretched' in space.
Crossover
Crossover
Initial population Selection Crossover
.:fD <, "
Solution space
.
........... Goal solution
Fig. 6.1. A block diagram of a genetic algorithm - evolution of individual models
in time is presentedas a trace in space where severalgenerations of populations of individuals are traced
While Fig. 6.1 shows graphically how a GA searches for the best solution in the solution space, the outline of GA is given as: 1. Generate initial population of individuals - each individual defined as a chromosome containing parameters - genes. 2. Evaluate the fitness of each individual using a fitness function. 3. Select a subset of individuals based on their fitness. 4. Apply a crossover procedure on the selected individuals to create a new generation of a population. 5. Apply mutation.
130
6 Evolutionary Computation for Model and Feature Optimization
6. Continue with the previous procedure 2-6 until a desired solution (with a desired fitness) is obtained, or the run time is over. When using the GA method for a complex multi-optional optimization problem, there is no need for in-depth problem knowledge, neither a need for many data examples stored beforehand. What is needed here is merely a "fitness" or "goodness" criterion for the selection of the most promising individuals (they are partial solutions to the problem). This criterion may require a mutation as well, which is a heuristic approach of a "trial-error" type. This implies keeping (recording) the best solutions at each of the stages. A class of simple genetic algorithms introduced by John Holland, is characterized by: • Simple, binary genes. The genes take values of 0 and I only. • Simple, fixed single-point crossover operation. The crossover operation is done by choosing a point where a chromosome is divided into two parts swapped with the two parts taken from another individual. • Fixed-length encoding. The chromosomes had fixed length of 6 genes.
Many complex optimization problems find their way to a solution through genetic algorithms. Such problems are, for example, the Traveling Salesman Problem (TSP) - finding the cheapest way to visit n towns without visiting a town twice; the Min Cut problem - cutting a graph with minimum links between the cut parts; adaptive control; applied physics problems; optimization of the parameters of complex computational models; optimization of neural network architectures; finding fuzzy rules and membership functions (Furuhashi et al. 1994), etc. The main issues in using genetic algorithms relate to the choice of genetic operations (crossover, selection, mutation). In the case of The Traveling Salesman the crossover operation can be merging different parts of two possible roads ('mother' and 'father' road) until new usable roads are obtained. The criterion for the choice of the most prospective ones is minimum length (or cost). Genetic algorithms comprise a great deal of parallelism. Thus, each of the branches of the search tree for best individuals can be utilized in parallel with the others. This allows for an easy realization of the genetic algorithms on parallel architectures. Genetic algorithms are search heuristics for the "best" instance in the space of all possible instances. The following issues are important for any genetic algorithm:
6.3 Genetic Algorithms
131
• Encoding scheme. How to encode the problem in terms of genetic algorithms, what variables to choose as genes, how to construct the chromosomes, etc. • Population size. How many possible solutions should be kept for further development. • Crossover operations. How to combine old individuals and produce new, more prospective ones. • Mutation heuristic. When and how to apply mutation.
In short, the major characteristics of the genetic algorithms are the following: • They are heuristic methods for search and optimization. As opposed to the exhaustive search algorithms, the genetic algorithms do not produce all variants in order to select the best one. Therefore, they may not lead to the perfect solution but to one which is closest to it taking into account the time limits. But nature itself is imperfect too, (partly due to the fact that the criteria for perfection keep changing), and what seems to be close to perfection according to one "goodness" criterion, may be far from it according to another. • They are adaptable, which means that they have the ability to learn; to accumulate facts and knowledge without having any previous knowledge. They begin only with a "fitness" criterion for selecting and storing individuals (partial solutions), which are "good", and dismissing those which are "not good". Genetic algorithms can be incorporated in learning modules as part of an expert system or of other information processing systems. Other EC techniques are: • Evolutionary strategies. These techniques use only one chromosome and a mutation operation, along with a fitness criterion, to navigate in the solution (chromosomal) space. • Evolutionary programming. These are EC techniques applied to the automated creation and optimization of sequence of commands (operators) that constitute a program (or an algorithm) to solve a given problem (Koza 1992). The theory of GA and the other EC techniques includes different methods for selection of individuals from a population, different cross over techniques, different mutation techniques. Selection is based on fitness. A common approach is proportional fitness, i.e. 'if I am twice as fit as you, I have twice the probability of being
132
6 Evolutionary Computation for Model and Feature Optimization
selected'. Roulette wheel selection gives chances to individuals according to their fitness evaluation (see Table 6.1). Table 6.1. Selection of individuals from a population based on fitness: each individual is assigned an evaluated fitness number and then either the top few are selected for crossover or every one is given a probability (chance) - the roulette strategy Individual
2
3
4
5
6
7
8
9
10
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
Fitness
2.0
Selection
0.180.160.150.130.110.090.070.060.030.02
Other selection techniques include tournament selection (every time of selection, the roulette wheel is turned twice, and the individual with the highest fitness is selected), rank ordering, and so on (Fogel et al. 1990). Important feature of the selection procedure is that fitter individuals are more likely to be selected. The selection procedure can involve also keeping the best individuals from the previous generation (if this principle was used by the Nature, Michelangelo would have been alive nowadays, as he is one of the greatest artists ever, with the best genes in this respect). This operation is called elitism. After the best individuals are selected from a population, a cross over operation is applied between these individuals. Different cross-over operations can be used: • One-point cross-over; • Three-point cross over or more (Fig. 6.2), etc.
-
Parents
...
Offspring
Fig. 6.2. Three point cross over operation, where the two individual cross over through exchanging their genes in 4 sections, base on usually randomly selected 3 points
Mutation can be performed in the following ways:
6.4 EC for Model and Parameter Optimization
133
• For a binary string, just randomly 'flip' a bit. • For a more complex structure, randomly select a site, delete the structure associated with this site, and randomly create a new sub-structure. Some EC methods just use mutation (no crossover, e.g. evolutionary strategies). Normally, however, mutation is used to search in a "local search space", by allowing small changes in the genotype (and therefore hopefully in the phenotype).
6.4 EC for Model and Parameter Optimization EC has been widely used in bioinformatics for optimization purposes (see (de Jong 2002, Fogel and Corne 2003). Here, a general procedure for a GA optimization of a feature set and a model is presented and illustrated in Figs.6.3 and 6.4. 1. Starting with an initial feature set Gm and a model M, create a population of K chromosomes (models), each having different feature subsets from Gm and slightly different parameter values from the parameter values of the model M. The chromosome contains a binary part, where a feature is present (1) or not present (0) in a model, and a part of continuous values - the parameters of the model. If, for example, the model is ECF ECOS (see Chap. 5), the parameters are: R max , R min, Number of fuzzy membership functions, Number of iterations of training the ECF model. 2. FOR J= 1 to P generations DO a. Select randomly from the data set S a subset Stst for testing and the rest Str for training. b. Train all K models on Str and test them on Stst. c. Select the best models (e.g. maximum accuracy). d. Apply cross validation and mutation to the chromosomes to create the next generation ofK new models. END (FOR) 3. Select the final best model (the model with the best accuracy) that has an optimized feature set and optimized model parameter values. 6.4.1 Example Fig. 6.3 shows a GA optimization of the parameters and the set of input variables (features) of an ECOS model for classification of samples into
134
6 Evolutionary Computation for Model and Feature Optimization
two classes - class of control and a class of cancer for CNS data (see Pomeroy et al, 2002). The best ECOS model, after 20 generations of populations of 20 individuals, has an accuracy of almost 94% when tested in a 3-fold cross validation procedure. The model has the following input variables: 1,3,4, 7, 9, 11, 12, and variables 2, 5, 6, 8 and 10 are not used. Optimal values of the ECF parameters (R max , R min , m of n, epochs) are shown in the figure. ~L::::
) SUrw.re Genetic Altoflthm For Ofillne (CF OplimlWtion
..:J
S"'llio no ICHSl2va-
r "'- ,in
'1:01_""._ rro-
,...-----E_ -
-
100
----,
I': ~ reldcraet_
,-
(; Iotnflllldcrsetas
, -
ro E_ a.. ..
,-
90 ~
femdnOlsetat
,-
-=-
i ~
J;> U.. 6o\""'..... E..
6o\P.......... - ---,
nrnr-
G......-.. "-
Ip"G.....
'"' 3 fOli2
MlUbonRate·
Ip"G..... J;> ........
• ..:J e...s""-,,"-
f:; AIow~,epc:JOJcflCn roRri __
r -. . r u.. F
rnro
I
---_...-..-/
a:J
70
1-
60
-
50 0
3
6o\T_
UonoYer RlJle •
1RJ
10 15 GeneratIOns
So.1
Average
20
,..-_ _ Aed. _ _--,
Be1l: SCOII!
939361
TmoR..........
R.......... G......_
0 ; 0
I 100
0
0
I~
0
G........ Le<¢
28
'.-010.....
N~ofF&atu'e1
27
2
4
6
8
10 12
I 25
1~5
. - - - - - Staa NoNetwot t.o.ded
Scang
Fig. 6.3. GA optimization of the parameters and the set of input variables (features) of an ECOS model for classification of CNS cancer samples into two classes - class of survivals and a class of non-responding to treatment (see (Pomeroyet al. 2002)). The best ECOS model, after 20 generations of populations of 20 individuals, has an accuracy of almost 94% when tested in a 3-fold cross validation procedure. The model has the following input variables: 1, 3, 4, 7, 9, 11, 12,(represented in lighter color) and variables 2, 5, 6, 8 and 10 are not used (represented in a darker color). Optimal values of the ECF parameters (Rmax , Rmin , m of n, epochs) are shown in the figure. See Color Plate 6
After the optimal set of features and model parameters is selected (optimized) using GA - see Fig. 6.3, now the final model is created using the optimal parameters and only 7 features out of 12 - see Fig. 6.4. Fig. 6.5 presents the clusters (profiles) of the expression of the 7 genes for class 1
6.4 EC for Model and Parameter Optimization
135
(that is patients not responding to treatment) and Fig. 6.6 for class 2 (eNS cancer surviving children).
.......
} .lflW... l(f
......
r:.~
- !XJ
..
....... I .....
Fig. 6.4. The optimal parameter values and input gene variables are used to derive the final ECOS model that has 22 clusters (rule nodes). This figure shows aggregated profiles for the two classes while the individual cluster profiles for each class are shown in Fig. 6.5 and Fig. 6.6, respectively. See Color Plate 7
) ct.. ,
.
10
Gono_
.:..
,r:-.--.. .--,-II J
Fig. 6.5. Individual cluster profiles for class I (not responding to treatment) obtained using 7 genes selected through GA optimization as shown in Fig. 6.3. See Color Plate 7
136
6 Evolutionary Computation for Modeland Feature Optimization
.
0-_
..:.J
r
I
Fig. 6.6. Individual cluster profiles for class 2 (cancer survivors) obtainedusing 7 genes selectedthroughGA optimization as shownin Fig. 6.3. See Color Plate 8
6.5 Summary This chapter introduces main principles of EC and GA and shows how these methods can be applied to optimize parameters of models and input features. Parameter estimation is a very difficult task in inferring GRN models, mainly because of the lack of observation data relative to the number of genes involved. In this respect, evolutionary computation (EC) that is a robust, global optimization method becomes an important tool to accurately infer and optimize GRNs. EC methods are inspired by the Darwin theory of evolution and have been used for parameter estimation or optimization in many engineering applications. Unlike classical derivativebased (like Newton) optimization methods, EC is more robust against noise and high dimensionality in the search space. In addition, EC does not require the derivative information of the objective function and is thus applicable to complex, black box problems. These characteristics make EC highly suitable for identifying parameters ofGRN, in particular because derivative information of the underlying model is usually not available and data is scarce and noisy, thus requiring a robust, global optimization algorithm that is not easily misled by these drawbacks. And finally, qualitative inference of parameters is difficult with small number of observations compared to the large number of genes involved. For these reasons, GA and EC are used in Chap. 7 and Chap. 8 for optimizing GRNs and CNGMs.
7 Gene/Protein Interactions - Modeling Gene Regulatory Networks (GRN)
This chapter presents background knowledge from Bioinfonnatics on gene and protein information processing in a biological cell with the emphasis on their dynamic interaction. In a cell and in a neuron in particular, DNA, RNA and proteins interact continuously, affecting the functioning of the whole cell and the phenotype of an organism. Here, the main principles of these interactions are presented. Some of these interactions are subject to modeling in a CNGM when related to the output signals/functions of a cell/neuron.
7.1 The Central Dogma of Molecular Biology With the completion of the first draft of the human genome and the genomes of some other species the task is now to be able to process this vast amount of ever growing dynamic information and to create intelligent systems for prediction and knowledge discoveries at different levels of life, from cell to whole organisms and species. The DNA (Deoxyribonucleic Acid) is a chemical chain, present in the nucleus of each cell of an organism, and it consists of ordered in a double helix pairs of small chemical molecules (bases) which are: Adenine (A), Cytosine (C), Guanine (G), and Thymine (T), linked together by deoxyribose sugar phosphate nucleic acid backbone. The RNA (ribonucleic acid) has a similar structure as the DNA, but here Thymine (T) is substituted by Uracil (U) (see Fig. 7.1). The central dogma ofthe molecular biology (see Fig. 1.2) states that the DNA is transcribed into RNA, which is translated into proteins. This process is shown in a bit more details in Fig. 7.2. The DNA contains millions of base pairs, but only 5% or so is used for the production of proteins, and these are the segments from the DNA that contain genes. Each gene is a sequence of base pairs that is used in the cell to produce proteins. Genes have length of hundreds to thousands of bases.
138
7 GenelProtein Interactions - Modeling Gene Regulatory Networks (GRN)
RNA
DNA
Ribonucleic acid
Deoxyribonucleicacid
Fig. 7.1. Structure of DNA and RNA
I
-
I
I
protein
mRNA
DNA \
•••••.•••••
tr¥7scription
/
/fanslation..
(
'_%i_~_
I I I
I I
" \
I \
Final destination
\
,~
Nucleus
'. -..
• ..-
......
)
-~( (
.
\
....../
Goigi Complex
Ribosome Endoplasmic Reticulum
Fig. 7.2. The central dogma of molecular biology - a schematic representation
Each gene consists of two types of segments - exons, that are segments translated into proteins, and introns - segments that are considered redundant and do not take part in the protein production. Removing the introns and ordering only the exon parts of the genes in a sequence is called splic-
7.1 The Central Dogma of Molecular Biology
139
ing and this process results in the production of a messenger RNA (or mRNA) sequences. mRNAs are directly translated into proteins. Each protein consists of a sequence of amino-acids, each of them defined by a RNA base triplet, called a codon. From one DNA sequence there are many copies of mRNA produced, the presence of certain gene in all of them defines the level of the gene expression in the cell and can indicate what and how much of the corresponding protein will be produced in the cell. The above description of the central dogma of the molecular biology is very much a simplified, but that would help to understand the rationale behind using connectionist and other information models in bioinformatics (Brown, Shreiber et al. 2000). Genes are complex chemical structures and they cause dynamic transformation of one substance into another during the whole life of an individual, as well as the life of the human population over many generations. When genes are "in action", the dynamics of the processes in which a single gene is involved are very complex, as this gene interacts with many other genes, proteins, and is influenced by many environmental and developmental factors. Modeling these interactions, learning about them and extracting knowledge, is a major goal for Bioinformatics. Bioinformatics is concerned with the application of the methods of information sciences for the analysis, modeling and knowledge discovery of biological processes in living organisms (Brown, Grundy et al. 2000, Baldi and Brunak 2001). The whole process of DNA transcription, gene translation, and protein production is continuous and it evolves over time. Proteins have 3D structures that unfold over time governed by physical and chemical laws. Proteins make some genes to express and may suppress the expression of other genes. The genes in an individual may mutate, change slightly their code, and may therefore express differently at a next time. So, genes may change, mutate, and evolve in a life time of a living organism. Modeling these processes is extremely complex task. The more new information is made available about DNA, gene expression, protein creation, metabolism, the more accurate the information models will become. They should adapt to the new information in a continuous way. The process of biological knowledge discovery is also evolving in terms of data and information being created continuously. Through evolutionary processes (evolution) genes are slowly modified through many generations of populations of individuals and selection processes (e.g. natural selection). Proteins provide the majority of the structural and functional components of a cell. The area of molecular biology that deals with all aspects of
140
7 GenelProtein Interactions - Modeling Gene Regulatory Networks (GRN)
proteins is called proteomics. So, far about 30,000 proteins have been identified and labeled, but this is considered to be a small part of the total set of proteins that keep our cells alive. The mRNA is translated by ribosomes into proteins. A protein is a sequence of amino-acids, each of them defined by a group of 3 nucleotides (codons). There are 20 amino acids all together, denoted by letters (A, CH, I, K-N, P-T, V, W, Y). Chemical formulae of each of the amino acids are shown in Fig. 7.3. Amino acids with hydropho bic side groups COO-
COO'
COO'
COO'
+H"N-b-H
+H"N-b-H
+H"N-b-H
I
I
I
I
C'" I
CH H"C/' 'CH"
+H"N-b-H
H-C-CH"
y'" yH"
I CH"
CH H"c' 'CH"
I CH"
I
CH"
Leucine
Isoleucine
(val)
(leu)
(lle)
I
e5
S
Valine
COO' +H"N-b-H
Methionine (met)
Phenylalanine (phe)
Amino acids with hydrophilic side groups COO-
COO' +H"N-t-H
+H"N-b-H
t",
(asn)
coo-
COO' +H"N-b-H
c'"
C'" I
b-tf
c'"
yH"
coo",N
Glu1amic acid (glu)
)",
I
C'" I
C'" I
CH" I
c'" I
c'" I
" >H
HC-N H
"'0
Glu1amine (gin)
COO+H"N-b-H
I
I
I
C'" I
H"N' "0 Asparagine
+H"N-b-H
I
I
C'"
COO' +H"N-b-H
COO-
y'"
c'" I
NH
b=NH,,+
NH,,+
Histidine (his)
I
N",
Lysine (lys)
+H"N-b-H
I
Arginine (arg)
C'" I
COO'
Aspartic acid (asp)
Amino acids that are in betwe en COO'
COO-
+H"N-6-H
+H"N-b-H
I
I
H-C-OH
H
Glycine (gly)
Alanine (ala)
~
I
y'" SH
Cysteine (cys)
CH"
'I'hIe onine (thr)
(ser) COO'
COO+H"N-b-H
H-t-OH
I
Serine
+"'N--b-H
I
COO'
COO'
+H"N-t-H
+H"N-J-H
COO+H"N-t-H
I
"'C'c/C'"
'"
Proline (pro)
Fig. 7.3. The chemical structure of the aminoacids
I
Q OH
Tyrosine (tyr)
I
c'"
J=:D H
Tryptophan (trp)
7.2 Gene and Protein Expression DataAnalysis and Modeling
141
The length of a protein in number of amino-acids is from tens to several thousands. Each protein is characterized by some characteristics, for example (Brown and Botstein 1999): • • • • • •
Structure Function Charge Acidity Hydrophilicity Molecular weight
An initiation codon, that is a particular triplet of bases, defines the start position of a gene in an mRNA where the translation of the mRNA into protein begins. A stop codon defines the end position. Proteins with a high similarity are called homologous. Homologues that have identical functions are called orthologues. Similar proteins that have different functions are called paralogues. Proteins have complex structures that include: • Primary structure (a linear sequence of the amino-acids); • Secondary structure (3D, defining functionality); • Tertiary structure (high level folding and energy minimization packing of the protein). Polypeptides with a tertiary level of structure are usually referred to as globular proteins, since their shape is irregular and globular in form; • Quaternary structure (interaction between two or more protein molecules).
7.2 Gene and Protein Expression Data Analysis and Modeling The recent advent of cDNA microarray and gene-chip technologies means that it is now possible to simultaneously interrogate thousands of genes. The potential applications of this technology are numerous and include identifying markers for classification, diagnosis, disease outcome prediction, therapeutic responsiveness, and target identification. Microarray analysis might not identify unique markers (e.g. a single gene) of clinical utility for a disease because of the heterogeneity of the disease. Prediction of the biological state/disease is likely to be more accurate by identifying clusters of gene expressions (gene expression profiles).
142
7 Gene/Protein Interactions- Modeling Gene RegulatoryNetworks (GRN)
Each point (pixel, cell) in a microarray matrix represents the level of expression of a single gene. Five principal steps in the microarray technologyare shown in Fig. 7.4. They are: tissue collection; RNA extraction; microarray gene expression evaluation; scanning and image processing; data analysis. Tissuesamples
a
RNA
Microarray ........
Data analysis
~ ~§~ ........... e••
:1:' -, I~ ,~ ~
C . ... ..
•••• ••••••• ••••••• .. ••••"
. ;' .
f: ~. :
.:..
, . ~'.-
.
RNA extractionand Reverse transcription Scanning and Normalization RNA amplification tocDNA and labeling imageprocessing and preprocessing
b
.,
.
I'
I
~ \
I
II H
, I
I:
' :. ;
,,
" III
"
..
I
,
, "
•ItI
1 $ I
I '
"
.I·
.1
f ~ d
II.
!'
Fig. 7.4. (a) Steps in microarray gene expression data collection, analysis and modeling; (b) Gene expression microarray contains in each cell the expression level of one gene in one sample, or - the ratio of its expression between two samples (e.g. normal and diseased). The level of gene expression in each pixel is encoded on the black and white scale with darker cells denoting lower gene expressions and lighter cells denoting higher gene expressions. See Color Plate 8
One of the contemporary directions while searching for efficient drugs for many terminal illnesses, such as cancer or HIV, is the creation of gene profiles of these diseases and subsequently finding targets for treatment through gene expression regulation. A gene profile is a pattern of expres-
7.2 Gene and ProteinExpression Data Analysis and Modeling
143
sion of a number of genes that is typical for all, or for some of the known samples of a particular disease. Having such profiles for a particular disease makes it possible to set an early diagnostic test, so a sample can be taken from a patient, the data related to the sample processed, and a profile related to the sample - obtained. This profile can be matched against existing gene profiles. Based on similarity, it can be predicted with certain probability if the patient is in an early phase of a disease or he/she is at risk of developing the disease in the future with certain probability. A usual way of processing gene (and also protein) expression data S of genes G consists of the following steps: 1. FOR i=l to N (in particular, it could be leave-one-out method, where N is the number of available samples) DO a. Select a data sub-set Sv,; from S for validation. b. For the rest of the data So, select the most discriminative subset of genes G;; Card(G;)«Card(G). c. Create a model AI; based on the data So and the gene set G; as input variables. d. Validate AI; On the test set Sv,I and calculate the error E;. END (FOR) 2. Calculate the average error from E;. 3. Define a set Gm of the most frequently selected genes in all N iterations. 4. Create a final model M based on the whole data S and the set of genes Gm • After the above steps, a model M is created using the most frequently appearing genes in all N iterations - a gene set Gm • The gene set Gm though may not be optimal in terms of representing a minimum set of genes that through their interaction "cover" all clusters of the problem space. The parameters of the model M derived above were not optimized and may not be optimal either. A GA optimization algorithm for the optimization of both the gene set and the model parameter values as an additional procedure to the procedure above is given in Chap. 6. 7.2.1 Example An example is shown in Fig. 7.5, where class profiles of 14 types of cancer are extracted from a trained EFuNN (see Chap. 5) on previously extracted 399 gene expression variables and 14 output classes using publicly available data from (Ramaswamy et al. 2001) http://wwwgenome.wi.mit.edu/MPRlGCM.html. The profiles of each class can be
144
7 Gene/Protein Interactions- Modeling Gene RegulatoryNetworks (GRN)
modified through a threshold tuned for each individual class that defines the membership degree above which a gene should be either overexpressed or under-expressed in all rules of this class in order for this gene to appear in the profile. The last profile is of the CNS cancer. The profiles show that different genes are expressed highly in different types of cancer. The analysis is performed with the use of a proprietary methodology and software system SIFTWARE (www.peblnz.com) (Kasabov 2001). The profiles show that different genes are expressed highly in different types of cancer.
PR LU
co H
LV 8L
;
u
12
... UT
LE PE PA
eN
.. E
eNS lID
!ll
'!ll
;m Gtn. Nurnbff
'-----'"",1-'.1
- . - - f ' R I I-'.1"
I ...,jfO" r j ...,j[QT r
...... --1IJ11-'.1 J e-.·--ICO,I-'.1 j '_LV1!-'.1 j ,.........-""-lJ)L'I-'.1 j
M_
'1 -'.1"
~l"O&5r
...,j[QTr ...,j("'iii""r ...,jfO"r j ...,j[QTr
:.'!ll
IU1'W ..... -'-' 1-'.1
U_ _ _
R.... """"_~II -'.1
•....-_ -.... 11-'.1"" o-__ IOVII.:..J
- ..-,,,,,1-'.1 e-..-_ItHSI !-'.1
zn
l5/J
j~107'r
j ...,j[QTr I ...,jj""OW r J ...,j("'iii""r j ...,jl"O&5r j ...,jlO7' r j ...,jf"Oi'"r
399
uo- -'.1"
j
~ [QT
v_'__ 1 s...r;."
I
........'·-1
Fig. 7.5. Class profiles of 14 types of cancer extracted from a trained EFuNN on 399 inputs (gene expression values) and 14 outputs using data from (Ramaswamy et al. 2001). The profiles of each class can be modified through a threshold tuned for each individual class that defines the membership degree above which a gene should be either over-expressed (lighter sign) or under-expressed (darker sign) in all rules of this class in order for this gene to appear in the profile. The last profile is of the CNS cancer. See Color Plate 9
Among the CNS cancer group there are 3 clusters that have different gene expression profiles, as detected by an EFuNN ECOS trained system in Fig. 7.5 and shown in Fig. 7.6. The analysis is again performed with the software system SIFTWARE (www.peblnz.com). The system allows accessing a selected gene in the GenBank. For example, the highly expressed
7.3 Modeling Gene/Protein Regulatory Networks (GPRN)
145
gene (a darker line) in cluster 1 is gene with an accession number AA363338, which is not expressed in clusters 2 and 3 of the same eNS cancer data. (g)
.) unl,.1 neryous,,,,Iem (CNS) PEBl Genee.presslOt! Class ~ules
~
I
~td =
hlgn, Green = low)
-.J V;
........
I
~
s_.. G....
'O""i5" ,
I
Fig. 7.6. Among the CNS cancer group there are 3 clusters that have different gene expression profiles, as detected by an EFuNN ECOS trained system from Fig. 7.5. The highly expressed genes (lighter lines) in cluster 1, 2 and 3 of CNS cancer data are different. See Color Plate 9
7.3 Modeling Gene/Protein Regulatory Networks (GPRN) The aim of computational system biology is to understand complex biological objects in their entirety, i.e. at a system level. It involves the integration of different approaches and tools: computer modeling, large-scale data analysis, and biological experimentation. One of the major challenges of the systems biology is the identification of the logic and dynamics of gene-regulatory and biochemical networks. The most feasible application of systems biology is to create a detailed model of a cell regulation to provide system-level insights into mechanism-based drug discovery. System-level understanding is a recurrent theme in biology and has a long history. The term "system -level understanding" is a shift of focus in understanding a system's structure and dynamics in a whole, rather than
146
7 Gene/Protein Interactions - Modeling Gene Regulatory Networks (GRN)
the particular objects and their interactions. System-level understanding of a biological system can be derived from insight into four key properties (Dimitrov et al. 2004). 1. System structures. These include the gene regulatory network (GRN) and biochemical pathways. They can also include the mechanisms of modulation the physical properties of intracellular and multi cellular structures by interactions. 2. System dynamics. System behavior over time under various conditions can be understood by identifying essential mechanisms underlying specific behaviors and through various approaches depending on the systems nature: metabolic analysis (finding a basis of elementary flux modes that describe the dominant reaction pathways within the network), sensitivity analysis (the study of how the variation in the output of a model can be apportioned, qualitatively or quantitatively, to different sources of variation.), dynamic analysis methods such as phase portrait (geometry of the trajectories of the system in state space) and bifurcation analysis (bifurcation analysis traces time-varying change(s) in the state of the system in a multidimensional space where each dimension represents a particular system parameter (concentration of the biochemical factor involved, rate of reactions/interactions, etc). As parameters varied, changes may occur in the qualitative structure of the solutions for certain parameter values. These changes are called bifurcations and the parameter values are called bifurcation values). 3. The control method. Mechanisms that systematically control the state of the cell can be modulated to change system behavior and optimize potential therapeutic effect targets of the treatment. 4. The design method. Strategies to modify and construct biological systems having desired properties can be devised based on definite design principles and simulations, instead of blind trial-and-error. As it was mentioned above, in reality analysis of system dynamics and understanding the system structure are overlapping processes. In some cases analysis of the system dynamics can give useful predictions in system structure (new interactions, additional member of system). Different methods can be used to study the dynamical properties of the system: • Analysis of steady-states allows finding the systems states when there are no dynamical changes in system components. • Stability and sensitivity analyses provide insights into how systems behavior changes when stimuli and rate constants are modified to reflect dynamic behavior.
7.3 Modeling Gene/Protein Regulatory Networks (GPRN)
147
• Bifurcation analysis, in which a dynamic simulator is coupled with analysis tools, can provide a detailed illustration of dynamic behavior. The choice of the analytical methods depends on availability of the data that can be incorporated in into the model and the nature of the model. It is important to know the main properties of the complex system under investigation, such as robustness. Robustness is a central issue in all complex systems and it is very essential for understanding of the biological object functioning at the system level. Robust systems exhibit the following phenomenological properties: • Adaptation, which denotes the ability to cope with environmental changes. • Parameter insensitivity, which indicates a system's relative insensitivity (to a certain extent) to specific kinetic parameters. • Graceful degradation, which reflects the characteristic slow degradation of a system's functions after damage, rather than catastrophic failure.
All the above features are present in many of the CI methods and techniques and make them very suitable to modeling complex biological systems. Revealing all these characteristics of a complex living system helps choosing an appropriate method for their modeling, and also constitutes an inspiration for the development of new CI methods that posses these features. Modeling living cells in silico (in a computer) has many implications; one of them is testing new drugs through simulation rather than on patients. According to recent statistics, human trials fail for 70-75% of the drugs that enter them. Modeling gene regulatory networks (GRN) is the task of creating a dynamic interaction network between genes that defines the next time expression of genes based on their previous levels of expression. A simple GRN of 4 genes is shown in Fig. 7.7. Each node from Fig. 7.7 represents either a single gene/protein or a cluster of genes that have a similar expression over time, as illustrated in Fig. 7.8. A detailed discussion of the methods for GRN modeling can be found in (Dimitrov et al. 2004). Models of GRN, derived from gene expression RNA data, have been developed with the use of different mathematical and computational methods, such as: statistical correlation techniques, evolutionary computation, ANN, differential equations (both ordinary and partial), Boolean models; kinetic models; state-based models and others.
148
7 Gene/Protein Interactions - Modeling Gene Regulatory Networks (GRN)
CJ Fig. 7.7. A simplified gene regulatory network where each node represents a gene/protein (or a group of them) and the arcs represent the connection between them - either excitatory (+) or inhibitory (-)
1.5
0 .5
c:
o
·en
...~ - 0 .5
Co >C
W
- 1
-2 -2.5 L - - ' -_ _- ' - -_ _L . . . - _ - - - I ._ _- - ' -_ 4 10 2 6
Ti~e
_
- ' -_ _- ' - - _ - - - - ' ' - : - _
12
14
16
Fig. 7.8. A cluster of genes that are similarly expressed over time (17 hours). See Color Plate 10
In (Kasabov et al. 2004) a simple GRN model of 5 genes is derived from time course gene expression data of leukemia cell lines U937 treated with retinoic acid with two phenotype states - positive and negative. The
7.3 Modeling Gene/Protein Regulatory Networks (GPRN)
149
model, derived from time course data, can be used to predict future activity of genes as shown in Fig. 7.9. 1 ,5
• (?)'
1
~
• ...
33 827
•
21
.'>Y'
'-: ' 8 0 .5
"
. . ''8 o ' . '1!r _,
-0.5
~~. . ..
o
' rrC'd icli utls
O " -d '
10
20
40
30
50
60
70
Fig. 7.9. The time course data of the expression of 4 genes (#33,8,27,21) from the cell line used in (Kasabov et al. 2004). The first 4 time data points are used for training and the rest are the predicted by the model values of the genes in a future time
Another example of GRN extraction from data is presented in (Chan et al. 2006) where the human response to fibroblast serum data is used (Fig. 7.10) and a GRN is extracted from it (Fig. 7.11). lo!! l Ou-xp rc s viou )
2 ,----'------
-
-
T he Respo nse or Hu ma n Fi bro blas ts to Serum Data -.----
---'---
-
-.-----
-
-
-
-.---
-
-
-
-,---
-
-
--,
Fig. 7.10. The time course data of the expression of genes in the Human fibroblast response to serum data. See Color Plate 10
150
7 GenelProtein Interactions - Modeling Gene Regulatory Networks (GRN)
-0.3
Fig. 7.11. A GRN obtained with the use of the method from (Chan et al. 2006) on the data from Fig. 7.10
Despite of the variety of different methods used so far for modeling GRN and for systems biology in general, there is not a single method that will suit all requirements to model a complex biological system, especially to meet the requirements for adaptation, robustness and information integration.
7.4 Evolving Connectionist Systems (ECOS) for GRN Modeling
7.4.1 General Principles
Microarray data can be used to evolve an ECOS with inputs being the expression level of a certain number of selected genes (e.g.l 00) and the outputs being the expression level of the same genes at the next time moment as recorded in the data. After an ECOS is trained on time course gene expression data, rules are extracted from the ECOS and linked between each other in terms of time-arrows of their creation in the model, thus representing the GRN. The rule nodes in an ECOS capture clusters of input genes that are related to the output genes at next time moment. The extracted rules from an EFuNN model for example (see Chap. 5, Sect. 5.2) represent the relationship between the gene expression of a
7.4 Evolving Connectionist Systems (ECOS) for GRN Modeling
151
group of genes G(t) at a time moment t and the expression of the genes at the next time moment G(t+dt) , e.g.: IF gI 3(t) is High (0.87) and g23(t) is Low (0.9) (7.1) THEN g8 7 (t+dt) is High (0.6) and gI 03(t+dt) is Low
Through modifying a threshold for rule extraction one can extract stronger or weaker patterns of dynamic relationship. Adaptive training of an ECOS makes it possible for incremental learning of a GRN as well as adding new inputs/outputs (new genes) to the GRN. A set ofDENFIS models (see Chap. 5, Sect. 5.4) can be trained , one for each gene gi so that an input vector is the expression vector G(t) and the output is a single variable gi(t+dt). DENFIS allows for a dynamic partitioning of the input space. Takagi-Sugeno fuzzy rules, that represent the relationship between gene gi with the rest of the genes, are extracted from each DENFIS model, e.g.: IF gl is (0.63, 0.70, 0.76) and g2 is (0.71, 0.77, 0.84) and g3 is (0.71, 0.77, 0.84) and g4 is (0.59,0.66,0.72)
(7.2)
THEN g5 =1.84 - 1.26g1 -1.22g2 + 0.58g3 - 0.03g4 7.4.2 A Case Study on a Small GRN Modeling with the Use of ECOS
Here we used the same data of the U937 cell line treated with retinoic acid (Dimitrov et al. 2004) as shown in Fig. 7.9. The results are taken from (Kasabov and Dimitrov 2002). Retinoic acid and other reagents can induce differentiation of cancer cells leading to gradual loss of proliferation activity and in many cases death by apoptosis. Elucidation of the mechanisms of these processes may have important implications not only for our understanding of the fundamental mechanisms of cell differentiation but also for treatment of cancer. We studied differentiation of two subclones of the leukemic cell line U937 induced by retinoic acid. These subclones exhibited highly differential expression of a number of genes including c-Myc, Idl and Id2 that were correlated with their telomerase activity - the PLUS clones had about 100fold higher telomerase activity than the MINUS clones. It appears that the MINUS clones are in a more "differentiated" state. The two subclones were treated with retinoic acid and samples were taken before treatment (time 0) and then at 6 h, 1, 2, 4, 7 and 9 days for the plus clones and until day 2 for the minus clones because of their apoptotic death. The gene ex-
152
7 Gene/Protein Interactions - Modeling Gene Regulatory Networks (GRN)
pression in these samples was measured by Affymetrix gene chips that
contain probes for 12,600 genes. To specifically address the question oftelomerase regulation we selected a subset of those genes that were implicated in the telomerase regulation and used ECOS for their analysis. The task is to find the gene regulatory network G= {gl,g2,g3,grest-,grest+} of three genes gl=c-Myc, g2=Idl, g3=Id2 while taking into account the integrated influence of the rest of the changing genes over time denoted as grest- and grest+ representing respectively the integrated group of genes, expression level of which decreases over time (negative correlation with time) and the group of genes, expression of which increases over time (positive correlation with time). Groups of genes grest-, grest+ were formed for each experiment of PLUS and MINUS cell line, forming all together four group of genes. For each group of genes, the average gene expression level of all genes at each time moment was calculated to form a single aggregated variable grest. Two EFuNN models, one for the PLUS cell, and one - for the MINUS cell, were trained on 5 input vector data, the expression level of the genes G(t) at time moment t, and five output vectors - the expression level G(t+ 1) of the same genes recorded at the next time moment. Rules were extracted from the trained structure that describes the transition between the gene states in the problem space. The rules are given in as a transition graph on Fig. 7.12a and 7.12b.
a.--
--,
b
3
5
..... "'0
o I .. t! '04
p,.. .
........
. .0 '"I
I I
o C-myc
p~ 2
..... "'0
Olt 3
2
\ \
\ \ \
'0 1
1 C-myc
Fig. 7.12. (a) The genetic regulatory network extracted from a trained EFuNN on time course gene expression data of genes related to telomerase of the PLUS leukemic cell line U937. Each point represents a state of the 5 genes used in the model, the arrows representing (rules) transitions of the states. (b) The regulatory network of three time steps for the MINUS cell line represented in the 2D space of the expression level of the first two genes - c-Myc and Idl
7.5 Summary
153
Using the extracted rules, that form a gene regulatory network, one can simulate the development of the cell from initial state G(t=O), through time moments in the future, thus predicting a final state of the cell.
7.5 Summary This chapter gave some background information on gene and protein interactions in cells and neurons as GRN. These interactions were linked to phenotype processes, such as cell cancer development (the CNS cancer data), or a proliferation of a cell line (also leading to a cancerous cell). Each gene interacts with many other genes in the cell, inhibiting or promoting, directly or indirectly, the expression level of messenger RNAs and thus the amounts of corresponding proteins. Transcription factors are an important class of regulating proteins, which bind to promoters of other genes to control their expression . Thus, transcription factors and other proteins interact in a manner that is very important for determination of cell function. A major problem is to infer an accurate model for such interactions between important genes in the cell. To predict the models of gene regulatory networks it is important to identify the relevant genes. The abundant gene expression microarray data can be analyzed by clustering procedures to extract and model these regulatory networks. We have exemplified some methods of GRN discovery for a large number of genes from multiple time series of gene expression observations over irregular time intervals. One method integrates genetic algorithm (GA) to select a small number of genes and a Kalman filter to derive the GRN of these genes (Chan et al. 2006). GA is applied to search for smaller subset of genes that are probable in forming GRN using the model likelihood as an optimization objective . After GRNs of smaller number of genes are obtained, these GRNs may be integrated in order to create the GRN of a larger group of genes of interest. The method is designed to deal effectively with irregular and scarce data collected from a large number of variables (genes). GRNs are modeled as discrete-time approximations of firstorder differential equations and Kalman filter is applied to estimate the true gene trajectories from the irregular observations and to evaluate the likelihood of the GRN models. The next chapter links a GRN to a functioning (e.g. spiking) of a neuron and then - to the functioning of the whole ANN model, that can be compared with targeted behavior, e.g. using brain data, thus creating a more complex CNGM.
8 CNGM as Integration of GPRN, ANN and Evolving Processes
This chapter presents a methodology for CNGM that integrates gene regulatory networks with models of artificial neural networks to model different functions of neural system. Properties of all cell types, including neurons, are determined by proteins they contain (Lodish et al. 2000). In tum, the types and amounts of proteins are determined by differential transcription of different genes in response to internal and external signals. Eventually, the properties of neurons determine the structure and dynamics of the whole neural network they are part of. Interaction of genes in neurons affects the dynamics of the whole neural network model through neuronal parameters , which are no longer constant, but change as a function of gene expression. Through optimization of the gene interaction network, initial gene/protein expression values and neuronal parameters , particular target states of the neural network operation can be achieved , and meaningful relationships between genes, proteins and neural functions can be extracted . One particular instance where the time scale of gene expression matches and in fact determines the time scale of neural behavior is the circadian rhythm. A circadian rhythm is a roughly-24-hour cycle in the physiological processes of plants and animals. The circadian rhythm partly depends on external cues such as sunlight and temperature, but otherwise it is determined by periodic expression patterns of the so-called clock genes (Lee et al. 1998, Suri et al. 1999). Smolen et al. (Smolen et al. 2004) have developed a computational model to represent the regulation of core clock component genes in Drosophila (per, vri, Pdp-I, and Clk). To model the dynamics of gene expression, differential equations and first-order kinetics equations were employed for modeling the control of genes and their products. The model illustrates the ways in which negative and positive feedback loops within the gene regulatory network cooperate to generate oscillations of gene expression. The relative amplitudes and phases of simulated oscillations of gene expressions resemble empirical data in most of simulated situations. The model is based on transcriptional regulation of per, Clk (dclock), Pdp-I , and vri (vrille). The model postulates that histone acetylation kinetics make transcriptional activation a nonlinear function of
156
8 CNGMas Integration ofGPRN, ANN and Evolving Processes
[CLK]. Simulations suggest that two positive feedback loops involving Clk are not essential for oscillations, because oscillations of [PER] were preserved when Clk, vri, or Pdp-I expression was fixed. However, eliminating positive feedback by fixing vri expression altered the oscillation period. Eliminating the negative feedback loop, in which PER represses per expression, abolished oscillations. Simulations of per or Clk null mutations, of per overexpression, and of vri, Clk, or Pdp-I heterozygous null mutations altered model behavior in ways similar to experimental data. The model simulated a photic phase-response curve resembling experimental curves, and oscillations entrained to simulated light-dark cycles. Temperature compensation of oscillation period could be simulated if temperature elevation slowed PER nuclear entry or PER phosphorylation. The model of Smolen et al. (Smolen et al. 2004) shows that it is possible to develop detailed models of gene control of neural behavior provided enough experimental data is available to adjust the model. Models of particular gene networks need to be based on measured values of biochemical parameters, like the kinetics of activation or expression of relevant transcription factors. Use of parameter values that do not describe the in vivo situation can lead to erroneous predictions of genetic and neural dynamic behaviors (Smolen et al. 2000). In this chapter we will envisage CNGM for any brain function, namely by formulating: (1) how to model internal gene/protein dynamics, (2) how to link parameters of a neuron model to activities of genes/proteins, (3) which genes/proteins are to be included in the model, (4) how to optimize the CNGM parameters, (5) how to validate CNGM on real brain data, (6) how to discover new knowledge from CNGM, and finally (7) how to integrate CNGM with bioinformatics.
8.1 Modeling Genetic Control of Neural Development Majority of existing models on neural development are molecular and biochemical models that do not take into account the role and dynamics of genes (see e.g. (van Ooyen 2003)). Computational models were developed for early neural development, early dendritic and axonal morphogenesis, formation of dendritic branching patterns, axonal guidance and gradient detection by growth cones, activity-dependent neurite outgrowth, etc. Although these models can be taken one step further by linking proteins to genes, this step was actually performed only by Marnellos and Mjolsness (Mjolsness et al. 1991, Marnellos and Mjolsness 2003), Storjohann and Marcus (Storjohann and Marcus 2005) and (Thivierge and Marcus 2006).
8.1 Modeling Genetic Control of Neural Development
157
Mjolsness et al. (Mjolsness et al. 1991) and Mamellos and Mjolsness (Mamellos and Mjolsness 2003) have introduced a modeling framework for the study of development including neural development based upon genes and their interactions. Cells in the model are represented as overlapping cylinders in a 2-dimensional hexagonal lattice where the extent of overlap determines the strength of interaction between neighboring cells. Model cells express a small number of genes corresponding to genes that are involved in differentiation. Genes in broad terms can correspond to groups of related genes, for instance proneural genes or epithelial genes, etc. Abstracting from biochemical detail, genes interact as nodes of a recurrent network. They sum up activating and inhibitory inputs from other genes in the same cell at any given time t, the overall sum denoted as g:
e, = ITabp~(t)
(8.1)
b
where genes are indexed by a and b, Tab is the interaction between genes a and b within cell i, and pib(t) are gene product levels within that cell. The developmental model also includes interactions from neighboring cells such that ga(t)
= ITabp~(t) + IIZabPb (t) j*i
b
(8.2)
b
where Zab is the interaction between genes a and b in neighboring cells, and Pb(t) are gene product levels in the neighboring cell). Neighborhood of a cell consists of the six surrounding cells. Thus, genes in a cell interact as nodes in a fully recurrent network with connection weights depending on the kind of the interaction. Two kinds of interaction are allowed: an intracellular and an inter-cellular one. A gene a sums inputs from genes in the same cell and from the neighboring cells at time t. Level (concentration) pia(t) of the product of the gene a then changes according to (8.3)
where R; is the rate of production of gene a's product, Aa is the rate of decay of gene a product, and ha is the threshold of activation of gene a. Function o(x) E (0, 1) is a sigmoid function defined as
a(x)
= 0.5[1 +
x
~(1 + x 2 )
J
(8.4)
158
8 CNGM as Integration of GPRN, ANN and EvolvingProcesses
As authors of the developmental model state (Mamellos and Mjolsness 2003) , levels of gene products should be viewed as corresponding to gene product activities rather than actual concentrations and gene interactions should be viewed as corresponding more to genetic rather than specific biochemical (transcriptional, etc.) interactions. The gene network allows cell transformations in the model. For instance, cells may change their state (i.e., the levels of gene products or other state variables), change type, strength of interaction, can give birth to other cells , or die. These transformations are represented by a set of grammar rules, the L-grammar as in Lindenmayer systems. Rules are triggered according to the internal state of each cell (or other cells as well) and are of two kinds: discrete (leading to abrupt changes) and continuous (leading to smooth changes). A set of binary variables C keeps track of what rules are active in any particular cell at any given time, thus representing the influence of a meta-rule for the constraints as to what rules may be active in a cell at a time. Vector g for cell i is therefore described more accurately by the next equation, where if C;' = I, then the corresponding rule is active, if C;' = 0, then the rule is inactive:
gi = LC;T;P i + LC;LAijT~P j r
r
(8.5)
i
where T I ' is the interaction strength matrix for one-cell rule r, Pi is the state variable (gene product level) vector for cell i, T / is the interaction strength matrix for two-cell rule r . Variable r stands as a label for a particular rule, which can be, for instance, mitosis , cell death , interphase, and so on. Aij is a factor that modifies the influence of cell} on cell i. Models using the gene network framework can be formulated as optimization tasks that look for the model parameters so that the model optimally fits biological data or behaves in a certain desired manner. Optimization seeks the minimum of the objective (or error) function E(p), which depends on the state variable values. An example of the objective function can be the least-squares error function: E(p)
= L (P~MODEL (t) - P~DATA (t))
(8.6)
i.a.t
which is the squared difference between gene product levels in the model and those in the data, summed over all cells (i) , over all gene products (a) and over all times (t) for which data are available. The objective function in gene network models typically have a large number of variables and parameters, are highly nonlinear and cannot be solved analytically or readily optimized with deterministic methods. Therefore the more
8.1 Modeling Genetic Control of Neural Development
159
appropriate methods for optimization are stochastic optimization methods like simulated annealing (Cerny 1985) or evolutionary computation (Goldberg 1989). What is actually being optimized is the set of adjustable parameters of the gene regulatory network that is the gene interaction weights, activation thresholds, protein production and decay rates, etc. The gene network framework has been applied to modeling to the development of the Drosophila embryo at the blastoderm stage (Reinitz et al. 1995). This model included a well-characterized hierarchy of regulatory genes that control the early events of Drosophila embryogenesis by setting up their expression patterns along the embryo's length and dividing it into segments. The model yielded predictions and interpretations of experimental observations. Marnellos and Mjolsness applied this approach to modeling early neurogenesis in Drosophila and constructed models to study and make predictions about the dynamics of how neuroblasts and sensory organ precursor cells differentiate from proneural clusters (Marnellos and Mjolsness 2003). The gene interaction strengths were optimized in order to fit gene expression patterns described in experimental literature. The objective function was the least-squares one and optimization was done by means of simulated annealing. The Drosophila developmental model made predictions about how the interplay of factors such as proneural cluster shape and size, gene expression levels, and strength of cell-cell signaling determine the timing and position of neuroblasts and sensory organ precursor cells. The model also made predictions about the effect of various perturbations in gene product levels on cell differentiation. Optimization found optimal values for model parameters so that the system evolved from the initial state to the desired final one that matched experimental findings on gene expression data and developmental phenomena in Drosophila. This is a novel contribution of computational neurogenetic modeling where the optimization leads to optimal hidden parameter values, like interactions between genes that constitute the main prediction of the model. Construction of the hidden gene regulatory network enables predictions about consequences of gene mutations. Another example of a neurodevelopmental process that is dependent upon gene expression is formation of topographic maps in the brains of vertebrates. Topographic maps transmit visual, auditory, and somatosensory information from sensory organs to cortex and between the cortical hemispheres (Kaas 1997). Experimental evidence suggests that topographic organization is maintained also in sensory neural structures where learning occurs, in other words, tactile information is stored within the spatial structure of maps (Diamond et al. 2003). It is known that the topographic map formation depends on activity-independent (genetic) and activ-
160
8 CNGM as Integration of GPRN, ANN and Evolving Processes
ity-dependent processes (learning or activity-dependent synaptic plasticity) (Willshaw and Price 2003). To study the interplay between these processes a novel platform is under development called INTEGRATE (Thivierge and Marcus 2006). It is similar in nature to a novel computational programming system for integrated simulation of neural biochemistry, neurodevelopment and neural activity within a unifying framework of genetic control, called NeuroGene (Storjohann and Marcus 2005). NeuroGene is designed to simulate a wide range of neurodevelopmental processes, including gene regulation, protein expression, chemical signaling, neural activity and neuronal growth. Central is a computational model of genes, which allows protein concentrations, neural activity and cell morphology to affect, and be affected by, gene expression. Using this system, the authors have developed a novel model for the formation of topographic projections from retina to the midbrain, including activity-dependent developmental processes which underlie receptive field refinement and ocular dominance column formation. Neurons are controlled by the genes, which are evaluated in all cell components. Regulation of gene transcription and translation is simulated through the use of queries. During the evaluation of a gene within a given cell component , the gene queries the cell component, retrieving information about the biochemical , neural or morphological state of a cell component or its immediate environment. This information is used to determine the expression rate of the gene in that cell component , according to the gene's regulation section. It is the state of the individual cell component (not the cell as a whole) which determines the expression rate of the gene. Effects of the gene, including protein production, apply to the cell component, such as dendrites, postsynaptic sites and growth cones. The expression of a gene can thus be limited to certain cell component type. The properties of simulated proteins are defined as part of the corresponding gene definition. Genes' influence on cellular behavior, morphology and neural properties in nature is mediated through molecular interactions involving proteins and other molecules. In NeuroGene programming language, this relationship is modeled by actions of genes. The actions are only invoked when and where the gene is expressed (i.e., the expression rate is greater than zero), reflecting the causal relationship between gene expression and cellular changes. NeuroGene can thus represent genetic control over cellular biochemistry , morphology and neural activity. Gene expression within a particular cell component can depend on extracellular protein concentrations, concentration gradients and/or the average concentrations of membrane bound proteins bound to neighboring cell components. Neural activity can affect gene expression through queries. This can be used to model genes which are expressed in response to neural activity.
8.2 Abstract Computational Neurogenetic Model
161
A case study of modeling projection formation from retina to tectum involves genes encoding the properties and expression profiles of known proteins (ephrins and Eph receptors), genes encoding postulated proteins such as retinal and tectal cell markers, and genes causing morphological change, including growth cone formation (Storjohann and Marcus 2005). The authors also implemented the learning rule introduced by Elliott and Shadbolt (Elliott and Shadbolt 1999) to model the competition among presynaptic terminals for the postsynaptic protein. The learning rule is encoded entirely in simulated genes. NeuroGene simulations of activitydependent remodeling of synapses in topographic projections had two results in accordance with experimental data. First, retino-tectal arbors, which initially form connections to many tectal cells over a large area, become focused so that each retinal ganglion cell connects to only one or a few tectal cells. This improves the topographic ordering of the projection. Second, the tectum, which receives overlapping topographic projections from both eyes, becomes subdivided into domains (known as ocular dominance columns) which receive neural input exclusively from one or the other eye. In addition, NeuroGene successfully modeled the EphA knockin experiment in which the retinal EphA level was increased and the resulting retino-tectal projections were specifically disrupted (Brown, Yates et al. 2000). NeuroGene can be considered to be a neurogenetic model in spite it does not include interactions between genes. Genes obey the known expression profiles and these can be changed as a consequence of mutation, gene knockout or knockin, and thus the model can be used for predictions of some neurodevelopmental disorders of the visual tract in vertebrates.
8.2 Abstract Computational Neurogenetic Model This methodology was first introduced in (Kasabov and Benuskova 2004,2005). In general, we consider two sets of genes: a set G gen that relates to proteins of general cell functions and a set G spcc that codes specific neuronal information-processing proteins (e.g. receptors, ion channels, etc.). The two sets form together a set G ={G f , G2, .. .. Gn } that forms a gene regulatory network (GRN) interconnected through matrix of gene interaction weights W (see Fig. 8.1). Proteins that mediate general cellular or specific information-processing functions in neurons are usually complex molecules comprised of several subunits, each of them being coded by a separate gene (Burnashev and Rozov 2000). We assume that the expression level of each gene g/t+L1t) is a nonlinear function of expression levels
162
8 CNGMas Integration ofGPRN, ANN and Evolving Processes
of all the genes in G. Relationship can be expressed in a discrete form (Weaver et al. 1999, Wessels et al. 2001) , i.e.:
+
g/t M) = Wi'
+,,(~ w"g, (t)J
(8.7)
where: N G is the total number of genes in G, WjO ~ 0 is the basal level of expression of gene j and the gene interaction weight Wjk represents interaction weight between two genes j and k. The positive interaction, Wj k > 0, means that upregulation of gene k leads to the upregulation of gene j. The negative interaction, Wj k < 0, means that upregulation of gene k leads to the downregulation of gene j. We can work with normalized gene expression values in the interval git) E (0, I). Initial values of gene expressions can be small random values, i.e. giO) E (0,0.1). It is a common practice to derive the gene interaction matrix W= {wjd (see Fig. 8.1) based on all gene expression data being collected at the same time intervals /)"t (Kasabov et al. 2004). In a living cell, gene expression, i.e. the transcription of DNA to messenger RNA followed by translation to protein, occurs stochastically, as a consequence of the low copy number of DNA and mRNA molecules involved. It has been shown at a cell level that the protein production occurs in bursts, with the number of molecules per burst following an exponential distribution (Cai et al. 2006). However, in our approach, we take into account the average gene expression levels and average levels of proteins taken over the whole population of cells and over the whole relevant time period. We assume a linear relationship between protein levels and gene expression levels. The linear relationship in the next equation is based on findings that protein complexes, which have clearly defined interactions between their subunits, have highly correlated levels with mRNA expression levels (Jansen et al. 2002 , Greenbaum et al. 2003). Subunits of the same protein complex show significant co-expression, both in terms of similarities of absolute mRNA levels and expression profiles, e.g., subunits of a complex have correlated patterns of expression over a time course (Jansen et al. 2002). This implies that there should be a correlation between mRNA and protein concentration, as these subunits have to be available in stoichiometric amounts for the complexes to function (Greenbaum et al. 2003). Thus the protein level pit+Llt) reads Np j
p/I + /),1) = ZjO+ 2::>jkgk (I) k=!
(8.8)
8.2 AbstractComputational Neurogenetic Model
163
where: Npj is the number of protein j subunits, ZjO 2 0 is the basal concentration (level) of protein j and Zjk 2 0 is the coefficient of proportionality between subunit gene k and protein j (subunit k content). Time delay f...t corresponds to time interval when protein expression data are being gathered. Determining protein levels requires two stages of sample preparation. All proteins of interest are separated using 2-dimensional electrophoresis, followed by identification using mass spectrometry (MacBeath and Schreiber 2000). Thus in our current model the delays f...t represent the time points of gathering both gene and protein data.
Fig. 8.1. What are the coefficients of the gene interaction matrix W? Which genes and which gene interactions lead to a neural spiking activity with particular characteristics? This is the main question which we will ask in our research. For simplicity we illustrate only a small GRN. Solid (dashed) lines denote positive (negative) interactions between genes, respectively
Some protein levels are directly related to the values of neuronal parameters P, such that
PJCt) = PJCO) PJCt)
(8.9)
where: PJCO) is the initial value of the neuronal parameter at time t = 0, and PJCt) is a protein level at time t. In such a way the gene/protein dynamics is linked to the dynamics of artificial neural network (ANN). The CNGM model from Eq. 8.7 to Eq. 8.9 is a general one and can be integrated with any neural network model, depending on what kind of neural activity one wants to model. In the presented model we have made several simplifying assumptions: • Each neuron has the same GRN, i.e. the same genes and the same interaction gene matrix W.
164
8 CNGM as Integration of GPRN, ANN and Evolving Processes
• Each GRN starts from the same initial values of gene expressions. • There is no direct feedback from neuronal activity or any other external factors to gene expression levels or protein levels. This generic neurogenetic model can be run continuously over time in the following way: 1. Set initial expression values of the genes G, G(t = 0), in the neuron and the matrix W of the GRN, basal levels of all genes and proteins, and the initial values of neuronal parameters p(t = 0), if that is possible. 2. Run the GRN and calculate the next vector of expression levels of the gene set G(t+~t) using equation (8.7). 3. Calculate concentration levels of proteins that are related to the set of neuronal parameters using equation (8.8). 4. Calculate the values of neuronal parameters P from the gene state G using equation (8.9). 5. Update the activity of neural network based on new values of parameters (taking into account all external inputs to the neural network). 6. Go to step 2. The biggest challenge of our approach and the key to the predictions of CNGM is the construction of the GRN transition matrix W, which determines the dynamics of GRN and consequently the dynamics of the ANN. There are several ways how to obtain W: 1. Ideally, the values of gene interaction coefficients
are obtained from real measurements through reverse engineering performed on the microarray data (Kasabov and Dimitrov 2002, Kasabov et al. 2004). 2. The values of W elements are iteratively optimized from initial random values, for instance with the use of genetic algorithm (GA), to obtain the desired behavior of the ANN. The desired behavior of the ANN can simulate certain brain states like epilepsy, schizophrenic hypofrontality, learning, etc. This behavior would be used as a "fitness criterion" in the GA to stop the search process for an optimal interaction matrix W. 3. The matrix W is constructed heuristically based on some assumptions and insights into what result we want to obtain and why. For instance, we can use the theory of discrete dynamic systems to obtain a dynamic system with the fixed point attractor(s), limit cycle attractors or strange attractors (Katok and Hasselblat 1995). 4. The matrix W is constructed from databases and literature on geneprotein interaction. 5. The matrix W is constructed with the use of a mix of the above methods. wij
8.3 Continuous Model of Gene-Protein Dynamics
165
The above method 2 of obtaining coefficients of Wallows us to investigate and discover relationships between different GRNs and ANN states even in the case when gene expression data are not available. An optimization procedure to obtain this relationship can read: 1. Generate a population of CNGMs, each with randomly generated values of coefficients for the GRN matrix W, initial gene expression values g(O), and initial values of ANN parameters P(O); 2. For each set of parameters run the CNGM over a period of time T and record the activity of the neurons in the associated ANN; 3. Evaluate characteristics of the ANN behavior (e.g. connectivity, level of activity, spectral characteristics ofLFP, etc); 4. Compare the ANN behavior characteristics to the characteristics of the desired ANN state (e.g. normal wiring, hypoactivity, etc.); 5. Repeat steps (1) to (4) until a desired GRN and ANN model behavior is obtained. Keep the solution if it fulfils the criterion; 6. Analyze all the obtained optimal solutions of GRN and the ANN parameters for significant gene interaction patterns and parameter values that cause the target ANN model behavior. In the step 1, which is the generation of the population of CNGM, we can apply the principles of evolutionary computation (see e.g. Chap. 6 in this book) with the operations of crossover and mutations of parameter values. In such a way we can simulate the process of evolution that has led to the neural GRN with the gene interactions underlying the desired ANN behavior.
8.3 Continuous Model of Gene-Protein Dynamics Instead of the discrete gene-protein dynamics introduced in the previous section on abstract CNGM we can use the system of continuous equations. Let us formulate a set of general equations for the gene-protein dynamic system. As a first gross simplification, we will again assume that every neuron has the same gene/protein regulatory network (GPRN) - that is, interactions between genes and proteins are governed by the same rules in every neuron. This assumption is partly justified by the fact that gene and protein expression data are usually average data obtained from a pool of cells, rather than from individual cells. The following set of nonlinear delay differential equations (DDEs) was inspired by (Chen and Aihara 2002), who derived the general conditions of their local stability and bifurcation for some simplifying assumptions. Particular terms on the right-hand side
166
8 CNGM as Integrationof GPRN, ANN and Evolving Processes
of equations were inspired by the "rough" network models from (Wessels et al. 2001). Underlying GPRN is illustrated in Fig. 8.2.
8.... - ____ -,
From other nodes
P1
(t
-t p 1)
•••••••••••••
••••••••••
P J(t -
•••••••
.+
+•
..,
. To other nodes
Link to the neuron's
~~ Fig. 8.2. Schematic illustration of the gene/protein regulatory network (GPRN). Description of symbols is in the text. Protein properties and concentrations are linked to neuronal parameters, like for instance magnitude of excitation and inhibition, etc. GPRN illustrationwas inspired by Fig.l in (Chen and Aihara 2002)
We will consider the dynamics of genes and proteins to be continuous that is, we can describe their changes in a continuous time. We will represent the mRNA levels of all the relevant genes with the vector m = (mr, mz, ... , m n ) and the corresponding protein levels with the vector p = (pr, P2, ···,Pn)' Components ofmRNA vector change as: dm ' dt
= Am;
(n
K
am, LWijPj(t-r p ) + LV;kXk(t-r FI
)
X)
+ bm, -Am;m;(t)
(8.10)
k=l
where:
mlt) is the overall level of mRNA for the
t h gene at time t;
am, is a nonlinear sigmoid regulation-expression (activation) function for the t h gene; th Am, is the amplitude of the activation function for the i gene;
8.3 Continuous Model of Gene-Protein Dynamics
167
are the interaction coefficients between the t h andj" gene (while the interaction itself is mediated by proteins activating corresponding transcription factors); - Pi is the level of the /h protein or protein subunit;
-
wij
T Pi is the delay, with which the /h protein influences the transcription of the t h gene; - Vik is the influence of the J(h external factor upon the gene i (influence of hormone, drug, etc.); - x, is the concentration of the k 1h external factor; 1h - 'x, is the delay, with which the k external factor influences the tran1h scription of the i gene; h - bm. is the bias, i.e., the basal expression level of the t gene group; - Am. is the degradation rate of the mRNA of the i'h gene group. -
Analogically, protein levels change as:
:' = Api O'p, mi(t-T m)+ ~UikYk(t-Ty)+bp, -Ap,Pi(t)
d. .
(
K'
)
(8.11)
where:
- plt) is the level of a protein (or protein subunit) coded for by the t h gene
-
-
-
-
-
-
(by the level of protein we mean the concentration of a fully functional protein); h 0' p, is a nonlinear sigmoid synthesis function for the t protein (note, we consider that one protein is coded for by only one gene); h A p is the amplitude of the synthesis function for the t protein; 1h m, is the overall level ofmRNA for the i gene; 'm, is the delay from initiation of transcription of the i1h gene family till the end of synthesis of the i 1h protein (on the order of tens of minutes); 1h Uik is the influence of the k external factor upon the protein (hormone, drug, but also a concentration of free ribosomes, etc.); Yk is the concentration of the k' h external factor; 'v, is the delay, with which the k'h external factor influences the protein level; 1h b p, is the bias, i.e., the basal level of the i protein; 1h Ap , is the degradation rate of the i protein;
r
The linear difference form of Eq. 8.10 without decay was used by (D'Haeseleer et al. 1999) to model mRNA levels during the rat eNS development and injury. Similar equation as Eq. 8.11 for the protein (gene
168
8 CNGMas Integration of GPRN,ANN and Evolving Processes
product) dynamics was used by (Marnellos and Mjolsness 2003), to model early neurogenesis in Drosophila. Eq. 8.11 is a more general form of the commonly used mathematical descriptions of the protein mass balance (McAdams and Arkin 1998). However it is less complex than the detailed model of Mehra et al., which takes into account: ratio of total mRNA to total ribosomes, product of average ribosomal binding affinity and total mRNA concentration, percentage of free ribosomes, distribution of active ribosomes over the polysomes, ratio of irreversible initiation or elongation rate constant to the termination rate constant for each mRNA, and ratio of protein degradation rate constant to translation termination rate constant (Mehra et al. 2003). This model was developed to provide insight into a reported lack of correspondence between mRNA levels and protein levels (Lee et al. 2003). There are presumably at least three reasons for the poor correlation generally reported in the literature between the level of mRNA and the level of protein: (1) complicated and varied post-translational processes, (2) variable degradation of proteins, and (3) error and noise in both protein and mRNA measurements (Greenbaum et al. 2003). However, this lack of correlation does not hold for all classes of proteins/mRNAs. In fact protein complexes that have clearly defined interactions between their subunits have highly correlated levels with mRNA expression levels (Jansen et al. 2002, Greenbaum et al. 2003, Fraser et al. 2004). Subunits of the same protein complex show significant co-expression, both in terms of similarities of absolute mRNA levels and expression profiles, e.g., subunits of a complex have correlated patterns of expression over a time course (Jansen et al. 2002). This implies that there should be a correlation between mRNA and protein concentration, as these subunits provide a special case as they have to be available in stoichiometric amounts of proteins for the complexes to functions (Greenbaum et al. 2003). And this is exactly the case of proteins in our model, which are receptors and ion channels, comprised of exact ratios of subunits. Thus, the use of Eq. 8.2 is justified in our model. Eq. 8.10 and Eq. 8.11 are the so-called delay differential equations. Delay differential equations are similar to ordinary differential equations, but their evolution involves past values of the state variable. The solution of delay differential equations therefore requires knowledge of not only the current state, but also of the state a certain time previously (Weisstein 1999-2006). For DDEs we must provide not just the value of the solution at the initial point, but also the history, that is the solution at times prior to the initial point (Drager and Layton 1997). As of MATLAB 6.5 (Release 13), the DDEs solver dde23 is part of the official MATLAB release. The next step is to link parameter values to protein concentrations. Let P, denotes the j'h parameter of a model neuron. If Pj E (0, 1) is the normal-
8.3 Continuous Model of Gene-Protein Dynamics
169
ized level of protein concentration, then the value of parameter P, is directly proportional to the concentration of protein Pt- in such a way that P/t)
=
P j (t)(Pt
ax
-
Pt
n
) - Pj
ffiin
(8.12)
where: Pj max and P,min are maximal and minimal values of the lh parameter, respectively. Other relations between proteins and parameters are also possible. The system of introduced equation allows for investigation of how deleted or mutated genes can alter the activity of a model neural network. Fig. 8.3 summarizes temporal evolution of activity of three system levels, i.e. three dynamic systems that we want to integrate into one integrated dynamic system of CNGM. The first dynamic system is the ANN with its measurable output, for instance level of electrical activity. Below that we have changes in protein levels. Changes in protein concentrations lag behind the changes in gene expression levels by the order of tens of minutes, even hours, due to the process of transcription, translation, and posttranslational modification (Lodish et al. 2000). Protein levels are directly related to parameters of neuronal signaling, like fast and slow excitation (inhibition), etc. Gene expression levels (expressed as mRNA levels) change slowly - on the order of hours and so do protein levels and parameter values. Ideally we would know all the parameters in Eqs. 8.9 and 8.10, that is, ideally we would know all the delays, biases, shapes of activation functions, amplitudes, degradation rates, and last but not least, the interaction matrix W = {wij}' Then we could simulate the integrated CNGM and obtain Fig. 8.3 for the real time. Unfortunately, the situation is far from ideal. In reality, we would have to proceed in the following steps: 1. Obtain gene expression data (mRNA levels) from the relevant neural system for discrete sampling intervals let us say for every hour. 2. Then use the extrapolation technique (for instance the extended Kalman filter (Kasabov et al. 2004)) to extrapolate missing values for mRNA levels, to obtain the curves for all m;(t)'s. In such a way, we can obtain the values of m, for, let us say, every minute. 3. Then we need to calculate the levels of proteins according to Eq. 8.10 to estimate the values of parameters according to the Eq. 8.11. 4. For predictions on gene interactions we need to infer the values of wij interaction coefficients between the pairs lh and lh gene by means of reverse engineering (Kasabov et al. 2004).
170
8 CNGM as Integration of GPRN, ANN and Evolving Processes
I Mea surabl e output: level of some
activity
)
Time (hrs)
P3
I
102040 min
c
Time (hrs)
Samp ling rate : every hour
mRNA
Gl G2
G3 2
3
4
5
7
Ti me (hrs)
Fig. 8.3. Summary of three dynamic systems within one integrated dynamic system - computational neurogenetic model. (a) abstract ANN activity like average spiking frequency, level of glucose consumption, etc., (b) protein dynamics, and (c) gene dynamics
The most important is the first step, i.e. the extrapolation , which will enable us to work with values of mRNA s and consequently parameter values in minute intervals. Different methods can be used , for instance the Kalman filter (Kasabov et al. 2004 ), evolutionary optimization (Whitehead et al. 2004), state space equations (Wu et al. 2004), etc. Then the biggest challenge is to estimate the delays from initiation of trans cription of the gene families till the end of synthesis of proteins. Alternatively , protein level s can be directly determined by mass spectrometry (MacBeath and Schreiber 2000). We need temporal courses of protein concentrations for updating neuronal parameters to simulate ANN. Proper updating of neuronal parameters is crucial for explaining changes in the ANN output why they occur and when . We can make rough qualified estimates of protein-synthesis delays usin g the information of the length of the relevant proteins and the time , which is needed for their gene s transcription and the
8.4 Towards the Integration of CNGM and Bioinfonnatics
171
subsequent protein translation and posttranslational modifications (Lodish et al. 2000). Alternatively, instead of simulating the ANN for the whole time of evolving gene-protein dynamics, we would like to suggest simulations of ANN only at particular interesting intervals of the gene-protein dynamics, as it is illustrated in Fig. 8.4. ANN output behavior: for instance the level of activity
B~EJ i \ r
Protein level
EJ
.:
P1
P3
\
t
Time (hrs)
Sampling rate: at every interesting interval of gene-protein dynamics
Fig. 8.4. Sampling of the ANN output at interesting time intervals, based on some heuristics. The average level of activity can be measured as the number of spikes, level of glucoseconsumption, etc.
Interesting intervals for sampling the ANN output can be based on some kind of heuristics - for instance based on the knowledge that at that particular time instant something has happened - for instance, an animal fell asleep. Otherwise this sampling can occur at intervals where the parameters have their extreme values, at intersections of values, etc.
8.4 Towards the Integration of CNGM and Bioinformatics Is there a genetic basis for the interaction matrix W between many genes, which is at the heart of our abstract CNGM? In this section we would like to provide an argument that such a genetic basis indeed exist. We have investigated through sequence analysis, the excitatory and fast inhibitory receptors (i.e., AMPAR, NMDAR and GABRA) gene groups (Benuskova et al. 2006).
172
8 CNGM as Integration of GPRN, ANN and Evolving Processes
Table 8.1. List of subunit proteins for AMPAR, GABRA and NMDAR. Glu means glutamate, GABA is the y-aminobutyric acid
gilnumber] gi11169959[ gil23831146[ gi[1169961[ gil1346142[ gil114969711 gi[142856031 gi[145481621 gil24926291 gi118201966[ gi138327554[ gil13460781 gi[45576031 gil1346079! gil3995191 gil238311281 gi1455946[ gi11207731 gil278201211 gi[38788155[ gi113959689[
Human Protein Glu AMPAR 1; (GluR-l) (GluR-A) (GluR-Kl) Glu AMPAR 2; (GluR-2) (GluR-B) (GluR-K2) Glu AMPAR 3; (GluR-3) (GluR-C) (GluR-K3) Glu AMPAR 4; (GluR-4) (GluR4) (GluR-D) Glu NMDAR 1 isoform NRl-l precursor Glu NMDAR subunit epsilon 1 precursor; (NR2A) (NMDAR2A) Glu NMDAR subunit epsilon 2 precursor; (NR2B) (NMDAR2B) Glu NMDAR subunit epsilon 3 precursor; (NR2C) (NMDAR2C) Glu NMDAR subunit epsilon 4 precursor; (NR2D) (NMDAR2D) GABA A receptor, alpha 1 precursor GABAA receptor alpha-2 subunit precursor GABA A receptor, alpha-3 precursor GABA A receptor alpha-4 subunit precursor GABA A receptor alpha-5 subunit precursor GABAA receptor beta-I subunit precursor GABAA receptor beta 2 subunit GABAA receptor beta-3 subunit precursor GABAA receptor gamma-l subunit precursor GABA A receptor, gamma 2 isoform 1 precursor GABA A receptor gamma-3 subunit precursor
Gene
Sequence length
GRIAI
906 aa
GRIA2
883 aa
GRIA3
894aa
GRIA4
902 aa
GRIN 1
885 aa
GRIN2
1464 aa
GRIN3
1484 aa
GRIN4
1233 aa
GRIN5
1336 aa
GABRAI
456 aa
GABRA2
451 aa
GABRA3
492 aa
GABRA4
554 aa
GABRA5
462 aa
GABRBI
474 aa
GABRB2
474 aa
GABRB3
473 aa
GABRGI
465 aa
GABRG2
475 aa
GABRG3
467 aa
We mainly focused on them because they playa major role in most of the mental disorders through their direct or indirect interactions with several other genes/proteins and through specific parameter functions (e.g.,
8.4 Towards the Integration ofCNGM and Bioinforrnatics
173
excitation and inhibition). Each of these proteins is comprised of several subunits, and each subunit is coded by a separate gene (see Table 8.1). Particular subunit content of these receptor proteins can vary depending on the brain region (Burnashev and Rozov 2000) therefore we perform the bioinformatics analysis on all subunit genes. Initially, the information related to expression of these subunit genes, mutations, etc. was collected through relevant literature survey followed by sequences retrieval from NCBI database. We did preliminary analysis for searching the common motifs (patterns that occurs repeatedly in a group of related protein or DNA sequences) among these subunits. Through this initial analysis we observed that all detected motifs were belonging to similar protein families, like ligand-gated ion channel, receptor family ligand binding region and neurotransmitter-gated ion channel ligand binding domain. This observation motivated us for performing the detailed residual inspection. For our investigation, we employed the standard operating bioinformatics procedure, i.e. comparative analysis from similarity measures that is widely accepted approach by most of neuroscientists. The method is familiarly known as multiple sequence alignments (MSA). Scope of this bioinformatics technique is twofold: (a) gains understanding to identify the shared regions of homology; (b) determines the consensus sequence of several aligned sequences. For such strategies, there are however many applications available free to academic users, but here in our case we used the CLUSTALW package (http://align.genome.jp/) developed by Koichi Ohkubo, (Genome Net), for details see Chenna (Chenna et al. 2003). A specific reason for picking this software was that it is mostly cited and secondly its output is a much readable representation than others in the field. Sequence similarity between all subunits was visualized, then we later used Box-Shade program V3.21 (http://www.ch.embnet.org/software/ BOXjorm.html) in order to format our multiple alignment results for explanation purpose. Fig. 8.5 highlights the most important observation and in general indicates that comparison of sequences of the 20 subunit proteins has revealed a number of conserved residues. As a biological fact we know the number of structurally conserved residues increases with the binding site contact size, thus we expect similar function for these amino acids in this case also. Here we have observed the extent to which these conserved residues are preserved, which is generally the case in most channel proteins and neuroreceptors, but the most interesting investigation is the consistent conservation of phenylalanine (F at position 269) and leucine (L at position 353) in all 20 proteins with no mutations. We expect these residues to play the roles as binding centers for interaction of these proteins with other
174
8 CNGM as Integrationof GPRN, ANN and Evolving Processes
genes/proteins that have a regulatory effect upon these receptors. Therefore, we expect that the gene interaction observations, which we infer from our abstract CNGM, may be justified because of such truly conserved residues. However we must say that all such hypotheses remain unproven and these predictions need to be tested through laboratory experimentation. For these neuronal proteins we also obtained a dendogram (Fig. 8.6) that represents the evolutionary related closeness between these subunit proteins. More closely related pairs of neuronal protein sequences have aligned most readily to each other than more divergent pairs. Gene
GABRA3 GABRAS GABRAI GABRA2 GABRA4 GABRGI GABRG2 GABRG3 GABRB2 GABRB3 GABRBI GRIA2 GRIA3 GRIM GRIAI GRIN2 GRIN3 GRIN4 GRINS
'"
Position
260
o
IGEYVVMII IGBYIIMI IGEYVVMII IGBYIVMI IGBYIVMI
LKRKIGYFV LKRKIGYFV LKRKIGYFV LKRKIGYFV LRRKMGYFM
SGDYVIMII
LSRRMGYFI SVCFIFV'F LSRRMGYFI SVCFIFVPS LSRRMGYFI IVCFLFVF LKRNIGYFI MGCFV LKRNIGYFI MGCFVFVFL LKRNIGYFI MGCFVFVF DPLAYEIWM LIIISSYI DPLAYEIWM ..• LIIISSYI DPLAYEIWM LIIISSYI DPLAYEIWM LIIISSYI EPFSASVWV VIFLASYI EPFSADVWV VIFLASYI EPYSPAVWV VIFLASYI BPYSPAVWV VIFLASYI
'" SGDYVVMS . . . AGDYVVJfII
IGSYPRLSL IGAYPRLSL IGAYPRLSL PQKSKPGVF
POKSKPGVF PQKSKPGVF PQKSKPGVF .SNGIVSPS .SNGIVSPS .SNGIVSPS .SNGIVSPS
350 AVCYAFVFS AVCYAFWFS AYCYAFVPS AVCYAFVFS AVCFAFVFS
360 BFAIVNYPI EFAIVNYPI EFAIVNYFI EFAIVNYFI EFAAVNYFI YGILHYFI YGILHYFV YAILNYYS EYALVNYIF EYAFVNYIF EYAFVNYIF FLIVERMV
. . . . . . . . . . . .
FLIVERNV ••• LIVERMV . FLIVERMV . FMIQEEFV . FHIQEBYV ••.
FMIQBQYI ••• FMIQBBYV •••
Fig. 8.5. Multiple alignments of all 20 subunits of 3 neuronal informationprocessing proteins showing consistent conservation of phenylalanine (F) at position 269 and leucine (L) at 353
r-c.
GRIN2 GRIN3 GRIN5 GRIN4
I
rlGABRB3
GABRB2 GRIA2
GABRGI r-
GABRG2 GABRG3
~GABRAI
GABRA4 GABRAS GABRAJ GABRA2
GRINI
rI GA!lRBl '1
GRIA3 GRIA4
GRIAI
Fig. 8.6. Phylogenetic tree representing 20 subunit proteins of AMPA, GABAA and NMDA receptors
8.5 Summary
175
By means of bioinformatics analysis of subunit proteins of AMPA, GABA A and NMDA neuroreceptors new information about preserved protein segments has been extracted. In particular, the evolutionary linkage of protein families indicates that most of the neuronal subunits are very closely related with each other, which further support the hypothesis that these neuroreceptor subunits are all required for proper and complete functioning of that receptor protein and thus their expression should be coordinated within one gene group. Based on the result of analysis that clearly showed us the extent of conservation of many amino-acid residues among all investigated receptor subunits, we assume that these regions can be the basis for mutual interactions not only between the subunit genes of one receptor but also between different neuroreceptors. In fact, mutual interaction between different receptors (i.e. AMPAR, NMDAR and GABRA) has been recently confirmed experimentally to occur on the genetic level (Salonen et al. 2006).
8.5 Summary Complex interactions between genes and proteins in neurons affect the dynamics of neural networks in the brain. Gene expression values may change due to internal dynamics of a GPRN or due to external factors, like hormones, electrical activity, etc. We can expect that different initial gene expression values, and even different gene interactions can lead to the same outcome in terms of neuronal activity. However, in the diseased brain, either altered initial expression values, mutated genes and/or altered interactions within GPRN lead to abnormalities in network activity. Biological sequence analysis and representation of biological properties like discovered conserved residues and motifs, location of gene on a chromosome, mutation etc. of these interacting molecules enhances the chances of better tuning of the gene interaction network used in the CNGM experiments and also in assignment of initial gene/protein expression values through which different states of the neural network operation can be achieved. Protein function usually imposes tight constraints on the evolution of specific regions of protein structure. Thus, proteins with a common evolutionary history and function can frequently be identified from the occurrence of clusters of conserved residues in their amino acid sequences. Residues directly or indirectly involved in function are often clustered in a short sequence motif (signature, pattern, or fingerprint) that is conserved across the different proteins sharing that function. Therefore molecular se-
176
8 CNGM as Integration of GPRN, ANN and Evolving Processes
quence analysis through bioinformatics approach can strongly enhance the biological plausibility of neurogenetic models. Here in our case study, sequence analysis was done on each specific subunit protein of AMPA, GABA A and NMDA receptors. Each subunit is encoded by individual gene. Particular subunit content of these receptor proteins can vary depending on the brain region (Burnashev and Rozov 2000) therefore we performed the bioinformatics analysis on all subunit genes. Based on the bioinformatics analysis we hypothesize that the conserved regions can be the basis for mutual interactions between these subunits both at the level of their composite receptor and between different receptors. In real neural networks neuronal parameters that define the functioning of a neural network depend on genes and proteins in a complex way. Gene expression values change due to internal dynamics of the gene/protein regulatory network, initial conditions of the genes and external conditions. All this may affect gradually or quickly the functioning of the neural network as a whole. Realistic models of gene networks within neural networks should account for these processes. Future research will concentrate on animal data and around the following issues: 1. "What-if analysis". What happens if one or few particular genes are erased or mutated (i.e. data are collected from knock-out gene technology)? What happens if interactions within the GRN change? What happens if external factors are included? In such a way our approach can serve as a noninvasive test system. 2. Introduction of learning rule(s) into an ANN model and corresponding genes into the GRN. First steps are presented in the next chapter of this book. 3. Exploration of possibilities of modeling genetically caused brain disorders such as epilepsy, Parkinson's disease, etc. The goal is to make predictions about gene interactions to aid experimental research on gene interactions in various states and conditions of the brain. Basic genetic data are presented and some ideas are conceived in the last chapter of this book.
9 Application of CNGM to Learning and Memory
Before we introduce a tentative CNGM of learning and memory at a cellular level, we need to consider the relevant rules of synaptic plasticity, which represent knowledge about the basic mechanisms of cellular and molecular memory storage. Then we need to enhance these rules with connections to gene/protein regulatory network (GPRN) of genes and proteins that are relevant to learning and memory. Finally, a provisional CNGM of learning and memory will be constructed. In what follows, we will discuss only the activity-dependent plasticity of excitatory synaptic connections of the brain, because they are thought to mediate the information storage within biological neural networks (Kandel et al. 2000).
9.1 Rules of Synaptic Plasticity and Metaplasticity At present, it is widely accepted that origins and targets of brain synapses are determined genetically as well as the properties of neurotransmission. However, the efficacy of signal transfer at synapses and even the number of synapses can change throughout life as a consequence of learning (Kandel et al. 2000). At present, bidirectional changes in the strength of excitatory synaptic weights are thought to be fundamental to information storage within neuronal networks. To unravel the cellular and molecular mechanisms of synaptic plasticity, which are the basis of learning and memory, is an ambitious goal of neurobiology and neurogenetics. Donald Hebb predicted a rule of synaptic plasticity driven by correlation of preand postsynaptic activity. He postulated that repeated firing of one neuron by another one, across a particular synapse, increases its strength (Hebb 1949). But more than 50 years later, there are still many open questions as to how this happens as well as how the weakening of synapses occurs. Synaptic plasticity is a process in which synapses change their efficacy as a consequence of their previous activity. Sometimes it is also called activity- or experience-dependent synaptic plasticity. Synaptic efficacy (synaptic weight, synaptic strength) is directly proportional to the magnitude of the postsynaptic potential (PSP) on the postsynaptic membrane, which
178
9 Application of CNGM to Learning and Memory
arises as a consequence of the defined unit stimulation of the presynaptic terminal of that synapse (Benuskova 1988). Synaptic efficacy is a measure of the synapse's contribution to the summary somatic postsynaptic potential, which determines the time and frequency of the postsynaptic spike train generated after exceeding the excitation threshold of the neuron. Synaptic weight (PSP after unit stimulation of the synapse) depends on two groups of factors: • Presynaptic factors: released amount of neurotransmitter. • Postsynaptic factors: number of postsynaptic receptors, receptor types and properties, and input electric impedance (depends on the geometry of dendritic input surface and its electric properties). Change of these synaptic properties leads to the change of synaptic strength. This change can be short- or long-lasting, negative or positive. In many regions of the brain, long-term synaptic potentiation (LTP), a longlasting increase in synaptic efficacy, is produced by high-frequency stimulation (HFS) of presynaptic afferents (Bliss and Lomo 1973) or by pairing presynaptic stimulation with postsynaptic depolarization (Markram et al. 1997). Long-term synaptic depression (LTD), a long-lasting decrease in the strength of synaptic transmission, is produced by low-frequency stimulation (LFS) of presynaptic afferents. The majority of synapses in many brain regions and in many species that express LTP also express LTD. Thus, the regulation of synaptic strength by activity is bidirectional (Bear 1995, Castellani et al. 2001). Molecules of glutamate neurotransmitter released during afferent activity bind to AMPA, NMDA and metabotropic glutamate (mGlu) receptors to produce postsynaptic response. The intracellular Ca2+ level varies as Ca2+ enters the neuron via the voltage-gated calcium channels (VGCCs) and NMDA receptor-channel complex, and as Ca2+ is released from internal storage sites as a result of the mGlu receptormediated G-protein cascade (Abraham and Tate 1997, Zucker 1999). When the postsynaptic membrane is depolarized by the actions of the nonNMDA (AMPA) receptor channel (as occurs during high-frequency stimulation), the depolarization relieves the Mg2+ blockage of the NMDA channel. High-frequency afferent activity results in high levels of intracellular Ca2+, which preferentially activate protein kinases (CaMKII, PKC, tyrosine kinase fyn); low-frequency afferent activity results in low levels of Ca2+, which preferentially activate protein phosphatases (calcineurin, PP 1). The induction of LTP and LTD appears to depend on the relative activity of kinases and phosphatases (Elgersma and Silva 1999, Mayford and Kandel 1999). With both enzymes present, predominant kinase activity leads to LTP (via phosphorylation of various substrates) while predominant phosphatase activity leads to LTD (via de-phosphorylation of various sub-
9.1 Rules of Synaptic Plasticity and Metaplasticity
179
strates). The intracellular calcium concentration [Ca2+]i is the principal trigger for the induction of LTD/LTP (Artola and Singer 1993, Zucker 1999, Shouval, Bear et al. 2002). Calcium influx through NMDARs plays a crucial role in the induction of LTD/LTP. NMDAR is unique because its activation requires the presynaptic release of glutamate to occur within certain time window as postsynaptic depolarization (NMDAR is both receptor- and voltage-gated channel). Thus, NMDAR serves as a molecular coincidence detector for detecting the two simultaneous presynaptic a postsynaptic events, thus implementing Hebb's rule at synapses (Paulsen and Sejnowski 2000, Tsien 2000). Conventional view for induction of LTP is that strong synaptic input (i.e. many synchronously active afferent fibers) produces a local depolarization that unblocks NMDARs, and concurrently released glutamate provides the necessary Ca2+ signal through NMDARs. Alternative model is based on backpropagating action potentials (Stuart and Sakmann 1994). Backpropagated APs (triggered by large EPSPs or by other inputs) can provide additional postsynaptic depolarization on the top of the AMPAmediated depolarization, for the voltage-dependent relief of Mg2+ block of NMDARs, when they bind glutamate, so that more Ca2+ pures in (Magee and Johnston 1997, Koester and Sakmann 1998, Linden 1999). This important role of backpropagating APs in switching between LTD and LTP is also supported by computational modeling studies (Shouval, Bear et al. 2002, Shouval, Castellani et al. 2002) (Fig. 9.1).
Postsynaptic spike occurs too early \0 coincide with EPSP
Postsynaptic spike occu rs in prope r time window \0 coincide with EPSP
Fig. 9.1. Illustration of the spike timing-dependent plasticity. Ions of sodium and calcium enter the postsynaptic spine through the NMDA-receptor gated ion channels. When postsynaptic neuron fires, postsynaptic spikes backpropagate to spines. When their timing coincides with EPSP, more calcium enters spines, and thus there is a bigger chance to achieve LTP
180
9 Application of CNGM to Learning and Memory
Experimental evidence indicates that long-term modification of synaptic efficacy indeed depends on the timing of pre- and postsynaptic APs (spike timing-dependent plasticity, STOP) (Song and Abbott 2001). It has been shown that the temporal order of the synaptic input and the postsynaptic spike determines whether LTP or LTO is elicited. Repeated pairing of postsynaptic spiking after presynaptic spikes results in larger Ca2+ influx and LTP (EPSP coincides with the backpropagating AP), whereas postsynaptic spiking before presynaptic spike (EPSP follows the AP) leads to small Ca2+ transient and LTO (Markram et al. 1997, Bi and Poo 1998, Froemke et al. 2005). Such a rule holds for cortical pyramidal neurons and for excitatory neurons in hippocampus (Abbott and Nelson 2000). This temporally asymmetric Hebbian synaptic plasticity supports sequence learning because it tends to wire together neurons that form causal chains (Paulsen and Sejnowski 2000).
a
A W [%]
b
ij
0.4
0.2
A t[m.] -40
- 20 --0.2
--0.4
ep 0
V
'j
c
.~
-" 0
E
., '"
~ '0 °
Q.
ftl
ce-, til
:0
0 Time between presynapticand postsynaptic spikes
' i
Fig. 9.2. STDP rule of synaptic plasticity. Presynaptic spikes that precede (follow) postsynaptic spikes within a certain time window produce long-term strengthening (weakening) of synapses, respectively. (a) Illustration of quant itative relationships for cortical neurons (Song et al. 2000); (b) illustration of experimentally measured points versus theoretical curves
Thus, the synaptic change, ~w(~f) , where ~f = fpost - f pre , due to a single pre- and postsynaptic pair is determined by this equation (Song et al. 2000, Izhikevich and Desai 2003) (see Fig. 9.2): ~w(M)=
{
~W+ = A+ exp(-~t /,+ )if M > 0 ~w_ =A_e xp( -M / ,_) i f M
< 0
(9.1)
9.1 Rules of Synaptic Plasticity and Metaplasticity
181
Parameters A+ and A_ determine the amplitude of synaptic change, which occurs when I1t is close to zero, while T+ and L determine the time windows over which synaptic changes occur. There are different implementations of the STDP rule, i.e. considering all the presynapticpostsynaptic spike pairs or just nearest neighbors, before and/or after a given presynaptic spike (Izhikevich and Desai 2003). We employ the nearest-neighbor additive implementation of STDP, that is, for each presynaptic spike, only two postsynaptic spikes are considered: the one that occurs before and the one that occurs after the presynaptic spike, i.e.: w(t + b't) = w(t)(l + I1w+ -l1wJ
(9.2)
Where Jt is a time step of weight calculation. This implementation as well as the nearest spike implementation were shown to faithfully reproduce the STDP experimental data from hippocampal and cortical preparations (VanRossum et al. 2000, Sjostrom et al. 2001). Moreover, nearest neighbor and nearest spike implementations of STDP have been shown to lead to the Bienenstock, Cooper and Munro (BCM) rule of synaptic plasticity (Izhikevich and Desai 2003). Besides precise timing of pre- and postsynaptic spikes, another factor determining the sign and magnitude of synaptic change is frequency of presynaptic firing (Bear 1995, Abraham and Bear 1996). Experimental data from the developing visual cortex had led to the formulation of a synaptic modification rule, known as the Bienenstock-Cooper-Munro (BCM) rule (Bienenstock et al. 1982). The model has two main features: First, it postulates that a neuron possesses a synaptic modification threshold (LTP/LTD threshold or 8M), which dictates whether the neuron's activity at any given instant will lead to strengthening or weakening of its input synapses. Thus, the modification threshold, Brv!, determines the direction of synaptic efficacy change. Synaptic modification varies as a nonlinear (parabolic) function (rf) of postsynaptic activity (c), i.e. ¢Xc) = c(t) (c(t) - Brv!(t)) (see Fig. 9.3). In the original BCM formulation, activity means either average or instantaneous firing rate rather than individual spikes. Although the firing rate of a neuron c(t) depends in a nonlinear fashion on the postsynaptic potentials, BCM theory considers that the region between the excitation threshold and saturation may be reasonably approximated by a linear input-output relationship of the model neuron (Benuskova, Rema et al. 200 I), thus c(t) is defined as the product between presynaptic activity vector (x) and synaptic efficacy vector (w), i.e. c(t) = w(t) . x(t). The function ¢Xc) changes sign at a particular value of c, that is equal to the value of modification threshold Brv!. Brv! is the point of crossover from LTD to LTP. If postsynaptic activity is below 8M (c < Brv!), but above baseline, ¢Xc) < 0, and synaptic efficacies are weakened. Con-
182
9 Application of CNGM to Learningand Memory
vcrs ely, if c exceeds B:vi, modification function ¢i..c) > 0, and active synapses potentiate. Synaptic weight vector (w) changes according to Hebb 's learning rule, which requires correlated pre- and postsynaptic activity at the synapse, i.e. (9.3)
LlW=rytP X where
n> 0 is the learning rate constant. ¢ > 0: potentiation
c
,,/ (L
Postsynaptic activity c'
c ¢ < 0: depression Fig. 9.3. Sliding modification LTP/LTD threshold Bt..1 in the BCM theory. Synap-
tic strengthening/weakening depends on the current value of Bt..1, which is not constant but instead changes proportionally to the average of the past postsynaptic activity c. In ABS theory, B_and B+. mean thresholds for LTD and LTP, respectively
The second important feature of the BCM rule is that the value of B:vi is not fixed but instead changes according to a nonlinear function of the average output of the cell (Cooper et al. 2004). B:vi varies according to a (running) time average of prior postsynaptic activity, i.e. 8M is a sliding modification threshold. The current value of 8M changes proportionally to the square of the neuron's activity averaged over some recent past "eM, i.e. (9.4) ~ slides as a function of the prior history of the postsynaptic cell, and its effective position depends upon the parameter a and the function qi.,t), which in turn depend on levels of neuromodulators like acetylcholine, dopamine, etc, and on the overall number of NMDARs (Cooper 1987, Benuskova, Rema et al. 2001) . Sliding LTP/LTD threshold is a homeostatic mechanism, which keeps the modifiable synapses within a useful dynamic
9.1 Rules of Synaptic Plasticity and Metaplasticity
183
range (Abraham et al. 2001). It acts against Hebbian positive-feedback process, in which effective synapses are strengthened, making them even more effective, and ineffective synapses are weakened, making them less so. This tends to destabilize postsynaptic firing rates increasing or decreasing them excessively. BCM stabilizes Hebbian plasticity by negative feedback, i.e. the LTP/LTD threshold increases (elevates, slides rightward) if the postsynaptic neuron is highly active, making LTP more difficult and LTD easier to induce. The opposite process occurs when the activity of the postsynaptic neuron is overly reduced, like in the cases of sensory deprivation (Cooper et al. 2004). Thus sliding of ~ guarantees synaptic stability without imposing subtractive or multiplicative controls on synaptic weights (Miller and MacKay 1994). Thus, the modification threshold ~, which regulates the ability to undergo LTP/LTD is itself regulated. The term "metaplasticity" has been introduced to describe the changes in the neuron's ability to undergo LTP and LTD, because it has become apparent from experimental data that the synaptic plasticity can indeed be altered by previous synaptic activity, or by neuromodulators (Abraham and Bear 1996). Metaplasticity is the plasticity of synaptic plasticity (plasticity at a higher level). For example, previous stimuli may make it easier or harder to induce LTP, like with the sliding BCM modification threshold. Metaplasticity mechanisms do not change synaptic efficacy but instead affect synaptic physiology in such a way that subsequent attempts to induce synaptic plasticity will be modified (Bear 1995, Abraham and Tate 1997). Artola, Brocher and Singer later formulated a similar synaptic plasticity rule (called the ABS rule), which defines two modification thresholds, (1for LTD induction and ()+ for LTP induction (Artola et al. 1990). According to the ABS rule, direction of the synaptic gain change again depends on the membrane potential of the postsynaptic cell or on the concentration of [Ca2+Ji. If the first threshold ()_ is reached, a mechanism is activated that leads to LTD. If the second threshold ()+ is reached, another process is triggered that leads to LTP. Experimental evidence supports the existence of the LTD threshold (Mockett et al. 2002) so the original BCM theory has been slightly extended to incorporate this and another findings to result in the so called calcium control hypothesis (Shouval, Bear et al. 2002, Shouval, Castellani et al. 2002). According to the calcium control hypothesis, the temporal derivative of the lh neuron's synaptic weight depends on the calcium levels so that: (9.5)
184
9 Application of CNGMto Learningand Memory
where [Ca 2+1 is the calcium concentration at synapse j, learning rate '7 is supposed to be calcium-dependent and increase monotonically (sigmoidally) with calcium levels, and the Q-function is defined in the following way: when 0< [Ca 2+1 < B_, the synaptic weight does not change, when B_ < 2 [Ca 2+1 < B+, the synaptic weight is depressed (LTD), and when [Ca +lJ > B+, the synaptic weight is potentiated (LTP). Both modification thresholds are the whole cell property and can change as a function of average past cell activity. Let us go back to the modification LTP/LTD threshold 8M. The BCM theory makes several explicit assumptions about how synapses modify, and there appears to be a good correspondence to experimental results. First, 8M exists: there is evidence in both hippocampus and visual cortex that the change in frequency of afferent activity generates a bidirectional pattern of LTD and LTP induction very similar to that predicted by the BCM model (Dudek and Bear 1993, Rick and Milgram 1996, Cho et al. 2001). Second, 8M slides: the threshold 8M does indeed appear to be adjustable, depending on the prior history of postsynaptic activity (Kirkwood et al. 1996, Rick and Milgram 1996, Abraham and Tate 1997). Third, 8M is a whole cell property and a function of previous postsynaptic cell firing (Holland and Wagner 1998, Wang and Wagner 1999, Abraham et al. 2001). Thus, it seems that the BCM theory indeed captures a real feature of a biological learning rule. Recently (Toyoizumi et al. 2005) asked what would be the optimal synaptic update rule so as to guarantee that a spiking neuron transmits as much information as possible. Under the (realistic) assumption of Poisson firing statistics, the synaptic rule exhibits all of the features of the BCM rule, in particular, regimes of synaptic potentiation and depression separated by a sliding threshold. The learning rule is found by maximizing the mutual information between presynaptic and postsynaptic spike trains under the constraint that the postsynaptic firing rate stays close to some target firing rate. Thus, the BCM rule has been extended to the spiking neuron, but still formulated in terms of instantaneous rate and not spikes (see equation 9.3). Another way of merging the BCM rule with the spiking nature of neurons is to proceed along the lines suggested by (Izhikevich and Desai 2003). They started with the STDP rule expressed by Eq. 9.1 and Eq. 9.2, and derived the expected magnitude of synaptic modification per one presynaptic spike, when the postsynaptic spike train is a Poisson process with firing rate x, i.e.: Liw(x) = x(
_~++x + _~-+x )
T+
T_
(9.6)
9.2 Towarda GPRNof Synaptic Plasticity
185
with the meaning of symbols as in Eq. 9.1. Derivation of the LTD/LTP threshold (the zero crossing of ~w(~) ) leads to (9.7)
The latter equation can be used to relate the sliding of ~ to biophysical and biochemical processes underlying LTD and LTP, either through dynamic changes of time constants for LTP and LTD as suggested in (Izhikevich and Desai 2003), and/or through the dynamic changes of their amplitudes. For instance, the amplitudes of LTD and LTP can change in the following way (Benuskova and Abraham 2006):
A+ (t) =
~:~~~
and
A~ (t) = A_ (O)BM (t)
(9.8)
where A(O) are initial (constant) values and ~ is the modification threshold. The time average of postsynaptic activity for ~ in Eq. 9.4 can be calculated as in (Benuskova, Rema et al. 200 I), that is by numeric integration of the following integral: (9.9)
If the postsynaptic activity c(t) was defined as c(t) = 1, if there is a postsynaptic spike at time t, and c(t) = 0, otherwise, then of course there is no need for the second power. In the following sections, we will relate the synaptic plasticity rules, i.e. STDP and BCM sliding modification threshold, to relevant proteins and genes, and the gene/protein regulatory network (GPRN), in order to propose a CNGM of learning and memory.
9.2 Toward a GPRN of Synaptic Plasticity Information revealed in this section will serve as a basis for the construction of links between synaptic plasticity variables from theoretical equations and particular proteins and genes within neurons that are involved in learning and memory formation. Then we will attempt to build a GPRN relevant to learning and memory molecular mechanisms. The induction of associative LTP/LTD depends on Ca 2+-dependent phosphorylation and dephosphorylation of various proteins, respectively (Mayford and Kandel 1999), and upon insertion and removal of AMPARs into and from the postsynaptic membrane (Carroll et al. 2001, Sheng and
186
9 Application of CNGM to Learning and Memory
Lee 2001). Long-term potentiation (LTP) in the mouse can be either short(1-3 hours) or long-lasting (> 24 hours), depending on the number of highfrequency stimulation trains administered to presynaptic inputs. However, in both cases, the induction of LTP requires Ca2+ influx through the Nmethyl-D-aspartate (NMDA) glutamate receptors. The short-lasting form of LTP or early E-LTP requires the participation of a Ca2+/calmodulindepenendent protein kinase II (CaMKII). CaMKII in the basal state is completely dependent upon Ca2+/calmodulin for its activity, but upon activation can rapidly convert to a Ca2+-independent kinase by autophosphorylation, which is necessary for the induction ofE-LTP. A molecular basis of E-LTP is the insertion of new AMPARs into the postsynaptic membrane (Shi et al. 1999). AMPARs mediate most of excitatory postsynaptic response in glutamatergic synapses; therefore changing their number (and/or properties) is a powerful way to control the strength of synaptic transmission. Insertion of new AMPAR occurs through exocytosis, i.e. fusion of storage vesicles with new AMPARs with the postsynaptic membrane (Lledo et al. 1998). Ca2+ and CaMKII play a key role in this process (Sudhof 1995). Localization of excitatory synapses on spines in turn provides for induction of strong intra-spine electric field that can electrophoretically drive vesicles for fusion with the postsynaptic membrane (Benuskova 2000). The long-lasting form of LTP or late L-LTP requires the activation of the cAMP signaling pathway through PKA (cAMP-dependent protein kinase, protein kinase A), ERKIMAPK (extracellular signal-regulated protein kinase/ mitogen-activated protein kinase), and RSK2 (ribosomal S6 kinase 2). Activation of gene transcription via CREB (cAMP-responsive transcription factor) pathway follows (Mayford and Kandel 1999). Constitutively active CaMKII activates a related kinase, CaMKIV that contributes to early Ca2+ -stimulated CREB phosphorylation (Wu et al. 200 I). CAMP is activated through Ca2+/calmodulin-dependent adenylyl cyclase. (Alternatively, a neurotransmitter or neuromodulator like dopamine binds to G-protein-coupled receptors. G-protein activates adenylyl cyclase, which catalyzes production of cAMP). CAMP then activates PKA. Ca2+ also activates MAPK and ERK. Facilitated by cAMP, both CaMKII and CaMKIV translocate to the cell nucleus along with PKA and ERKIMAPK to activate gene transcription via phosphorylation of CREE. Translocation ofERK to the nucleus requires activation ofPKA (Poser and Storm 2001). Activation of the ERKlMAPK pathway may trigger nuclear translocation of RSK2 and thus phosporylation and transactivation of CREB (Impey et al. 1998). Transition from E-LTP to L-LTP also requires phosphorylation by PKA of inhibitor-I (1-1), which dephosphorylates and turns off protein phos-
9.2 Towarda GPRNof Synaptic Plasticity
187
phatase 1, PPI (Mayford and Kandel 1999). The action on inhibitor-1 is opposed by ca1cineurin, which activates the phosphatase cascade and leads to LTD, which is the mirror opposite ofLTP. In fact, PKA and ca1cineurin phosphorylate and dephosphorylate the same residue on inhibitor-I. Calcineurin has this role in LTD because it has a high affinity for Ca 2+, even higher than that of CaMKII. At low-frequency stimulation, the amount of Ca2+ coming into the cell through the NMDARs is small. This will activate ca1cineurin but not CaMKII. Activation of ca1cineurin leads to a removal of AMPARs from the postsynaptic membrane through the endocytosis of membranous vesicles (Beattie et al. 2000). In addition, calcineurin dephosphorylates protein phosphatase I-I, which in tum phosphorylates and activates PPI that also participates in mechanisms of LTD. Processes leading to LTD and LTP described above are summarized in Fig. 9.4. NMDAR stimulation
D
::::==- CaMKil
2
[Ca +]i
D[hi9h]~ Calmodulin
D
Calcineurin
~
Adenylyl cyclase
cAMP
.Pt1'-1p~ i:o':';.1 (l
AMPAR removal
LTD
D D
_p +P
~~"'t1'.1
PKA
~
~
+P ERK I MAPK
I
RSK'
pCRES Gene expression
D
L-L TP
pCaMKl1
~«~
C,MKIV
~
~ <;:::"
AM PAR insertion
E-LTP
Fig. 9.4. Illustration of postsynaptic molecular events leading to induction of LTD or LTP after NMDAR stimulation. +P means phosphorylation; -P means dephosphorylation of the target protein, respectively. Meanings of all other symbols are explained in the text
188
9 Application ofCNGM to Learning and Memory
Some researchers argue that sometimes there may be an intermediate phase between E-LTP and L-LTP, which is dependent upon a local protein synthesis in spines based upon the pre-existing mRNA (Bortolotto and Collingridge 1998). In particular the activation of metabotropic glutamate receptors (mGluRs) before or during tetanus can trigger transcriptionindependent local protein synthesis and thus prolong the duration of LTP (Raymond et al. 2000). There is a local machinery for protein synthesis at spines and dendrites consisting of polyribosomes, tRNAs, initiation factors and mRNAs for glutamate receptors, structural proteins and kinases like CaMKII (Steward 1997). It has been shown that both sensory experience and synaptic stimulation leads to translation of a-CaMKII (Wu et al. 1998), the kinase that is crucially involved in the induction of LTP. Now we want to tum our attention to the transition from E-LTP to LLTP, especially the process of initiation of gene transcription, nuclear protein synthesis and its consequences. Neuronal nucleus has many different types of input (multiple converging signal-transduction cascades) and generates many different types of output (multiple genes that can be expressed in different patterns and at different levels). Some of these gene expressions are in response to inputs derived from synaptic stimuli. Extensive studies of stimulus-dependent gene expression in CNS neurons have shown that nuclear signaling pathways are able to discriminate between features of the electrical stimuli, such as their frequency, intensity, duration or pattern of repetition (Bito et al. 1997). As we have seen, such discrimination may well involve Ca 2 + signaling mechanisms. In principle, Ca 2+ influx could link up to Ca 2+ targets with different Ca 2+ sensitivity, thereby providing discrimination between Ca 2+ signals of varying amplitudes, like we have seen it happened for calcineurin and calmodulin (Fig. 9.4). In construction of our learning and memory GPRN we have to address several questions: 1. What are the various molecular pathways that link patterns of synaptic activity with gene expression? 2. What are the specific transcription factors involved in stimulustranscription coupling? 3. What are the specific downstream genes initiated by synaptic activity and how is their expression modified? 4. How do these genes give rise to activity-induced changes of neuronal synapses and what is the nature of these synaptic changes? 5. How is the expression of transcription factors themselves modified? The answer to the first question is summarized in Fig. 9.4. Upon repeated activation of NMDAR (and/or G-protein-coupled receptor) adenylyl cyclase catalyzes production of cAMP. In tum, cAMP activates
9.2 Toward a GPRN of Synaptic Plasticity
189
PKA. Prolonged Ca2+ increase activates PKA and ErkiMAPK. CaMKII activates CaMKIV, and all these kinases translocate to nucleus to activate gene transcription via CREB, which is the main transcription factor linking synaptic stimulation with gene expression (Mayford and Kandel 1999). CREB is activated by phosphorylation of Serl33. Phospho-CREB (pCREB) induces specific immediate-response genes via CRE (cAMPresponsive element) in their promoter regions. Negative regulation of CREB in hippocampal neurons has been found to occur through calcineurin-dependent regulation of nuclear PP1. Sustained but not transient elevation of nuclear CREB phosphorylation is required for efficient stimulus-transcription coupling. Prolongation of the synaptic input on the time scale of minutes, and the activity-induced inactivation of calcineurin pathway, greatly extends the period over which pCREB levels are elevated, thus affecting induction of downstream genes (Bito et al. 1997). The correlation between the duration of LTP and CREB phosphorylation was tested by quantifying changes in CREB phosphorylation throughout the induction and maintenance of L-LTP in pyramidal neurons in mature hippocampalentorhinal cortex slices (Leutgeb et al. 2005). High-frequency stimulation of Schaffer collaterals consisted of 2 stimulus trains of 100 pulses at 100 Hz with a 10 min intertrain interval. LTP was recorded after increasing time intervals, and lasted for 4 hours. At the same time intervals, measuring the extent of CREB phosphorylation at a single cell resolution, using confocal microscopy, revealed that the pCREB/CREB ratio was significantly increased already at 30 min after the LTP induction, and continued to increase until the end of measurement, i.e. after 4 hours. In another study, done in freely moving rats, LTP was evoked in hippocampal dentate gyrus and phosphorylation of CREB was measured by immunocytochemistry (Schulz et al. 1999). CREB phosphorylation occurred in a biphasic manner, with a first short-lasting peak at 30 min, almost a zero minimum at 1 hr, and a second long-lasting peak beginning 2 hr after tetanic stimulation and lasting for at least 24 hr. In (Schulz et al. 1999), only synaptic stimulation that generated nondecremental L-LTP promoted a sustained hyper-phosphorylation of CREB but not stimulation that produced deeremental E-LTP. Strong stimulation consisted of 20 trains of 15 impulses of the frequency 200 Hz with the inertrain interval of 5 s. Weak stimulation consisted of only 2 trains. Animals that received a weak stimulation showed (if at all) a small and transient rise in CREB phosphorylation within 2 hr after stimulation, and after 6 hr, none of the animals showed any significant CREB phosphorylation. This study is in accordance with the discovery that different protein kinases that phosphorylate CREB, do so with different temporal courses (Wu et al. 2001). In hippocampal neurons, fast CaMK-dependent pathway is followed by a slower MAPK path-
190
9 Application of CNGMto Learning and Memory
way . In vitro pCREB formation in the culture of dissociated CA3 /CAI pyramidal neurons was overwhelmingly dominated by the CaMK phosphorylation during the first 0-10 min, with the peak at around 5 min, and then decreasing over the next hour. MAPK-driven CREB phosphorylation slowly increased with the peak at 60 min (end of measurement). At 30 min poststimulus, both kinases contributed about equally to the CREB phosphorylation. Thus, several waves of CREB phosporylation and subsequent gene induction can follow after synaptic stimulation. Now, we would like to know , what are the downstream genes regulated by pCREB or in other words which proteins are synthesized due to the action of CREE. In Aplysia, result of CREB-induced immediate-response genes (with CRE in their promoter regions) transcription are two proteins (Mayford and Kandel 1999): (I) ubiquitin-hydrolase recruits ubiquitin proteosome, which cleaves the inhibitory subunit of PKA and thus causes its persistent activity and prolongation of facilitation up to 12-24 hrs, (2) transcription factor clEBP , which binds to the CAAT region of DNA and initiates transcription of proteins that are coded for by late genes and are responsible for growth of new synapses and a prolonged stabilization of synaptic transmission. Long-term plasticity of the central nervous system (CNS) in mammals involves induction of a whole set of genes whose identity and purpose is not completely characterized. Attempts to identify candidate plasticityrelated genes (CPGs) in the hippocampus dentate gyrus (DG) yielded 362 upregulated CPGs and 41 downregulated transcripts (dCPGs) (Hevroni et al. 1998). Of these , 66 CPGs and 5 dCPGs are known genes that encode for a variety of signal transduction proteins, transcription factors, trophic factors, various enzymes (kinases, phosphatases, etc) and structural proteins. To test relevance to neural plasticity, 66 CPGs were tested for induction by stimuli producing long-term potentiation (LTP). Approximately one-fourth of the genes examined were upregulated by LTP. These results indicate that an extensive genetic response is induced in mammalian brain after glutamate receptor activation, and imply that a significant proportion of this activity is coinduced by LTP. Based on the identified CPGs, it is conceivable that in fact multiple cellular mechanisms underlie long-term plasticity of the nervous system . In a more recent study, a powerful geneidentification method, differential analysis of primary cDNA library expression (DAzLE), and cDNA microarray from primary cortical neurons were used to identify plasticity-induced genes (PLINGs) (Hong et al. 2004) . 661 out of 1,152 arrayed genes are found to be consistently upregulated between I and 24 hr with differential expression patterns after NMDA receptor activation. This indicates that there is a broad and dynamic range of long-lasting neuronal responses that occur through NMDA
9.2 Toward a GPRN of Synaptic Plasticity
191
receptor activation. Based on a different temporal course of expression after NMDARs activation, 6 groups of late plasticity-induced genes were distinguished. Based on their functional categorization, 20 gene categories were identified, i.e.: neurotransmitter related, cellular receptors, cell survival, proteolysis, microtubule function related, proteosome components, RNA processing/stability, ion pumps/transporters, protein folding /chaperones, cell adhesion/matrix, actin cytoskeleton regulation, protein/vesicle trafficking, energy metabolism, transcription regulators, cellular oxidation-reduction, intracellular signaling, and finally genes with an unknown function. Next step is to use gene knockout mice to learn which of these genes are crucial for learning and memory formation. Mutant mice lacking a-and 8isoforms of CREB display intact short-term memory but deficient longterm memory in three independent learning tasks concomitantly, while LLTP in hippocampal CA1 is also impaired. In contrast to the striking phenotype of the CREB-deficient mice, knockouts of immediate early transcription factor zif/268 (also known as NGFI-A), NFAT (nuclear factors of activated T cell) and CAMP-response element modulator (CREM), all of which are induced due to NMDARs stimulation, have not yet yielded remarkable behavioral changes. Deletions of c-fos and fosB cause behavioral abnormalities, but their neurobiological basis is not understood (Bito et aI. 1997). These results are surprising since immediate-early genes for transcription factors like c-fos.fosis, zif268, CREM were among the first to be believed to bring about long-term plastic changes. More work is needed to discover which genes and proteins related directly to the long-term changes in synaptic weights are being expressed due to the action of CREE. However, recently it has been discovered that 24 hours of activation of CaMKIV and CREB results in the generation of new synapses that contain both new NMDARs and new AMPARs in CAl pyramidal cells (Marie et aI. 2005). CREB is probably involved in generation of the socalled silent synapse, i.e. synapse that contain only NMDARs. CaMKIV is probably involved in AMPARs insertion. Silent synapses are unsilenced during LTP by insertion of new AMPARs into the postsynaptic membrane. The process of new spine growth requires about 20 min (Maletic-Savatic et aI. 1999), thus the rate limiting processes should be related to initiation of gene transcription, subsequent gene translation and protein synthesis. To complete the plasticity GPRN, we must also consider how expression of CREB itself is regulated (Mayford and Kandel 1999, Smolen et al. 2000). CREB activator is called CREB-l a, and it gets into active state also by phosphorylation. CREB2 and CREB-l b are endogenous repressors of CREB-la. CREB also induces ICER (cAMP-responsive early repressor). ICER then binds to CREs and represses its own transcription as well as
192
9 Application ofCNGM to Learning and Memory
that of CREB. Also, a positive feedback-loop appears to exist, with CREB binding to its own gene CREs and activating its own transcription. PhospoCREB binds with the CREB-binding protein (CBP) that has been shown to be a co-activator of transcription not only for CREB, but also for a number of other transcription factors, including (but not necessarily restricted to) activator protein 1 (API), early region 1A (ElA) protein, nuclear hormone receptors (NHRs), and STATs (signal transducers and activators of transcription) (Bito et al. 1997). The convergence of signals onto CBP is all the more remarkable in light of evidence suggesting an additional role for CBP as a histone acetyltransferase critical for transcriptional initiation. Taken together, the multiplicity of signaling mechanisms acting on CREB and CBP provide a rich array of possibilities for input-specific patterns of gene expression. Figure 9.5 summarizes the core gene/protein regulatory network around CREB, and the long-term synaptic changes resulting from CREB activation. CREB2
CREB-1 b
-\ 1-
AM PARs NMDARs (new synapses)
+
CREB-1a
\+
+
+
_r(;;ER
STATs AP1
E1A
Fig. 9.5. Gene regulatory network for CREB, the major transcription factor of learningand memory, whichwhen activated by signal-transduction pathways from stimulated synapses leads to new protein synthesis and new synapse formation. This network is by no means exhaustive
9.3 Putative Molecular Mechanisms of Metaplasticity
193
9.3 Putative Molecular Mechanisms of Metaplasticity Next we have to consider molecules involved in metaplasticity, plasticity of synaptic plasticity itself (Abraham and Bear 1996). In theoretical equations, processes of LTP and LTD are metaplastically modulated by the variable(s) called modification threshold(s) that depend on the previous history of cell activity and upon other inborn and age/state related factors. Although NMDARs mediate LTD and LTP, they are regulated separately from AMPARs (Turrigiano and Nelson 2000, Sheng and Lee 2001). Therefore changes in numbers/properties of NMDARs rather belong to several candidate mechanisms underlying metaplasticity as we will see below. In general, any molecular action, which affects the characteristics of temporal changes in [Ca2+]i during and after NMDAR stimulation, in tum influences both plasticity and metaplasticity (Abraham and Tate 1997). We will mention these main metaplasticity candidates while not claiming the list is exhaustive: 1. NMDARs, since they playa critical role in postsynaptic calcium entry, which in tum plays a fundamental role in bidirectional synaptic modification. As such, changes in NMDAR number/function will dramatically alter the properties of experience-dependent synaptic plasticity and metaplasticity. NMDARs are heteromeric ion channels, composed of NR 1 and NR2 subunits. Each of four subtypes of the NR2 subunit (2A-2D) confers distinct functional properties to the receptor. NMDARs composed ofNRI and NR2B subunits mediate long duration currents (i.e. larger [Ca2+]i), whereas inclusion of the NR2A subunit results in NMDARs with faster kinetics (i.e. lower [Ca2+] J NMDARs composed ofNRl and NR2B are observed in the neocortex at birth, and over the course of development there is an increase in the ratio NR2A/ NR2B (Bliss 1999). Thus the age can set the effective position of ~ expressed by the constant a (Eq. 9.4). In fact, in previous computer modeling studies, a was introduced as a constant and for the immature cortex it was about ten times smaller than a for the mature cortex (Clothiaux et al. 1991, Benuskova et al. 1994). That means, in computer simulations of cortical plasticity, effective position of the modification threshold was higher for the adult cortex than for the immature cortex. Another critical factor for the effective position of ~ is the number of NMDARs. When the NMDAR number is reduced to one half, like in the neocortex treated by alcohol during prenatal development, value of a in the model that successfully simulates impaired synaptic plasticity increases twice compared to the model of the normal neocortex
194
9 Application of CNGMto Learning and Memory
(Benuskova, Rema et aI. 2001). Thus, the value of the constant a (Eq. 9.4) can depend on the inborn and/or genetic factors. In addition, NMDARs can also contribute to the changes in the temporal average of activity expressed by (C(t)zM. In the immature cortex, the composition and thus the function of synaptic NMDARs can be acutely and bidirectionally modified by severely altered input sensory activity on the time scale of hours (sensory activation) and days (sensory depression) (Philpot et aI. 2001). Thus ~ changed over hours in the biophysical model of the BCM rule (Castellani et aI. 2001). Thus changes in the NMDAR subunit content can contribute to a slow change of ~ (t). 2. Neuromodulators. It appears that ~ effective position expressed by the function cAt) may be affected by various modulatory factors like various neurotransmitters, neuromodulators, hormones, etc. For instance, binding of corticosteroids to its glucocorticoid receptor results in a shift of the LTP modification threshold to the right, and a shift of the LTD modification threshold to the left (Kim et al. 1996). Stress and glucocorticoids appear to exert a metaplastic effect through the modulation of Ca 2+ levels. These data indicate that B_, postulated by Artola, Brocher and Singer (Artola et al. 1990), may be modifiable as well as ~ (B+) and that a single stimulus such as glucocorticoid receptor activation may shift these two thresholds in opposite directions. These effects can be either temporary or permanent, depending on whether the alterations in levels of neuromodulators are temporary or permanent. Actually, it seems that certain levels of neuromodulatory substances like acetylcholine (ACh), and maybe dopamine (DA) and norepinephrine (NE) as well, are necessary for synaptic plasticity to occur at all. For instance when the cholinergic basal forebrain fibers are completely destroyed, synaptic plasticity in the somatosensory cortex is completely abolished as well (Sachdev et aI. 1998). It has been shown that application of Ach and or NE or their agonists facilitates the expression of both LTD and LTP in cortical preparations, as if the thresholds B_ and ~ (B+) were both shifted to the left (Brocher et aI. 1992, Kirkwood et al. 1999). Thus, different neuromodulators can have different effects on the position of B_ or ~ ( B+) or both. 3. CREB/CRE induced gene expression. Stimulation ofNMDARs leads to the CREB/CRE induced gene expression that further facilitates maintenance ofLTP (Schulz et al. 1999, Wu et al. 2001, Leutgeb et aI. 2005). In the next section we will demonstrate how to incorporate the dynamics of CREB phosporylation into the modification threshold calculation. 4. Most promising candidate mechanism for relatively fast postsynaptic activity-dependent sliding of ~ (expressed by (C(t)zM in Eq. 9.4) is the
9.3 Putative Molecular Mechanisms of Metaplasticity
195
dynamics of CaMKII. The temporal 8M integration window for neuron's memory on past activity was equated to 22 min in the modeling study of visual cortex (Clothiaux et al. 1991), from seconds to minutes in the modeling studies of somatosensory cortex (Benuskova et al. 1994, Benuskova, Rema et al. 2001). Increased intracellular Ca2+ results in autophosphorylation of CaMKII, thereby converting the enzyme to a Ca2+ independent (autonomous) form (Abraham and Tate 1997). It has been proposed, that the level of Ca2+-independent CaMKII sets the current value of 8M (Bear 1995). It appears that higher level of autonomous CaMKII shifts the LTP threshold to the right and so the subsequent incoming signal may be less effective in inducing LTP. Phosphorylated CaMKII binds Ca2+ and calmodulin more tightly and limits the availability of Ca2+ and calmodulin for other enzymes essential for LTP to be established. This may result in a metaplastic state such that a second burst of synaptic activity might lead to such elevation in Ca2+ concentration that could be now in the range for selective activation of specific phosphatases. In this way the induction of LTD may be facilitated (Abraham and Tate 1997). It has been shown that the effect of tetanus on the kinase and its activity is confined to both synapses and neuronal soma. In addition, in dendrites the increase of phosphorylated CaMKII is accompanied by the increase of nonphosphorylated CaMKII (Ouyang et al. 1997), probably due to the local synthesis (Wu et al. 1998). To sum up, it seems that 8M is comprised of some constant portion dependent upon genetically based or inborn factors (like the number of NMDARs, overall levels of neuromodulators, and so on) that sets the effective position of 8M on the x-axis. Into this constant component, agerelated factors can fall too, because they can be considered constant over the time of measurement. Then there is a slowly changing component over hours or days that can be for instance dependent upon the composition of NMDARs, other activity-related slow influences like motivation or attention related chemicals, and CREB/CRE-induced gene expression. Last but not least, there is a fast component that adjusts 8M in the matter of minutes. This latter component can be dependent for instance on the phosphorylation state of CaMKII and perhaps also on other fast mechanisms and molecular changes. Perhaps the choice of factors entering the calculation of 8M depends on the experimental situation, which we want to model. Many other influences will be hidden in the proportionality constants and scaling factors. In the next section we step towards the construction of CNGM of synaptic plasticity that underlies learning and memory formation.
196
9 Application ofCNGM to Learningand Memory
9.4 A Simple One Protein-One Neuronal Function CNGM We will formulate a set of rules and equations that can be used in any of neuron spiking models (Maass and Bishop 1999). For better clarity, we will repeat those equations, which were introduced earlier, and which are relevant to our CNGM of learning and memory. The main difference against the previous interpretations will be that some of the parameters will be governed by the dynamics of CREB/CRE-induced gene expression, which has been proven to be critical for the long-term memory formation (Poser and Storm 2001). Thus, we will assume that each cortical excitatory synaptic weight changes according to the additive nearest neighbor STDP rule (former Eq. 9.2), i.e.: wet + Sf) = w(t)(l + ~w+ - ~wJ
(9.10)
where & is the time step of weight updating, let us say 1 ms. Synaptic change is comprised of contributions from only two nearest spikes. That is for each presynaptic spike, only two postsynaptic spikes are considered: the one that occurs right before and the one that occurs right after the given presynaptic spike, respectively, so that L'lw+ = A+ exp(-M / rJ for M > 0
(9.11)
L'lw_ = A_ exp(-L'lt / r_)for L'lt < 0
where ~t = tpost - t pre is the time difference between the post- and presynaptic spikes. The novelty of our approach is that the amplitudes ofpositive and negative synaptic change, A+ and A_ , respectively, are not constant anymore, but instead they depend on the dynamic synaptic modification threshold Brvt in the following way (Benuskova and Abraham 2006): A (t)
= A+(O) BM(t)
+
A_ (t)
=
(9.12)
A_ (O)BM(t)
where A(O) are initial (constant) values and Brvt is the modification threshold. Thus, when Brvt increases, ~ increases and A+ decreases, respectively, and vice versa. If e(t) = l, when there is a postsynaptic spike, and e(t) = 0, otherwise, the rule for sliding modification threshold Brvt reads (9.13)
9.4 A Simple One Protein-One Neuronal Function CNGM
197
where ~t) depends on the slowly changing level of pCREB, the time average (e(t»,M depends on some fast activity-integration process, which 2 for instance involves the dynamics of available Ca +-sensitive CaMKII (Bear 1995, Benuskova, Rema et al. 2001), and a is the scaling constant. The time average of postsynaptic activity can be calculated as in (Benuskova, Rema et al. 2001), that is by numeric integration of the following integral:
(e(t») 'M
= _1 (fe(tl)exp( -(t - t') / T M) dt' T
M
(9.14)
-00
where TM can be on the order of minutes. Thus Btvt will have a fast component changing in the matter of minutes and a slow component ~t) that will change over hours as the level of pCREB does after NMDARs stimulation (Schulz et al. 1999, Wu et al. 2001, Leutgeb et al. 2005). PCREB induces gene expression together with a co-activator factor CBP (see e.g. Fig. 9.5). It has been shown that the CBP production reaches maximum within the first hour after NMDARs stimulation and remains highly elevated up to 24 hr afterwards ((Hong et al. 2004), supporting Table 1, item 633, group 2 genes). Thus actually the rate limiting factor for stimulationinduced genes is pCREB, which changes in a biphasic manner after NMDARs stimulation (Schulz et al. 1999). Since Btvt determines the easiness of LTP induction, function ~t) will be the inverse of the pCREB formation curve, i.e.:
1 rp(t)
(9.15)
= [pCREB(t)]
where [pCREB(t)] is the concentration of phosphorylated CREB in the postsynaptic neuron. Early CaMK-dependent CREB phosporylation occurs after any high-frequency stimulation and later, PKA-dependent phase of CREB phosporylation occurs when the presynaptic stimulation lasts longer than 1 min (Schulz et al. 1999, Leutgeb et al. 2005). Thus the duration of presynaptic HFS stimulation will provide a threshold for the switch between the first phase of CREB phosporylation and its second phase. In a more detailed biophysical model this switch should arise from the kinetics of postsynaptic enzymatic reactions. Thus our CNGM is more abstract and highly simplified, but therefore perhaps more suitable for simulating larger networks of artificial neurons. But first we would like to demonstrate its feasibility in the experimental study of one neuron in reproduction of the actual experimental results from (Schulz et al. 1999).
198
9 Application ofCNGM to Learning and Memory
9.5 Application to Modeling of L-LTP In this study, we will employ as a spiking neuron model, a simple spiking neuron model of Izhikevich (Izhikevich 2003). Let the variable v (mV) represents the membrane potential of the neuron and u represents a membrane recovery variable, which accounts for the activation of K+ ionic currents and inactivation of Na + ionic currents, and thus provides a negative feedback to v. The dynamics of these two variables is described by the following set of differential equations:
v=0.04v 2 +5v+140-u+I
(9.16)
if = a(bv-u)
(9.17)
Synaptic inputs are delivered via the variable I. After the spike reaches its apex (AP = 55 mY), the membrane voltage and the recovery variable are reset according to the equation if
v zAP,
then {
v~c
(9.18)
u e-iu v d
Values of dimensionless parameters a, b, C, d differ for different types of neurons, i.e. regularly spiking, fast spiking, bursting, etc. (Izhikevich 2003). We will assume that the total synaptic input I(t)
=Iwit )
(9.19)
where the sum runs over all active inputs and wit) is the value of synaptic weight of synapse j at time t. In order to reproduce experimental data from (Schulz et al. 1999) we construct a simple spiking model of a hippocampal dentate granule cell (GC), in which we ignore the effect of inhibitory neurons. For the schematic illustration of hippocampal formation see Fig. 9.6. Thus, a model GC has three excitatory inputs, two of them representing ipsilateral medial and ipsilateral lateral perforant paths, mpp and lpp, respectively, and one excitatory input from a contralateral entorhinal cortex (cEC) (Amaral and Witter 1989). Mpp and lpp are two separate input pathways coming from the ipsilateral entorhinal cortex (EC) and terminating on separate but adjacent distal dendritic zones of the hippocampal dentate granule cells (McNaughton et al. 1981). They together form an ipsilateral perforant pathway input (pp). Input from the contralateral entorhinal cortex (cEC) terminates on the proximal part of the granule cell dendritic tree (Amaral and Witter 1989).
9.5 Application to Modeling ofL-LTP
199
As a neuron model we employ the simple model of spiking neuron introduced by Izhikevich (Izhikevich 2003), with the parameters values corresponding to regularly spiking cell, i.e. a = 0.02, b = 0.2, c = -69 mY, d = 2, and the firing threshold equal to 24 mY (McNaughton et al. 1981). The model is simulated in real time with the time step of 1 ms. Total synaptic input corresponding to variable I reads:
I(t)
=
hmpp(t)wmpp(t) I mpp + h1pp(t) WIpp(t) I 1pp+ heEc(t) WeEc(t) I eEC
(9.20)
where wmpp (WIpp, WeEe) is the weight of the mpp (lpp, cEC) input, and I mpp ( I\pp , IeEe) is the intensity of electric stimulus delivered to mpp (lpp, cEC), respectively. The function hmpp(t), h1pp(t) and heEc(t) is equal to 1 or 0 when presynaptic spike occurs or is absent at a respective input at time t. In our simulations, the spontaneous, testing and training intensities are the same and equal to I mpp = IIpp = I eEC = 100. Actually, the interpretation of stimulus intensity in the model is the number of input fibers within a given pathway that are engaged by stimulation. Initial values of synaptic weights wmpp(O) = Wlpp(O) = WeEc(O) ;:::; 0.05, so when the three input pathways were stimulated simultaneously or in a close temporal succession, a postsynaptic spike followed.
\ Entorhinal
cortex
Ipp
Fig. 9.6. Schematic illustration of hippocampal pathways and GC inputs
To simulate synaptic plasticity, we employed the STDP rule expressed by Eqs. 9.10 - 9.11, with the sliding BCM threshold incorporated through the amplitudes of synaptic changes (Eqs. 9.12 - 9.15) with these parameters values: A+(O) = 0.02, A _(0) = 0.01, T+ = 20 ms, T _ = 100 ms, TM = 30 s, a= 3000. We simulate the experimental situation, in which to induce
200
9 Application of CNGM to Learning and Memory
LTP in dentate gyrus (round cells in Fig. 9.6), electrical stimulation was delivered to the perforant pathway, which is the mixture of lpp and mpp fibers. Nondecrementallong-lasting LTP was induced by stimulating perforant pathway with 20 trains of impulses. Each train consisted of 15 pulses. The frequency within the train was 200 Hz for high frequency stimulation (HFS). The distance between trains was 5 sec. Nondecremental LTP or L-LTP lasted for at least 24 hours (Schulz et al. 1999). In computer simulations, spontaneous spiking input from EC (ipsilateral and contralateral) was generated randomly (Poisson train) with an average frequency of 8 Hz to simulate the spontaneous theta modulation (Frank et al. 200 l). That has lead to a postsynaptic spontaneous activity of granule cells of ~ 1 Hz (Kimura and Pavlides 2000). Spontaneous input has to be synchronous between the inputs so that their weights keep approximately the same value. There is an anatomical basis for such a synchronization within EC (Biella et al. 2002). Decorrelated random spontaneous activity of frequency < 1 Hz can be superimposed upon all three input weights with no effect. Model GC received spontaneous spikes all the time. HFS of 20 pulse trains was delivered to pp at t = 2 hours. During the HFS of perforant pathway, there was an ongoing 8 Hz-spontaneous input activity from cEC input. During the 5s intertrain intervals all inputs received uncorrelated spontaneous activity of the frequency of 8Hz. After the pp HFS, 8Hz correlated spontaneous spikes at all three inputs resumed again. In the following figures, we summarize results of our computer simulation. All presented simulated curves are averages from 6 measurements, similarly like in (Schulz et al. 1999). Fig. 9.7 shows the results of simulation of induction and maintenance of nondecremental LTP in granule cells. Magnitude and duration of fEPSP change (i.e. 24 hours) in our computer simulation are the same as in the experimental study (Schulz et al. 1999). Percentual change in the field EPSP was calculated as a dimensionless linear sum either of mpp and lpp weight changes for pp input, i.e. ~fEPSP = ~wmpp + ~Wlpp or for the contralateral input as ~fEPSP = ~WcEC. As we can see in Fig. 9.7a, HFS of pp consisting of 20 trains leads to homosynaptic LTP of pp and heterosynaptic LTD of cEC input. Since the induction of LTD of cEC pathway was not tested in the simulated experiments of Schulz et al. (Schulz et al. 1999), it can be considered to be the model prediction. However, this prediction of the model is in accordance with experimental data of Levy and Steward (Levy and Steward 1983), in which the HFS of ipsilateral pp depressed the contralateral pathway when the latter was not receiving a concurrent HFS, which is the case of our study.
9.5 Applicationto ModelingofL-LTP
201
Fig. 9.7b shows the temporal courses of [pCREB(t)] that accompanies the induction and maintenance of L-LTP and has the same course and amplitude as in the experimental study (Schulz et al. 1999). Fig. 9.7c depicts temporal evolution of the modification threshold ~ in our computer simulations. Synaptic weights and therefore ~ change slowly in dependence on [pCREB(t)] and quickly in dependence on the time average of postsynaptic spiking activity over the last TM = 30 sec. To conclude, we would like to note that in the experimental study (Schulz et al. 1999), decremental or early E-LTP was also induced and [pCREB] measured but the paper does not provide sufficient details (like amplitude and detailed time course of [pCREBJ) for setting up our model for that situation. HFS
a
~
2
8
6
4
CD UJ
u .e,
12
14
18
20
22
16
18
20
22
24
16
18
20
22
24
16
time (hours )
b a:
10
~LR 2
0
4
6
8
2
12
14
tim e (h o urs )
C
o
10
4
6
8
10
12
14
time (hours)
Fig. 9.7. (a) Temporal evolution of fEPSP in our computer simulation of L-LTP. PP means perforant path, cEC means contralateral entorhinal cortex. Nondecremental L-LTP continues to last for 24 hours of simulation; (b) Biphasic course of [pCREB(t)] that accompanies the induction and maintenance of L-LTP as measured in the experiment (Schulz et al. 1999); (c) Evolution of the modification threshold ~ in the model
202
9 Application of CNGM to Learning and Memory
9.6 Summary and Discussion Our computer simulations faithfully reproduce the results of experimental study of L-LTP (Schulz et al. 1999). In our model, we have linked the temporal changes in the levels of pCREB as measured in experiment to the dynamics of the BCM synaptic modification threshold 8M that determines the magnitude of synaptic potentiation and depression in STDP, which is a novel and original contribution of this chapter. Learning rule, which we have introduced in this chapter and which we have used to model the experimental data on hippocampal synaptic plasticity leads to the following picture of relative synaptic changes during the course of the model simulation (see e.g. Fig. 9.8).
0 .05
-.
0 .04 0 .03 0 .02
::+
-e
::
..
0 .01 0 -0 .0 1 -0 .02 -0 .0 3 -0 .04 -0 .0 5 - 1 0 0 -8 0
-6 0
-40
-2 0
0
20
40
60
80
100
T im e difference (ms)
Fig. 9.8. Positive and negative weight changes (above and below zero on the yaxis, respectively) recorded during the course of the model GC simulation as a function of the temporal difference between the postsynaptic and presynaptic spike timing
Changeable windows for the induction of LTD and LTP are crucial factors for the successful reproduction of experimental results. Whether these windows are changeable and whether this dynamics indeed depends on the average past activity of the postsynaptic cell, remains to be proven. So far, it is one of the predictions of our model. Another prediction of our model
9.6 Sununary and Discussion
203
is that the width of temporal windows for the induction of LTDILTP also depends on particular long-term processes like the time course of concentration of biochemical factors that are necessary for the induction and maintenance of LTP. In our particular experimental study of hippocampal L-LTP we have employed the time course of the elevation of concentration of phosphorylated CREB, the main transcription factor involved in the induction and maintenance of the late LTP known nowadays. It is to be expected that there are many factors influencing the LTDILTP windows . In fact, we have made a preliminary list of them in section 9.3 on putative mechanisms of metaplasticity. It may be a worthy exercise to incorporate these other factors into our learning rule to model different kinds of experimental situations. In this chapter, we have given the neurogenetic basis and have introduced the construction of CNGM of learning and memory at a cellular level. It should be mentioned that it is not the only way how to construct CNGM of learning and memory. Instead, this theory and methodology should serve as a recipe of how to proceed in the set up of such a model and how to verify it on the real data. As an experimental study we have modeled synaptic plasticity phenomena in hippocampal dentate gyrus (Schulz et al. 1999). This brain structure is often studied in relation to mechanism of learning and memory. Our CNGM is of course based on the present knowledge of molecular mechanisms of learning and memory. These can be much more elaborated in future years, although hopefully some of present knowledge will be incorporated. At the same time, the methodology of current CNGM can be used with any particular knowledge of molecular mechanisms of learning and memory to bring more insight into the basis ofleaming and memory in the brain.
10 Applications of CNGM and Future Development
This chapter presents information on neurogenetic causes of brain diseases and points to future directions of building CNGM for these diseases. With the advancement of molecular research technologies more and more data and information is available about the genetic basis of neuronal functions and diseases (see Table A.I in Appendix 1). This information can be utilized to create models of brain functions and diseases that include models of gene and protein interactions, i.e. computational neurogenetic models (CNGM) (Kasabov and Benuskova 2004,2005). This research has many open questions, some of them listed below we will attempt to address in this chapter: 1. Which real neuronal parameters are to be included in an ANN model of a particular brain condition and how to link them to activities of genes/proteins? 2. In turn, which genes/proteins are to be included in the model and how to represent gene interaction over time within each neuron? 3. How to integrate in time the activity of genes, proteins and neurons in an ANN model? 4. How to integrate internal and external variables in a CNGM (e.g., genes and neuronal parameters with external signals acting on the brain) and how to treat various perturbations? 5. How to create and validate a CNG model in the presence of scarce data? 6. How to measure brain activity and the CNGM activity in order to validate the model? 7. What is the effect of connectivity? How it interacts with the effects of GPRN? 8. What useful information can be derived from CNGM? These and probably many more questions remain to be addressed in the future. We will suggest answers to some of them, or at least directions in which the answers may lie. In the present chapter we will suggest the construction of CNGM for these diseases and processes: epilepsy, schizophrenia, mental retardation, brain aging and Alzheimer disease, and Parkinson
206
10Applications ofCNGM and FutureDevelopment
disease. We conclude the final chapter with the introduction of the braingene ontology project.
10.1 CNGM of Epilepsy Epilepsy is a disorder characterized by the occurrence of at least two unprovoked seizures (Cavazos and Lum 2005). Seizures are the manifestation of abnormal hypersynchronous discharges of neurons in the cerebral cortex. The clinical signs or symptoms of seizures depend on the location and extent of the propagation of the discharging cortical neurons. The prevalence of active epilepsy is about 1%, which however means about 50 millions people worldwide are affected (Cavazos and Lum 2005). People at any age can develop epilepsy. However, it has been noted that around half of people developing epilepsy do so before the age of 15 years. Recent epidemiological evidence suggests, however, that increasing numbers of patients are developing epilepsy in old age: this is partly because of demographic changes in the population (with an increasing proportion of the population in the elderly age range) and also due to an increasing incidence of degenerative cerebrovascular disease in old age. Most, but not all, studies have found a slight male preponderance (Sander 2003). Seizures are often a common, nonspecific manifestation of neurologic injury and disease, which should not be surprising because the main function of neurons is the transmission of electrical impulses. A genetic contribution to a etiology has been estimated to be present in about 40% of patients with epilepsy (Gardiner 2003). Pure Mendelian epilepsies, in which a single major locus can account for segregation of the disease trait are considered to be rare, and probably account for no more than 1% of patients (Gardiner 2003). The common familial epilepsies tend to display complex inheritance, in which the pattern of familial clustering can be accounted for by the interaction of several loci together with environmental factors. 10.1.1 Genetically Caused Epilepsies The following Table 10.1 is devoted to different types of genetically caused epilepsies, associated brain pathologies, symptoms, and putative mutated genes to be included in future CNGM. This table is by far not complete, since every day new data on genes involved in the so-called idiopathic epilepsies (i.e. those with unknown cause) emerge.
10.1 CNGM of Epilepsy
207
Table 10.1. Epilepsies caused genetically, putative mutated genes, and affected functions of brain neurons in humans
Epilepsy Mutated genes / Brain abnormality chromosome location (if
Symptoms
References
Autosomal dominant nocturnal frontal lobe epilepsy (ADNFL)
a4 subunit of the nicotinic AchR (CHRNA4) / 20q 1. ~2 subunit of same receptor (CHRNB2) / lp
Reduced nAchR channel opening time and reduced conductance leading to hyperexcitability.
Partial seizures during night that may generalize, arising from the frontal lobes, motor, tonic, postural type.
(Gardiner 1999, Meisler et al. 2001, Stein le in 2004)
Benign familial neonatal convul-
EBN1 (K+ chan- Alteration of the gatnel gene KCQ2) ing properties of the /20q K+ channel leading to poor control of repetiEBN2 (K+ chan- tive firing. nel gene KCNQ3)/8q
Generalized epilepsy of the newborns, seizures is frequent and brief, episodes resolve within a few days.
(Gardiner 1999, Meisler et al. 2001, George 2004, Steinlein 2004)
SIOns
(BFNC1 and BFNC2) Childhood absence epilepsy (CAE)
y2 subunit gene for the GABA A receptor gene GABRG2 / 5q gene CLCN2 / 3q
Fast and part of slow GABAergic inhibition is reduced, voltage-gated Cl" channel function is impaired.
Absence seizures (consciousness impaired) up to 200 times a day, bilateral 2-4 Hz spike and slowwave EEG.
(Crunelli and Leresche 2002, Marini et al. 2003, Steinlein 2004, Segan 2005)
Generalized epilepsyand febrile seizures plus (GEFS+)
~1 subunit of the Na+ channel gene SCN1B / 19q a1 and a2 subunits, gene SCN1A and gene SCN2A/2q GABRG2/5q
Normal inactivation kinetics of the Na+ channel is reduced causing persistent Na+ influx and hyperexcitability, reduced function of the GABAAR.
Childhood onset offebrile seizures, with febrile and afebrile generalized seizures continuing beyond 6 yrs of age.
(Gardiner 1999, Meisler et al. 2001, Steinlein 2004)
208
10 Applications of CNGM and Future Development
Epilepsy Mutated genes / Brain abnormality chromosome location (if known)
Symptoms
Intractable childhood epilepsy
al subunit of the Na+ channel, gene SCNIA / 2q
Rapid recovery of the Na+ channel from inactivation or very slow inactivation.
Frequent intracta- (George 2004) ble generalized tonic-clonic seizures.
Juvenile absence epilepsy (JAE)
al /5q, a5 /I5q, y2 /5q subunit genes for the GABA A receptor gene (CLCN2) / 3q
Fast and part of slow GABAergic inhibition is reduced, voltchannel age-gated function is impaired.
Similar like CAE (George but the seizures 2004, Sestart after year 10, gan 2005) seizures may be less frequent and last longer than few seconds.
Juvenile myoclonic epilepsy (JME)
a7 subunit of the nicotinic AchR (CHRNA7) / 15q gene CLCN2 / 3q, 134 subunit of Ca 2+ channel, (CACNB4) / 19p
Reduced function of the nicotinic AChR, voltage-gated Clchannel and voltage-gated Ca2+ channel have reduced conductance.
Myoclonic jerks or seizures shortly after awakening, generalized tonicclonic seizures, and sometimes absence seizures.
(Gardiner 1999, Cavazos and Lum 2005)
Dravet syndrome, severe myoclonic epilepsy of infancy (SMEI)
a I subunit of Complete loss of acthe Na+ channel, tivity of the Na+ gene SCN1A / channel. 2q
Both generalized and localized seizures, clonic and myoclonic seizure types.
(George 2004, Steinlein 2004)
Lafora disease (progressive myoclonus epilepsy)
Laforin gene EPM2A / 6q24 malin gene EPM2B /6p22.3
cr
Presence of Lafora bodies (granules of accumulated carbohydrates).
References
Myoclonic jerk- (Ganesh et ing, ataxia, mental al. 2006) deterioration leading to dementia.
10.1 CNGM of Epilepsy
209
10.1.2 Discussion and Future Developments
There are many computational models of temporal lobe-like epileptic seizures based on neural network dynamics (Traub et al. 1987, Biswal and Dasgupta 2002, Wendling et al. 2002, Kudela et al. 2003). We distinguish these latter models from models of thalamocortical slow spike-and-wave seizures like in CAE (Lytton et al. 1997, Destexhe 1998, Robinson et al. 2002). However none of existing models take into account the influence of genes and their internal networks upon neural dynamics, which is the goal of CNGM. Using the information about genetic causes of epilepsies listed in Table 10.1, existing neural network or dynamical models can be enhanced to incorporate this information and thus lead to future CNGM. There is quite high percentage of pharmacoresistant epilepsies, i.e. 30-40% (Fisher et al. 2003), therefore, it is important to seek new ways of modeling epilepsy to gain more insight into the mechanisms of seizure spread and maintenance in seizure-prone cortex. By the development of new computational models the aim is to better understand the mechanisms of drug resistance. In humans with intractable temporal lobe epilepsy (TLE), many of surviving inhibitory interneurons lose their PV content or PV immunoreactivity (Wittner et al. 2005). It has been proposed that efficient Ca2+ buffering by PV and its high concentration in PV-expressing inhibitory cells is a prerequisite for the proficient inhibition of cortical networks (DeFelipe 1997). To investigate this hypothesis, Schwaller and coworkers used mice lacking PV (PV-/-), which had previously been produced by homologous recombination. These mice show no obvious abnormalities and do not have epilepsy (Schwaller et al. 2004). However, the severity of generalized tonic-clonic seizures induced by Pentylenetetrazole (PTZ) was significantly greater in PV-/- than in PV+/+ animals. Extracellular single-unit activity recorded from over 1000 neurons in vivo in the temporal cortex revealed an increase of units firing regularly and a decrease of cells firing in bursts. In addition, control animals showed a lesser degree of synchronicity and mainly high frequency components above 65 Hz in the LFP spectrum compared to PV-/- mice. On the other hand, PV-/- mice were characterized by increased synchronicity and by abnormally high proportion of frequencies below 40 Hz (Villa et al. 2005). In the hippocampus, PV deficiency facilitated the GABAAergic current reversal induced by high-frequency stimulation, a mechanism implied in the generation of epileptic activity (Vreugdenhil et al. 2003). Through an increase in inhibition, the absence of PV facilitates hypersynchrony through the depolarizing action of GABA (Schwaller et al. 2004). In setting up the future CNGM, for modeling the generation of LFP we can use the SNN model of cerebral cortex introduced in the end of Chapter
210
10Applications of CNGM and Future Development
4. Or we can use more sophisticated simulation environments like GENESIS (Bower and Beeman 1998) and NEURON (Carnevale and Hines 2006). Behind each of the neuron's parameters there is a particular protein be it the ion channel or postsynaptic receptor or enzyme. Proteins in neurons like receptors and ion channels are complex proteins comprised of several subunits each of them is coded for by a separate gene. These genes are expressed in a coordinated manner so we can treat them as a one gene group G, with the overall normalized expression level or we can treat each subunit individually. During the time interval of LFP measurement, we can assume that the gene expression level is constant but at the same time depends on expression levels of all genes in the selected set of genes such that (10.1) where (J is the sigmoid function between 0 and 1, gk(t) is the expression level of gene k at time t and Wjk E (-1, I) is the coefficient of an abstract gene interaction matrix W. Neuron's parameter value Pit) is proportional to gene group expression level git) such that Pit)
=
P(O) git)
(10.2)
where Pj(t) is the value of parameter j at time t, P(O) is the initial value of that parameter and gj(t) E (0, I) is the normalized level of expression of thej" gene in the model GRN. Genes and proteins involved in GRN should be comprised of ion channels, excitatory and inhibitory receptors with fast and slow kinetics, calcium buffering proteins, etc. Then we can use optimization procedures as introduced in Chapter 8 to find the coefficients of the gene interaction matrix. The objective function for optimization will be mouse LFP with particular spectral characteristics. Once we have obtained the gene interaction matrix W, we can simulate gene deletion and mutation and drug effects upon LFP characteristics, spiking activity of neurons, synchronicity, oscillations, etc. Predictions from CNGM can aid experimental research of causes of idiopathic epilepsies and drug resistance for various drugs.
10.2 CNGM of Schizophrenia It is estimated that there are about 50 different neurotransmitters acting in the human brain. The three major categories of substances that act as neurotransmitters are (I) amino acids (primarily glutamate, GABA, aspartic acid and glycine), (2) peptides (vasopressin, somatostatin, neurotensin,
10.2 CNGM of Schizophrenia
211
etc.) and (3) monoamines (norepinephrine, dopamine and serotonin) plus acetylcholine. There are also other categories like opioids, tachykinins, and so on. A vast majority of neurotransmitters is being produced in evolutionary older subcortical nuclei. It can be said, that almost every subcortical nucleus has its own neurotransmitter(s) (they may synthesize more than just one). These different subcortical groups of neurons and their axons which are sent and spread all over the brain including the cortex constitute different neurotransmitter systems. Serotonin (5-HT, 5hydroxytryptophan), dopamine (DA), and noradrenaline (norepinephrin) are produced in three subcortical nuclei (Fig. 10.1).
Prefrontal associat ion cortex
Occipital cortex
Limbic asociation cortex
Fig. 10.1. Schematic illustration of diffuse projections that originate in the noradrenergic (NA), serotonergic (5-HT) and dopaminergic (DA) nuclei in the brain stem, and spread all over the cerebral cortex (projections into other brain parts are not illustrated)
On the other hand, major excitatory neurotransmitter of cortical excitatory neurons is glutamate. Major inhibitory neurotransmitter in cortex is GABA (y-amino-butyric acid). However in effect, many neurotransmitters act upon every neuron within a cortex. Action of different neurotransmitters causes influx of different ions and activations of different second messengers. Thus, the information-processing properties of neurons are subject to complex short- and long-term influences. Different neurotransmitter systems and thus different subcortical and cortical brain parts are related to different mental processes and functions. Extraordinary low or high activity of neurotransmitter systems leads to disorders of mental functions
212
10Applications ofCNGM and Future Development
(Kaplan and Sadock 2000). Thus, neurotransmitters themselves are endogenous psychoactive drugs. Since levels of neurotransmitter production in every nucleus are pre-programmed genetically, a gene error or gene mutation can lead to a permanently altered production of a particular neurotransmitter, its receptor(s), or any molecule in the second messenger systems. Neuronal genetic program itself is subject to such influences as hormones, stress, aging, and as we now know, also to activity of neurons themselves. The so-called psychoactive substances, that is drugs and medicines used to treat mental disorders, act in synapses upon different types of receptors. They are psychoactive because their chemical structure resembles that of endogenous neurotransmitters. Thus, they can chemically bind to their receptors. Neurons react as if there was a natural neurotransmitter present. Biochemical and neuroscientific research finds systematic and reproducible changes in the brain that are typical for particular mental disorders like depression and schizophrenia (Kaplan and Sadock 2000). There are dozens of studies, which demonstrate that schizophrenia has the so-called polygenetic type of heredity. Individuals carrying a specific genetic predisposition (i.e. complex tiny genetic mutations) mayor may not develop a disorder. It is said that these individuals are vulnerable against a given disease. Individuals carrying these mutations are vulnerable in a sense that various life stress events may trigger or switch on the disease process. If the endogenous genetic predisposition is too strong, the process of disease may start spontaneously, that is without a triggering external event. 10.2.1 Neurotransmitter Systems Affected in Schizophrenia
Schizophrenic symptoms can be very bizarre and vary a lot from person to person. Statistically, about 1% of population suffers schizophrenia regardless of the race, gender, education, family and social status, etc. There are several subtypes of schizophrenia with typical symptoms, different severity of symptoms and different courses of the disease. The most characteristic symptoms are delusions, hallucinations, and various thinking and perceptual disorders. Schizophrenic withdrawal from reality can manifest itself in many peculiar ways. Disorder is accompanied by serious deterioration from previous level of functioning in such areas as work, social relations, and self-care. Neuropathological research on the brains of schizophrenics has shown that there are specific alterations on some types of neurons within the frontal and temporal cortices (Kaplan and Sadock 2000). These cortical neurons seem to be less mature, with altered and retarded signs of differentiation. These subtle neurodevelopmental morpho-
10.2 CNGM of Schizophrenia
213
logical abnormalities may be the consequence of defects in major neurotransmitter systems that affect the developing cortex. Major altered neurotransmitter systems seem to be the dopaminergic (DA), glutamatergic (Glu) and serotonergic (5-HT) systems. There are four relatively separated dopaminergic systems in the brain. The first one acts on hypothalamus and affects the neurohormonal secretion. The second one, acting on basal ganglia, plays a crucial role in executing muscular, particularly involuntary, movements. The third DA system, the so-called mesolimbic system, originates in ventral tegmentum in the brain stem and innervates the limbic system and the limbic association cortex. It regulates expression of emotions, feelings of satisfaction, reward and pleasure. The fourth DA system, the so-called mesocortical system, has its dopaminergic neurons also in the ventral tegmentum. These neurons send their axons to the frontal and prefrontal cortices. Release of DA in these cortices regulates motivation, concentration and goal-directed planned behavior, which requires a complex organization of thoughts. Based on experimental data, it is assumed that a disbalance ofDA levels in the brain includes its lack in the frontal and prefrontal cortices, and its excess in the limbic and subcortical areas. Reduction of DA levels in the frontal cortex results in the so-called hypofrontality, that is unusually reduced levels of activity in the frontal cortex, revealed by means of brain imaging techniques (DelaTorre et al. 2005). Reduced frontal activity is hypothesized to result in such cognitive deficits as a specific "emptiness", an absence of cognitive and emotional contents in thoughts and motivation. On the other hand, increased levels ofDA in the emotional and subcortical centers may lead to an impaired filtration and discrimination of stimuli, thus in tum leading to delusions in thinking and hallucinations in perception. For instance, a long-term abuse of amphetamines, e.g. drugs which increase levels of DA in the brain, results in development of paranoid delusions. The second major system altered in schizophrenia is a glutamatergic system. Glutamate is being released from axonal terminals of cortical excitatory neurons that make synapses with other cortical neurons and with subcortical neurons in the brain stem nuclei to which they relay their feedback influence. Neurochemical research has revealed a decrease in the number of glutamate receptors and glutamate itself in the frontal cortices of schizophrenics. Decreased glutamatergic transmission in frontal cortex can also contribute to the hypofrontality phenomenon. Importance of the role played in schizophrenia by the reduced action of glutamate is related to the fact that it is a neurotransmitter of learning. Long-term changes of synaptic weights occur in glutamate synapses through NMDA receptors. Thus, reduced levels of glutamate and NMDA receptors in the frontal cor-
214
10 Applications ofCNGM and Future Development
tex may lead to altered synaptic plasticity during learning (adaptation). The drug PCP (phencyclidin, "angel dust"), that blocks postsynaptic glutamate receptors and increases levels of dopamine and serotonin, causes similar psychic experiences like in schizophrenia, e.g. hallucinations, disorders of thinking and cognition, emptiness. Neurons that produce serotonin are located in the brain stem in the nucleus raphe. They send their axons all over the cerebral cortex, limbic system, basal ganglia, thalamus and hypothalamus. Target neurons possess different types of serotonin receptors, thus somewhere the serotonin acts as an excitatory NT and elsewhere as an inhibitory NT. Serotonin regulates sleep, appetite, and libido. It also suppresses aggressive behavior - it can be said that it is a neurotransmitter of "mental comfort". Antipsychotic medicines (previously called neuroleptics) used in psychiatry to alleviate schizophrenic symptoms, block postsynaptic receptors for DA. Differences between effects of different antipsychotics depend on which subtypes of DA receptors they affect. Recently, it has been found that antipsychotics that reduce not only the DA synaptic transmission but also a serotonergic synaptic transmission, have more desirable effects upon schizophrenic symptoms and can restore the activity in frontal areas of the brain (Honey et al. 1999). The involvement of 5-HT system is not surprising since it has been known for a long time, that LSD (lysergic acid dyethylamid) enhances serotonergic transmission and produces gross perceptual and thinking alterations. Treatment of schizophrenia is usually a lifelong process. Since medicines still do not cure the original biochemical defect, they can only help the shattered neurotransmitter systems to regain more or less stable balance.
10.2.2 Gene Mutations in Schizophrenia Twin and adoption studies demonstrated that susceptibility to schizophrenia is strongly heritable even if children are reared apart from their biological parents. When one twin has schizophrenia, the risk of schizophrenia in the co-twin is greater in monozygotic twins (45%) than in dizygotic twins (15%). However, 40% of the monozygotic co-twins of a person with schizophrenia are clinically normal (Cloninger 2002). Furthermore, the risk of illness decreases with degree of genetic relationship more rapidly than can be explained by a single gene or the sum of effects of several such genes. Thus, the inheritance pattern of schizophrenia suggests that multiple genes, each of small effect, interact nonlinearly with one another and with environmental factors to influence susceptibility to schizophrenia (Cloninger 2002). This prediction has been confirmed by more than 20 ge-
10.2 CNGM of Schizophrenia
215
nomewide linkage scans in more than 1,200 families of schizophrenics. These studies found evidence for several genes of small effect; that is, genes that modify susceptibility but are neither necessary nor sufficient to cause the disorder. However, no evidence was found for any genes with a large individual effect, such as a Mendelian subtype of schizophrenia. Linkage studies for susceptibility genes produced the list of regions of interest to include target regions on Iq21-q22 (Brzustowicz et al. 2000), 6q21-q22.3 (Cao et al. 1997), and 13q34. Chumakov and colleagues (Chumakov et al. 2002) describe a new human gene, G72, on chromosome 13q34 that interacts with the gene for D-amino acid oxidase (DAAO) on 12q24 to regulate glutaminergic signaling through the N-methyl-Daspartate (NMDA) receptor pathway. Using traditional positional cloning techniques of linkage and linkage disequilibrium, they show that both of these genes are associated with increased susceptibility to schizophrenia. Therefore, this is the first discovery ofa specific gene that also provides a pathogenic molecular mechanism that can account for the major symptoms of a psychiatric disorder. Similarly, two other groups reported that the gene dysbindin on 6p22.3 (Straub et al. 2002) and the gene neuregulin I on 8p (Stefansson et al. 2002) also influence susceptibility to schizophrenia and may operate via the same NMDA mechanism. Each of these gene discoveries came from association analysis targeting chromosomal regions first identified by linkage analysis. The linkage of schizophrenia to the 15q14 locus of the a-7 nicotinic receptor has also been replicated (Leonard et al. 2002). A gene called COMT (catecho-o-methytransferase) had long been suspected of being involved because it codes for an enzyme that breaks down dopamine after it is secreted into the synapse. COMT gene chromosome location is 22q 11.2. Egan et al. have linked a gene variant that reduces dopamine activity in the prefrontal cortex to poorer performance and inefficient functioning of the prefrontal cortex during working memory tasks, and to slightly increased risk for schizophrenia (Egan et al. 2001). The finding emerged from an ongoing study of people with schizophrenia and their siblings. Their brain imaging studies had revealed that both well siblings and patients falter on tasks of working memory and show reduced activation of the prefrontal cortex, which is required for this function. Studies have shown that the chemical messenger dopamine plays a pivotal role in tuning the activity of the prefrontal cortex during working memory tasks. People inherit two copies of COMT (one from each parent), each in either of two forms. The first variant, Val, reduces prefrontal dopamine activity, while a second form, Met, increases it. Egan et al. found that those who had inherited 2 copies of Val, on average, performed worse than those with only one copy. Those with two copies of Met performed best in the
216
10 Applications of CNGM and Future Development
working memory task. At the same time, people with two copies of Val had the lowest prefrontal activity. Brain activity of those with one copy of each variant was more efficient, while activity of siblings who inherited two copies of Met showed the highest brain efficiency, on average. Among 104 pairs of parents studied, the investigators discovered that the Val form of COMT was transmitted to offspring who eventually developed schizophrenia more often than would be expected by chance: 75 times for Val, compared to 51 times for Met. Inheriting two copies of the Val form accounts for a 1.5-fold increase risk for schizophrenia. It is not yet known exactly how the COMT Val variant impairs prefrontal efficiency. The researchers suspect that COMT's effect, while modest, may be amplified through interaction with other susceptibility genes and environmental factors. It seems that the COMT Val allele is certainly not a necessary or sufficient causative factor for schizophrenia. However, its biological effect on prefrontal function and the relevance of prefrontal function for schizophrenia implicate a mechanism by which it increases liability for the disorder. Although some insights into the etiology of schizophrenia have been developed from these studies, an understanding of the disease on the molecular level remains elusive. Efforts to identify molecular aberrations associated with the disease may be confounded by the subtle structural and cellular changes that occur and the polygenic nature ofschizophrenia. Neuropathological and brain imaging studies suggest that schizophrenia may result from neurodevelopmental defects. Cytoarchitectural studies indicate cellular abnormalities suggestive of a disruption in neuronal connectivity in schizophrenia, particularly in the dorsolateral prefrontal cortex (Arnold and Trojanowski 1996). Cellular aberrations, such as decreased neuronal size, increased cellular packing density, and distortions in neuronal orientation, have been observed in immunocytochemical and ultrastructural studies. Yet, the molecular mechanisms underlying these findings remain unclear. To identify molecular substrates associated with schizophrenia, DNA microarray analysis was used to assay gene expression levels in post-mortem dorsolateral prefrontal cortex of schizophrenic and control patients (Hakak et al. 2001). DNA microarray analysis is one such technique that allows for the quantitative measurement of the transcriptional expression of several thousand genes simultaneously. Genes determined to have altered expression levels in schizophrenics relative to controls are involved in a number of biological processes, including synaptic plasticity, neuronal development, neurotransmission, and signal transduction. Most notable was the differential expression of myelination-related genes suggesting a disruption in oligodendrocyte function in schizophrenia. Oligodendrocytes increase neuronal conduction velocity through their insulating properties and pro-
10.2 CNGM of Schizophrenia
217
vide extrinsic trophic factors that promote neuronal maturation and axonal survival. Myelination ofthe prefrontal cortex has been observed to occur in late adolescence and early adulthood, which is typically the age of onset of schizophrenia (Benes 1989). The consequence of abnormal myelination can also be related to abnormal timing in signal transfer between the two brain hemispheres as reported for instance in (Barnett et al. 2005). Genes involved in schizophrenia that can be used in the future CNGM are listed in Table 10.2. Table 10.2. Genes involved in GPRN of schizophrenia(SCH) for future CNGM Location / Gene sym- Presumed function bol lq2l-q22 Myelination related genes. Abnormal myelination leads to abnormal signal transfer between neurons Splicing and expression of myelin-relatedgenes 6q21-q22.3 / QKI G72 gene interacts with the gene for DAAO to regu13q34/ G72 late glutamatergictransmissionthrough NMDARs D-amino acid oxidase (DAAO) l2q24 / DAAO Involved in NMDAR mechanism 6p22.3 / dysbindin 8p / neuregulin Involved in NMDAR mechanism l5q 14 / a7 subunit of Acetylcholine(ACh) is involved in such cognitive nicotinic ACh receptor functions as learning and memory COMT is an enzyme that breaks down dopamine after 22q11.2/ COMT it has been released in the synapse. SCH gene variant seems to reduce activity in the prefrontal cortex
10.2.3 Discussion and Future Developments Computational models of neural networks have been applied to modeling neurological diseases and mental illnesses including schizophrenia (Reggia et al. 1999). In the model of aberrant prefrontal cortex (PC) function, neural network approach helps to elucidate how multiple pathologies can lead to PC dysfunction, be it low or high levels of dopamine, reduction or increase in D 1 receptors function, loss or increase in GAB A global inhibition (Reid and Willshaw 1999). The authors developed a spiking network model of PC and simulated the situation when neurotransmitter/receptor levels are altered. The reason for alteration can be either a gene mutation and subsequent altered protein biosynthesis or a psychoactive drug. Changes in neurotransmitter/receptor action led to the disruption of coherence of firing patterns of neurons in the model Pc. Another model has targeted neurodevelopmental impairment in schizophrenia and its consequences upon neural networks functioning (Hoffman and McGlashan
218
10 Applications of CNGM and Future Development
1999). The modeling goal in this work was to simulate that pruning of corticocortical synaptic connections in the speech perception system can lead to hallucinated voices, which is a frequent schizophrenic symptom. The effect of myelination impairment upon the function of corpus callosum and interhemispheric information transfer was computationally investigated in (Chhabra et al. 1999). The model shows that an impaired performance of the brain can be a consequence of the imbalance in the interhemispheric excitation/inhibition and also due to loss of transmission of specific information between the left and right hemispheres via the corpus callosum. Such computational and neuroengineering models as mentioned above provide a conceptual bridge between molecular/cellular pathology and cognitive performance. On the other hand, computational neurogenetic models as introduced in this book can offer insights into the gene/protein interactions within neurons that underlie a particular brain activity. For instance genetic mutations of genes coding for proteins influencing the function of DA, NMDARs and myelination can be incorporated into the neural GRN together with other genes/proteins they can influence. Then a hierarchical model of LFP/EEG generation can be used to optimize the coefficients of GRN interaction matrix W. The target signal can be for instance EEG recorded from a schizophrenic patient. There are several reports that the spectral characteristics of EEG in schizophrenic patients have abnormalities (Fenton et al. 1980, Manchanda et al. 2003). The CNGM can explain these abnormalities by means of mutations in certain genes and/or by means of altered interactions within GRN. In addition, the CNGM can simulate the effects of antipsychotic drugs upon GRN, neural network dynamics and the resulting EEG. Thus, the knowledge extracted from CNGM of schizophrenia will be qualitatively new and can be used for improving the treatment of this disease on the individual basis by simulating the processes occurring between the genes/proteins and neural dynamics.
10.3 CNGM of Mental Retardation Mental retardation (MR) is a developmental deficit, beginning in childhood, which results in significant limitation of intellect and cognition and poor adaptation to the demands of everyday life (Sebastian 2005). MR is defined as an overall intelligence quotient (IQ) lower than 70 associated with functional deficits in adaptive behavior (such as daily living skills, social skills and communication), with an onset before 18 years. MR affects approximately 2-3% of population. Moderate to severe MR (IQ < 50) is estimated to affect 0.4-0.8% of the population and the mild MR (50
10.3 CNGM of Mental Retardation
219
< IQ < 70) about 2%. Intellectual disability is the developmental consequence of some pathogenic process. The underlying causes of MR are very heterogeneous. They include non-genetic factors that cause brain injury and act prenatally (like maternal drug consumption or malnutrition) or during early infancy (like hypoxia, infections, etc.), as well as established genetic causes. Yet the underlying cause is established in only about one half of cases (Stevenson et al. 2003).
10.3.1 Genetic Causes of Mental Retardation
The Down syndrome is the best-known example of prenatal genetic MR (Sebastian 2005). Although it can have several forms, basically it is caused by trisomy 21, in which the fetus acquires an extra chromosome 21 in all cells, for a total of 47 chromosomes. Another example is cri-du-chat syndrome, which is characterized by a high-pitched voice and is caused by a deletion in chromosome 5p3. Similar microscopic deletions (microdeletions) of DNA have been reported in chromosome 15qll-12 in the PraderWilli and Angelman syndromes. The Prader-Willi syndrome results when the microdeletion is in the chromosome of paternal origin and the Angelman syndrome results when it is of maternal origin. Tuberous sclerosis is another example of the disorders in this group, which might be associated with mental retardation. It is caused by a mutation in a gene affecting the formation of the ectodermal layer of the embryo. Phenylketonuria is the best known and most common of the metabolically caused MR. The enzymatic defect is diminished activity uf phenylalanine hydroxylase, caused by single mutated gene, which leads to a high serum phenylalanine level, affecting, among other things, myelination of the CNS. Seizures and tremors are common, as are eczema and psychotic manifestations. There is a big group of chromosome X-linked forms of MR (XLMR). Approximately a third of XLMR patients have syndromic forms of XLMR, where MR is associated with recognizable clinical signs such as somatic abnormalities or dysmorphic facial features. MR with fragile X (FRAXA) is the most common form of syndromic XLMR known to date. XLMR are categorized as syndromic if associated with characteristic clinical features like in fragile X syndrome (FRAXA), or non-specific (non-syndromic) if MR is the only symptom in affected individuals. Finding the molecular causes of nonsyndromic (NS) XLMR has turned out to be much more difficult because of genetic heterogeneity, which precludes pooling of linkage data from different families. Therefore, mapping intervals have remained comparatively wide. Until 1998, only a single gene, FMR2, could be isolated, because of its association with another fragile site, FRAXE. Analo-
220
10Applications of CNGM and Future Development
gous to the mechanism leading to inactivation of FMRI in patients with FRAXA, transcriptional silencing of FMR2 was found to be caused by the expansion and subsequent hypermethylation of a CCG trinucleotide repeat in the 50 noncoding region of the gene. FRAXA or fragile X syndrome is the most common inherited form of syndromic XLMR and, after Down syndrome, its most common genetic form. It has dominant inheritance with lower penetrance in females because of the haploid status in males for most genes on the X chromosome. Because of a constriction at the location of Xq27.3, it appears as if the chromosome is fragile and a part of it is breaking off (Genes and disease 2005). Prepubertal boys with this syndrome look quite normal. They often are restless and hyperactive and have a short attention span. Their developmental milestones, especially speech development, are delayed. After puberty, the characteristic phenotypical features may appear. They include an oblong face, prominent ears and jaw, and macroorchidism (enlarged testicles). Fragile X syndrome results from a repeat-expansion mutation in the FMRI gene, which encodes the protein named FMRP. In normal individuals, the FMRI gene is transmitted stably from parent to child. However, in Fragile X individuals, there is a mutation in one end of the gene (the 5' untranslated region), consisting of an amplification of a CGG repeat. Patients with fragile X syndrome have 200 or more copies of the CGG motif. The huge expansion of this repeat means that the FMRI gene is not expressed, so no FMRI protein is made (Genes and disease 2005). Although the exact function of FMRI protein in the cell is unclear, it is known that it is translated near synapses in response to neurotransmitter activation and plays a role in synaptic plasticity and possibly also maturation (Weiler et al. 1997). Recently, the search for NS-XLMR genes has gained considerable momentum by identifying 8 of the 13 genes presently known to playa role in NS-XLMR. Table 10.3 lists genes involved in NS-XLMR together with their location on chromosome X, coded protein and presumed function (Carrie et al. 1999, Bienvenu et al. 2002, Kitano et al. 2003, Ropers et al. 2003). These genes can form the basis of the future CNGM of mental retardation. As we have envisaged in previous chapters, changes in synaptic connectivity are likely to be a key mechanism by which nervous system organization is permanently changed by experience. Local translation of some proteins in dendrites is increasingly considered to be important for changes in synaptic plasticity (Raymond et al. 2000, Bradshaw et al. 2003). Certain mRNAs are known to be targeted to dendrites, and are observed in or near dendritic spines, more frequently at newly forming synapses. Dendrites have been shown to be equipped with components necessary for protein
10.3 CNGM of Mental Retardation
221
synthesis such as polyribosomal aggregates. Protein translation induced by metabotropic receptor stimulation has previously been proposed to playa role in LTP (Bradshaw et al. 2003). Protein translation near synapses may playa role in activity-dependent modification of synapses and also in synaptic maturation and stabilization during development. Table 10.3. Genes involved in X-linked mental retardation for future CNGM Location / Gene symbol Protein and presumed function XpilA / TM4SF2 Tetraspanin protein, interacts with integrins, control of neurite outgrowth? XqI2 /0PHNI RhoGAP involved in regulating actin cytoskeleton dynamics and neuronal morphogenesis Xq22 /PAK3 Rac/Cdc42 effector involved in regulating actin cytoskeleton dynamics and neuronal morphogenesis Xp22.1 / ARX Transcription factor Xp22.1-p21.3 / Unknown ILlRAPLl Xp22.3-22.1 / Serine/threonine kinase RPS6KA3/RSK2 Xq22.3 / FACL4 Fatty acid coenzyme A ligase, involved in vesicle transport, membrane fusion and gene expression Xq24 / AGTR2 Angiotensin II receptor, type 2, unknown function Xq26 / ARHGEF6 Rho guanine nucleotide exchange factor 6, role in integrin-mediated signaling, regulates Rho-G'TPases Rac l/Cdc42 Xq28 / FMR2/FRAXE Transcription factor? Xq28 / GDlI Rab GDP-dissociation inhibitor involved in synaptic vesicle fusion and neuronal morphogenesis Xq28 / MECP2 Methyl CpG-binding protein, involved in Chromatin remodeling and gene silencing Xq28 / SLC6A8 Solute carrier family 6, member 8, Creatine transporter Weiler with co-workers report that the mRNA for fragile X mental retardation protein (FMRP) rapidly associates with synaptic polyribosomal complexes in synaptoneurosomes after stimulation by a specific mGluRl agonist DHPG (3,5-dihydroxyphenylglycine) (Weiler et al. 1997). This association happens within 1 min after stimulation of mGluRl. Moreover, immunostaining of the synaptosomal proteins at short intervals after stimulation shows increased FMRP expression relative to unstimulated samples, indicating rapid synthesis of FMRP in response to synaptic activation during the first 5 min after stimulation ofmGluRl. Huber with colleagues investigated whether mice with a null mutation in the FMR 1 gene showed any changes in synaptic plasticity that could ac-
222
10Applications of CNGM and Future Development
count for the effects of the human mutation on the brain (Huber et al. 2002). Previous work had shown that the protein-synthesis-dependent phase of long-term potentiation (LTP) in the hippocampus of the knockout mouse was normal, but Huber with colleagues decided to look at another form of hippocampal plasticity - long-term depression (LTD) - following the demonstration that LTD also requires local protein synthesis. Surprisingly, they found that LTD in the knockout mice was enhanced, so that a train of stimulation produced greater depression of synaptic function than in control mice. LTD can also be induced by directly stimulating metabotropic glutamate receptors using the agonist DHPG (3, 5dihydroxyphenylglycine). The knockout mice also showed stronger LTD in response to DHPG application than did wild-type mice. However, it should be noted that a different type of LTD that is mediated by NMDA (N-methyl-D-aspartate) receptors rather than by metabotropic glutamate receptors was not affected. Further insights into the function of FMRP have been obtained using the drosophila FMRI homolog, DFXR (also called DFMR1) (Morales et al. 2002). The protein product, DFXR, is an RNA binding protein capable of interacting with itself and the mammalian counterparts, thus implying some degree of functional conservation. The study has found that DFXR regulates neurite branching and extension in the developing brain. In wildtype flies, neurites branch in a regular array, forming a stereotypical gridlike structure. In the mutants, small ectopic branches were observed. The gain of DFXR function also affects neurite morphology. High levels of DFXR cause a complete failure of axon extension. This is qualitatively very similar to the loss-of-function mutant, but more severe. These findings suggest a dose-dependent role of DFXR in neurite growth. Also studies in mammalian brains suggest that FMRP may be involved in dendritic spine maturation, as both MR patients and FMRI knockout mice exhibit an immature spine morphology (Weiler et al. 1997). Autopsy results indicate that forebrain synapses in fragile X patients exhibit a thin, elongated neurite morphology and a reduced synaptic contact size in electron microscopy. This suggests that synaptically regulated synthesis of FMRP may be involved in dendritic spine maturation, and it is not unreasonable to postulate that this process may be impaired in cases of fragile X syndrome. Although the specific function of FMRP that leads to spine and neurite abnormalities has not been determined, FMRP has been shown to bind to ribosomes. Polyribosomal aggregates are seen in dendritic spines, particularly during development and synaptogenesis, and protein translation occurs in developing synapses in response to neurotransmitter stimulation. It has been also shown that a rat homologue of FMRP is produced at synapses and its translation is accelerated upon mGluR1 stimulation
10.3 CNGM of Mental Retardation
223
(Weiler et al. 1997). Thus FMRP may be involved in regulating translation of proteins at the synapse, and its absence might impede this synthesis and consequently impair synaptic maturation. 10.3.2 Discussion and Future Developments It may not be surprising that in genetically caused MR, mutated genes code
for proteins involved in structural synaptic plasticity and neural morphogenesis. These processes are considered to be the material basis of learning and long-term memory as well as behavior of animals and humans. Thus the CNGM of MR can be developed along the lines of synaptic plasticity models described in chapters 2 and 11 of this book. Both models, one for the sigmoid (linear) neurons and the other one for a spiking neuron, work with the notion of a movable synaptic modification LTD/LTP threshold. In fact the theory and model of cortical synaptic plasticity described in chapter 2 and in (Benuskova, Rema et al. 2001) accounts for an impaired synaptic plasticity in the cortex of the rat model of mental retardation. The explanatory variable is the synaptic modification LTD/LTP threshold that is set to high values in the model of the impaired cortex. The same mathematical operation can work for an explanation of an increase in LTD observed in the FMRI-KO mice (Huber et al. 2002). Albeit causes and underlying molecular processes are different (prenatal alcohol versus FMRI gene deletion) manifestations at the synaptic plasticity levels are the same, i.e. enhancement of LTD. More detailed dynamical models in relation to the internal dynamics of the underlying GRN can concentrate on the design of the complex functional relationship for the fast and slow component of the movable synaptic modification LTD/LTP threshold. Another CNGM approach to modeling MR in future can be using biological models of neurite outgrowth that are based on dynamic concentrations of relevant proteins as developed by (Kiddie et al. 2005). Protein concentrations in this model can be made functions of gene expressions of the underlying GRN. In such a way the CNGM can lead to explanation of aberrant neurite extension and branching by means of altered gene/protein dynamics. Similar strategy for modeling impaired neural development can be applied to modify the neurogenetic model of normal neural development in Drosophila (Marnellos and Mjolsness 2003). That is the normally functioning model GRN can be modified to include gene deletions and/or mutations to see what the consequences upon neural development would be.
224
10Applications of CNGM and Future Development
All these models then can be used to seek corrections at the genetic and molecular level that should be applied to alleviate damage to the neural system in MR.
10.4 CNGM of Brain Aging and Alzheimer Disease The aging of the human brain is the major risk for Alzheimer's disease and a cause of cognitive decline in the elderly (Lu et al. 2004). As the world population grows older, understanding the genetic and environmental factors that contribute to aging is becoming very important. Evolutionary biologists have sought to formulate theories as to why organisms age. Current views on aging range from the very complicated genetic picture to those who seek one or few major mechanisms that will explain why we age. Cellular aging is accompanied by a decline in metabolic activity, decreased resistance to stress, gene dysregulation, decreased rates of protein synthesis and degradation. Attractive hypotheses for a unitary basis for cellular aging include telomere shortening, accumulation of somatic mutations in mitochondria, oxidative stress and cellular damage by free radicals, errors in DNA replication and repair, and altered rates and accuracy of protein synthesis (Mackay 2000). Further, the one simple factor that leads to increased longevity in mammals is dietary restriction (Sinclair and Guarente 2006). Changes in gene expression with advancing age in brain tissue are consistent with inflammatory and oxidative stress responses and are attenuated by caloric restriction. In skeletal muscle, aging results in a gene expression pattern indicative of a marked stress response and lower expression of metabolic and biosynthetic genes. Most of these changes in gene expression are prevented or ameliorated by caloric restriction (Sinclair and Guarente 2006). Thus it is not resolved yet whether there are numerous mechanisms of aging, which are modulated by hundreds of gene effects, depending upon gene-gene and gene-environmental interactions or whether there is only a single major mechanism of aging, which is modulated by only a few major gene effects (Mackay 2000). Perhaps the most likely answer is that many viewpoints are to some extent correct, and the real question becomes one of determining all of the potential genetic mechanisms and the extent to which each contributes to genetic variation in aging. Understanding the molecular effects of aging in the brain will reveal processes that lead to age-related brain dysfunction, like Alzheimer's disease. More than 12 million individuals worldwide have Alzheimer's disease (AD), and it accounts for most cases of dementia that are diagnosed
lOA CNGM of BrainAging and Alzheimer Disease
225
after the age of 60. AD is clinically characterized by a global decline of cognitive function that progresses slowly and leaves end-stage patients bedridden, incontinent and dependent on custodial care. Death occurs, on average, 9 years after diagnosis. All of the currently used drugs are of limited benefit, because they have only modest symptomatic effects. Other drugs are used to manage mood disorder, agitation and psychosis in later stages of the disease, but no treatment with a strong disease-modifying effect is currently available (Citron 2004). A large body of evidence suggests that oxidative stress results in DNA damage that subsequently leads to changes in gene expression and aging of tissues. The time in life when brain ageing begins is undefined. Transcriptional profiling of 11,000 genes from the human frontal cortex from individuals ranging from 26 to l06 years of age defined a set of genes with reduced expression after age 40 (Lu et al. 2004). These genes play central roles in synaptic plasticity, vesicular transport and mitochondrial function. This is followed by induction of stress response, antioxidant and DNA repair genes. DNA damage is markedly increased in the promoters of genes with reduced expression in the aged cortex. DNA damage may reduce the expression of genes involved in learning, memory and neuronal survival, thus initiating a program of brain aging that starts early in adult life. It was interesting to note that middle age group ranging in age from 45-71 exhibited much greater heterogeneity in gene expression than other age groups, with some cases resembling the young group and others resembling the aged group. These results suggest that a genetic signature of human cortical aging may be defined starting in young adult life, and that the rate of age-related change may be heterogeneous among middle age individuals. Genes that playa role in synaptic function and the plasticity that underlies learning and memory were among those most significantly affected in the ageing human cortex (Lu et al. 2004). Several neurotransmitter receptors that are centrally involved in synaptic plasticity showed significantly reduced expression after age 40, including the GluRl AMPA receptor subunit, the NMDA R2A receptor subunit, and subunits of the GABA A receptor. Moreover, the expression of genes that mediate synaptic vesicle release and recycling was significantly reduced notably VAMPl/synaptobrevin, synapsin II, RAB3A and SNAPs. Members of the major signal transduction systems that mediate long-term potentiation (LTP) and memory storage were age-downregulated, notably the synaptic calcium signaling system, with reduced expression of calmodulin 1 and CAM kinase IIa. The major calcium-binding proteins calbindins land 2, the calcium pump ATP2B2, and the calcium-activated transcription factor MEF2C that promotes neuronal survival, were also significantly reduced. Furthermore, multiple members of the protein kinase C (PKC) and Ras-
226
10Applications of CNGM and Future Development
MAP kinase signaling pathways showed decreased expression. In addition, genes involved in vesicular/protein transport showed reduced expression in the aged cortex, including multiple RAB GTPases, sortilin, dynein, and clathrin light chain. Moreover, microtubule-associated proteins (MAPIB, MAP2, tau and kinesin IB) that stabilize microtubules and promote axonal transport were consistently and robustly reduced. The p35 activator of cyclindependent kinase- 5 (cdk5), which regulates intraneuronal protein trafficking and synaptic function, was also significantly reduced. A number of genes involved in protein turnover also showed reduced expression in aged cortex, including ubiquitin conjugating enzymes, the lysosomal proton pump, and the enzymes D-aspartate O-methyltransferase and methionine adenosyltransferase II, which repair damaged proteins. The aging of the human frontal cortex was also associated with increased expression of genes that mediate stress responses and repair. These included genes involved in protein folding (heat shock protein 70 and a crystallin), antioxidant defense (nonselenium glutathione peroxidase, paraoxonase and selenoprotein P) and metal ion homeostasis (metallothioneins IB, IG and 2A). Genes involved in inflammatory or immune responses, such as tumor-necrosis factor (TNF)-a, were also increased. Increased expression of the base-excision repair enzymes 8-oxoguanine DNA glycosylase and uracil DNA glycosylase is consistent with increased oxidative DNA damage in the aged cortex. Analogical changes were observed in the brain of aging mice (Jiang et al. 2001). Interestingly, the genes playing the role in learning and memory whose expression was affected by aging was oppositely affected by exposure of mice to an enriched environment. Brain tissue of AD patients is characterized by two main findings: (1) senile plaques made of fragmented brain cells surrounded by amyloid family proteins, and (2) tangles of cytoskeleton filaments within cells (Pastor and Goate 2004). Two main disease mechanism are based on the involvement of two proteins - amyloid-B (AB) and tau - in AD pathology. A13 is the main constituent of senile plaques - one of the key pathological characteristics of AD. Tau is the main component of neurofibrillary tangles, the other hallmark lesion of AD. Genetic and pathological evidence strongly supports the amyloid cascade hypothesis of AD, which states that amyloid1342 (A1342), a proteolytic derivative of the large transmembrane protein amyloid precursor protein (APP), has an early and vital role in all cases of AD (Citron 2004). A1342 forms aggregates that are thought to initiate the pathogenic cascade, leading ultimately to neuronal loss and dementia. Although A1342 is constitutively produced, for a long time it was assumed that only A13 that is deposited in compact neuritic plaques is neurotoxic. However, more recent data indicate that soluble oligomeric species also
lOA
CNGM of BrainAging and Alzheimer Disease
227
have toxic properties. Therefore it is now thought that the formation of toxic aggregates is the first truly pathological step in the disease. The amyloid cascade is initiated by the generation of AB42. In familial early onset AD, AB42 is overproduced owing to pathogenic mutations. In sporadic AD, various factors can contribute to an increased load of AB42 oligomers and aggregates. Amyloid-B oligomers might directly injure the synapses and neurites of brain neurons, in addition to activating microglia and astrocytes. Tau pathology, which contributes substantially to the disease process through cytoskeletal tangles, is triggered by AB42. AB is generated proteolytic ally from a large precursor molecule, APP, by the sequential action of two proteases, B-secretase and y-secretase. A third protease, a-secretase, which competes with B-secretase for the APP substrate, can preclude the production of AB by cleaving the peptide in two. This outline immediately points to three strategies to reduce AB: inhibition of B-secretase, inhibition of y-secretase and stimulation of a-secretase. However, these enzymes have other substrate proteins as well, thus the potential side effects of this approach are not known (Citron 2004). After brain aging, family history is the second-greatest risk factor for AD. AD appears to follow an age-related dichotomy: early-onset familial AD (EOFAD) and late-onset AD (LOAD) without obvious familial segregation. Rare and highly penetrant early-onset EOFAD mutations in different genes are transmitted in an autosomal dominant fashion, while LOAD is thought to be explained by the common disease/common variant (CD/CV) hypothesis. According to this theory common disorders are also governed by common DNA variants (such as single nucleotide polymorphisms). These variants significantly increase disease risk but are insufficient to actually cause a specific disorder. Current empirical and theoretical data support this hypothesis, although there remains great uncertainty as to the number of the underlying risk factors and their specific effect sizes. EOFAD represents only a small fraction of all AD cases (-5%) and typically presents with onset ages younger than 65 years, showing autosomal dominant transmission within affected families. To date, more than 160 mutations in 3 genes have been reported to cause EOFAD. These include the AB precursor protein (APP) on chromosome 21, presenilin 1 (PSEN1) on chromosome 14, and presenilin 2 (PSEN2) on chromosome 1 (Bertram and Tanzi 2005). The most frequently mutated gene, PSEN1, accounts for the majority of AD cases with onset prior to age 50. While these AD-causing mutations occur in 3 different genes located on 3 different chromosomes, they all share a common biochemical pathway, i.e., the altered production of AB leading to a relative overabundance of the AB42 species, which eventually results in neuronal cell death and dementia. An
228
10Applications of CNGM and Future Development
up-to-date overview of disease-causing mutations in these genes can be found at the Alzheimer Disease & Frontotemporal Dementia Mutation Database (Alzheimer disease & frontotemporal dementia mutation database 2006). LOAD, on the other hand, is classically defined as AD with onset at age 65 years or older and represents the vast majority of all AD cases. While segregation and twin studies conclusively suggest a major role of genetic factors in this form of AD, to date, only 1 such factor has been established, the £4 allele of the apolipoprotein E gene on chromosome 19q13 (APOE) (Bertram and Tanzi 2005). The risk effect of APOE-£4 has been consistently replicated in a large number of studies across many ethnic groups. In addition to the increased risk exerted by the s-l-allele, a weak, albeit significant, protective effect for the minor allele, £2, has also been reported in several studies. Unlike the mutations in the known EOFAD genes, APOE£4 is neither necessary nor sufficient to cause AD but instead operates as a genetic risk modifier by decreasing the age of onset in a dose-dependent manner. Despite its long-known and well-established genetic association, the biochemical consequences of APOE-£4 in AD pathogenesis are not yet fully understood but likely encompass AB-aggregation/clearance and/or cholesterol homeostasis. Numerous additional LOAD loci and probably also EOFAD loci remain to be identified, since the 4 known genes together account for probably less than 50% of the genetic variance of AD (Table lOA). It is currently unclear how many of these loci will prove to be risk factors as opposed to causative factors. As candidates for the former, more than three dozen genes have been significantly associated with AD (Cacabelos et al. 1999). However, no single gene has been shown to be a risk factor with even nearly the same degree of replication or consistency as has APOE-£4. An up-to-date overview of the status of these and other potential AD candidate genes, including metaanalyses across published association studies, can be found at the Alzheimer Research Forum genetic database (Alzheimer research forum genetic database of candidate genes 2006). The table can also include genes involved in learning and memory, synaptic function, intraneuronal protein trafficking, protein turnover, genes that mediate stress responses and repair, genes involved in inflammatory and immune responses, etc. (Cacabelos et al. 1999, Jiang et al. 2001, Citron 2004, Bertram and Tanzi 2005). In the future CNGM, each gene and protein can be linked to the corresponding molecular pathway and cellular function using the Kyoto Encyclopedia of Genes and Genomes (KEGG) (KEGG pathway database 2006). The next step will be to link protein levels to neuronal parameters in the neural network model. One brain region that is severely affected by AD processes is hippocampus. There are models of
10.5 CNGM of Parkinson Disease
229
hippocampal functioning (Rolls and Treves 1998) that can be utilized to include internal gene networks. Several computational models of AD have been proposed, e.g. (Hom et al. 1996, Hasselmo 1997, Devlin et al. 1998), which can be also exploited for linking mutated proteins to neuronal parameters and the investigations of altered dynamics of gene expressions and the consequences upon deterioration of neural functions like learning and memory formation in AD. Table 10.4. Maingenes involved in Alzheimer'sdisease (AD) for future CNGM Location / Gene symbol Protein and presumed function
21q21 / APP l4q24 / PSEN1 1q31 / PSEN2 19q13 / APOE
A~ precursor proteinis mutated, whichresults in alteredproduction and aggregation of A~ Altered A~ production Altered A~ production Apoliprotein E. It is a risk factor, the role of which is unknown. Alterations in AB-aggregationiclearance and/orcholesterol homeostasis are likely.
10.5 CNGM of Parkinson Disease Parkinson disease (PD) is the second most common neurodegenerative disease of adult onset. Histopathologically, it is characterized by a severe loss of dopaminergic neurons in the substantia nigra and cytoplasmic inclusions consisting of insoluble protein aggregates (Lewy bodies). These malformations lead to a progressive movement disorder including the classic triad of tremor, bradykinesia, and rigidity, with an average onset age between 50 and 60 years. Genetics has played a major role in elucidating the causes of nigrostriatal neuronal loss across a wide spectrum of clinically and histopathologically heterogeneous PD cases. As in AD, there appears to be an age-dependent dichotomy: the majority of individuals with an early or even juvenile onset show typical Mendelian inheritance. However, unlike in AD, these cases show a predominantly autosomal-recessive mode of inheritance, and there is an ongoing debate as to whether genetic factors play any substantial role in contributing to disease risk in cases with onset beyond approximately 50 years. Notwithstanding these uncertainties, mutations in at least five genes have now been shown to cause familial early-onset parkinsonism (Bertram and Tanzi 2005). These genes include genes for a-synuclein (SNCA or PARKl), parkin (PRKN or PARK2), DJ-l or PARK7, PTEN-induced putative kinase I (PINKI or PARK6), and leucine-rich repeat kinase 2 or dardarin (LRRK2 or
230
10Applications of CNGM and Future Development
PARK8), with several other linkage regions pending characterization and/or replication. The locus PARKI on chromosome 4q21 involves the protein that is the major constituent of one of the classic neuropathological hallmarks of the disease, i.e., a-synuclein, which can be found at the core of Lewy bodies. While the exact mechanisms underlying a-synuclein toxicity currently remain only incompletely understood, recent evidence suggests that some SNCA mutations may change normal protein function quantitatively rather than qualitatively, via duplication or triplication of the a-synuclein gene. Another mutation occurs at LRKK2 locus on chromosome 12q12. While the functional consequences of LRRK2 mutations are still unknown, it was suggested they could interfere with the protein's kinase activity. While changes in SNCA and LRRK2 are the leading causes of autosomaldominant forms of PD, the majority of affected pedigrees actually show a recessive mode of inheritance. The most frequently involved gene in recessive Parkinsonism is parkin (PRKN) on chromosome 6q25, which causes nearly half of all early-onset PD cases. Parkin is an ubiquitin ligase that is involved in the ubiquitination of proteins targeted for degradation by the proteasomal system. The spectrum of parkin mutations ranges from amino acid-changing single base mutations to complex genomic rearrangements and exon deletions, which probably result in a loss of protein function. It has been speculated that this may trigger cell death by rendering neurons more vulnerable to cytotoxic insults, e.g., the accumulation of glycosylated a-synuclein. In addition to parkin mutations, genetic analyses on other pedigrees revealed two independent, homozygous mutations in DJl on chromosome Ip36. Both mutations result in a loss of function of DJ-I, a protein that is suggested to be involved in oxidative stress response. While several studies have independently confirmed the presence of DJ-l mutations in other PD cases, the frequency of disease-causing variants in this gene is estimated to be low (~l %). Additional PD-causing mutations were discovered in PINKI. This enzyme is expressed with particularly high levels in brain, and the first two identified mutations were predicted to lead to a loss of function that may render neurons more vulnerable to cellular stress, similar to the effects of parkin mutations. While Lewy bodies are typically not found in brains of patients bearing parkin mutations, it is currently unclear whether these are present in PD cases with mutations in DJl and PINKl. At least 6 additional candidate PD loci have been described, including putative disease-causing mutations in the ubiquitin carboxyterminal hydrolase Ll (UCHLl) on chromosome 4p14, and in a nuclear receptor of subfamily 4 (NR4A2 or NURR1) located on 2q22 (Bertram and Tanzi 2005). These genes may actually be susceptibility factors rather
10.5 CNGM of Parkinson Disease
231
than causal PD genes. Table 10.4 lists major genes involved in PD for future constructions of CNGM. Table 10.5. Main genes involved in Parkinson disease (PD) for future CNGM Location / Gene symbol Protein and presumed function 4q21 / SNCA Protein u-synuclein leads to neurotoxicity by aggregation 12q12 / LRRK2 Enzyme Leucine-rich repeatkinase 2, function unknown 6q25/PRKN Parkin mutation leads to impaired proteindegradation via proteasome Ip36 / DJ1 Protein DJ-l, mutation leads to impaired oxidative stress response PTEN-induced putativekinase 1, mutation may lead to Ip36 / PINKI mitochondrial dysfunction While a number of whole-genome screens across several late-onset PD family samples have been performed, only a few overlapping genomic intervals have been identified (Bertram and Tanzi 2005). One of the more extensively studied regions is 17q21, near the gene encoding the microtubule-associated protein tau (MAPT). Previously, it had been shown that rare missense mutations in MAPT lead to a syndrome of frontotemporal dementia with Parkinsonism (FTD with Parkinsonism linked to chromosome 17), but to date no mutations have been identified as causing Parkinsonism without frontotemporal degeneration. Despite the lack of evidence for genetic linkage to chromosome 19q13, variants in APOE have also been tested for a role in PD and related syndromes. Across the nearly 3 dozen different studies available to date, some authors report a significant risk effect of APOE-£4 for PD, while others only see association with certain PD phenotypes or even a risk effect of the £2 allele. In addition to the findings in autosomal-dominant familial PD, there is also some support for a potential role of SNCA variants on the risk for late-onset PD (Bertram and Tanzi 2005). In the future CNGM, each gene and protein can be linked to the corresponding molecular pathway and cellular function using molecular pathways in Kyoto Encyclopedia of Genes and Genomes (KEGG pathwaydatabase 2006). In addition, new ways of combining electrophysiological and molecular single neuron analysis with gene-expression profiles in the same neuron are being developed and bring first results (Liss and Roeper 2004). After constructing the gene and protein regulatory network for PD, the next step will be the model of affected neural circuits and functions (Bear et al. 2001). The main brain region affected in PD is substantia nigra (i.e.
232
10Applications of CNGM and Future Development
black substance). Substantia nigra (SN) is an elongated nucleus located medial to the basis pedunculi throughout the rostrocaudal extent of the midbrain. Although a component of the brainstem, SN is considered along with the basal ganglia because it has reciprocal connections with basal ganglia and is functionally related to them. The SN is divided into two components that have different connections and distinct neurotransmitters. A more ventral part is called the substantia nigra pars reticulata (SNr) and a dorsal part is called the substantia nigra pars compacta (SNc). Neurons of the SNc use dopamine (DA) as a neurotransmitter and project primarily to the neostriatum. They also contain the black pigment neuromelanin; a product like dopamine produced from tyrosine, hence the name black substance. Neurons in the SNr project principally to the thalamus (ventral anterior, ventral lateral, and dorsomedial nuclei) but also to brain stem nuclei (superior colliculus, pedunculopontine nucleus) and use GABA as the neurotransmitter. The SNr also receives striatal input that uses GABA (and substance P) as transmitters and is inhibitory. The SNr is a subnucleus that synthesizes GABA. The basal ganglia together with cerebellum modify movements generated by motor cortex on a minute-to-minute basis. Motor cortex sends information to both, and both structures send information right back to cortex via the thalamus. The output of the cerebellum is excitatory, while the basal ganglia are inhibitory. The balance between these two systems allows for smooth, coordinated movements, and a disturbance in either system will show up as a movement disorder, like for instance the PD triad. There are several computational models of the basal ganglia system that can be exploited for future CNGM development, i.e. (Terman et al. 2002, Frank 2005, Leblois et al. 2006).
10.6 Brain-Gene Ontology The explosion of biomedical data and the growing number of disparate data sources are exposing researchers to a new challenge - how to acquire, represent, maintain and share knowledge from large and distributed databases in the context of rapidly evolving research. In the biomedical domain, for instance, the problem of knowledge discovery from biomedical data and making biomedical knowledge and concepts sharable over applications and reusable for several purposes is both complex and crucial. It is central to support the decision in the medical practice as well as to enable comprehensive knowledge-acquisition by medical research communities and molecular biologists involved in biomedical discovery.
10.6 Brain-Gene Ontology
233
Ontology is defined in the artificial intelligence literature as a specification of a conceptualization. Ontology specifies at a higher level the classes of concepts that are relevant to the domain and the relations that exist between these classes. Ontology captures the intrinsic conceptual structure of a domain. For any given domain, its ontology forms the heart of the knowledge representation. Cell functions through the use of its genes to produce proteins and although each cell within an organism will usually contains the same set of genes there are significant differences in how the genes are utilized between them. Research from structural genomics towards functional genomics is calling for availability of new biological knowledge. The development of reliable automated sequencing techniques made possible the growth of genomics as a commonplace element of modern biology. The genome sequencing projects have provided tremendous information about the central dogma of molecular biology and helps in understanding the proper functioning of organism in much better way. Microarray technology makes use of the sequence resources created by the genome projects and other sequencing efforts to answer the questions which genes are expressed in a particular cell type of an organism, at a particular time, under particular conditions. For instance, they allow comparison of gene expression between normal and diseased (e.g. epileptic, schizophrenic, etc) cells. DNA microarray consists of an orderly arrangement of DNA fragments representing the genes of a particular organism and assessment transcription profiles at the genomic scale can be achieved using this technology. Microarray analysis permits scientists to detect thousands of genes in a small sample simultaneously and to analyze the expression of those genes. Integrating the domain of discovery outputs from different experimentation techniques with extracted appropriate biological and medical knowledge, there is a scope to discover new metabolic pathways and modeling of the metabolic and regulatory networks in living organisms, and ultimately to understand the pathogenesis of various diseases. In this context the biggest problem today we are faced with is an overwhelming array of nomenclature for genes, proteins, drugs and even diseases. And thus biomedical community is suffering from communication problem and the ability to use resources to search the vast sources of information more effectively, i.e. to extract appropriate meanings. Many researchers and databases use (at least partially) idiosyncratic terms and concepts for representing biological information. It has been observed that often, terms and definitions differ between groups, with different groups not infrequently using identical terms with different meanings. The concept 'gene', for example, is used with different semantics by the major international genomic databases.
234
10Applications ofCNGM and Future Development
Ontology can provide conceptual framework and factual knowledge that is necessary to deal critically with the rapidly changing science of biology. Ontology is one way to provide a semantic repository of systematically ordered relevant concepts in molecular biology. Such repositories can be used, for example, to bridge the different notions in various databases by explicitly specifying the meaning of and relation between fundamental concepts. Thus in general, ontology permits researchers to define and share domain-specific vocabularies. Gene Ontology (GO), for example, has been used to "produce a controlled vocabulary that can be applied to all organisms even if knowledge of genes and proteins is changing". GO is the basis for systems that address the problem of linking biology knowledge and literature facts. However, in addition to research-based literature the amount of data produced daily by medical information systems and medical decision support systems is growing at a staggering rate. We must consider that scientific biomedical information can include information stored in the genetic code, but also can include experimental results from various experiments and databases, including patient statistics and clinical data. Large amounts of information and knowledge are available in medicine. Making medical knowledge and medical concepts shared over applications and reusable for different purposes is crucial. Biomedical ontologies should provide conceptuallinks between data from seemingly disparate fields. This might include, for example, the information collected in clinical patient data for clinical trial design, geographical and demographic data, epidemiological data, drugs, and therapeutic data, as well as from different perspectives as those collected by nurses, doctors, laboratory experts, research experiments and so on. At the same time the framework should reuse and integrate as many as possible different ontologies. The ontologies should integrate terminologies, as well as domain specific ontologies, such as disease ontologies and GO, in order to support the knowledge discovery process. The project Brain-Gene Ontology (BGO) that runs at KEDRI AUT in New Zealand aims to build multi-dimensional biomedical informatics ontology, able to share knowledge from different sources and experiments undertaken across aligned research communities in order to connect areas of science seemingly unrelated to the area of immediate interest. BGO is a frame-based ontology developed using Protege environment (Protege 2006) (see Fig. 10.2). Protege includes knowledge acquisition tools that allow domain expert and ontology engineers to built and refine the knowledge representation at the same time that populates instances in the knowledge base. BGO is primarily focused on the gene-disease relationship. We are representing graphically, the relationships in a way that enables visualization and creation of new relationships. We are planning to integrate the
10.7Summary
235
data mining techniques available in the NeuCom (NeuCom 2006) that was also developed at KEDRI AUT. NeuCom is a complete software package for generic data processing with emphasis on evolving connectionist modeling techniques. It contains a set of tools for data visualization, normalization, analysis, feature extraction, clustering and modeling with cross validation and genetic algorithm with a consistent and easy to use interface. It is designed to help the user to extract knowledge from any given dataset and, if the user desires, develop prediction or classification models on it. The NeuCom modules can adapt to new incoming data in an on-line incremental, lifelong learning mode, and can extract meaningful rules that would help people discover new knowledge in their respective fields.
...... . -_. ·-
, ~X
~
·.-...-- , ....... .. . ..............-.....-......
.. .
o-.~ ..eo-
•
.. . 0.. ..,.
.. • e-........... ..-..; - .
"
._--. . . . . .o-.
• __..-=-....-. ,1 ~
• e a.-..JIo-I'o .. • ~J~
. •c:ao-.......• "'-S--
., ~
~
..
Fig. 10.2. Snapshot from the BGO developed at KEDRI (www.kedri.info). See ColorPlate 11
10.7 Summary The book introduces a novel computational approach to brain neural network modeling that integrates dynamic gene networks with a neural network model. The bridging system is the protein network, in which individual proteins that are coded for by genes are related to neuronal parameters.
236
10Applications ofCNGM and Future Development
Interaction of genes in neurons affects the dynamics of the whole neural network through neuronal parameters, which are no longer constant, but change as a function of gene expression. Through optimization of the gene interaction network, initial gene/protein expression values and ANN parameters, particular target states of the neural network operation can be achieved. A particular ANN model and its observable output should be chosen based on the concrete neural system and a particular problem, which one wants to model. Principles of building such models can be found for instance in (Rolls and Treves 1998). Although we speak about brain diseases and functions, having in mind mammals and higher vertebrates, the approach suggested in this book is applicable also to simpler organisms such as Drosophila to aid the explanation of genetic basis of their behavior. Another system that can be used to validate our approach could be the brain slices or cell cultures taken from normal and genetically modified animal brain.
Appendix 1
A.1 Table of Genes and Related Brain Functions and Diseases Most of the genetic disorders of the human neural system featured in this Appendix are the direct result of a mutation in one gene. Even in those cases, the exact molecular processes leading to disease symptoms may not be fully understood yet, as is the case of the Huntington's disease (Cattaneo et al. 2001). However, one of the most difficult problems ahead is to find out how genes contribute to diseases that have a complex pattern of inheritance, such as in the cases of mental diseases like schizophrenia, depression, Parkinson's disease, etc. In all these cases, no one gene has the decisive power to say whether a person has a disease or not. It is likely that more than one gene mutation is required before the disease is manifest, and a number of genes may each make a subtle contribution to a person's susceptibility to a disease; genes may also affect how a person reacts to environmental factors like life stress (Caspi et al. 2003). The following Table A.I.I contains the list of human brain diseases, mutated genes / location on chromosomes, affected brain cell functions and resulting symptoms. The main source for Table A.I.l was the public domain chapter The Nervous System in Genes and Disease (Genes and disease 2005). From the site of Genes and Disease you can follow the links into many online related resources with free and full access including the human genome sites to see the location of the genes implicated in each disorder. You can also find related gene sequences in different organisms. For the very latest information, you can search for complete research articles via PubMed, and look into other books in the NCBI electronic bookshelf.
238
Appendix I
Table A.l.l. Genes and related brain functions and diseases in humans Disease
Mutations of genes identified so far / chromosome location if known (protein role if known)
Brain abnor- Symptoms mality
Adrenoleukodystrophy (ALD)
ADL gene (affects the function of the fatty acid enzyme) / unknown chromosome
Myelin sheath Progressive neu- (McGuinon nerve fibers rological disabil- ness et al. in the brain is ity and death. 2003) lost, and the adrenal cortex degenerates.
References
Alcoholism CREB gene (role in synaptic plasticity associated with drug addictive behaviors) / unknown chromosome
Compromised CREB signaling in nucleus accumbens and amygdala.
Compulsive and (Pandey uncontrolled pat- 2004) tern of alcohol drinking in spite of the adverse consequences of its abuse.
AlcoholCHRM2 gene / 7 ismdepression
Attention, learning, memory and cognition are affected.
People are at risk (Wang, for developing Hinrichs et both diseases, al. 2004) alcoholism linked to depression.
Alzheimer PSEN2 (early-onset AD)/lq31 disease PSEN I (early onset (AD) AD) /14q24 APP gene (earlyonset AD) / 21q21 APOE-e4 (late-onset AD)/19qI3
Plaques made of fragmented brain cells surrounded by amyloidfamily proteins, tangles of cytoskeleton filaments.
Progressive inability to remember facts and events and, later, to recognize friends and family.
(Citron 2004, Pastor and Goate 2004, Bertram and Tanzi 2005)
A.l Table of Genes and Related Brain Functions and Diseases
239
Disease
Mutations of genes identified so far / chromosome location if known (protein role if known)
Brain abnor- Symptoms mality
References
Amyotrophic lateral sclerosis (ALS)
SODI / 2lq22 (prevents superoxide radicals attack from the cell inside)
Progressive degeneration of motor neuron cells in the spinal cord and brain.
Loss of motor control which ultimately results in paralysis and death.
(Bertram and Tanzi 2005, Federici and Boulis 2005)
Mutations in UBE3A disrupt protein break down during brain development.
Mental retardation, abnormal gait, speech impairment, seizures, frequent laughing, smiling and excitability.
(Buiting et al. 2003, Wang, Liu et al. 2004)
Angelman UBE3A / deletion of syndrome segment l5q ll-q13 (AS) on maternally derived chromosome 15 (protein degradation)
Anorexia nervosa (AN)
Susceptibility locus / Severely reObsessive fear lp stricted eating of weight gain, extremely low body weight
(Grice et al. 2002)
Ataxia te- ATM gene / 11 (aflangiecta- fects DNA damage sia (A-T) response)
Cerebellar de- Lack of balance (Watts et al. generation and slurred 2002) speech, immunodeficiency, radiosensitivity, predisposition to cancer.
Attention DATI /5p15.3 deficit hyperactivity DRD4/ llp15.5 disorder (ADHD)
Brain dopaminergic neurotransmission is compromised.
Trouble keeping attention on tasks or play activities, easily distracted and forgetful, excessive senseless aggressive hyperactivity.
(Chen et al. 2003, Langley et al. 2004)
240
Appendix 1
Disease
Mutations of genes identified so far / chromosome location if known (protein role if known)
Brain abnor- Symptoms mality
Autism
Genes related to GABAergic and serotonergic neurotransmission, early developmental genes affecting embryonic development of CNS / unknown chromosome(s)
Brain inhibi- Communication (McIntosh tory and sero- problems, inabil- 1998, Veenstratonergic trans- ity to read the mind of others, Vanderweele mISSIOn IS et al. 2004) compromised; social impairment, and unthe immune usual or repetisystem is weakened to tive behaviors. allow viral infections to damage the brain during embryonic development.
Bulimia nervosa (BN)
Susceptibility locus on chromosome lOp
Severe distur- Binge-eating, bances in eat- self-induced ing behavior. vomiting.
References
(Bulik et al. 2003)
Charcot- Type IA PMP22 /11, MarieTooth dis- Type 1B Cx32 / X, ease (CMT) ge- Type 1C EMP2 / 16 netically heterogeneous group of diseases
Peripheral neuropathy resulting from insufficient myelin coating of peripheral nerves.
Slowly progres- (Street et al. 2002) sive degeneration of the musc1es in the foot, lower leg, hand, forearm and a mild loss of sensation
Cockayne CSA/5 syndrome CSB / unknown chromosome (components of the transcriptional machinery and DNA repair)
Loss of transcriptioncoupled repair of DNA in all cells in the body.
Premature aging (Spivak of the brain and 2004) the whole body, short stature, sensitive to sunlight.
A.l Table of Genes and Related Brain Functions and Diseases Disease
Mutations of genes Brain abnor- Symptoms identified so far mality / chromosome location if known (protein role if known)
241
References
Creutzfeld Gene encoding the -Jacob dis- prion protein PrP ease (CJD) (PRNP) / 20p13
Neurodegeneration with spongiosis and amyloid plaques consisting ofPrP.
(Bertram Rapidly proand Tanzi gressing neurodegeneration 2005) leading to death.
Deafness
Cx26 / 13
Congenital deafness.
Deafness is not (Sabag et al. accompanied by 2005, Yang other symptoms. et al. 2005)
Depression (linked to 19 different genetic regions)
1-2 copies of the short allele of the 5HTT promoter / unknown chromosome
Serotonergic Stressful life (5-HT) neuro- events (loss, transmission in threat, humiliathe brain is tion, defeat) lead compromised. to clinical depression.
(Caspi et al. 2003, Zubenko et al. 2003, Zubenko et al. 2004)
Epilepsy (at least 8 forms of epilepsy possessing some genetic basis)
Many genes coding for ion channels, neurotransmitter receptors, transcription factors, etc.
Abnormal cell Recurring seifiring in the zures brain originating in different parts of the brain.
(Gardiner 1999, Meisler et al. 2001, George 2004, Steinlein 2004)
Fragile X expansion of the Defects in syndrome CGG motif in FMRI neurite density / X (function unand morpholknown, FMRI protein ogy and synbinds RNA) aptic plasticity.
The most common inherited form of mental retardation.
(Weiler et al. 1997, Huber et al. 2002, Morales et al. 2002)
242
Appendix 1
Disease
Mutations of genes identified so far / chromosome location if known (protein role if known)
Frontotemporal dementia (FTD)
Microtubule associ- Protein accuated protein tau gene mulates in neurofibrillary (MAPT) / l7q2l tangles within neurons to cause malfunction.
Brain abnor- Symptoms mality
First personality and behavioral changes, later forgetfulness, disorientation, confusion, like in AD.
Hunting- Expansion of CAG in Dilatation of Degenerative ton disease HD gene for protein ventricles and neurological disease that leads to atrophy of (HD) Huntigtin /4p16 caudate nudementia and death. cleus.
References
(Zhukareva et al. 2003, Bertram and Tanzi 2005)
(Cattaneo et al. 2001, Bertram and Tanzi 2005, RaIser et al. 2005)
HPRTl gene / X (af- Accumulation Painful deposits (Fairbanks et Leschal. 2002) fects recycling of of uric acid of uric acid in Nyhan the skin, kidney syndrome purines) and bladder, (LNS) self-mutilation, mental retardation, muscle weakness. Lewy body Gene for Leucine-rich dementia repeat kinase 2 (LBD) (LRRK2) /12q12 a-synuclein gene (SNCA) / 4q2l
Maple Syrup Urine Disease (MSUD)
Six gene loci encoding for the BCKDH enzyme / unknown chromosome
Presence of cortical and subcortical Lewy bodies, amyloid plaques and neurofibrillary tangles
Progressive cog- (Bertram nitive impairand Tanzi ment, recurrent 2005) visual hallucinations, and Parkinsonism.
Disruption of the metabolism of certain amino acids.
Progressive neu- (Jouvet et al. rodegeneration 2000) leading to death within the first months of life.
A.l Table of Genes and Related Brain Functions and Diseases Disease
Mutations of genes Brain abnor- Symptoms identified so far maUty / chromosome location if known (protein role if known)
243
References
Severe cerebral (Pase et al. degeneration and 2004) arterial changes, resulting in death in infancy.
Menkes X-linked (can not syndrome transport copper)
Disruption of the cells' ability to absorb copper.
Myotonic Gene, found on dystrophy chromosome 19
Muscles con- Mental defi(Ranum and tract but have ciency, hair loss Day 2004) decreasing and cataracts. power to relax.
Narcolepsy Unknown
Low levels of hypocretins (orexins), suggesting the loss of the brain cells that secrete hypocretin.
Falling into a (Mieda et al. deep sleep at any 2004) time, drowsiness, sudden weakness of the muscles that leads to collapse.
Neurofi- NF2 gene / 22 (tumor Impairment of brake on cell bromato- suppressing gene) sis, type 2 growth and division. (NF-2)
Development of (Rong et al. malignant brain 2004) tumors and benign tumors on both auditory nerves.
Niemann- Gene on chromosome Pick type 18 (plays a role in C (NP-C) cholesterol homeostasis
Brain and nerv- (Ko et al. ous system im- 2003, Ko et pairment leading al. 2005) to death.
Excessive build-up of cholesterol inside cells, causing ill processmg.
244
Appendix 1 Brain abnor- Symptoms mality
References
Parkinson a-synuclein gene disease (SNCA) /4q21 (PD) parkin gene (PRKN) / 6q25 DJl / 1p36 gene for PTENinduced putative kinase 1 (PINK1) / 1p36 gene for dardarin (LRRK2) /12q12
Neurotoxicity by aggregation of asynuclein, impaired protein degradation via parkin, impaired oxidative stress response via DJ1, mitochondrial dysfunction.
Presence of an inclusion bodies, called the Lewy bodies, in basal ganglia resulting in tremor, muscular stiffness and difficulty with balance and walking.
(Bertram and Tanzi 2005, Eckert and Eidelberg 2005)
Phenylke- Both alleles of the gene for phenylatonuria lanine hydroxylase (PKU) (PAH) /12 (converts phenylalanine to tyrosine)
Concentration of phenylalanine in the body builds up to toxic levels.
Mental retarda- (Kaufman tion, organ dam- 1999, Knerr age, unusual et al. 2005) posture.
Disease
Mutations of genes identified so far / chromosome location if known (protein role if known)
PraderWilli syndrome (PWS)
Absence of segment 11-13 on the long arm of the paternally derived chromosome 15 (mRNA processing)
Impaired in- Mental retardatermediate step tion, decreased between DNA muscle tone, transcription short stature, and protein emotionallabilformation. ity and an insatiable appetite.
Refsum disease
PAHX gene / 10 (required for the metabolism of phytanic acid)
Impaired PAHX -phytanic acid hydrolase
(Wang, Liu et al. 2004, Schule et al. 2005)
(Brink et al. Degenerative 2003) nerve disease (peripheral neuropathy), ataxia, retinitis pigmentosa (vision disorder), bone and skin changes
A.l Table of Genes and Related Brain Functions and Diseases
245
Disease
Mutations of genes identified so far / chromosome location if known (protein role if known)
Brain abnor- Symptoms mality
References
Rett syndrome (RTT)
MeCP2 gene / X (controls gene expression)
It is not clear
Loss of purposewhat the ful use of hands mechanism is. and speech, reduced muscle tone, wringing hand movements, autisticlike behavior and seizures.
(Zoghbi 2003, Glaze 2005, Zoghbi 2005)
Schizophrenia (SCH) (Many genes may be involved)
COMT Val allele (decreases prefrontal dopamine), myelination and growth-related genes, G72/ l3q34 (interacts with the gene for D-amino acid oxidase (DAAO) on l2q24 (NMDAR signaling )
Impairment of prefrontal cortex function, disruption in oligodendrocyte function, disorders of neural development.
Personality changes, disorder of thoughts (delusions), perceptory phenomena (hallucinations), degradation of ability to perform daily functions.
(Egan et al. 2001, Cloninger 2002, Sugai et al. 2004, Barnett et al. 2005)
Spinal Muscular Atrophy (SMA)
Mutation of both alleles of the survival motor neuron gene (SMNl) /5
Death of spinal motor neurons and subsequent muscle paralysis.
Neuromuscular disorder that may result in death from respiratory failure (type I), reduced life expectancy (type II), inability to walk (types II and III).
(Federici and Boulis 2005, Gabanella et al. 2005)
Degeneration of the spinal cord and the cerebellum.
Loss of muscle coordination, spasticity.
(RaIser et al. 2005)
Spinocere- Expansion of a CAG triplet in the SCA1 bellar gene /6 (function unataxia known)
246
Appendix 1
Disease
Mutations of genes Brain abnor- Symptoms identified so far mality / chromosome location if known (protein role if known)
References
Tay-Sachs Mutations in both alleles of a gene HEXA disease / 15 (helps to degrade a lipid called GM2 ganglioside)
Excessive accumulation of the GM2 ganglioside in neurons.
Paralysis, dementia, blindness and early death. Neuron dysfunction and psychosis in chronic adult form.
Tuberous sclerosis
Growth of cells proceeds in an unregulated fashion, resulting in tumor formation.
Benign, tumor- (Inoki et al. like nodules of 2005) the brain and/or retinas, skin lesions, seizures and/or mental retardation
TSCI /9 TSC2/16 (tumor suppressor functions)
Williams Deletion of genes for Deletion of LIM kinase syndrome LIM kinase and leads to imelastin / 7 pairment of visuospatial cognition.
Wilson's disease
ATP7B /13 (metabo- Copper acculism of copper) mulation and toxicity to the liver and brain.
High competence in language, mUSIC and interpersonal relations, with low IQ, heart and vein problems.
(Rajavel and Neufeld 2001, Martino et al. 2005)
(Basalyga et aI. 2004, Soosairajah et al. 2005)
Liver and neuro- (Cater et al. logical disease, 2004) cornea of the eye can also be affected.
Appendix 2
A.2 A Brief Overview of Computational Intelligence Methods Computational Intelligence (CI) is the area of developing generic intelligent information processing methods and systems with wider applications. CI methods, in its majority, are inspired by the human intelligence. They are characterized by learning, generalization, adaptation, pattern recognition, rule extraction, knowledge representation, which are characteristics of the living systems too. The methods of CI include: • Probabilistic and statistical learning methods (e.g. Bayesian classifiers, data clustering, Markov Models, Support Vector Machines (SVM), transductive SVM and SVM trees). • Rule-based systems (propositional logic dated back to Aristotle) and fuzzy systems (introduced by Zadeh (Zadeh 1965, Zadeh 1979)). • Artificial neural networks. • Evolutionary computation, genetic algorithms, particle swarm intelligence and other artificial life approaches. • Quantum computation and nanotechnology. • Hybrid systems (e.g. knowledge-based neural networks; neuro-fuzzy systems; neuro-fuzzy-genetic systems; evolving connectionist systems). A.2.1 Probabilistic and Statistical Methods
These methods are based on probability of event estimation and their statistical analysis. Bayesian methods are among the most popular (Pang 2004). They are based on the Bayesian probability that represents the conditional probability between two events C and A (Bradley and Louis 1996):
248
Appendix 2
p(A IC)= p(A IC)p(A) p(C)
(A.2.l)
Sometimes, using the Bayesian formula involves difficulties, mainly concerning the evaluation of the prior probabilities peA), p( C), p( C IA). Bayesian networks are used to describe relationships between genes in a genetic regulatory network (GRN) (Baldi and Long 2001, Hartemink et al. 2001). Unlike other approaches such as clustering, a Bayesian network can describe arbitrary combinatorial control of gene expression and thus it is not limited to pair-wise interactions between genes. Due to their probabilistic nature, Bayesian networks are robust in the face of both imperfect data and imperfect models. Most importantly, the models are biologically interpretable and can be scored rigorously against observational data. A very popular statistical technique for discovering patterns in data is clustering. Based on a measured distance between instances (objects, points, vectors) from the problem space, groups of close instances can be defined. These groups are called clusters. They are defined by their cluster centers and the membership of the data points to them. A centre c, of a cluster C, is defined as an instance, the mean of the distances to which from each instance in the cluster, is less than its distance to another cluster centre. Let us have a set X of p data items represented in an n-dimensional space. A clustering procedure results in defining k disjoint subsets (clusters), such that every data item (n-dimensional vector) belongs to one only cluster. A cluster membership function M is defined for each of the clusters Cr, Cz,...,
c.
M, : X ---+ {O,l}
(A.2.2)
1' XE C, M.(x) = { O,xrteC, ,
where x is a data instance (vector) from X. A significant characteristic of clustering is how distance between vectors is measured. The distance between two data points in an ndimensional geometrical space can be measured in several ways, e.g.: • Hamming distance: (A.2.3) • Euclidean distance: (A.2.4)
A.2 A Brief Overview of Computational Intelligence Methods
249
A special type of clustering is called fuzzy clustering in which clusters may overlap, so that each of the data instances may belong to each of the clusters to a certain degree (Bezdek 1981). The procedure aims at finding the cluster centers Vi (i = 1,2, ..., c) and the cluster membership functions jJi, which define to what degree each of the n examples belong to the /h cluster. The number of clusters c is either defined a priori (supervised type of clustering), or chosen by the clustering procedure (unsupervised type of clustering). The result of a clustering procedure can be represented as a fuzzy relation jJi,h such that:
1.
L>u =
2.
all clusters equals 1); >0, for each i = 1,2, ..., c (there are no empty clusters).
1,
for each k = 1,2, ..., n; (the total membership of an instance to
Lfl"
Probabilistic and statistical methods are used widely in Bioinformatics tools and systems for biological data analysis and modeling. Regression and discrimination analysis, along with Support Vector Machines (SVM) (Vapnik 1998) are popular techniques to build a classifier function. The idea of SVM is to transform original data into higher dimensional feature space via a kernel computation, and to construct a separating hyper plane with maximum margin between the class samples (see Fig. A.2.1 as an example). These kernel functions could be polynomial, radial basis, linear, etc.
Fig. A.2.t. Support vector machines define vectors (called support vectors) at the border betweenthe samples of two classes
SVM are widely used for gene expression and protein expression data classification and profiling (Leslie et al. 2002, Pang and Kasabov 2004, Wang et al. 2006).
250
Appendix2
Stochastic Models Stochastic models deal with the dynamic history of each object of the model. In other word, for each object the next state must be calculated using a set of probabilistic rules. Each rule shows the probability for object to be changed in particular interval of time, and probability to come to each state. So, the change of state in this type of model is probabilistic, not deterministic. Let us assume that object x in the system has a finite state space with L states (like in a kinetic logic model ): {Xi, Xl, , Xr} . For each time step tHI there is a transition probability P(xHdxo, ,Xk) and a chain xo, ..., X k represents the history of the system. Variables X k form Markov chain if and only if for any k: (A.2.5) In other words, a future state depends only on the current state . All probability values P(xilxj) that represent the probability for the system to move from the i h to the state form a transition matrix. Suppose that a system can move to state Xi at time t, with a transition rate ~. The probability of the system to move to the lh state at the time tk
l
IS: L
(A.2.6)
i A-"A - L.J k j =!
The formula for calculating a next time point depends on the distribution of the move s tk+ i - tk, and, for instance, in the case of an exponential process, the next time point is the following: tH i = t k - In(r) /2, where r, is a random value uniformly distributed in (0, l). Stochastic models are used for modeling gene regulatory networks (GRN) (Baldi and Hatfield 2002 , Roberts et al. 2005).
A.2.2 Boolean and Fuzzy Logic Models
Boolean Models k
Consider the set of N objects at time h {t/ , t/ , ..., tn } and each object can be in only two different states: on/off, 1/0, False/True, etc. For simplicity, let us assume:
A.2 A Brief Overview of Computational Intelligence Methods
x;
E
{O,l}
i = 1,...,N
251
(A.2.7)
The state of the system at a time moment can be described as the states of all objects in this set. The state of a given object at the next time step tHI can be determined by a Boolean logic function (returning only two values: 0 or 1) that takes as an input the current state of the system: (A.2.8)
A Boolean function B = {B I , B2, ••• , BN}can be represented as a truth taN ble which consists of all possible system states (2 ) . This function represents relations between all system's states and can be represented as a diagram. Examples of Boolean functions are given below: (A.2.9)
where: I, logical OR; and ~, logical NOT. Boolean methods are used for gene regulatory network modeling to represent the expression of a gene (1 expressed; or 0 not expressed) and the connection between the genes (1- excitatory, and -1 - inhibitory) (Somogyi et al. 2001). Boolean models are very limited in terms of representing the "grayness" in biological systems and the "smoothness" in the interaction of its elements.
Fuzzy Logic Models Fuzzy logic is a logic system that is based on fuzzy relations and fuzzy propositions, the latter being defined on the basis of fuzzy sets (Zadeh 1965). A fuzzy set is a set defined by a membership function to which each domain value can belong to any degree of membership between 0 and 1 and not just (1 (belong) and 0 (does not belong) as in ordinary sets. A variable that can take as values symbolic concepts, such as Small, Medium, High, each defined by their fuzzy membership function, is called fuzzy variable. Fuzzy propositions are propositions which contain fuzzy variables with their fuzzy values. The truth value of a fuzzy proposition " X is A " is given by the membership function JiA of the fuzzy value A. Fuzzy relations link two fuzzy sets in a predefined manner. Fuzzy relations make it possible to represent ambiguous relationships, like: "the expressions of the genes in cluster 2 and cluster 3 are similar", or "model A performed more or less better than model B", or "the more the gene X is expressed, the higher the risk of cancer".
252
Appendix 2
Fuzzy logic allows for representation of "grayness", "smoothness", inexactness, flexibility, mobility of relations and concepts, which is important when dealing with biological data and concepts. Examples are: "high/low activity of a brain segment", "high/low gene expression values", "strong/week binding between proteins", "more or less similar structures", "fast growing culture", and many more. Short
Medium
Tall
0.7
OL..--"'-----"';:O".......::;.---L..--.,;:----......... 30 170 250 Height (em)
Fig. A.2.2. Fuzzy concepts 'short', 'medium' and 'tall' defined by membership functions
Afuzzy model is represented usually as a set of fuzzy rules and an inference algorithm. An example of a set of two general fuzzy rules, where each rule has two fuzzy input variables xl and x2 and one fuzzy output variable is: Rule r.: IF xl is Short (DIll) and x2 is Short (DI2I)
(A.2.l0)
THEN Output is Short (CF1) Rule r{ IF xl is Tall (DIl) and x2 is Tall (DI2})
(A.2.II)
THEN Output is Tall (CF})
'Short' and 'tall' are fuzzy concepts defined by their respective fuzzy membership functions and DI and CF are degree of importance (membership) and certainty factors respectively (see Fig. A.2.2). These rules are facilitated in the ANN structure shown in Fig. A.2.3 and explained later in the next section.
A.2 A Brief Overview of Computational Intelligence Methods
-,
253
FuzzyOLtputs ..........
-,
-.
!
-, \.
Fig. A.2.3. Fuzzy rules 1.10 and 1.11 implemented in EFuNN architecture
A.2.3 Artificial Neural Networks
Artificial neural networks (A1\TN) (connectionist systems) are computational models that mimic vaguely the nervous system in its main functions of adaptive learning and generalization (Amari 1990). They are universal computational models so that any algorithm or function can be realized as an ANN model. Moreover, A1\TN can learn functions from data without specifying the type of the function. They are called model-free estimators . ANNs provide a model of computation that is different from traditional algorithms. Typically, they are not explicitly programmed to perform a given task; rather, they learn to do the task from examples of desired input/output behavior. The networks automatically generalize their processing knowledge into previously unseen situations, and they perform well for the noisy, incomplete or inaccurate input data. In general view, artificial neural network is the model consisting of interconnected units evolving in time. A connection between units i and j is usually characterized by a weight denoted as w ij ' There are three important architectures ANN based on the connectivity: • Recurrent (contains direct loops from output units (nodes) back to input nodes). • Feedforward (contains no direct loops). • Layered (units organized into layers and connections are between layers).
254
Appendix 2
The behavior of each unit in time can be described by a time-dependent function, or a stochastic process, or a Bayesian formula, etc. So, i h unit receives total input x from the units connected to it and generates a response based on an activation function. When this function is a threshold function 1,X> O
(A.2.12)
j(x) = { O,x :O:;O
the unit is called a threshold gate and can generate only binary decisions. ANN can implement different machine learning techniques and hence the variety of the ANN architectures. Many of these architectures are known as "black boxes" as they do not facilitate revealing internal relationships between inputs and output variables of the problem in an explicit form. But for the process of knowledge discovery, having a "black box" learning machine is not sufficient. A learning system should also facilitate extracting useful information from data for the sake of a better understanding and learning of new knowledge. The knowledge-based ANNs (KBANNs) have been developed for this purpose. They combine the strengths of different AI techniques, e.g. ANN and rule-based systems, or fuzzy logic. Evolving connectionist systems (ECOS) have been recently developed to facilitate both adaptive learning in an evolving structure and knowledge discovery (Kasabov 2003). ECOS are modular connectionist-based systems that evolve their structure and functionality in a continuous, self-organized, on-line, adaptive, interactive way from incoming information; they can process both data and knowledge in a supervised and/or unsupervised way. Learning is based on clustering in the input space and on function estimation for this cluster in the output space. Prototype rules can be extracted to represent the clusters and the functions associated with them. Different types of rules are facilitated by different ECOS architectures, such as evolving fuzzy neural networks (EFuNN) (see Fig. A.2.3), dynamic neuro-fuzzy inference systems (DENFIS), etc. An ECOS structure grows and "shrinks" in a continuous way from input data streams. Feedforward and feedback connections are both used in the architectures. The ECOS are not limited in number and types of inputs, outputs, nodes, connections. A simple learning algorithm of a simplified version of EFuNN called ECF (Evolving Classifying Function) is given in next section. Evolving Classifier Function (ECF)
The learning algorithm for the ECF ANN:
A.2 A Brief Overview of Computational Intelligence Methods
255
1. Enter the current input vector from the data set (stream) and calculate the distances between this vector and all rule nodes already created using Euclidean distance (by default). If there is no node created, create the first one that has the coordinates of the first input vector attached as input connection weights. 2. If all calculated distances between the new input vector and the existing rule nodes are greater than a maximum-radius parameter Rmax , a new rule node is created. The position of the new rule node is the same as the current vector in the input data space and the radius of its receptive field is set to the minimum-radius parameter Rmin ; the algorithm goes to step 1; otherwise it goes to the next step. 3. If there is a rule node with a distance to the current input vector less then or equal to its radius and its class is the same as the class of the new vector, nothing will be changed; go to step I; otherwise: 4. If there is a rule node with a distance to the input vector less then or equal to its radius and its class is different from those of the input vector, its influence field should be reduced. The radius of the new field is set to the larger value from the two numbers: distance minus the minimum-radius; minimum-radius. New node is created as in 2 to represent the new data vector. 5. If there is a rule node with a distance to the input vector less than or equal to the maximum-radius, and its class is the same as of the input vector's, enlarge the influence field by taking the distance as a new radius if only such enlarged field does not cover any other rule nodes which belong to a different class; otherwise, create a new rule node in the same way as in step 2, and go to step 1. Recall procedure (classification of a new input vector) in a trained ECF: 1. Enter the new input vector in the ECF trained system. If the new input vector lies within the field of one or more rule nodes associated with one class, the vector is classified in this class. 2. If the input vector lies within the fields of two or more rule nodes associated with different classes, the vector will belong to the class corresponding to the closest rule node. 3. If the input vector does not lie within any field, then take m highest activated by the new vector rule nodes, and calculate the average distances from the vector to the nodes with the same class; the vector will belong to the class corresponding to the smallest average distance. ECOS have been used for different tasks, including gene expression modeling and profile discovery (see the next section), GRN modeling, protein data analysis, brain data modeling, etc. (Kasabov 2003).
256
Appendix 2
A.2.4 Methods of Evolutionary Computation (EC)
EC methods are inspired by the Darwinian theory of evolution. These are methods that search in a space of possible solutions for the best solution of a problem defined through an objective function (Goldberg 1989). EC methods have been used for parameter estimation or optimization in many engineering applications. Unlike classical derivative-based (like Newton) optimization methods, EC is more robust against noise and multi-modality in the search space. In addition, EC does not require the derivative information of the objective function and is thus applicable to complex, blackbox problems. Several techniques have been developed as part of the EC area: genetic algorithms (GA), evolutionary strategies, evolutionary programming, particle swarm optimization, artificial life, etc., the GA being the most popular technique so far. GA is an optimization technique aiming at finding the optimal values of parameters ("genes") for the "best" "individual" according to a pre-defined objective function (fitness function). A GA includes the following steps: • GAL Create a population ofN individuals, each individual being represented as a "chromosome" consisting of values (alleles) of parameters called "genes". • GA2. Evaluate the fitness of each individual towards a pre-defined objective function. If an individual achieves a desired fitness score, or alternatively - the time for running the procedure is over, the GA algorithm STOPS. • GA3. Otherwise, select a subset of "best" individuals using pre-defined selection criteria (e.g. top ranked, roulette-wheel, keep the best individuals through generations, etc. • GA4. Crossover the selected individuals using a crossover ("mating") technique to create a new generation of a population of individuals. • GA5. Apply mutation using a mutation technique. Go to GA2. GA is a heuristic and non-deterministic algorithm. It can give a close to optimal solution depending on the time of execution. For a large number of parameters ("genes in the chromosome") it is much faster than an exhaustive search and much more efficient. Representing real genes, or other biological variables (proteins, binding strengths, connection weights, etc) as GA "genes", is a natural way to solve difficult optimization tasks in CI. For this reason GAs are used for several tasks in this book and also in the proposed CNGM.
Appendix 3
A.3 Some Sources of Brain-Gene Data, Information, Knowledge and Computational Models - Allen Institute and the Allen Brain Atlas: http://www.alleninstitute.org - Alzheimer disease & frontotemporal dementia mutation database: http://www.molgen.ua.ac.be/admutations - Alzheimer research forum genetic database of candidate genes: http://www.alzforum.org/ - Blue Brain Project: http://bluebrainproject.epfl.ch/index.html - Brain-Gene Ontology: http://www.kedri.info/ - Brain models at USC: http://www-hbp.usc.edu/Projects/bmw.htm - Brain models: http://ttb.eng.wayne.edu/brain/ - Cancer gene expression data: http://wwwgenome.wi.mit.edu/MPRlGCM.html - eMedicine: http://www.emedicine.com/ - Ensemble Human Gene View: http://www.ensembl.org/Homo_sapiens/index.html - Epilepsy: http://www.epilepsy.com/epilepsy/epilepsy_brain.html - European Bioinformatics Institute EEl: http://www.ebi.ac.uk - ExPASy (Expert Protein Analysis System) Proteomics Server: http://www.expasy.org/ - Genes and disease: http://www.ncbi.nlm.nih.gov/books/ - Gene Expression Atlas: http://expression.gnf.org/cgi-bin/index.cgi - GeneCards (integrated database of human genes): http://www.genecards.org/index.html - GeneLoc (presents an integrated map for each human chromosome): http://bioinfo2.weizmann.ac.illgeneloc/index.shtml - How Stuff Works: http://health.howstuffworks.com/brainl.htm - KEGG (Kyoto Encyclopedia of Genes and Genomes): http://www.genome.jp/kegg/
258
Appendix 3
- MathWorld - A Wolfram Web Resource: http://mathworld.wolfram.com/DelayDifferentialEquation.htmI - NCBI Genbank: http://www.ncbi.nlm.nih.gov/Genbank/index.html - Neural Micro Circuits Software: http://www.1sm.tugraz.at - Neuro-Computing Decision Support Environment (NeuCom): http://www.aut.ac.nz/researchlresearch_institutes/kedri/research_centres /centre_for_novel_methods_oCcomputational_intelligence/neucom.htm - The Brain Guide: http://www.omsusa.org/pranzatelli-Brain.htm - The National Society for Epilepsy: http://www.e-epilepsy.org.ukl
References
Abbott LF, Nelson SB (2000) Synaptic plasticity: taming the beast. Nature Neurosci 3:1178-1183 Abraham WC, Christie BC, Logan B, Lawlor P, Dragunow M (1994) Immediate early gene expression associated with the persistence of heterosynaptic longterm depression in the hippocampus. Proc Nat! Acad Sci USA 91:1004910053 Abraham WC, Bear MF (1996) Metaplasticity: the plasticity of synaptic plasticity. Trends Neurosci 19(4):126-130 Abraham WC, Tate WP (1997) Metaplasticity: a new vista across the field of synaptic plasticity. Prog Neurobiol 52(4):303-323 Abraham WC, Mason-Parker SE, Bear MF, Webb S, Tate WP (2001) Heterosynaptic metaplasticity in the hippocampus in vivo: a BCM-like modifiable threshold for LTP. Proc Nat! Acad Sci USA 98(19):10924-10929 Abraham WC, Logan B, Greenwood JM, Dragunow M (2002) Induction and experience-dependent consolidation of stable long-term potentiation lasting months in the hippocampus. J Neurosci 22(21):9626-9634 Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithm. Machine Learning 6:37-66 Al-Rabiai S, Miller MW (1989) Effect of prenatal exposure to ethanol on the ultrastructure of layer V of mature rat somatosensory cortex. J Neurocytology 18:711-729 Albus JS (1975) A new approach to manipulator control: The cerebellar model articulation controller (CMAC). Trans of the ASME: Journal of Dynamic Systems, Measurement, and Control 27:220-227 Alzheimer disease & frontotemporal dementia mutation database (2006), http://www.molgen.ua.ac.be/admutations. Human Genome Variation Society Alzheimer research forum genetic database of candidate genes (2006), http://www.alzforum.org/ Amaral DG, Witter MP (1989) The three-dimensional organization of the hippocampal formation: a review of anatomical data. Neuroscience 31:571-591 Amari S (1967) A theory of adaptive pattern classifiers. IEEE Trans on Electronic Computers 16:299-307 Amari S (1990) Mathematical foundations of neuro-computing. Proc IEEE 78:1143-1163 Amari S, Kasabov N (eds) (1998) Brain-like computing and intelligent information systems, Springer, Singapore
260
References
Arbib M (1972) The metaphorical brain. An introduction to cybernetics as artificial intelligence and brain theory. John Wiley & Sons, New York Arbib M (1987) Brains, machines and mathematics. Springer, Berlin Arbib M (ed) (2003) The handbook of brain theory and neural networks, ed 2, MIT Press, Cambridge, MA Armstrong-James M, Callahan CA (1991) Tha1amo-cortica1 processing of vibrissal information in the rat. II. Spatiotemporal convergence in the thalamic ventroposterior medial nucleus (VPm) and its relevance to generation of receptive fields ofSl cortical "barrel" neurones. J Comp Neuro1303:211-224 Armstrong-James M, Callahan CA, Friedman MA (1991) Thalamo-cortical processing of vibrissal information in the rat. I. intracortical origins of surround but not centre-receptive fields of layer IV neurones in the rat S1 barrel field cortex. J Comp Neuro1303:193-210 Armstrong-James M, Diamond ME, Ebner FF (1994) An innocuous bias in whisker sensation modifies receptive fields of adult rat barrel cortex neurons. J Neurosci 11(14):6978-6991 Arnold SE, Trojanowski JQ (1996) Recent advances in defining the neuropathology of schizophrenia. Acta Neuropathol (Berl) 92(3):217-231 Artola A, Brocher S, Singer W (1990) Different voltage-dependent threshold for inducing long-term depression and long-term potentiation in slices of rat visual cortex. Nature 347:69-72 Artola A, Singer W (1993) Long-term depression of excitatory synaptic transmission and its relationship to long-term potentiation. Trends Neurosci 16(11):480-487 Bailey CH, Kandel ER, Si K (2004) The persistence of long-term memory: a molecular approach to self-sustaining changes in learning-induced synaptic growth. Neuron 44:49-57 Bak P, Tang C, Wiesenfe1d K (1987) Self-organized criticality: an explanation of l/fnoise. Phys Rev Lett 59:381-384 Baldi P, Brunak S (2001) Bioinformatics. The machine learning approach, ed 2nd. MIT Press, Cambridge, MA Baldi P, Long AD (2001) A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes. Bioinformatics 17(6):509-519 Baldi P, Hatfield GW (2002) DNA microarrays and gene expression: from experiments to data analysis and modeling. Cambridge Univ. Press, Cambridge, UK Barnett KJ, Corballis MC, Kirk IJ (2005) Symmetry of callosal information transfer in schizophrenia: a preliminary study. Schizophr Res 74(2-3):171-178 Basalyga DM, Simionescu DT, Xiong W, Baxter BT, Starcher BC, Vyavahare NR (2004) Elastin degradation and calcification in an abdominal aorta injury model: role of matrix metalloproteinases. Circulation 110(22):3480-3487 Bear MF, Cooper LN, Ebner FF (1987) A physiological basis for a theory of synapse modification. Science 237:42-48 Bear MF (1995) Mechanism for a sliding synaptic modification threshold. Neuron 15(1):1-4
261 Bear MF, Connors BW, Paradiso MA (2001) Neuroscience: exploring the brain, ed 2. Lippincott Williams & Wilkins, Baltimore, MD Beattie EC, Carroll RC, Yu X, Morishita W, Yasuda H, vonZastrow M, Malenka RC (2000) Regulation of AMPA receptor endocytosis by a signaling mechanism shared with LTD. Nature Neurosci 3(12):1291-1300 Beierlein M, Fall CP, Rinzel J, Yuste R (2002) Thalamocortical bursts trigger recurrent activity in neocortical networks: layer 4 as a frequency-dependent gate. J Neurosci 22(22):9885-9894 Benes FM (1989) Myelination of cortical-hippocampal relays during late adolescence. Schizophr Bull 15(4):585-593 Bentley PJ (2004) Controlling robots with fractal gene regulatory networks. In: deCastro L, vonZuben F (eds) Recent developments in biologically inspired computing, vol 1. Idea Group Inc, Hershey, PA, pp 320-339 Benuskova L (1988) Mechanisms of synaptic plasticity. Czechoslovak Physiology 37(5):387-400 Benuskova L, Diamond ME, Ebner FF (1994) Dynamic synaptic modification threshold: computational model of experience-dependent plasticity in adult rat barrel cortex. Proc Natl Acad Sci USA 91:4791-4795 Benuskova L (2000) The intra-spine electric force can drive vesicles for fusion: a theoretical model for long-term potentiation. Neurosci Lett 280( 1):17-20 Benuskova L, Kanich M, Krakovska A (2001) Piriform cortex model of EEG has random underlying dynamics. In: Rattay F (ed) Proc. World Congress on Neuroinformatics. ARGESIM/ASIM-Verlag, Vienna Benuskova L, Rema V, Armstrong-James M, Ebner FF (2001) Theory for normal and impaired experience-dependent plasticity in neocortex of adult rats. Proc Natl Acad Sci USA 98(5):2797-2802 Benuskova L, Abraham WC (2006) STDP rule endowed with the BCM sliding threshold accounts for hippocampal heterosynaptic plasticity. J Comp Neurosci (in press) Benuskova L, Kasabov N, Jain V, Wysoski SG (2006) Computational neurogenetic modelling: a pathway to new discoveries in genetic neuroscience. Intl J Neural Systems 16(3):215-227 Bertram L, Tanzi RE (2005) The genetic epidemiology of neurodegenerative disease. J Clin Invest 115(6):1449-1457 Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York Bi G-q, Poo M-m (1998) Synaptic modification in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J Neurosci 18(24):10464-10472 Biella G, Uva L, Hoffmann UG, Curtis MD (2002) Associative interactions within the superficial layers of the entorhinal cortex of the guinea pig. J Neurophysiol 88(3): 1159-1165 Bienenstock EL, Cooper LN, Munro PW (1982) Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J Neurosci 2(1):32-48
262
References
Bienvenu T, Poirier K, Friocourt G, Bahi N, Beaumont D, Fauchereau F, Jeema LB, Zemni R, Vinet M-C, Francis F, Couvert P, Gomot M, Moraine C, Bokhoven Hv, Kalscheuer V, Frints S, Gecz J, Ohzaki K, Chaabouni H, Fryns J-P, Desportes V, Beldjord C, Chelly J (2002) ARX, a novel Prd-class-homeobox gene highly expressed in the telencephalon, is mutated in X-linked mental retardation. Human Molecular Genetics 11(8):981-991 Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford Biswa1 B, Dasgupta C (2002) Neural network model for apparent deterministic chaos in spontaneously bursting hippocampal slices. Physical Review Letters 88(8):88-102 Bito H, Deisseroth K, Tsien RW (1997) Ca2+-dependent regulation in neuronal gene expression. Curr Opin Neurobio1 7:419-429 Bliss TV, Lomo T (1973) Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbitt following stimulation of perforant path. J Physiol 232(2):331-356 Bliss TVP (1999) Young receptors make smart mice. Nature 401:25-27 Bortolotto ZA, Collingridge GL (1998) Involvement of calcium/ca1modulindependent protein kinases in the setting of a molecular switch involved in hippocampal LTP. Neuropharmacology 37:535-544 Bower JM, Beeman D (1998) The book of GENESIS: exploring realistic neural models with the GEneral NEural SImulation System, ed 2. TELOS/Springer, New York Bradley P, Louis T (1996) Bayes and empirical Bayes methods for data analysis. Chapman & Hall, London Bradshaw KD, Emptage NJ, Bliss TVP (2003) A role for dendritic protein synthesis in hippocampal late LTP. Eur J Neurosci 18(11):3150-3152 Brink DMvd, Brites P, Haasjes J, Wierzbicki AS, Mitchell J, Lambert-Hamill M, Belleroche Jd, Jansen GA, Waterham HR, Wanders RJ (2003) Identification of PEX7 as the second gene involved in Refsum disease. Am J Hum Genet 72(2):471-477 Brocher S, Artola A, Singer W (1992) Agonists of cholinergic and noradrenergic receptors facilitate synergistically the induction of long-term potentiation in slices of rat visual cortex. Brain Res 573:27-36 Brown A, Yates PA, Burrola P, Ortuno D, Ashish V, Jesselt TM, Pfaff SL, O'Leary DDM, Lemke G (2000) Topographic mapping from the retina to the midbrain is controlled by relative but not absolute levels of EphA receptor signaling. Cell 102:77-88 Brown C, Shreiber M, Chapman B, Jacobs G (2000) Information science and bioinformatics. In: N K (ed) Future directions of intelligent systems and information sciences. Springer, pp 251-287 Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Jr., Haussler D (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. PNAS 97(1):262-267 Brown PO, Botstein D (1999) Exploring the new world of the genome with DNA microarrays. Nature 21:33-37
263 Brzustowicz LM, Hodgkinson KA, Chow EWC, Honer WG, Bassett AS (2000) Location of a Major Susceptibility Locus for Familial Schizophrenia on Chromosome 1q21-q22. Science 288(5466):678-682 Buiting K, Gross S, Lich C, Gillessen-Kaesbach G, e1-Maarri 0, Horsthemke B (2003) Epimutations in Prader-Willi and Angelman syndromes: a molecular study of 136 patients with an imprinting defect. Am J Hum Genet 72(3):571577 Bulik CM, Devlin B, Bacanu S-A, Thornton L, Klump KL, Fichter MM, Halmi KA, Kaplan AS, Strober M, Woodside DB, Bergen AW, Ganjei JK, Crow S, Mitchell J, Rotondo A, Mauri M, Cassano G, Keel P, Berrettini WH, Kaye WH (2003) Significant linkage on chromosome lOp in families with bulimia nervosa. Am J Hum Genet 72:200-207 Burnashev N, Rozov A (2000) Genomic control of receptor function. Cellular and Molecular Life Sciences 57: 1499-1507 Cacabelos R, Takeda M, Winblad B (1999) The glutamatergic system and neurodegeneration in dementia: preventive strategies in Alzheimer's disease. Int J Geriat Psychiatry 14:3-47 Cai L, Friedman N, Xie XS (2006) Stochastic protein expression in individual cells at the single molecule level. Nature 440:358-362 Cao Q, Martinez M, Zhang J, Sanders AR, Badner JA, Cravchik A, Markey CJ, Beshah E, Guroff JJ, Maxwell ME, Kazuba DM, Whiten R, Goldin LR, Gershon ES, Gejman PV (1997) Suggestive evidence for a schizophrenia susceptibility locus on chromosome 6q and a confirmation in an independent series of pedigrees. Genomics 43(1): 1-8 Carnevale NT, Hines ML (2006) The NEURON book. Cambridge University Press, Cambridge, UK Carpenter G, Grossberg S (1991) Pattern recognition by self-organizing neural networks. MIT Press, Cambridge, MA Carpenter G, Grossberg S, Markuzon N, Reynolds JH, Rosen DB (1991) Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analogue multi-dimensional maps. IEEE Trans on Neural Networks 3(5):698-713 Carrie A, Jun L, Bienvenu T, Vinet M-C, McDonell N, Couvert P, Zemni R, Cardona A, Buggenhout GV, Frints S, Hamel B, Moraine C, Ropers HH, Strom T, Howell GR, Whittaker A, Ross MT, Kahn A, Fryns J-P, Beldjord C, Marynen P, Chelly J (1999) A new member of the IL-1 receptor family highly expressed in hippocampus and involved in X-linked mental retardation. Nature Genetics 23:25-31 Carroll RC, Beattie EC, vonZastrow M, Malenka RC (2001) Role of AMPA receptor endocytosis in synaptic plasticity. Nature Rev Neurosci 2:315-324 Caspi A, Sugden K, Moffitt TE, Taylor A, Craig IW, Harrington H, McClay J, Mill J, Martin J, Braithwaite A, Poulton R (2003) Influence of life stress on depression: moderation by a polymorphism in the 5-HTT gene. Science 301(5631 ):386-389
264
References
Castellani GC, Quinlan EM, Cooper LN, Shouval HZ (2001) A biophysical model of bidirectional synaptic plasticity: dependence on AMPA and NMDA receptors. Proc Nat! Acad Sci USA 98(22):12772-12777 Cater MA, Forbes J, Fontaine SL, Cox D, Mercer JF (2004) Intracellular trafficking of the human Wilson protein: the role of the six N-terminal metal-binding sites. Biochem J 380(Pt 1):805-813 Cattaneo E, Rigamonti D, Zuccatto C, Squittieri F, Sipione S (2001) Loss of normal huntingtin function: new developments in Huntington's disease. Trends Neurosci 24:182-188 Cavalli-Sforza LL (2001) Genes, people and languages. Penguin Books, London Cavazos JE, Lum F (2005) Seizures and epilepsy: overview and classification. eMedicine.com, Inc., http://www.emedicine.comlneuro/topic415.htm Cerny V (1985) A thermodynamical approach to the travelling salesman problem: an efficient simulation algorithm. Journal of Optimization Theory and Applications 45:41-51 Chalmers DJ (1996) The conscious mind: in search of a fundamental theory. Oxford University Press, Oxford Chan ZSH, Kasabov N, Collins L (2006) A two-stage methodology for gene regulatory network extraction from time-course gene expression data. Expert Systems with Applications 30(1):59-63 Chen CK, Chen SL, Mill J, Huang YS, Lin SK, Curran S, Purcell S, Sham P, Asherson P (2003) The dopamine transporter gene is associated with attention deficit hyperactivity disorder in a Taiwanese sample. Mol Psychiatry 8(4):393-396 Chen L, Aihara K (2002) Stability analysis of genetic regulatory networks with time delay. IEEE Trans on Circuits and Systems - I: Fundamental Theory and Applications 49(5):602-608 Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31(13):3497-3500 Chhabra J, Glezer M, Shkuro Y, Gittens SD, Reggia JA (1999) Effects of callosal lesions in a computational model of single-word reading. In: Reggia JA, Ruppin E, Glanzman DL (eds) Disorders of brain, behavior, and cognition: the neurocomputational perspective. Progress in brain research, vol 121. Springer, New York, pp 219-242 Chin HR, Moldin SO (eds) (2001) Methods in genomic neuroscience. Methods and new frontiers in neuroscience, CRC Press, Boca Raton Cho K, Aggleton JP, Brown MW, Bashir ZI (2001) An experimental test of the role of postsynaptic calcium levels in determining synaptic strength using perirhinal cortex of rat. J Physio1532(2):459-466 Chumakov I, Blumenfeld M, Guerassimenko 0, Cavarec L, Palicio M, Abderrahim H, Bougueleret L, Barry C, Tanaka H, Rosa PL (2002) Genetic and physiological data implicating the new human gene G72 and the gene for Damino acid oxidase in schizophrenia. Proc Nat! Acad Sci USA 99(1367513680)
265 Citron M (2004) Strategies for disease modification in Alzheimer's disease. Nature Rev Neurosci 5(9):677-685 Cloete I, Zurada J (eds) (2000) Knowledge-based neurocomputing, MIT Press, Cambridge, MA Cloninger CR (2002) The discovery of susceptibility genes for mental disorders. Proc Nat! Acad Sci USA 99(21):13365-13367 Clothiaux EE, Bear MF, Cooper LN (1991) Synaptic plasticity in visual cortex: comparison of theory with experiment. J Neurophysiol 66(5): 1785-1804 Cooper LN (1987) Cortical plasticity: theoretical analysis, experimental results. In: Rauschecker JP, Marler P (eds) Imprinting and cortical plasticity. John Wiley & Sons, New York, pp 117-191 Cooper LN, Intrator N, Blais B, Shouval HZ (2004) Theory of cortical plasticity. World Scientific, Singapore Corballis MC (2003) From mouth to hand: gesture, speech, and the evolution of right-handedness. Behav Brain Sci 26: 199-260 Crick F, Koch C (1995) Are we aware of neural activity in primary visual cortex? Nature 375:121-123 Crunelli V, Leresche N (2002) Childhood absence epilepsy: genes, channels, neurons and networks. Nature Rev Neurosci 3(5):371-382 Cybenko G (1989) Approximation by super-positions of sigmoidal function. Mathematics of Control, Signals and Systems 2:303-314 D'Haeseleer P, Wen X, Fuhrman S, Somogyi R (1999) Linear modeling ofmRNA expression levels during CNS development and injury. Proc. Pacific Symposium on Biocomputing, World Scientific, Singapore, pp 41-52 D'Haeseleer P, Liang S, Somogyi R (2000) Genetic network inference: from coexpression clustering to reverse engineering. Bioinformatics 16(8):707-726 Damasio AR (1994) Descartes' error. Putnam's Sons, New York Darwin C (1859) The origin of species by means of natural selection. John Murray, London de Jong H (2002) Modeling and simulation of genetic regulatory systems: a literature review. Journal of Computational Biology 9( I):67-102 DeFelipe J (1997) Types of neurons, synaptic connections and chemical characteristics of cells immunoreactive for calbindin-D28K, parvalbumin and calretinin in the neocortex. J Chern Neuroanat 14:1-19 Deisz RA (1999) GABAB receptor-mediated effects in human and rat neocortical neurones in vitro. Neuropharmacology 38:1755-1766 DelaTorre JC, Barrios M, Junque C (2005) Frontal lobe alterations in schizophrenia: neuroimaging and neuropsychological findings. Eur Arch Psychiatry Clin Neurosci 255(4):236-244 Delorme A, Gautrais J, vanRullen R, Thorpe S (1999) SpikeNET: a simulator for modeling large networks of integrate and fire neurons. Neurocomputing 2627:989-996 Delorme A, Thorpe S (200 I) Face identification using one spike per neuron: resistance to image degradation. Neural Networks 14:795-803 Dennett DC (1991) Consciousness explained. Penguin Books, New York
266
References
Destexhe A (1998) Spike-and-wave oscillations based on the properties of GABAB receptors. J Neurosci 18:9099-9111 Devlin JT, Gonnennan LM, Andersen ES, Seidenberg MS (1998) Categoryspecific semantic deficits in focal and widespread brain-damage: a computational account. J Cogn Neurosci 10(1):77-94 Diamond ME, Armstrong-James M, Ebner FF (1993) Experience-dependent plasticity in adult rat barrel cortex. Proc Nat! Acad Sci USA 90(5):2082-2086 Diamond ME, Petersen RS, Harris JA, Panzeri S (2003) Investigations into the organization ofinfonnation in sensory cortex. J Physiol Paris 97(4-6) :529-536 Dimitrov D, Sidorov I, Kasabov N (2004) Computational biology . In: Rieth M, Sommers W (eds) Handbook of theoretical and computational nanotechnology, vol 1. American Scientific, Los Angeles DiPellegrino G, Fadiga L, Fogassi L, Gallese V, Rizzolatti G (1992) Understanding motor events: a neurophysiological study. Experimental Brain Research 91:176-180 Drager LD, Layton W (1997) Initial value problems for nonlinear nonresonant delay differential equations with possibly infinite delay. Electronic Journal of Differential Equations (24) :1-20 Duch W, Adamczak R, Grabczewski K (1998) Extraction of logical rules from neural networks. Neural Proc Letters 7:211-219 Dudek SM, Bear MF (1993) Bidirectional long-term modification of synaptic effectiveness in the adult and immature hippocampus . J Neurosci 13(7):15181521 Eckert T, Eidelberg D (2005) Neuroimaging and therapeutics in movement disorders. NeuroRx 2(2):361-371 Edelman GM, Tononi G (2000) Consciousness. how matter becomes imagination. Penguin Books, London Egan MF, Goldberg TE, Kolachana BS, Callicott JH, Mazzanti CM, Straub RE, Goldman D, Weinberger DR (2001) Effect ofCOMT Vall 08/158 Met genotype on frontal lobe function and risk for schizophrenia. Proc Nat! Acad Sci USA 98(12):6917-6922 Elgersma Y, Silva AJ (1999) Molecular mechanisms of synaptic plasticity and memory. Current Opinion in Neurobiology 9(2):209-213 Elliott T, Shadbolt NR (1999) A neurotrophic model of the development of the retinogeniculocortical pathway induced by spontaneous retinal waves. J Neurosci 19(18):7951-7970 Enard W, Przeworski M, Fisher SE, Lai CSL, Wiebe V, Kitano T, Monaco AP, Paabo S (2002) Molecular evolution of FOXP2 , a gene involved in speech and language. Nature 418:869-872 Engel AK, Fries P, Konig P, Brecht M, Singer W (1999) Temporal binding, binocular rivarly, and consciousness. Consciousness and Cognition 8:128-151 Evans PD, Gilbert SL, Mekel-Bobrov N, Vallender EJ, Anderson JR, Vaez-Azizi LM, Tishkoff SA, Hudson RR, Lahn BT (2005) Microcephalin, a gene regulating brain size, continues to evolve adaptively in humans Science 309(5741): 1717-1720
267 Fahlman C, Lebiere C (1990) The cascade-correlation learning architecture . In: Touretzky DS (ed) Advances in neural information processing systems, vol 2. Morgan Kaufmann , San Francisco, CA Fairbanks LD, Jacomelli G, Micheli V, Slade T, Simmonds HA (2002) Severe pyridine nucleotide depletion in fibroblasts from Lesch-Nyhan patients . Biochern J 366(pt 1):265-272 Federici T, Boulis NM (2005) Gene-based treatment of motor neuron diseases, Muscle & Nerve, http://www3.interscience.wiley.comicgibin/fulltext/112l17863/HTMLSTART Fedor P, Benuskova L, Jakes H, Majernik V (1982) An electrophoretic coupling mechanism between efficiency modification of spine synapses and their stimulation. Studia Biophysica 92:141-146 Feng R, Rampon C, Tang Y-P, Shrom D, Jin J, Kyin M, Sopher B, Martin GM, Kim S-H, Langdon RB, Sisodia SS, Tsien JZ (2001) Deficient neurogenesis in forebrain-specific Presenilin-l knockout mice is associated with reduced clearance of hippocampal memory traces. Neuron 32:911-926 Fenton GW, Fenwick PBC, Dollimore J, Dunn TL, Hirsch SR (1980) EEG spectral analysis in schizophrenia. Brit J Psychiat 136:445-455 Fisher A, Walker MC, Bowery NG (2003) Mechanisms of action of anti-epileptic drugs. The National Society for Epilepsy, http://www.eepilepsy.org.uk/pages/articles/show_article.cfm?id= 111 Fogel D, Fogel L, Porto V (1990) Evolving neural networks. Bioi Cybernetics 63:487-493 Fogel DB (1995) Evolutionary computation - Toward a new philosophy of machine intelligence. IEEE Press, New York Fogel G, Corne D (2003) Evolutionary computation for bioinformatics. Morgan Kaufmann, San Francisco, CA Frank LM, Brown EN, Wilson MA (2001) A comparison of the firing properties of putative excitatory and inhibitory neurons from CAl and the entorhinal cortex. J Neurophysiol 86(4):2029-2049 Frank MJ (2005) Dynamic dopamine modulation in the basal ganglia : a neurocomputational account of cognitive deficits in medicated and nonmedicated Parkinsonism. J Cogn Neurosci 17:51-72 Fraser HB, Hirsh AE, Giaever G, Kurnn J, Eisen MB (2004) Noise minimization in eukaryotic gene expression. PLOS Biology 2(6):0834-0838 Freeman WJ (2000) Neurodynamics. An exploration in mesoscopic brain dynamics. Springer, London Freeman WJ (2003) Evidence from human scalp EEG of global chaotic itinerancy. Chaos 13(3):1-11 Fries P, Roelfsema PR, Engel AK, Konig P, Singer W (1997) Synchronization of oscillatory responses in visual cortex correlates with perception in interocular rivalry. Proc Nat! Acad Sci USA 94:12699-12704 Frith U (2001) Mind blindness and the brain in autism. Neuron 32:969-979 Fritzke B (1995) A growing neural gas network learns topologies . Advances in Neural Information Processing Systems 7:625-632
268
References
Froemke RC, Poo M-m, Dang Y (2005) Spike-timing-dependent synaptic plasticity depends on dendritic location. Nature 434:221-225 Funihashi K (1989) On the approximate realization of continuous mappings by neural networks. Neural Networks 2: 183-192 Furuhashi T, Nakaoka K, Uchikawa Y (1994) A new approach to genetic based machine learning and an efficient finding of fuzzy rules. Proc. Proc. WWW'94 Workshop, IEEE/Nagoya-University, Nagoya, pp 114-122 Gabanella F, Carissimi C, Usiello A, Pellizzoni L (2005) The activity of the spinal muscular atrophy protein is regulated during development and cellular differentiation. Hum Mol Genet 14(23):3629-3642 Ganesh S, Puri R, Singh S, Mittal S, Dubey D (2006) Recent advances in the molecular basis of Lafora's progressive myoclonus epilepsy. J Hum Genet 51(1):1-8 Gardenfors P (2000) Conceptual spaces. The geometry of thought. MIT Press, Cambridge, MA Gardiner RM (1999) Genetic basis of human epilepsies. Epilepsy Res 36:91-95 Gardiner RM (2003) Molecular genetics of the epilepsies. The National Society for Epilepsy, http://www.eepilepsy.org.uk/pages/articles/show_article.cfm?id=44 Geinisman Y, deToledo-Morrell L, Morrell F (1991) Induction of long-term potentiation is associated with an increase in the number of axospinous synapses with segmented postsynaptic densities. Brain Research 566:77-88 Genes and disease (2005), National Centre for Biotechnology Information (NCBI), The Nervous System, http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=gnd.chapter. 75 George AL (2004) Inherited channelopathies associated with epilepsy. Epilepsy Currents 4(2):65-70 Gerstner W, Kistler WM (2002) Spiking neuron models. Cambridge Univ. Press, Cambridge, MA Glaze DG (2005) Neurophysiology of Rett syndrome. J Clin Neurol 20(9):740746 Gold JI, Bear MF (1994) A model of dendritic spine Ca2+ concentration exploring possible bases for a sliding synaptic modification threshold. Proc Natl Acad Sci USA 91:3941-3945 Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading, MA Granato A, Santarelli M, Sbriccoli A, Minciacchi D (1995) Multifaceted alterations of the thalamo-cortico-thalamic loop in adult rats prenatally exposed to ethanol. Anat Embryol 191:11-23 Gray CM, Konig P, Engel AK, Singer W (1989) Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature 338:334-337 Greenbaum D, Colangelo C, Williams K, Gerstein M (2003) Comparing protein abundance and mRNA expression levels on a genomic scale. Genome Biology 4:117.111-117.118
269 Grice DE, Halmi KA, Fichter MM, Strober M, Woodside DB, Treasure JT, Kaplan AS, Magistretti PJ, Goldman D, Bulik CM, Kaye WH, Berrettini WH (2002) Evidence for a susceptibility gene for anorexia nervosa on chromosome 1. Am J Hum Genet 70:787-792 Grossberg S (1969) On learning and energy - entropy dependence in recurrent and nonrecurrent signed networks. J Stat Phys 1:319-350 Grossberg S (1982) Studies of mind and brain. Reidel, Boston Hakak Y, Walker JR, Li C, Wong WH, Davis KL, Buxbaum JD, Haroutunian V, Fienberg AA (2001) Genome-wide expression analysis reveals dysregulation of myelination-related genes in chronic schizophrenia. Proc Nat! Acad Sci USA 98(8):4746-4751 Hameroff S, Penrose R (1996) Orchestrated reduction of quantum coherence In brain microtubules: a model for consciousness? In: Hameroff SR, Kaszniak AW, Scott AC (eds) Toward a science of consciousness: the first Tucson discussions and debates. MIT Press, Cambridge, MA, pp 507-540 Hartemink AJ, Gifford DK, Jaakkola TS, Young RA (2001) Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks. Proc. Pacific Symnposium on Biocomputing, vol. 6, pp 422-433 Hasselmo MF (1997) A computational model of the progression of Alzheimer's disease. MD Computing: Computers in Medical Practice 14(3):181-191 Hassibi B, Stork DG (1992) Second order derivatives for network prunning: optimal brain surgeon. In: Touretzky DS (ed) Advances in neural information processing systems, vol 4. Morgan Kaufmann, San Francisco, CA, pp 164-171 Hauptmann W, Heesche K (1995) A neural network topology for bidirectional fuzzy-neuro transformation. Proc. FUZZ-IEEE/IFES, IEEE Press, Yokohama, Japan, pp 1511-1518 Hayashi Y (1991) A neural expert system with automated extraction of fuzzy ifthen rules and its application to medical diagnosis. In: Lippman RP, Moody JE, Touretzky DS (eds) Advances in neural information processing systems, vol 3. Morgan Kaufmann, San Mateo, CA, pp 578-584 Haykin S (1994) Neural networks - A comprehensive foundation. Prentice Hall, Engelwood Cliffs, NJ Hebb D (1949) The Organization of Behavior. John Wiley and Sons, New York Heskes TM, Kappen B (1993) On-line learning processes in artificial neural networks) Mathematic foundations of neural networks. Elsevier, Amsterdam, pp 199-233 Hevroni D, Rattner A, Bundman M, Lederfein D, Gabarah A, Mangelus M, Silverman MA, Kedar H, Naor C, Komuc M, Hanoch T, Seger R, Theill LE, Nedivi E, Richter-Levin G, Citri Y (1998) Hippocampal plasticity involves extensive gene induction and multiple cellular mechanisms. J Mol Neurosci 10(2):75-98 Hinton GE (1989) Connectionist learning procedures. Artificial Intelligence 40:185-234 Hinton GE (1990) Preface to the special issue on connectionist symbol processing. Artificial Intelligence 46: 1-4
270
References
Hoffman RE, McGlashan TH (1999) Using a speech perception neural network simulation to explore normal neurodevelopment and hallucinated 'voices' in Shizophrenia. In: Reggia JA, Ruppin E, Glanzman DL (eds) Disorders of brain, behavior, and cognition: the neurocomputational perspective. Progress in brain research, vo112!. Springer, New York, pp 311-325 Holland JH (1975) Adaptation in natural and artificial systems. Univ Michigan Press, Ann Arbor, MI Holland JH (1998) Emergence. Oxford Univ Press, Oxford Holland LL, Wagner JJ (1998) Primed facilitation of homosynaptic long-term depression and depotentiation in rat hippocampus. J Neurosci 18(3):887-894 Honey GD, Bullmore ET, Soni W, Varathesaan M, Williams SC, Sharma T (1999) Differences in frontal activation by a working memory task after substitution of risperidone for typical antipsychotic drugs in patients with schizophrenia. Proc Nat! Acad Sci USA 96(23):13432-13437 Hong SJ, Li H, Becker KG, Dawson VL, Dawson TM (2004) Identification and analysis of plasticity-induced late-response genes. Proc Nat! Acad Sci USA 101(7):2145-2150 Hom D, Levy N, Ruppin E (1996) Neuronal-based synaptic compensation: a computational study in Alzheimer's disease. Neural Computation 8(6):12271243 Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Networks 2:359-366 Huber KM, Gallagher SM, Warren ST, Bear MF (2002) Altered synaptic plasticity in a mouse model of fragile X mental retardation. Proc Nat! Acad Sci USA 99:7746-7750 Hunter L (1994) Artificial intelligence and molecular biology. Canadian Artificial Intelligence 35:10-16 Impey S, Obrietan K, Wong ST, Poser S, Yano S, Wayman G, Deloume JC, Chan G, Storm DR (1998) Cross talk between ERK and PKA is required for Ca2+ stimulation of CREB-dependent transcription and ERK nuclear translocation. Neuron 21:869-883 Inoki K, Ouyang H, Li Y, Guan KL (2005) Signaling by target of rapamycin proteins in cell growth control. Microbiol Mol BioI Rev 69(1):79-100 Ishikawa M (1996) Structural learning with forgetting. Neural Networks 9:501521 Izhikevich EM (2003) Simple model of spiking neurons. IEEE Trans Neural Net 14(6): 1569-1572 Izhikevich EM, Desai NS (2003) Relating STDP to BCM. Neural Computation 15:1511-1523 Jang R (1993) ANFIS: adaptive network-based fuzzy inference system. IEEE Trans on Systems, Man, and Cybernetics 23(3):665-685 Jansen R, Greenbaum D, Gerstein M (2002) Relating whole-genome expression data with protein-protein interactions. Genome Research 12(1):37-46 Jedlicka P (2002) Synaptic plasticity, metaplasticity and the BCM theory. Bratislava Medical Letters 103(4-5):137-143
271 Jensen KF, Killackey HP (1987) Terminal arbors ofaxons projecting to the somatosensory cortex of adult rats. 1. The normal morphology of specific thalamocortical afferents. J Neurosci 7:3529 3543 Jensen 0 (2001) Information transfer between rhytmically coupled networks: reading the hippocampal phase code. Neural Computation 13:2743-2761 Jiang CH, Tsien JZ, Schultz PG, Hu Y (2001) The effects of aging on gene expression in the hypothalamus and cortex of mice. Proc Nat! Acad Sci USA 98(4): 1930-1934 Jouvet P, Rustin P, Taylor DL, Pocock JM, Felderhoff-Mueser U, Mazarakis ND, SarrafC, Joashi U, Kozma M, Greenwood K, Edwards AD, Mehmet H (2000) Branched chain amino acids induce apoptosis in neural cells without mitochondrial membrane depolarization or cytochrome c release: implications for neurological impairment associated with maple syrup urine disease. Mol BioI Cell 11(5):1919-1932 Kaas JH (1997) Topographic maps are fundamental to sensory processing. Brain Res Bull 44(2):107-112 Kandel ER, Schwartz JH, Jessell TM (2000) Principles of neural science, ed 4. McGraw-Hill, New York Kaplan BJ, Sadock VA (2000) Kaplan & Sadock's comprehensive textbook of psychiatry, ed 7. Lippincott Williams & Wilkins, New York Kasabov N (1996a) Foundations of neural networks, fuzzy systems and knowledge engineering. MIT Press, Cambridge, MA Kasabov N (1996b) Adaptable connectionist production systems. Neurocomputing 13(2-4):95-117 Kasabov N (1996c) Learning fuzzy rules and approximate reasoning in fuzzy neural networks and hybrid systems. Fuzzy Sets and Systems 82(2):2-20 Kasabov N (1998) Evolving fuzzy neural networks - Algorithms, applications and biological motivation. In: Yamakawa T, Matsumoto G (eds) Methodologies for the conception, design and application of soft computing. World Scientific, pp 271-274 Kasabov N (2001) Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning. IEEE Trans Syst Man Cybern Part BCybern 31(6):902-918 Kasabov N (2002a) Evolving connectionist systems. Methods and applications in bioinformatics, brain study and intelligent machines. Springer, London Kasabov N (2002b) Evolving connectionist systems for adaptive learning and knowledge discovery: methods, tools, applications. Proc. First International IEEE Symposium on Intelligent Systems, pp 24-28 Kasabov N, Dimitrov D (2002) A method for gene regulatory network modelling with the use of evolving connectionist systems. Proc. ICONIP'2002 - International Conference on Neuro-Information Processing, IEEE Press, Singapore Kasabov N, Song Q (2002) DENFIS: Dynamic, evolving neural-fuzzy inference systems and its application for time-series prediction. IEEE Trans on Fuzzy Systems 10(2):144-154 Kasabov N (2003) Evolving connectionist systems. Methods and applications in bioinformatics, brain study and intelligent machines. Springer, London
272
References
Kasabov N, Benuskova L (2004) Computational neurogenetics. Journal of Computational and Theoretical Nanoscience 1(1):47-61 Kasabov N, Chan ZSH, Jain V, Sidorov I, Dimitrov D (2004) Gene regulatory network discovery from time-series gene expression data - a computational intelligence approach. In: Pal NR, Kasabov N, Mudi RK et al. (eds) Neural Information Processing - II th International Conference, ICONIP 2004 - Lecture Notes in Computer Science, vol 3316. Springer, Calcutta, India, pp 1344-1353 Kasabov N, Benuskova L (2005) Theoretical and computational models for neuro, genetic, and neuro-genetic information processing. In: Rieth M, Schommers W (eds) Handbook of computational and theoretical nanotechnology, vol X. American Scientific, Los Angeles, CA Kasabov N (2006) Evolving connectionist systems: the knowledge engineering approach. Springer, London Kasabov N, Bakardjian H, Zhang D, Song Q, Cichocki A, van Leeuwen C (2006) Evolving connectionist systems for adaptive learning, classification and transition rule discovery from EEG data: A case study using auditory and visual stimuli. Intl J Neural Systems (in press) Katok A, Hasselblat B (1995) Introduction to the modem theory of dynamical systems. Cambridge Univ Press, Cambridge, MA Kaufman S (1999) A model of human phenylalanine metabolism in normal subjects and in phenylketonuric patients. Proc Natl Acad Sci USA 96(6):31603164 Kecman V (200 I) Learning and soft computing: support vector machines, neural networks, and fuzzy logic models (complex adaptive systems). MIT Press, Cambridge, MA KEGG pathway database (2006), Kyoto Encyclopedia of Genes and Genomes, http://www.genome.jp/kegg/pathway.html Khan J, Simon R, Bitti1'er M, Chen Y, Leighton S, Pohida T, Smith P, Jiang Y, Gooden G, Trent J, Meltzer P (1998) Gene expression profiling of alveolar rhabdomyosarcoma with eDNA microarrays. Cancer Res 58(22):5009-5013 Kharazia VN, Wenthold RJ, Weinberg RJ (1996) GluRI-immunopositive interneurons in rat neocortex. J Comp Neurol 368:399-412 Kiddie G, McLean D, vanOojen A, Graham B (2005) Biologically plausible models of neurite outgrowth. In: Pelt Jv, Kamermans M, Levelt CN et al. (eds) Development, dynamics and pathology of neuronal networks: from molecules to functional circuits. Progress in brain research, vol 147. Elsevier, New York, pp67-79 Kim JJ, Foy MR, Thompson RF (1996) Behavioral stress modifies hippocampal plasticity through N-methyl-D-aspartate receptor activation. Proc Nat! Acad Sci USA 93(10):4750-4753 Kimura A, Pavlides C (2000) Long-term potentiation/depotentiation are accompanied by complex changes in spontaneous unit activity in the hippocampus. Journal of Neurophysiology: 1894-1906 Kirkwood A, Rioult MC, Bear MF (1996) Experience-dependent modification of synaptic plasticity in visual cortex. Nature 381(6582):526-528
273 Kirkwood A, Rozas C, Kirkwood J, Perez F, Bear MF (1999) Modulation oflongterm synaptic depression in visual cortex by acetylcholine and norepinephrine. J Neurosci 19(5):1599-1609 Kitano T, Schwarz C, Nickel B, Paabo S (2003) Gene diversity patterns at 10 X. chromosomal loci in humans and chimpanzees. Mol Biol Evol 20(8): 12811289 Kleppe IC, Robinson HPC (1999) Determining the activation time course of synaptic AMPA receptors from openings of colocalized NMDA receptors. Biophys J 77:1418-1427 Klinke R, Kral A, Heid S, Tillein J, Hartmann R (1999) Recruitment of the auditory cortex by long-term cochlear electrostimulation. Science 285: 1729-1733 Knerr I, Zschocke J, Schellmoser S, Topf HG, Weigel C, Dotsch J, Rascher W (2005) An exceptional Albanian family with seven children presenting with dysmorphic features and mental retardation: maternal phenylketonuria. BMC Pediatr 5(1):5 Ko DC, Binkley J, Sidow A, Scott MP (2003) The integrity of a cholesterolbinding pocket in Niemann-Pick C2 protein is necessary to control lysosome cholesterol levels. Proc Nat! Acad Sci USA 100(5):2518-2525 Ko DC, Milenkovic L, Beier SM, Manuel H, Buchanan J, Scott MP (2005) Cellautonomous death of cerebellar purkinje neurons with autophagy in NiemannPick type C disease. PLoS Genet 1(1):81-95 Koch C, Poggio T (1983) A theoretical analysis of electrical properties of spines. Proc Roy Soc Lond B 218:455-477 Koch C, Crick F (1994) Some further ideas regarding the neuronal basis of awareness. In: Koch C, Davis JL (eds) Large-scale neuronal theories of the brain. MIT Press, Cambridge, MA, pp 93-111 Koch C (1996) Towards the neuronal substrate of visual consciousness. In: Hameroff SR, Kaszniak AW, Scott AC (eds) Towards a science of consciousness: the first Tucson discussions and debates. MIT Press, Cambridge, MA, pp 247258 Koch C, Hepp K (2006) Quantum mechanics in the brain. Nature 440:611-612 Koester HJ, Sakmann B (1998) Calcium dynamics in single spines during coincident pre- and postsynaptic activity depend on relative timing of backpropagating action potentials and subthreshold excitatory postsynaptic potentials. Proc Nat! Acad Sci USA 95(16):9596-960 I Koetter R (2003) Neuroscience databases. Kluwer Academic, Norwell, MA Kohonen T (1984) Self-organization and associative memory. Springer, Berlin Kohonen T (1990) The self-organizing map. Proc IEEE 78: 1464-1497 Kohonen T (1997) Self-organizing maps, ed 2. Springer, Heidelberg Konig P, Engel AK, Singer W (1996) Integrator or coincidence detector? The role of the cortical neuron revisited. Trends Neurosci 19:130-137 Koza J (1992) Genetic Programming. MIT Press, Cambridge, MA Kudela P, Franaszcuk PJ, Bergey GK (2003) Changing excitation and inhibition in simulated neural networks: effects on induced bursting behavior. Biol Cybernetics 88(4):276-285
274
References
Kurkova V (1991) Kolmogorov's theorem is relevant. Neural Computation 3:617622 Langley K, Marshall L, Bree MVD , Thomas H, Owen M, O'Donovan M, Thapar A (2004) Association of the dopamine d(4) receptor gene 7-repeat allele with neuropsychological test performance of children with ADHD. Am J Psychiatry 161(1):133-138 Leblois A, Boraud T, Meissner W, Bergman H (2006) Competition between feedback loops underlies normal and pathological dynamics in the basal ganglia. J Neurosci 26:3567-3583 LeCun Y, Denker JS, Solla SA (1990) Brain damage. In: Touretzky DS (ed) Advances in neural information processing systems. Morgan Kaufmann, San Francisco, CA, pp 598-605 Lee C, Bae K, Edery I (1998) The Drosophila CLOCK protein undergoes daily rhythms in abundance, phosphorylation, and interactions with the PER-TIM complex. Neuron 21:857-867 Lee KS, Schottler F, Oliver M, Lynch G (1980) Brief bursts of high-frequency stimulation produce two types of structural change in rat hippocampus. J NeurophysioI44(2):247-258 Lee PS , Shaw LB, Choe LH, Mehra A, Hatzimanikatis V, KH KHL (2003) Insights into the relation between mRNA and protein expression patterns: II. Experimental observations in Escherichia coli . Biotechnology and Bioengineering 84(7):834-841 Leonard S, Gault J, Hopkins J, Logel J, Vianzon R, Short M, Drebing C, Berger R, Venn D, Sirota P, Zerbe G, Olincy A, Ross RG , Adler LE, Freedman R (2002) Association of promoter variants in the a7 nicotinic Acetylcholine receptor subunit gene with an inhibitory deficit found in schizophrenia. Arch Gen Psychiatry 59:10085-11 096 Leslie C, Eskin E, Noble WS (2002) The spectrum kernel : a string kernel for SVM protein classification. Proc. Pacific Symposium on Biocomputing, vol. 7, pp 566-575 Leutgeb JK, Frey JU, Behnisch YT (2005) Single celI analysis of activitydependent cyclic AMP-responsive element-binding protein phosphorylation during long-lasting long-term potentiation in area CA I of mature rat hippocampal-organotypic cultures. Neuroscience 131:601-610 Levy WB, Steward 0 (1983) Temporal contiguity requirements for long-term associative potentiation/depression in the hippocampus. Neuroscience 8(4):791797 Liao D, Hessler NA, Malinow R (1995) Activation ofpostsynapticalIy silent synapse s during pairing-induced LTP in CA I region of hippocampal slice. Nature 375: 400-404 Libet B (1985) Unconscious cerebral initiative and the role of conscious will in voluntary action. Behavioral and Brain Sciences 8:529-566 Libet B (1999) Do we have free will? Journal of Con sciou sness Studies 6(8-9):4757 Linden DJ (1999) The return of the spike: postsynaptic APs and the induction of LTP and LTD. Neuron 22(4):661-666
275 Liss B, Roeper J (2004) Correlating function and gene expression of individual basal ganglia neurons. Trends Neurosci 27(8):475-481 Livingstone M, Hubel D (1988) Segregation of form, color, movement, and depth: anatomy, physiology, and perception. Science 240:740-749 Lledo P-M, Zhang X, Sudhof TC, Malenka RC, Nicoll RA (1998) Postsynaptic membrane fusion and long-term potentiation. Science 279:399-403 Llinas RR, Ribary U (1994) Perception as an oneiric-like state modulated by senses. In: Koch C, Davis JL (eds) Large-scale neuronal theories of the brain. MIT Press, Cambridge, MA, pp 111-125 Lodish H, Berk A, Zipursky SL, Matsudaira P, Baltimore D, Darnell J (2000) Molecular cell biology, ed 4th. W.H. Freeman & Co., New York Lu T, Pan Y, Kao S-Y, Li C, Kohane I, Chan J, Yankner BA (2004) Gene regulation and DNA damage in the ageing human brain. Nature 429(24 June):883891 Lytton WW, Contreras D, Destexhe A, Steriade M (1997) Dynamic interactions determine partial thalamic quiescence in a computer network model of spikeand-wave seizures. J Neurophysiol 77(4):1679-1696 Maass W, Bishop CM (eds) (1999) Pulsed neural networks, MIT Press, Cambridge, MA MacBeath G, Schreiber S (2000) Printing proteins as microarrays for highthroughput function determination. Science 289(5485): 1760-1763 Mackay TFC (2000) Aging in the post-genomic era: simple or complex? Genome Biology 1(4) Magee JC, Johnston D (1997) A synaptically controlled associative signal for Hebbian plasticity in hippocampal neurons. Science 275:209-213 Maletic-Savatic M, Malinow R, Svoboda K (1999) Rapid dendritic morphogenesis in CAl hippocampal dendrites induced by synaptic activity. Science 283: 1923-1927 Mamdani E (1997) Application of fuzzy logic to approximate reasoning using linguistic synthesis. IEEE Trans on Computers 26(12): 1182-1191 Manchanda R, MalIa A, Harricharran R, Cortese L, Takhar J (2003) EEG abnormalities and outcome in first-eposode psychosis. Can J Psychiatry 48(11):722726 Maquet P (2001) The role of sleep in learning and memory. Science 294:10481052 Marcus G (2004a) The birth of the mind: how a tiny number of genes creates the complexity of the human mind. Basic Books, New York Marcus GF, Fisher SE (2003) FOXP2 in focus: what can genes tell us about speech and language? Trends in Cognitive Science 7(6):257-262 Marcus GF (2004b) Before the word. Nature 431:745 Marie H, Morishita W, Yu X, Calakos N, Malenka RC (2005) Generation of silent synapses by acute in vivo expression ofCaMKIV and CREB. Neuron 45:741752 Marini C, Harkin LA, Wallace RH, Mulley JC, Scheffer IE, Berkovic SF (2003) Childhood absence epilepsy and febrile seizures: a family with a GABAA receptor mutation. Brain 126:230-240
276
References
Markram H, Lubke J, Frotscher M, Sakmann B (1997) Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs. Science 275(5297):213-215 Marnellos G, Mjolsness ED (2003) Gene network models and neural development. In: vanOoyen A (ed) Modeling neural development. MIT Press, Cambridge, MA, pp 27-48 Martino S, Marconi P, Tancini B, Dolcetta D, Angelis MGD, Montanucci P, Bregola G, Sandhoff K, Bordignon C, Emiliani C, Manservigi R, Orlacchio A (2005) A direct gene transfer strategy via brain internal capsule reverses the biochemical defect in Tay-Sachs disease. Hum Mol Genet 14(15):2113-2123 Massimini M, Ferrarelli F, Huber R, Esser SK, Singh H, Tononi G (2005) Breakdown of cortical effective connectivity during sleep. Science 309:2228-2232 Maviel T, Durkin TP, Menzaghi F, Bontempi B (2004) Sites of neocortical reorganization critical for remote spatial memory. Science 305(5680):96-99 Mayeux R, Kandel ER (1991) Disorders oflanguage: the aphasias. In: Kandel ER, Schwartz JH, Jessell TM (eds) Principles of neural science, ed 3. Appleton & Lange, Norwalk, pp 839-851 Mayford M, Kandel ER (1999) Genetic approaches to memory storage. Trends Genet 15(11):463-470 McAdams HH, Arkin A (1998) Simulation of prokaryotic genetic circuits. Ann Rev Biophys Biomol Struct 27:199-224 McGuinness MC, Lu JF, Zhang HP, Dong GX, Heinzer AK, Watkins PA, Powers J, Smith KD (2003) Role of ALDP (ABCDI) and mitochondria in X-linked adrenoleukodystrophy. Mol Cell Bio123(2):744-753 McIntosh H (1998) Autism is likely to be linked to several genes. The APA Monitor online 29(11):http://www.apa.org/monitor/nov98/gene.html McNaughton BL, Barnes CA, Andersen P (1981) Synaptic efficacy and EPSP summation in granule cells of rat fascia dentata studied in vitro. J NeurophysioI46(5):952-966 Mehra A, Lee KH, Hatzimanikatis V (2003) Insight into the relation between mRNA and protein expression patterns: 1. Theoretical considerations. Biotechnology and Bioengineering 84(7):822-833 Meisler MH, Kearney J, Ottman R, Escayg A (2001) Identification of epilepsy genes in humans and mouse. Annu Rev Genetics 35:567-588 Mekel-Bobrov N, Gilbert SL, Vallender EJ, Anderson JR, Hudson RR, Tishkoff SA, Lahn BT (2005) Ongoing adaptive evolution of ASPM, a brain size determinant in Homo sapiens Science 309(5741):1720-1722 Melzack R (1999) Phantom limb. In: Wilson RA, Keil, F. (ed) The MIT Encyclopedia of the Cognitive Sciences. MIT Press, Cambridge, MA, pp 636-638 Mendel JM (2001) Uncertain rule-based fuzzy logic systems: introduction and new directions. Prentice Hall, New York Mieda M, Willie JT, Hara J, Sinton CM, Sakurai T, Yanagisawa M (2004) Orexin peptides prevent cataplexy and improve wakefulness in an orexin neuronablated model of narcolepsy in mice. Proc Nat! Acad Sci USA 101(13):46494654
277 Miller KD, MacKay DJC (1994) The role of constraints in Hebbian learning. Neural Computation 6(1):98-124 Miller MW, Dow-Edwards DL (1988) Structural and metabolic alterations in rat cerebral cortex induced by prenatal exposure to ethanol. Brain Res 474:316 326 Miltner WHR, Braun C, Arnold M, Witte H, Taub E (1999) Coherence of gammaband EEG activity as a basis for associative learning . Nature 397:434-436 Mitchell MT, Keller R, Kedar-Cabelli S (1997) Explanation-based generalization: a unified view. Mach Learn 1(1):47-80 Mitra S, Hayashi Y (2000) Neuro-fuzzy rule generation: survey in soft computing framework. IEEE Trans on Neural Networks 11(3):748-768 Mjolsness E, Sharp DH, Reinitz J (1991) A connectionist model of development. J Theor Bioi 152:429-453 Mockett B, Coussens C, Abraham WC (2002) NMDA receptor-mediated metaplasticity during the induction of long-term depression by low-frequency stimulation. Eur J Neurosci 15(11):1819-1826 Mogilner A, Grossman JAI, Ribary U, Joliot M, Volkmann J, Rapaport D, Beasley RW (1993) Somatosensory cortical plasticity in adult humans revealed by magneto encephalography. Proc Natl Acad Sci USA 90:3593-3597 Morales J, Hiesinger PR, Schroeder AJ, Kume K, Verstreken P, jackson FR, Nelson DL, Hassan BA (2002) Drosophila fragile X protein , DFXR, regulates neuronal morphology and function in the brain . Neuron 34:961-972 NeuCom (2006), Neuro-Computing Decision Support Environment, http://www.aut.ac.nz/research/research_instituteslkedri/research_centres/centre _for_novel_methods_oCcomputationaUntelligence/neucom.htm Ouyang Y, Kantor D, Harris KM, Schuman EM, Kennedy MB (1997) Visualization of the distribution of autopho sporylated calcium/calmodulin-dependent protein kinase II after tetanic stimulation in the CAl area of the hippocampus. J Neurosci 17(14):5416-5427 Pandey SC (2004) The gene transcription factor cyclic AMP-responsive element binding protein: role in positive and negative affective states of alcohol addiction. Pharmacol Ther 104(1):47-58 Pang S (2004) Data approximation for Bayesian network modelling. Inti J Computers, Systems and Signals 5(2) :36-43 Pang S, Kasabov N (2004) Inductive vs transductive inference, global vs local models: SVM, tSVM, and SVMT for gene expression classification problems. Proc. IntI. Joint Conf. Neural Net., IJCNN, IEEE Press, Budapest Pase L, Voskoboinik I, Greenough M, Camakaris J (2004) Copper stimulates trafficking of a distinct pool of the Menkes copper ATPase (ATP7A) to the plasma membrane and diverts it into a rapid recycling pool. Biochem J 378(Pt 3):1031-1037 Pastor P, Goate AM (2004) Molecular genetics of Alzheimer's disease. Curr Psychiatry Rep 6(2):125-133 Paulsen 0, Sejnowski TJ (2000) Natural patterns of activity and long-term synaptic plasticity. Current Opinion in Neurobiology 10(2):172-179
278
References
Penrose R (1994) Shadows of the Mind : A Search for the Missing Science of Consciousness. Oxford Univ. Press, Oxford Pevzner PA (2000) Computational molecular biology: An algorithmic approach. MIT Press, Cambridge, MA Philpot BD, Sekhar AK, Shouval HZ, Bear MF (2001) Visual experience and deprivation bidirectionally modify the composition and function ofNMDA receptors in visual cortex. Neuron 29(1):157-169 Ping TY, Shimizu E, Dube G, Rampon C, Kerchner G, Zhuo M, Guosong L, Tsien 1 (1999) Genetic enhancement of learning and memory in mice. Nature 401:63-69 Pollard KS, Salama SR, Lambert N, Lambot M-A, Coppens S, Pedersen JS, Katzman S, King B, Onodera C, Siepel A, Kern AD, Dehay C, Igel H, Manuel Ares J, Vanderhaeghen P, Haussler D (2006) An RNA gene expressed during cortical development evolved rapidly in humans. Nature Advance Online Publication (doi: 10.1038/nature051 13):1-6 Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, et aJ. (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870) :426 Poser S, Storm DR (2001) Role of Ca2+-stimu1ated adenylyl cyclase in LTP and memory formation . Int J Devl Neurosci 19:387-394 Protege (2006), http://protege.stanford.edul Rajavel KS, Neufeld EF (2001) Nonsense-mediated decay of human HEXA mRNA. Mol Cell Bioi 21(16) :5512-5519 Ralser M, NonhoffU, Albrecht M, Lengauer T, Wanker EE, Lehrach H, Krobitsch S (2005) Ataxin-2 and huntingtin interact with endophilin-A complexes to function in plastin-associated pathways. Hum Mol Genet 14(9):2893-2909 Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, et a1. (2001) Multiclass cancer diagnosis using tumor gene expression signatures. PNAS 98(26): 15149 Ranum LP, Day JW (2004) Myotonic dystrophy: RNA pathogenesis comes into focus. Am J Hum Genet 74(5):793-804 Raymond CR, Thompson VL, Tate WP, Abraham WC (2000) Metabotropic glutamate receptors trigger homosynaptic protein synthesis to prolong long-term potentiation. J Neurosci 20(3) :969-976 Reggia lA, Ruppin E, Glanzman DL (eds) (1999) Disorders of brain, behavior, and cognition: the neurocomputational perspective. Progress in brain research, Springer, New York Reid A, Willshaw D (1999) Modeling prefrontal cortex delay cells: the role of dopamine in schizophrenia. In: Reggia JA, Ruppin E, Glanzman DL (eds) Disorders of brain, behavior, and cognition: the neurocomputational perspective. Progress in brain research, vol 121. Springer, New York, pp 351-373 Reinitz J, Mjolsness E, Sharp DH (1995) Model for cooperative control of positional information in Drosophila by Bicoid and maternal Hunchback. 1 Exp ZooI271 :47-56 Rema V, Ebner FF (1999) Effect of enriched environment rearing on impairments in cortical excitability and plasticity after prenatal alcohol exposure. J Neurosci 19(24):10993-10006
279 Rhawn J (1996) Neuropsychiatry, neuropsychology, and clinical neuroscience: emotion, evolution, cognition, language, memory, brain damage, and abnormal behavior, ed 2. Lippincott Williams & Wilkins, Baltimore Ribary U, Ionnides K, Singh KD, Hasson R, Bolton JPR, Lado F, Mogilner A, Llinas R (1991) Magnetic field tomography of coherent thalamocortical 40Hz oscillations in humans. Proc Nat! Acad Sci USA 88:11037-11401 Rick JT, Milgram NW (1996) Frequency dependence of long-term potentiation and depression in the dentate gyrus of the freely moving rat. Hippocampus 6:118-124 Rieke F, Warland D, Steveninck RRdRv, Bialek W (1996) Spikes - Exploring the neural code. MIT Press, Cambridge, MA Rizzo1atti G, Fadiga L, Gallese V, Fogassi L (1996) Premotor cortex and the recognition of motor actions. Cognitive Brain Research 3:131-141 Rizzolatti G, Arbib MA (1998) Language within our grasp. Trends Neurosci 21: 188-194 Roberts AC, Robbins TW, Weikrantz L (1998) The prefrontal cortex. Oxford Univ. Press, Oxford Roberts S, Dybowski R, Husmeier D (2005) Probabilistic modeling in bioinformatics and medical informatics. Springer, London Robins A (1996) Consolidation in neural networks and the sleeping brain. Connection Science 8(2):259-275 Robinson PA, Rennie CJ, Rowe DL (2002) Dynamics of large-scale brain activity in normal arousal states and epileptic seizures. Phys Rev E 65(4):19-24 Rodriguez E, George N, Lachaux J-P, Martinerie J, Renault B, Varela FJ (1999) Perception's shadow: long-range synchronization of human brain activity. Nature 397:434-436 Roelfsema PR, Engel AK, Konig P, Singer W (1997) Visuomotor integration is associated with zero time-lag synchronization among cortical areas. Nature 385:157-161 Rolls ET, Treves A (1998) Neural networks and brain function. Oxford University Press, New York Rong R, Tang X, Gutmann DH, Ye K (2004) Neurofibromatosis 2 (NF2) tumor suppressor merlin inhibits phosphatidylinositol 3-kinase through binding to PIKE-L. Proc Nat! Acad Sci USA 101(52):18200-18205 Ropers H-H, Hoeltzenbein M, Kalscheuer V, Yntema H, Hamel B, Fryns J-P, Chelly J, Partington M, Gecz J, Moraine C (2003) Nonsyndromic X-linked mental retardation: where are the missing mutations? Trends Genet 19(6):316320 Rosenblatt F (1962) Principles of neurodynamics. Spartan Books, New York Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL (eds) Parallel distributed processing: explorations in the microstructure of cognition, vol 1. MIT Press / Bradford Books, Cambridge, MA, pp 318-363 Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems, Cambridge University Engineering Department
280
References
Saad D (ed) (1999) On-line learning in neural networks, Cambridge Univ. Press, Cambridge, MA Sabag AD, Dagan 0, Avraham KB (2005) Connexins in hearing loss: a comprehensive overview. J Basic Clin Physiol Pharmacol16(2-3):101-116 Sachdev RS, Lu SM, Wiley RG, Ebner FF (1998) The role of the basal forebrain cholinergic projection in somatosensory cortical plasticity. J Neurophysiol 79:3216-3228 Sahraie A, Weiskrantz L, Barbour JL, Simmons A, Williams SCR, Brammer MJ (1997) Pattern of neuronal activity associated with conscious and unconscious processing of visual signals. Proc Nat! Acad Sci USA 94(9406-9411) Salonen V, Kallinen S, Lopez-Picon FR, Korpi ER, Holopainen IE, Uusi-Oukari M (2006) AMPA/kainate receptor-mediated up-regulation of GABAA receptor d subunit mRNA expression in cultured rat cerebellar granule cells is dependent on NMDA receptor activation. Brain Res Salzberg SL (1990) Learning with nested generalized exemplars. Kluwer Academic, Boston, MA Sander JW (2003) The incidence and prevalence of epilepsy. The National Society for Epilepsy, http://www.eepilepsy.org.uk/pages/articles/show_article.cfm?id=26 Sankar A, Manmone RJ (1993) Growing and pruning neural tree networks. IEEE TransComput 42(3):291-299 Savage-Rumbaugh S, Lewin R (1994) Kanzi: the ape at the brink of the human mind. John Wiley & Sons, New York Schaal S, Atkeson C (1998) Constructive incremental learning from only local information. Neural Computation 10:2047-2084 Schnapp BJ, Reese TS (1986) New developments in understanding rapid axonal transport. Trends Neurosci 9:155-162 Schratt GM, Tuebing F, Nigh EA, Kane CG, Sabatini ME, Kiebler M, Greenberg ME (2006) A brain-specific micro RNA regulates dendritic spine development. Nature 439:283-289 Schule B, Albalwi M, Northrop E, Francis DI, Rowell M, Slater HR, Gardner RJ, Francke U (2005) Molecular breakpoint cloning and gene expression studies of a novel translocation t(4;15)(q27;q11.2) associated with Prader-Willi syndrome. BMC Med Genet 6(6):18 Schulz S, Siemer H, Krug M, Hollt V (1999) Direct evidence for biphasic cAMP responsive element-binding protein phosporylation during long-term potentiation in the rat dentate gyrus in vivo. J Neurosci 19(13):5683-5692 Schwaller B, Tetko IV, Tandon P, Silveira DC, Vreugdenhil M, Henzi T, Potier M-C, Celio MR, Villa AEP (2004) Parvalbumin deficiency affects network properties resulting in increased susceptibility to epileptic seizures. Mol Cell Neurosci 25:650-663 Searle J (2002) Consciousness and Language. Cambridge Univ. Press, Cambridge, MA Sebastian CS (2005) Mental retardation. eMedicine, Inc., http://www.emedicine.com/med/topic3095.htm
281 Segan S (2005) Absence seizures. eMedicine http://wwwemedicinecom/NEURO/topic3htm Seri B, Garcia-Verdugo JM, McEwen BS, Alvarez-Buylla A (2001) Astrocytes give rise to new neurons in the adult mammalian hippocampus. J Neurosci 21(18):7153-7160 Shadlen MN, Newsome WT (1998) The variable discharge of cortical neurons: implications for connectivity, computation, and information coding. J Neurosci 18:3870-3896 Sheng M, Lee SH (2001) AMPA receptor trafficking and the control of synaptic transmission. Cell 105:825-828 Shi SH, Hayashi Y, Petralia RS, Zaman SH, Wenthold RJ, Svoboda K, Malinow R (1999) Rapid spine delivery and redistribution of AMPA receptors after synaptic NMDA receptor activation. Science 284: 1811-1816 Shouval HZ, Bear MF, Cooper LN (2002) A unified model of NMDA receptordependent bidirectional synaptic plasticity. Proc Natl Acad Sci USA 99(16): 10831-10836 Shouval HZ, Castellani GC, Blais BS, Yeung LC, Cooper LN (2002) Converging evidence for a simplified biophysical model of synaptic plasticity. BioI Cybernetics 87:383-391 Siegel JM (2001) The REM sleep-memory consolidation hypothesis. Science 294: 10581063 Silvanto J, Cowey A, Lavie N, Walsh V (2005) Striate cortex (VI) activity gates awareness of motion. Nature Neurosci 8:143-144 Sinclair DA, Guarente L (2006) Unlocking the secrets of longevity genes. Scientific American 294(3):48-57 Singer W (1994) Putative function of temporal correlations in neocortical processing. In: Koch C, Davis JL (eds) Large-scale neuronal theories of the brain. MIT Press, Cambridge, MA, pp 201-239 Singer W (1999a) Neuronal synchrony: a versatile code for the definition ofrelations? Neuron 24:49-65 Singer W (1999b) The observer in the brain. In: Riegler A, Peschl M, Stein Av (eds) Understanding representation in the cognitive sciences. Kluwer Academic/Plenum, New York Sjostrom PJ, Turrigiano GG, Nelson SB (2001) Rate, timing, and cooperativity jointly determine cortical synaptic plasticity. Neuron 32:1149-1164 Smith LS, Hamilton A (eds) (1998) Neuromorphic systems: engineering silicon from neurobiology. Progress in Neural Processing, World Scientific, London Smolen P, Baxter DA, Byrne JH (2000) Mathematical modeling of gene networks. Neuron 26:567-580 Smolen P, Hardin PE, Lo BS, Baxter DA, Byrne JH (2004) Simulation of Drosophila circadian oscillations, mutations, and light responses by a model with VRI, PDP-I, and CLK. Biophys J 86(May):2786-2802 Somogyi R, Fuhrman S, Wen X (2001) Genetic network inference in computational models and applications to large-scale gene expression data. In: Bower JM, Bolouri H (eds) Computational modeling of genetic and biochemical networks. MIT Press, Cambridge, MA, pp 119-157
282
References
Song Q, Kasabov N (2006) TWNFI - Transductive weighted neuro-fuzzy inference system and applications for personalised modelling. Neural Networks Song S, Miller KD, Abbott LF (2000) Competitive Hebbian learning through spike-timing-dependent synaptic plasticity. Nature Neuroscience 3:919-926 Song S, Abbott LF (2001) Cortical development and remapping through spike timing-dependent plasticity. Neuron 32(2):339-350 Soosairajah J, Maiti S, Wiggan 0, Sarmiere P, Moussi N, Sarcevic B, Sampath R, (2005) Interplay between components of a novel Bamburg JR, Bernard LIM kinase-slingshot phosphatase complex regulates cofilin. EMBO ] 24(3):473-486 Spacek J, Harris KM (1997) Three-dimensional organization of smooth endoplasmatic reticulum in hippocampal CA 1 dendrites and dendritic spines of the immature and mature rat. J Neurosci 17:190-203 Spivak G (2004) The many faces of Cockayne syndrome. Proc Nat! Acad Sci USA 101(43):15273-15274 Stefansson H, Sigurdsson E, Steinthorsdottir V, Bjomsdottir S, Sigmundsson T, Ghosh S, Brynjolfsson J, Gunnarsdottir S, Ivarsson 0 , Chou TT (2002) Neuregulin 1 and susceptibility to schizophrenia. Am J Hum Genet 71(4):877-
°
892 Steinlein OK (2004) Genetic mechanisms that underlie epilepsy. Nature Rev Neurosci 5:400-408 Stevenson RE, Procopio-Allen AM , Schroer RJ, Collins JS (2003) Genetic syndromes among individuals with mental retardation. Am J Med Genet 123A:29-32 (1997) mRNA localization in neurons: a multipurpose mechanism? Steward Neuron 18:9-12 Stickgold R, Hobson JA, Fosse R, Fosse M (2001) Sleep , learning, and dreams: off-line memory reprocessing. Science 294: 1052-1 057 Storjohann R, Marcus GF (2005) NeuroGene: integrated simulation of gene regulation, neural activity and neurodevelopment. Proc . IntI. Joint. Conf. Neural Net., IJCNN 2005 , IEEE , Montreal, Canada, pp 428-433 Straub RE, Jiang Y, MacLean CJ, Ma Y, Webb BT, Myakishev MV, Harris-Kerr C, Wormley B, Sadek H, Kadambi B, Cesare AJ, Gibberman A, Wang X, O'Neill FA, Walsh D, Kendler KS (2002) Genetic variation in the 6p22.3 gene DTNBPI, the human ortholog of the mouse dysbindin gene, is associated with schizophrenia. Am J Hum Genet 71(2):337-348 Street VA, Goldy JD, Golden AS , Tempel BL, Bird TD, Chance PF (2002) Mapping of Charcot-Marie-Tooth disease type 1C to chromosome 16p identifies a novel locus for demyelinating neuropathies. Am J Hum Genet 70(1) :244-250 Stuart GJ, Sakmann B (1994) Active propagation of somatic action potentials into neocortical pyramidal cell dendrites. Nature 367(6458):69-72 Sudhof TC (1995) The synaptic vesicle cycle: a cascade of protein-protein interactions . Nature 375:645-653 Sugai T, Kawamura M, Iritani S, Araki K, Makifuchi T, Imai C, Nakamura R, Kakita A, Takahashi H, Nawa H (2004) Prefrontal abnormality of schizophre-
°
283 nia revealed by DNA microarray: impact on glial and neurotrophic gene expression. Ann N Y Acad Sci 1025(Oct):84-91 Suri V, Lanjuin A, Rosbash M (1999) TIMELESS-dependent positive and negative autoregulation in the Drosophila circadian clock. The EMBO Journal 18:675-686 Takagi T, Sugeno M (1985) Fuzzy identification of systems and its applications to modeling and control. IEEE Trans on Systems, Man, and Cybernetics 15:116132 Taylor JG (1999) The race for consciousness. MIT Press, Cambridge, MA Terman D, Rubin JE, Yew AC, Wilson CJ (2002) Activity patterns in a model for the subthalamopallidal network of the basal ganglia. J Neurosci 22:2963-2976 Theiler J (1995) On the evidence for low-dimensional chaos in an epileptic electroencephalogram. Phys Lett A 196:335-341 Thivierge J-P, Marcus GF (2006) Computational developmental neuroscience: exploring the interactions between genetics and neural activity. Proc. IntI. Joint Conf. Neural Net., IJCNN 2006, IEEE, Vancouver, Canada, pp 438-443 Thorpe S, Fize D, Marlot C (1996) Speed of processing in the human visual system. Nature 381 :520-522 Thorpe SJ, Fabre-Thorpe M (2001) Seeking categories in the brain. Science 291 :260-262 Tikovic P, Voros M, Durackova D (2001) Implementation of a learning synapse and a neuron for pulse-coupled neural networks. Journal of Electrical Engineering 52(3-4):68-73 Tononi G, Edelman GM (1998) Consciousness and complexity. Science 282: 1846-1851 Towell GG, Shawlik JW, Noordewier M (1990) Refinement of approximate domain theories by knowledge-based neural networks. Proc. Proc. 8th Natl. Conf. AI, AAAI PresslMIT Press, Boston, MA, pp 861-866 Towell GG, Shavlik JW (1993) Extracting refined rules from knowledge-based neural networks. Mach Learn 13(1):71-101 Towell GG, Shawlik JW (1994) Knowledge based artificial neural networks. Artificial Intelligence 70(4): 119-166 Toyoizumi T, Pfister J-P, Aihara K, Gerstner W (2005) Generalized BienenstockCooper-Munro rule for spiking neurons that maximizes information transmission. Proc Nat! Acad Sci USA 102(14):5239-5244 Traub RD, Miles R, Wong RK (1987) Models of synchronized hippocampal bursts in the presence of inhibition. I. Single population events. J NeurophysioI58(4):739-751 Traub RD, Whittington MA, Stanford 1M, Jefferys JGR (1996) A mechanism for generation of long-range synchronous fast oscillations in the cortex. Nature 383:621-624 Tsien JZ (2000) Linking Hebb's coincidence-detection to memory formation. Current Opinion in Neurobiology 10(2):266-273 Tsuda I (2001) Toward an interpretation of dynamic neural activity in terms of chaotic dynamicical systems. Behav Brain Sci 24:793-847
284
References
Turrigiano GG, Nelson SB (2000) Hebb and homeostasis in neuronal plasticity. Curr Opin Neurobiol10:358-364 Utlsh A, Siemon HP (1990) Kohonen's self-organizing feature maps for exploratory data analysis. Proc. IntI. Neural Networks Conf., INNC'90, Kluwer Academic, Paris, pp 305-308 van Ooyen A (ed) (2003) Modeling neural development, MIT Press, Cambridge, MA VanRossum MCW, Bi GQ, Turriggiano GG (2000) Stable Hebbian learning from spike timing-dependent plasticity. The Journal of Neuroscience 20(23):88128821 Vapnik V (1998) Statistical learning theory. John Wiley & Sons, New York Veenstra-Vanderweele J, Christian SL, E. H. Cook J (2004) Autism as a paradigmatic complex genetic disorder. Annu Rev Genomics Hum Genet 5:379-405 Villa AEP, Asai Y, Tetko IV, Pardo B, Celio MR, Schwaller B (2005) Crosschannel coupling of neuronal activity in parvalbumin-deficient mice susceptible to epileptic seizures. Epilepsia 46(Suppl. 6):359 Vreugdenhil M, Jefferys JGR, Celio MR, Schwaller B (2003) Parvalbumindeficiency facilitates repetitive IPSCs and related inhibition-based gamma oscillations in the hippocampus. J Neurophysiol 89: 1414-1423 Wang H, Wagner JJ (1999) Priming-induced shift in synaptic plasticity in the rat hippocampus. J Neurophysiol 82:2024-2028 Wang H, Fu Y, Sun R, He S, Zeng R, Gao W (2006) An SVM scorer for more sensitive and reliable peptide identification via tandem mass spectrometry. Proc. Pacific Symposium on Biocomputing, vol. 11, pp 303-314 Wang JC, Hinrichs AL, Stock H, Budde J, Allen R, Bertelsen S, Kwon JM, Wu W, Dick DM, Rice J, Jones K, Nurnberger J, Tischfield J, Porjesz B, Edenberg HJ, Hesse1brock V, Crowe R, Schuckit M, Begleiter H, Reich T, Goate AM, Bierut LJ (2004) Evidence of common and specific genetic effects: association of the muscarinic acetylcholine receptor M2 (CHRM2) gene with alcohol dependence and major depressive syndrome. Hum Mol Genet 13(17): 1903-1911 Wang NJ, Liu D, Parokonny AS, Schanen NC (2004) High-resolution molecular characterization of 15q 11-q 13 rearrangements by array comparative genomic hybridization (array CGH) with detection of gene dosage. Am J Hum Genet 75(2):267-281 Watts JA, Morley M, Burdick JT, Fiori JL, Ewens WJ, Spielman RS, Cheung VG (2002) Gene expression phenotype in heterozygous carriers of ataxia telangiectasia. Am J Hum Genet 71(4):791-800 Watts M, Kasabov N (1998) Genetic algorithms for the design of fuzzy neural networks. In: Usui S, Omori T (eds) Proc. 5th IntI. Conf. Neural Inf. Processing, vol 2. lOS Press, Kitakyushu, pp 793-796 Weaver DC, Workman CT, Stormo GD (1999) Modeling regulatory networks with weight matrices. Proc. Pacific Symposium on Biocomputing, World Scientific, pp 112-123 Weiler IJ, Irwin SA, Klintsova AY, Spencer CM, Brazelton AD, Miyashiro K, Comery TA, Patel B, Eberwine J, Greenough WT (1997) Fragile X mental re-
285 tardation protein is translated near synapses in response to neurotransmitter activation. Proc Nat! Acad Sci USA 94:5395-5400 Weisstein EW (1999-2006) Delay differential equations. Wolfram Research, MathWorld A Wolfram Web Resource http://mathworld.wolfram.com/DelayDifferentialEquation.html Wendling F, Bartolomei F, Bellanger H, Chauvel P (2002) Epileptic fast activity can be explained by a model of impaired GABAergic dendritic inhibition. Eur J Neurosci 15:1499-1508 Werbos P (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 87: 10-15 Wessels LFA, van Someren EP, Reinders MJT (2001) A comparison of genetic network models. Proc. Pacific Symposium on Biocomputing, World Scientific, Singapore, Hawai, pp 508-519 White JA, Banks MI, Pearce RA, Kopell NJ (2000) Networks of interneurons with fast and slow g-aminobutyric acid type A (GABAA) kinetics provide substrate for mixed gamma-theta rhythm. Proc Nat! Acad Sci USA 97(14):81288133 Whitehead DJ, Skusa A, Kennedy PJ (2004) Evaluating an evolutionary approach for reconstructing gene regulatory networks. In: Pollack J, Bedau MA, Husbands P et al. (eds) Proc. 9th International Conference on the Simulation and Synthesis of Living Systems (ALIFE IX), MIT Press, Cambridge, MA, pp 427-432 Willshaw D, Price D (2003) Models for topographic map formation. In: vanOoyen A (ed) Modeling neural development. MIT Press, Cambridge, MA, pp 213244 Wittenberg GM, Sullivan MR, Tsien JZ (2002) Synaptic reentry reinforcement based network model for long-term memory consolidation. Hippocampus 12:637-647 Wittenberg GM, Tsien JZ (2002) An emerging molecular and cellular framework for memory processing by the hippocampus. Trends Neurosci 25(10):501-505 Wittner L, Eross L, Czirjak S, Halasz P, Freund TF, Magloczky Z (2005) Surviving CAl pyramidal cells receive intact perisomatic inhibitory input in the human epileptic hippocampus. Brain 128:138-152 Wu FX, Zhang WJ, Kusalik AJ (2004) Modeling gene expression from microarray expression data with state-space equations. Proc. Pacific Symposium on Biocomputing, World Scientific, Singapore, pp 581-592 Wu G-Y, Deisseroth K, Tsien RW (2001) Activity-dependent CREB phosphorylation: convergence of a fast, sensitive calmodulin kinase pathway and a slow, less sensitive mitogen-activated protein kinase activity. Proc Nat! Acad Sci USA 98(5):2808-2813 Wu L, Wells D, Tay J, Mendis D, Abbott M-A, Barnitt A, Quinlan E, Heynen A, Fallon JR, Richter JD (1998) CPEB-mediated cytoplasmic polyadenylation and the regulation of the experience-dependent translation of a-CaMKII mRNA at synapses. Neuron 21:1129-1139
286
References
Yamakawa T, Kusanagi H, Uchino E, Miki T (1993) A new effective algorithm for neo fuzzy neuron model. Proc . Fifth IFSA World Congress, IFSA, Seoul, Korea, pp 1017-1020 Yang JJ, Liao PJ, Su CC, Li SY (2005) Expression patterns of connexin 29 (GJE1) in mouse and rat cochlea. Biochem Biophys Res Commun 338(2):723-728 Yao X (1993) Evolutionary artificial neural networks. Inti J Neural Systems 4(3) :203-222 Zadeh L (1979) A theory of approximate reasoning. In: Hayes J, Michie D, Mikulich LI (eds) Machine intelligence, vol 9. Halstead Press, New York, pp 149194 Zadeh LA (1965) Fuzzy sets. Information and Control 8:338-353 Zador A, Koch C, Brown T (1990) Biophysical model of a Hebbian synapse. Proc Nat! Acad Sci USA 87:6718-6722 Zhukareva V, Sundarraj S, Mann D, Sjogren M, Blenow K, Clark CM, McKeel DW, Goate A, Lippa CF, Vonsattel JP, Growdon JH, Trojanowski JQ, Lee VM (2003) Selective reduction of soluble tau proteins in sporadic and familial frontotemporal dementias: an international follow-up study. Acta Neuropathol (Berl) 105(5):469-476 Zoghbi HY (2003) Postnatal neurodevelopmental disorders: meeting at the synapse? Science 302:826-830 Zoghbi HY (2005) MeCP2 dysfunction in humans and mice. J Child Neurol 20(9):736-740 Zubenko GS, Maher BS, Hughes HB, Zubenko WN, Stiffler JS, Kaplan BB, Marazita ML (2003 ) Genome-wide linkage survey for genetic loci that influence the development of depres sive disorders in families with recurrent, earlyonset, major depression. Am J Med Genet B Neuropsychiatr Genet 123(1):118 Zubenko GS, Maher BS, Hughes HB, Zubenko WN, Stiffler JS, Marazita ML (2004) Genome-wide linkage survey for genetic loci that affect the risk of suicide attempts in families with recurrent, early-onset, major depression. Am J Med Genet B Neuropsychiatr Genet 129(1):47-54 Zucker RS (1999) Calcium- and activity-dependent synaptic plasticity. Current Opinion in Neurobiology 9(3):305 -313
Index
action planning, 43 adult cortex, 66 aging, 224 Alzheimer's disease, 224 AMPA receptor, 55, 105 ANN, 81, 253 anosognosia, 25 artificial neural network, 81, 253 auditory cortex, 63 awareness, 46, 49 Bayesian methods, 247 BCM theory, 57, 68, 184 BGO,234 bifurcation analysis, 146 binding, 38, 41 binocular deprivation, 63 binocular rivalry, 39 blindsight, 43 Boolean methods, 251 brain, 23, 53 brain cancer, 97 brain diseases, 205 brain-gene ontology, 9, 234 Broca's aphasia, 32 CaMKJI, 61, 178 cAMP-responsive transcription factor, 186 cerebral cortex , 23, 56 chaos, 50 chromosome, 128 classification, 81, 124 clustering, 96, 122,248,249
CNGM, I, 155, 163, 169, 171, 174, 177, 196,203,205 coding , 78 codon, 141 coherence activity, 42, 43, 46, 47 computational intelligence, 247 computational neurogenetic model ing, I, 155, 163, 169, 171, 174,177,196,203,205 Computer Tomography, 20 conceptual spaces, 47 conduction aphasia , 32 connectionist, 84, 128 connectionist constructivism, 89 connectionist selectivism, 89 consciousness, 46, 49 cortical column, 74 CREB ,186 CREB phosphorylation, 189 crossover, 129 CT,20 Darwin, 136,256 dendrite, 54 dendritic tree, 54 DENFIS , 107, 116, 151 developmental plasticity, 62 dimensionality, 85 distance, 248 DNA , 137 dopamine, 211 dynamic core, 46 Dynamic Evolving Neural-Fuzzy Inference Systems , 107, 116, 151 dynamic synaptic modification threshold, 69
288
Index
dynamic systems , 169 ECOS,107 EEG, 20, 99 EFuNN, 108, 152 electroen cephalography, 20 epilepsy, 206 evolution, 128 evolutionary computation, 88, 127, 165 evolutionary processes, 127 evolv ing, 1, 109 evolv ing connectionist system s, 107 Evolving Fuzzy Neural Network, 108,152 excitatory, 55, 60, 102,214 experience-dependent, 61, 64, 79 expli cit memory, 27 firing threshold , 103 fitness, 129 fMRI,22 functional MRI, 22 fuzzy, 97, 109, 119, 249 fuzzy logic, 251 fuzzy set, 25 1 fuzzy variable, 251 GABA , 211 gamma oscillations, 41 gene , 128, 137 gene control, 156 gene expression, 97,142,155,162, 169, 188, 195 gene profile, 142 gene/protein regulatory network , 147,165,250 gene s and disease , 237 genetic algorithms, 128 genetic disorders , 237 Gestalt ,37 glutamate, 56, 2 13 GPRN, 147, 165, 250 gradient descent, 90
Hebbian synaptic plasticity, 180 hemiparalysis, 25 homo synaptic LTP, 200 immediate-early genes, 191 implicit or nondeclarative memory, 28 inhibito ry, 55, 102 innate factors, 62 input-output function, 82 ion channels, 55 knowledge, 91 knowledge-based, 254 language, 29, 35 language gene, 34 learning , 56, 84, 86, 120, 177,247 learning and memo ry, 25 lifelong learning, 87, 127 long-term memory , 27,191,223 long-term synaptic depression, 57, 178 long-term synaptic potentiation, 57, 178, 186 LTD, 57,178 LTP, 57,1 78,1 86 Magnetic Resonance Imaging , 21 magnetoencephalography, 20 MEG ,20 memory , 58, 61,177 mental retardation, 218 mentalization, 45 metaplasticity, 183, 193 microarray data, 150 micro array matrix , 142 mirror neurons, 34, 45 MLP, 98 monocular deprivation, 63 morphogenesis, 156 morphological changes, 61 motor, 75 MRI, 21
289 MSA,173 Multilayer Perceptron, 98 multiple sequence alignment, 173 mutation, 129 NeuCom, 235 neural code, 74 neural development, 156 Neural Gas, 89 neural representation, 36 neurogenesis, 28 neuro-information processing, 53 neuron, 53 neurotransmitter, 54 NMDA receptor, 55, 59, 73, 105, 178,213 NMDAR, 105, 187 non-coding, 61 non-REM sleep, 48 noradrenaline, 211 normalization, 122 ocular dominance, 62 ontology, 233 optimization, 84, 128, 155, 165,256 orientation selectivity, 62 oscillations, 38, 44 Parkinson disease, 229 peA,85 percept, 42 PET,21 phantom limb, 65 phase, 77 population, 129 Positron Emission Tomography, 21 postsynaptic potential, 55 prediction, 81 prefrontal cortex, 45 prenatal ethanol, 68 Principal Component Analysis, 85 protein, 141 PSP, 55 qualia,48
rate code, 77 receptors, 55, 59 reflective consciousness, 41 REM sleep, 48 representation, 65 reverse correlation, 77 ribosome, 140 RNA,137 robustness, 147 schizophrenia, 212 second messengers, 56 selection, 129 Self Organizing Map, 93 self-reflection, 45 sensory activity, 67 sensory awareness, 38,43 serotonin, 211 short-term memory, 26, 108, 191 similarity, 107 Single-Photon Emission Computed Tomography, 21 SNN, 102 SOM,93 somatosensory cortex, 64 SPECT,21 spike, 54 Spike Response Model, 102 spike timing, 77 spike timing-dependent plasticity, 180 Spiking Neural Network, 102 spiking neuron, 198 spine, 54, 59 SRM,102 STDP, 180 stochastic models, 250 subconscious, 47 subcortical structures, 23 subjective experience, 48 Support Vector Machine, 90, 249 SVM, 90, 249 synaptic modification threshold, 181 synaptic plasticity, 56, 58, 177, 199 synaptic strength, 53
290
Index
synaptic weight, 53, 68, 82 synchronization, 38, 40, 77 Takagi-Sugeno, 116, 151 thalamocortical noise, 72 thinking, 34 topographic, 161 topography, 64 topological map, 95 Transcranial Magnetic Stimulation, 19 transcription, 139
transductive inference, 121 transition matrix, 164 translation, 139 unsupervised learning, 83 vesicles, 54, 60 visual areas, 38 Wernicke's aphasia, 31 whiskers, 66