GENETICS – RESEARCH AND ISSUES
MOLECULAR POLYMORPHISM OF MAN: STRUCTURAL AND FUNCTIONAL INDIVIDUAL MULTIFORMITY OF BIOMACROMOLECULES
No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.
GENETICS – RESEARCH AND ISSUES Additional books in this series can be found on Nova‘s website under the Series tab.
Additional E-books in this series can be found on Nova‘s website under the E-books tab.
GENETICS – RESEARCH AND ISSUES
MOLECULAR POLYMORPHISM OF MAN: STRUCTURAL AND FUNCTIONAL INDIVIDUAL MULTIFORMITY OF BIOMACROMOLECULES
SERGEI D. VARFOLOMYEV AND
GENNADY E. ZAIKOV EDITORS
Nova Science Publishers, Inc. New York
Copyright © 2011 by Nova Science Publishers, Inc. All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. For permission to use material from this book please contact us: Telephone 631-231-7269; Fax 631-231-8175 Web Site: http://www.novapublishers.com NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers‘ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. Additional color graphics may be available in the e-book version of this book. LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA Molecular polymorphism of man : structural and functional individual multiformity of biomacromolecules / [edited by] Sergei D. Varfolomyev and Gennady E. Zaikov. p. ; cm. Includes bibliographical references and index. ISBN 978-1-61324-929-1 (eBook) 1. Genetic polymorphisms. 2. Phenotypic plasticity. I. Varfolomeyev, S. D. II. Zaikov, Gennadii Efremovich. [DNLM: 1. Polymorphism, Genetic. 2. Genetics, Population. 3. Genomics. QU 500 M718 2009] QH447.6.M65 2009 611'.01816--dc22 2009021411
Published by Nova Science Publishers, Inc. † New York
CONTENTS Foreword
vii S. D. Varfolomyev and G. E.Zaikov
Chapter 1
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism S. D. Varfolomyev , I. N. Kurochkin and I. A. Gariev
Chapter 2
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva and V. A. Tarasov
Chapter 3
Chapter 4
Association of Candidate Genes Polymorphism with Asthma in Bashkortostan Republic of Russia E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova and I. R. Gilyazova Genes and Languages: Is Are There Correlations between MTDNA Data and Geography of Altay and Ural Languages E. Khusnutdinova and I. Kutuev
1
77
101
129
Chapter 5
Common and Special Features of the Human Ribosomal DNA Natalia. S. Kupriyanova and Alexei. P. Ryskov
145
Chapter 6
Ethnic Genomics of the East European Human Populations S. A. Limborska, D. A. Verbenko, A. V. Khrunin and P. A. Slominsky
175
Chapter 7
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova and Y. B. Lebedev
Chapter 8
Biomedical Aspects in Investigations of Biochemical Polymorphism of Actins and Some Actin-Binding Proteins S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva, M. A. Kovaleva, L. S. Eremina and V. O. Popov
203
237
vi
Contents
Chapter 9
Molecular Mechanisms of Adaptation: Stress and Aggression A. G. Tonevitsky, N. V. Maluchenko, J. V. Shchegolkova, M. A. Kulikova, M. A. Timofeeva, O. V. Sysoeva, V. A. Shleptsova and A. I. Grigoriev
Chapter 10
Ethnogenomics: The Genetic History of Humans Written in Chromosomal DNA Markers L. A. Zhivotovsky and E. K. Khusnutdinova
Index
281
299 325
FOREWORD S.D. Varfolomeev and G.E. Zaikov ―You shouldn‘t be a motiveless optimist to believe that fifty years later the ‗biological code‘—chemical encryption of hereditary features—will be decoded and read. Since that moment, the man will become the absolute sovereign of the living matter.‖ V.A. Engelgardt, 1957 ―The man will not become the lord of the nature until he will not become the lord of himself.‖ G. Gegel, 1807
The Academician V.A. Engelgardt has rather accurately predicted the time of performance of the outstanding event in the history of mankind—decoding of the human genome. If you call to notice that this prediction was made just three and half years after discovery of DNA as the informative molecule carrying genetic information, one may only be surprised about sagacity and scientific intuition of the Academician V.A. Engelgardt. During fifty years, science has made a terrific experimental and theoretical academic breakthrough that by the beginning of new millennium provided decoding of human DNA genome molecule structure, and genome of many microorganisms, plants and animals, as well. At present, the structures of DNA genome for more than 500 organisms are determined. This process is in progress, and its intensity and information received avalanche-like increase. Besides deep admiration of achievements of the science and display of power of the global human intellect, what does this give to the modern society? Decoding of genomes has many consequences, and a lot of them will be continued and developed in recent years or decades. Decoding of the human genome creates a qualitatively new state in development of modern fields of science, technology and medicine. One of the basic results of the Human Genome Project, which are already comprehended, is formation of a basis for investigating genome of every individual with detection of differences at the gene and protein levels. Chemical-biological approach based on highly efficient physical methods provides opportunities of detailed molecular genetic typing of the population, investigation of genetic polymorphism, individual features of enzymatic and molecular-receptor processes in every person. Achievements in human genomics and proteomics, chemical enzymology,
viii
S. D. Varfolomeev and G. E.Zaikov
bioinformatics and medical genetics form the basis of modern investigations and multiple practical uses. The accuracy and efficiency of modern analytical methods allows for assigning tasks of obtaining genetic and proteomic molecular portray of every individual and detection of individual differences of personalities at the genetic and protein levels. In the nearest decade, post-genomic and proteomic investigations will lead to significant changes in many spheres of social life. At present, qualitatively new molecular medicine based on determination of the ultimate causes of many diseases is being established. Aptitudes and development of many diseases are genetically defined. Basing on post-genomic and proteomic studies, new branches, such as cardiogenomics, oncogenomics, neurogenomics, pharmacogenomics, based on objective appraisal and reduction of risks of cardiovascular diseases and cancer, forecast of neurodegenerative processes and aging appear in medicine. Today, it is referred to as creation and development of individual medicine based on molecular-genetic and proteomic human portray. Occupational guidance and study of personal dispositions in various spheres of action may be based on molecular-genetic analysis. Molecular genetic typing is the foundation for reasonable determination of potential occupational abilities of a man. The study of polymorphism of genes defining physical, psychological and intellectual human characteristics seems to be of crucial importance. In the modern post-genomic process, one of the main targets is creation of a unified platform for genetic analysis and the basis for genotyping of the population. Functional reserves of the human organism are significantly defined by the genotype of parents. At present in developed countries, and in the nearest future in Russia, a system for estimation of risks and abilities of children basing on genetic portray of parents is being developed. It is expected that in full, this system will start functioning in the nearest decade. Genetic forecasting of pathology risk and human abilities at the background of many social factors is the material basis for transition to genetically healthful population. Post-genomic projects suggest many special supplements; in particular, developed approaches provide full and unambiguous identification of an individual using superlow trace quantities of biological materials. The molecular genetic approach becomes the foundation for many human sciences. Analysis of structural features of genomic DNA passed from generation to generation is fundamental for the modern approach to the study of origination and evolution of ethnoses. New fields of science, ethnogenomics and ethnogeography appeared. Post-genomic development of the science touches upon many spheres of life of the modern society. Basing on molecular presentations created by modern physical, chemical and molecular biological methods and operating modern information technologies, which use an advanced mathematical apparatus, this field creates extremely socially meaningful products affecting development of the society as a whole. The problem of studying molecular polymorphism of a man is interdisciplinary and interesting for investigators in various branches. Recognizing the multilevel and interdisciplinary kind of the problem, in 2006, Russian Academy of Sciences and M.V. Lomonosov Moscow State University established a joint project aimed at solution of many problems, coordinated around the study of the multiformity of human biomacromolecules. The book suggested is the result of the first stage of development of this project. Workers of many scientific organizations in Russia took part preparation of this book.
Foreword
ix
Understanding of the modern methodical level of investigations of the same gene variety at the level of genomic DNA and expressed protein structures seems to be of principal importance. Methodology of the investigation includes various methodical approaches and physical observation methods that allow detection of differences in the gene structure, including, at the level of singular substitution of nucleotides. Of importance are detailed analysis and application of modern high sensitive, precise and highly productive methods based on mass spectrometry of biomacromolecules in this field. A number of key systems such as full system of enzymes or ribosomal RNAs existing in the human organism require analysis and understanding of the molecular variety affecting human molecular physiology. The questions about the role of structural modifications and singular substitutions in the structure and function of macromolecules may be answered both at experimental and theoretical stages of the investigation. The important role of modern computation and information technologies in the analysis of the role of singular substitutions, heritable and stable in the human population, should be emphasized. Human aptitudes to some diseases are genetically determined. On the example of analysis of the literature and self experimental data, the role of genetic and proteomic polymorphism in development of ischemic heart disease, liver diseases, bronchial asthma, habitual noncarrying of pregnancy, etc., is discussed in this book. The investigations of genetic control of carcinogenesis and mutagenesis are of the greatest importance and interest. At present, genetic aspects of psychology, type of psychological behavior, personality of individual, and psychoemotional behavior are the subject for molecular-genetic and proteomic analysis. Actually, molecular genetic methods of personality identification is of high practical significance. A special article in this book is devoted to the aspects of forensic genetics. Amazing results were currently obtained on genetic history of the mankind recorded in some DNA-markers. Polyethnic structure of the population is the specific feature of Russia. Molecular kinetic methods allow clarification of complex problems in intercommunication of different ethnoses in Russia. This is hard work lying on the shoulders of Russian researchers of ethnic genomics and genogeography. Modern results obtained in this field are shown in the final section of this book. Thus, the reader gets the book introducing him into the world of ideas and results oriented at the post-genomic development of the society. This is the book devoted to molecular polymorphism of man based on structural multiformity of biomacromolecules. Achievements of many branches of the modern science—physics and chemistry, biology and medicine, mathematics and informatics—and many human sciences, fancily and incredibly intersect in this field. This field is extremely socially significant, because it touches upon every individual. The nearest decade of the post-genomic era will give many interesting and unexpected achievements. The Editors would like to say many thanks to Dr. Valeriya I. Naidich for technical help during preparation of this volume. S.D. Varfolomeev, MSU Professor, Corresponding Member of RAS G.E.Zaikov, Professor, Institute of Biochemical Physics RAS
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev, G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 1
HUMAN ENZYMES – —GENETIC, PROTEOMIC AND CATALYTIC POLYMORPHISM S.D. Varfolomeev* , I.N. Kurochkin* and I.A. Gariev N.M. Emanuel Institute of Biochemical Physics, RAS M.V. Lomonosov Moscow State University, Chemical Department, Moscow, Russia
ABSTRACT Various aspects of enzyme molecular polymorphism phenomenon are considered. The state of investigations in the field of physical and structural chemistry of enzymes in the context of manifestation of some protein structure elements‘ role in catalytic activity and formation of tertiary structure of the protein is are analyzed. Capabilities of molecular mechanics in the study of effects of structural variations on a catalytic site and quantum-mechanical calculations of elementary acts of the catalytic cycle at relatively low changes in distances between catalytic groups are discussed. Bioinformative approaches in the study of catalytic site structures are considered. Analysis of protein capabilities and databases, which provide information on human enzymes at the genetic and structural levels, is of special attention. Databases and databanks of genes, proteins and single amino acid replacements, and database for the study of genetic polymorphism associations with diseases are considered. In addition, questions about theoretical methods of forecasting effects of single replacements on the structure and function of protein are considered. An approach that applies the entropic portray of a family of homologous proteins to detection of conservative and variable parts of the polypeptide chain important for the structure and catalysis is developed. On the example of several most physiologically important enzymes (acetyl- and butyrylcholin esterase, paraxonase, carboxylesterase, alcohol dehydrogenase, alkaline phosphatase, protein phosphatase, angiotensin converting enzyme, cyclooxygenase, catalase, peroxidase, superoxide dismutase) molecular polymorphism expression by these biomacromolecules in both three-dimensional structures and entropic images, and separate functions of the organism are considered. * *
[email protected]
[email protected]
2
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
ABBREVIATIONS SNP PON HDL DNA RNA ACHE BCHE CE ADH ALPL ACE COX MPO EPX TPO SOD OPC
single nucleotide polymorphism paraoxonase high density lipoproteins deoxyribonucleic acid ribonucleic acid acetylcholine esterase butyrylcholin esterase carboxyl esterase alcohol dehydrogenase alkaline phosphatase angiotensin-converting enzyme cyclooxygenase myeloperoxidase eosinophilic peroxidase thyroid peroxidase superoxide dismutase organophosphoric compounds
INTRODUCTION Investigation of human molecular polymorphism is one of the most important and socially valuable modern post-genomic projects. The essential concentration of efforts of the world scientific society concluded in an amaizing achievement— – decoding of DNA, the structure of genomic informative molecule, which defines the structure and functioning of biomacromolecules and molecular machines in the human organism. One of the expected, but nevertheless solely important, results of the project is demonstration of the fact that the man has one genome, but every particular individual has different genes. The multiformity of structures of each particular gene at the DNA level, already occurred and still occurring, inheritable and passed on from generation to generation, led to a giant accumulation of differences of the individuals at the molecular level and human molecular polymorphism. Understanding of these molecular features of everayevery individual has many important and essential consequences. In the multiformity of biomacromolecules, enzymes take a special place. 1. Enzymes form the frame of metabolism. Being rather scanty in the human genome (about 3,000 among the total number of identified genes of 300,000), enzymes determine rates, directions and stationary concentrations of all chemical reagents thatwhich participate in organism function. Therefore, changes and variations in the structure of any enzyme or at the level of its expression may significantly contribute into behavior of the whole system. This is of special importance, if an enzyme is limiting in a complex sequence of transformations by any metabolic path. Structure variations may affect many properties of the enzyme: catalytic activity, stability,
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
2.
3.
4.
5.
6.
3
membrane activity, transfer targeting in the cell. All these properties are principally important and may significantly affect processes proceeding in the cell. Genetic and proteomic modification of an enzyme is modification of metabolism. At present, enzymes are the most well- studied class of proteins. For a wide variety of enzymes, the information about three-dimensional structure with atomic resolution is obtained, active sites are identified and reasonable representations of molecular mechanisms of catalysis are formed [1]. Structural antinomy of the enzymatic catalysis is so that the multiformity of enzymes having different primary structure is provided with rather limited number of catalytic sites. Apparently, the nature by a head found a combination of atoms with catalytic activity, learned to form their three-dimensional structure from polypeptide chains with various sequences of amino acids and specialized to use them in various enzymes and organisms. Structural convergence of proteins to the limited number of active sites observed forms quite favourable prerequisites to solution of the general problem, which is obtaining of the comprehensive and full information about all catalytic sites of enzymes existing in the nature. Information and computation technologies, developed by the present time, allow raising a question about a possibility to reconstruct full three-dimensional protein structure based on the notion of the primary structure. In the case of enzymes, this procedure is essentially unique by virtue of the structural antinomy of enzymatic catalysis and commonality of active site structures for enzymes from different sources. The methods developed allow reconstruction of the protein structure by homology. This is principally important stage in the study of enzymes, because in the modern science, the basic structural information in presented by primary sequences mostly due to development of genomic investigations. Thus, at the present time a question may be arisedraised about structural reconstruction of all human enzymes basing on genomic data with detection and analysis of possible molecular polymorphisms. Modern chemical enzymology has developed rather branched and reliable network for measuring catalytic activity of enzymes of all classes. This provides the basis for studying polymorphism of enzymes at the post-translational level by their functional activity. It is principally important and interesting to investigate molecular polymorphism of enzymes at two limiting points of molecular expression of the information— – at the level of gene and at the level of its end product. Many interesting observations and suprisingsurprising findings may be expected. The study of molecular polymorphism of enzymes will provide the possibilities of molecular interpretation of physiological features of organism. It has been known that the basic and the most well-known genetic diseases, such as Felling's disease, connected with polymorphism of enzymes. It is obvious that finer distinctions in the structure of enzymes define physiological features of the organism ont of one or another individual, such as physical and mental capacity, aptitude to particular diseases. It seems principally important to search for correlations: gene structure and activity of enzyme— - physiological response. Totality of such data will form principally new bases for molecular physiology. Enzymes are targets for a great number of modern pharmaceuticals. The study of structural polymorphism of enzymes forms grounds for understanding the
4
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev mechanisms of action of particular pharmaceutical on different individuals. Understanding of these mechanisms is the necessary formation stage of individual medicine and the way for creation of improved pharmaceuticals.
PHYSICAL AND STRUCTURAL CHEMISTRY OF ENZYMES: FROM THE PRIMARY STRUCTURE TO ELEMENTARY ACTS OF CATALYSIS Nearly 200 years passed since in 1814, K.S. Kirchhoff, the natural scientist from SaintPetersburg, discovered the phenomenon of chemical reaction acceleration by a biological substance. During that time, the biological (enzymatic) catalysis has passed a long way of evolution of apprehension of this phenomenon‘s nature and currently forms the area, which is deeply understandiable from the fundamental positions and widely applied to various fields of human activititesactivities. In 1836, Jons Jakob Berzelius, one of the founders of modern chemistry and the father of catalysis, wrote: ―In plants and animals thousands of catalytic processes between tissues and fluids proceed, implementing a great many chemical syntheses from a single primary material‖ [2, 3]. Hence, from the very beginning of the investigations, it was clear that the enzymatic catalysis demonstrates a number of absolutely outstanding properties: 1. Catalysis by enzymes, as a rule, is by 1012 - 1015 times more effective compared with the ―classical‖ chemical catalysts, hydrogen ion, for example. 2. Protein molecules, the material carriers of enzymatic activity, are able to "recognize" molecules of reagents and carry out reactions selectively with molecules of certain structure. The latter property is solely important in biological systems, because it provides directional flow of chamicalchemical variations in complex multicomponent biological mixtures of substances. In the recent 20-30 years, historical advances in the understanding of enzymatic catalysis and development of multiple applications of enzymes happened. The existing achievements are determined by the colossal scope of information about the structure of proteins, progress in studying kinetics and mechanics of reactions catalyzed by enzymes, creation of methods, which allow to manipulate the protein structure basing on modification of its gene, the use of modern computer informative and calculation methods. The latter shall be outlined. Creation, development and active use of storage and processing methods for large information volumes, a possibility of high -volume and high -speed computations provided success in this field of science and technology. Comprehensive study of enzymes and their application provided origination, establishing, development and success of the whole fields of modern natural science. Enzyme is a precise and high- performance tool for carrying out complex reactions, which require accuracy is cleavage and synthesis of particular chemical bonds. Providing opportunities of directional genetic modification of organisms and obtaining of proteins with useful properties at the required level, genetic engineering is based on ability of enzymes to hydrolyze selectively
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
5
particular bonds in nucleic acid molecules and synthesize, if necessary, phosphodiester bonds and link separate DNA and RNA chains.
Chemical Kinetics and Structural Investigations – The Basis for Making Notions About Mechanisms of the Enzymatic Catalysis At present, a significant progress in the understanding of enzyme operation mechanisms is reached. Systematic and miscellaneous investigation of enzymes allowed provision of a systematic approach to the study of this natural phenomenon. Modern presentations about the mechanism of enzyme action are based on two principally important baseline surveys: 1. Investigation of kinetic process schemes with identification of substrate insertion points to catalytic cycle, detection and determination of the chemical nature of labile intermediate compounds and analysis of various states of the active site. 2. Study of the structure of enzymes and their active sites. Creation of experimentally stipulated presentations about mechanisms of catalysis is based on the use of structural data and the study of reaction kinetics with identification of intermediate compounds participating in the mechanism of the process. Formally kinetic analysis of reactions catalyzed by enzymes is the subject of rather intensive investigations, carried out in recent decades. The baseline principles of analysis and basic results are represented in a series of monographs, textbooks and manuals [4-8]. This work results isn a comprehensive analysis of various kinetic schemes connecting reaction rates with concentrations of participants of the catalytic process (substrates, active sites, intermediates), and description of temporal development of catalytic process. In the most of the cases, similar to the study of both stationary and nonstationary processes, a significant fact making analysis simpler is structurally homogeneous character of the active site of enzyme and linearity of the main equations describing elementary stages of the catalytic process by enzyme concentration. This makes formally kinetic description of the reaction easier and makes possible adequate comparison of theoretical equations with experimental data. The conclusion of formally kinetic analysis is that catalytic cycle of the enzyme action represents a multistage chemical and physical process involving a set of labile intermediate compounds, which include both chemical intermediates and conformers of catalytic groups, substrates and intermediates. Formal kinetic survey of various kinetic schemes and appropriate experimental study in both stationary and unstationary modes allowed obtaining of detailed information about the number and nature of intermediates, characteristics of kinetic processes of their formation and transformation [1].
6
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Bioinformatics in the Study of Catalytic Site Structures For the majority of enzymes, active sites of enzymes representing a collection of functional radicals of amino acids may be rather reliably divided into two principally important functional areas – —sorptional and catalytic. The sorptional area performs the substrate complex formation with the enzyme and is responsible for substrate selection and specificity of the enzyme. The catalytic area, which, as a rule, comprises acids and bases, metal ions and prosthetic groups, performs principally important stages of substrate activation and its chemical transformation. In the recent decade, focusing of interest to structures of proteins allows stating with a certain assurance that a significant part of basic catalytic sites existing in the nature is now identified and studied. This assurance is based on a great experience in chemical enzymology, which operates a large array of structural data, reveals affined structures of catalytic sites for many enzymes [1]. Hence the enzymes of various classes may comprise identical or almost identical structures of catalytic sites. At the present time, information on the primary structure of more than a million of proteins, including proteins comprising genomes of man, animals, plants and bacteria, is obtained. Database on three-dimensional structures includes over 30,000 structures, with 5,000 basic structures among them. A significant part of these proteins are enzymes. Given that the classifier of enzymes comprises over 3,730 items, the statement that the majority of catalytic structures are already known to us is particularly confirmed. Bioinformatics methods are highly significant for the study of enzymes and the mechanisms of their action [9-15]. A possibility of comparing structures of many proteins using modern informative technologies provides an opportunity to determine their general, functionally valuable elements, active sites of enzymes, in particular. This approach is based on the natural phenomenon, according to which the totality of proteins, nearly infinite by the protein structure and having catalytic activity, has catalytic sites formed by a rather limited number of structures. Basing on the computer technology, we have developed two identification methods for functional groups composing an active site of any enzyme. The structure of catalytic site of enzyme may be determined from data on the primary sequence of amino acids in the protein [11, 12]. On the other hand, having data on atomic resolved three-dimensional structure of the protein, one can also detect a catalytic site, even if it has not been identified before [15]. From data on the primary sequence of amino acids in the protein informative technologies allow determination of catalytic groups of the enzyme. This approach is based on comparison of primary structures of proteins defined as carriers of any enzymatic activity with detection of general or the so-called ―conservative‖ positions containing the same amino acid. Modern computer databases contain significant informationoninformation on the structure of proteins and enzymes. The case in hand is both the primary sequence of amino acids in the polypeptide chain and atomic resolved structure of proteins. The biggest volumes of information on primary structures, i.e., on the sequence of amino acids in proteins, is obtained. Investigations of genomes have provided this field with giant volumes of information, which continue increasing exponentially. It has been known that the primary sequence is that defining all structural hierarchies of the protein. The following scientific problem is obvious: how could we forecast complete three-dimensional structure of protein including the structure of its active site basing on the
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
7
knowledge of the sequence of amino acids? At present, to solve this task, at least, two approaches are used. The first approach proceeds from theoretical calculation of the energy performance of protein globule with searchinbgsearching for structures with minimal energy [16]. It is associated with large scope of calculations. Moreover, in the most of the cases, the task may not have a single-valued solution, although it may be expected that the progress in this field may lead to applicably valuable results. The second approach is much simpler. It is based on composing a structure by homology. The approach is based on the fact that many proteins, speciallyespecially proteins from various sources which perform the same or close function, are rather similar and possess high homology degree. Since for many proteins the complete three-dimensional structure is known, these structures can be used for composing the homologue structure. This significantly simplifies the calculations. To detect amino acids forming active site of an enzyme and structure forming amino acids ―uniting‖ side chains of the amino acid in the catalytic site, an approach based on comparison of sequences of amino acids in homological proteins presented by a broad family (see below) was used. The procedure of comparison of the primary protein structure with high homology degree (30% or higher) was called "alignment.". If any protein from retrieval is taken for the basis, other proteins-homologues can be ―bead‖ on it using a computer in order to compare identity of each position in all selection under study. The ―alignment‖ algorithm suggests automatic accounting for ―insertions‖ (the presence of additional, not general sequence) and ―deletions‖ (frequent skips of chain fragments). ―Alignment‖ results in a matrix of probabilities of occurrence of one or another amino acid in each position of the polypeptide chain selected as the base one. This approach allows detection of amino acid positions, in which amino acid may be preserved with high probability, the so-called conservative positions typical of all proteins of the selection. Distribution of various functions of conservative amino acids is rather obvious. Carboxylic group of aspartic and glutamic amino acids, imidazole group of histidine, guanidine group of arginine form three-dimensional catalytic structures, which perform coordinated nucleophilic-electrophilic catalysis. Elementary act of catalysis is associated with relatively low proton transfer along hydrogen bond line by 1-1.5 A or with bond polarization by electrophilic agent. Hence, reactivity of substrate molecule or active site group increases by 106- 107 times. For example, drawing the reagent structure in the transient state closer to hydroxylic ion, proton transfer from water increases its reactivity in nucleophilic replacement reaction by 107 times. If the process proceeds consistently and each of the substrates is activated in this manner, total acceleration of the reaction by 1012 - 1015 times may be expected. Catalytically valuable amino acids are spread by the whole polypeptide chain of the protein. During folding and three-dimensional structure formation, catalytically important functional groups are drawn closer in space, forming three-dimensional structure of the catalytic site. The factor providing reliable polypeptide chain folding is the presence of structure forming amino acids— - glycine, proline and cysteine – —where appropriate. Hence, catalytically valuable amino acids may be located in different order of the popypeptide chain.
8
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
The main structural antinomy of the enzymatic catalysis is so that the giant quantity of enzymatic reactions using virtually infinite number of proteins and differentiating by the primary structure is based on the use of rather limited number of catalytic sites. For example, the greatest class of enzymes – —hydrolases, gives about one -third of presently known enzymes (about 1,100 of 3,700 enzymes by their classification). Each enzyme is followed by a biological variety connected with the fact that in biological systems these proteins are represented by virtually infinite set of choices of amino acid sequence. However, there are five basic catalytic sites only, which activate molecules and the catalytic cycle [13]. Multiformity of the reactions (about 1,100 for hydrolases) is connected with the sorption site action, which forms the substrate complex and orients active site relative to catalytic groups of the protein. Thus, catalytic sites of enzymes are unique structures formed during evolution and, apparently, as a result of searching many intermediate versions of structures and genetically fixed as indexed positions of catalytically valuable amino acids and structure forming amino acids collecting functional groups of catalytic amino acids in the desired place of space with required steric match. The phenomenon of structural unity of catalytic sites composition occurs, when we compare active sites of enzymes, which are not hydrolases, with hydrolases. Analysis shows that active sites of enzymes of other classes different from hydrolases are composed from the same nucleophilic-electrophilic elements, as active sites of hydrolases are. Reactive sites of molecules are activated by the same nucleophilic-electrophilic mechanisms, frequently with the use of the same three-dimensional structures. The phenomenon of structural solidity of the active sites of enzymes becomes more obvious, when structures of active sites of hydrolases are compared with appropriate structures of synthetases, which form bonds and use no water molecule as a reagent. Transition from hydrolytic to synthetic reactions is thereby performed by replacement of activation of water by, for instance, carbohydrate hydroxylic group with full preservation of the active site structure [12]. The above-described identification of catalytic and structure forming amino acids from the data on primary sequence of amino acids in the polypeptide chain, the phenomenon of structural solidity of enzymes of various classes has many consequences. It is obvious that proteins having catalytic activity may be composed from amino acids, the number of which is much smaller than twenty acids existing in the nature. The greater number of positions in the polypeptide chain may be substituted by other amino acids without sugnificantsignificant change in catalytic function of protein. Of importance is the presence of some critical amino acids in particular positions either possessing acid-base properties, forming a catalytic site, or structure forming amino acids gathering catalytic groups in a particular area of the space. Traditional modification methods for the enzyme properties are site-directed mutagenesis or the method of directed evolution. The results of analysis show that free variation of amino acids is possible only in positions different from conservative by character. Otherwise modification of the enzyme may very probably cause loss of its enzymatic activity.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
9
Active Sites of Enzymes: Geometrical Invariants and Observed Values of Parameters At present, structural studies of enzymes have provided investigators in the field of catalysis with a broad set of atomic resolved structures. Structural PDB base is currently represented by more than 30,000 structures, the data quantity increasing by 20% annually. Hence, composing of structuresbystructures by combinatorial methods with respect to homology becomes the separate approach to obtaining three-dimensional structures of proteins and enzymes. This allows arising and solving general problems forming notions about catalysis as a whole as the natural phenomenon. The task that had been solved, concluded in designing of an automated procedure, which allowed detection of active sites as a configuration of atoms of definite structure and the associated problem, which was estimation of permissible variations of distances between atoms and bond angles at conservation of the catalytic function. To put it differently, of interest is answer to the the question how ―strict‖ structural requirements to configuration of atoms are in order to implement the catalytic function. The answer to this question may be obtained by statistic comparison of the structures of enzymes possessing the same active site but different by the source specificity, and structure of protein molecule. The model object of surveys serine hydrolases including a triad of the hystidineaspartic acid series as the catalytic site was was selected for. Enzymes of this class are studied most well from positions of catalysis mechanism, specificity, and structural organization of active site [6, 17]. In the PDB database, these enzymes are presented by 1,284 proteins. Hence, 1,256 structures were obtained by the X-ray structural analysis, and 28 structures by proton magnetic resonance. The primary analysis shows that enzymes performing catalysis with the help of Ser-HisAsp triad may be rather different by configuration of atoms forming the catalytic site. Statistical analysis of permissible variations observed in really operating enzymes was performed, on the one hand, for the purpose of detecting limitations and, on the other hand, for the purpose of creating automatic computerized procedure for identification of catalytic sites of this type from the data on three-dimensional structures of protein. The latter suggests designing of a computer "pattern" with permissible values of parameters, which would allow detection of catalytic sites in a complex configuration of atoms of a protein molecule. For the design of computer ―pattern,‖ the simple procedure of structure comparison by mean-square deviations of atoms appeared unacceptable, because it doesn not consider the multitude of specific structural differences associated with location of atoms in space. A procedure detecting local identity of atoms in protein structures, based on geometrical invariants, has been developed. For geometrical invariants, the distances between principally important atoms, planar angles between three atoms pertaining to two or three catalytic groups, and planar angles formed by two vectors, each composed on atoms of the same catalytic residue, have been taken. For the ―training‖ set, the data on enzymes with strictly known structure were used, and for limitations, minimal and maximal values of this parameter were taken.
10
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
This analysis gives principally important information on structural features of active sites of enzymes, allows detection of permissible values of geometrical invariants, creation of a procedure for automatic identification of active sites in the protein molecule, and determines important features of the structural reorganization of active site in the catalytic cycle.
Molecular Mechanics of Protein in the Study of Active Sites Recently new abilities in the study of molecular mechanisms of the enzymatic catalysis occurred. These abilities are associated with the use of molecular mechanics and quantum chemistry methods for calculation of protein molecule behavior during the catalytic act. Development of effective programs describing proteins within the classical mechanics gave the investigators opportunities to calculate energy properties of protein molecule and behavior protein chains in various conditions [18-21]. Of principal interest are calculations of protein interaction with lowmolecularlow molecular ligands (substrates, inhibitors), study of the so-called docking-process, and computer mutagenesis, as well, the calculation of changes in the protein molecule at replacement of an amino acid by another one. Computer mutagenesis is principally important for solving tasks of rational designing of proteins with directional changing of their properties. Molecular mechanics and computer mutagenesis methods helped in studying behavior of the active sites of enzymes at replacement of conservative and nonconservative glycines by other amino acids. It has been shown that replacement of conservative glycines by alanines seriously disturbs geometry of active sites [22].
Quantum-Chemical Calculations of Elementary Acts of the Catalytic Cycle A qualitative advance in understanding of the elementary acts of enzymatic reactions is associated with the use of quantum mechanics and quantum chemistry methods for description of separate acts of the catalytic cycle. Rather reliable quantum-chemical calculations became possible due to perceived separation of a catalytic site in protein molecule, which behavior may be described using quantum chemistry methods. Hence, the rest of the protein molecule may be described using molecular mechanics methods, the QMMM-approximation [23-25]. The methods developed provide calculation of multidimensional surfaces of potential energy, study and identification of stationary points (global and local minimums, transition states), and calculation of energy patterns of chemical reactions. All this forms a new level of understanding of molecular mechanism of the reaction. Quantum-chemical calculations provided a possibility of solving many problems connected with detailed apprehension of the mechanisms of enzymatic reactions. For example, since Blow et al. discovered the three-dimensional structure of -chymotrypsin, it has been assumed common that hydroxylic group of serine in Ser-His-Asp triad is activated by the mechanism of proton double transfer serine hydroxylic group histidine imidazole group deprotonated carboxylic group of aspartic acid. This mechanism based on seemingly obvious fact of proton transfer from positively charged imidazole ion to negatively charged
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
11
carboxylic group of aspartic acid was assumed for explanation of the catalytic triad action (see above) for serine hydrolases. However, quantum-chemical calculations of energy patterns performed in the framework of the QM-MM approximation demonstrated that regarding the environment of carboxylic group, the activation mechanism of serine hydroxyl is of the single-proton character. The role of carboxylic group of aspartic acid is defined by orientation of imidazole group in the active site space [25]. At the present time, rather reasonable ideas on the enzyme specificity nature, i.e., their ability to ―recognize‖ the structure of effective substrate or inhibitor, are developed. Basic physicochemical model is based on the idea that substrate or inhibitor interaction with sorption area of the active site provides a decrease of free activation energy at the limiting stage of the process. The mechanism of free activation energy decrease due to formation of stresses on the reaction bond is experimentally validated. Substrate selection by means of conformation changes of the active site induced by the interaction of ―good‖, i.e., specific, substrate with active site groups (the so-called ―induced conformity‖) is also theoretically and experimentally validated [1]. The mechanism of ―induced conformity‖ was suggested by Koshland [26] and is now confirmed by quantum-chemical calculations [27]. For serine hydrolases, O - N distance for free enzyme is systematically by 0.1 - 0.2Å longer compared with the enzyme-substrate (enzyme-inhibitor) complex. This suggests that complex formation between the substrate and the active site causes some change of the latter that moves the system by the reaction coordinate. As shown by quantum-chemical calculations, energy required for the proton transfer O - N in the enzyme-substrate complex is, approximately, by 3 kcal higher compared with free enzyme. Since the proton transfer is the component of the activation barrier (total barrier height is 9.6 kcal/mol), the data obtained show that the substrate makes some structural changes in the active site, which have a significant effect on enzymatic reaction proceeding (the reaction rate constant increases by 150 times, approximately). Quantum-chemical calculations demonstrate high ―sensitivity‖ of catalytic function of the enzyme to relatively low changes in the active site geometry. This may be valuable for structural polymorphism of the enzyme. Replacement that does not touch upon the active site directly and is located far from catalytic locus, is able to induce a change of distances between catalytic groups important for catalysis and, therefore, lead to catalytic polymorphism. The feature of the modern state of investigations in the field of physical and structural chemistry of enzymatic catalysis shall be emphasized. At present, the system approach to investigation and understanding of the origin of catalysis by enzymes is fully formed. Knowledge of the primary structure, i.e., the sequence of amino acids in polypeptide chain provides sufficient information to cosequent hierarchy levels of process understanding to the extent of elementary acts of the catalytic site. Modern informative methods give an opportunity to detect functional groups forming the catalytic site and principally important structure forming amino acids. In the presence of homologues of particular protein having the structure with atomic resolution, one may obtain quite reliable information on the three-dimensional structure of the studied protein. This allows determination of the active site geometry. Subsequent application of molecular mechanics and
12
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
quantum chemistry methods to calculation of the interaction between catalytic groups and the substrate provides the information on energy of elementary acts for various possible reaction mechanisms and participants of the catalytic process and, therefore, provides an idea about molecular mechanism of catalysis, physical nature of chemical reaction acceleration by enzyme action.
BANKS AND DATABASES GIVING INFORMATION ON HUMAN ENZYMES AT GENETIC AND STRUCTURAL LEVELS Two persons who are not relatives have about one difference per 1300 nucleotide pairs [28]. Human genome is 3.2109 pairs large, so two persons, on average, differ by 2.5106 positions. Calculation of total number of variations, which may be observed in the human population, is more complicated task. The lesser frequent mutation is, the smaller the chance is to detect it by comparing genomes of two individuals. Accurate estimate of total SNP number in the human genome requires, firstly, data on distribution of frequencies of mutation occurrence and, secondly, selected lower bound of occurrence of the variations, which will be accounted for. It is assumed that for SNP only such mutations are taken, which Minor Allele Frequence, MAF, is 1% or higher [29]. Using estimates of mutation distribution frequences one may show that human population has about 11 million SNP [30].
BANKS AND DATABASES OF GENES, PROTEINS AND SNP GenBank1 [31] is one of the primary and the most well-nownknown data banks providing information on nucleotide sequences. It was established to give a suitable access to experimental sequencing data. Owing to this data bank, modern publications list identification codes only (GenBank Accession Number), whereas nucleotide sequences themselves may be found by the identification code mentioned. Stored sequences are highly different by length: they may represent results of a single sequencing and genomes of the entire organisms. For example, total DNA sequence for E.coli is accessible by identification code АЕ014075. GenBank is maintained by NCBI (National Center for Biotechnology Information) at NIH (National Health Institute, USA). Since mid-90s, three leading data banks: GenBank, EMBL (European Molecular Biology Laboratory) and DDBJ (DNA Database of Japan), have united their efforts and arranged data exchange. Therefore, at the present time, content of these banks is identical. Genome Database2 [32]. The results of sequencing perormedperformed within the "Human genome" Project are stored in GenBank which, by definition, does not analyze the data. The results obtained are annotated in a separate project, Genome Database, maintained in the same NCBI Centre. Annotation of the sequence includes identification of genes and description of their exonic-introgenic structure, the relation with databases of protein sequences and RNA, description of local features – —the presence of regions enriched with 1 2
http://www.ncbi.nlm.nih.gov/Genbank/ http://www.ncbi.nlm.nih.gov/Genomes/
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
13
repetitions, the presence of unique STS sequences (Site Tagging Sequences), references to homologic sequences of other organisms and much more the same. There are analogous projects: Genome Browser3 (University of California in Santa-Cruz) and Ensembl4 of European Institute of Bioinformatics. dbSNP5 [33]. During the ―Human genome‖ Project DNA of several volunteers was sequenced [34]. Therefore, nucleotide sequences from the Genome database do not belong to any particular man – —they are generalized, consensus for a group of people. The differences between individuals determined during sequencing and, first of all, SNP, formed a new database – —dbSNP, which is also supported by the group of NCBI. The database presents documents of two types - variations determined experimentally by any research group and variations representing associations of several experimental groups (because different groups may describe the same mutation independently). Identification codes in the first group begin with letters ss (submitted snp), and in the second group with letters rs (reference snp). For every variation, data on its location in chromosome and flanking sequences (25 n.p. or longer each) are shown. It is indicated if this mutation is present in the intergenic interval, nitron or exon of any gene, and if it leads to a change in amino acid sequence of the protein. If investigations with many people were performed, data on frequency of occurrence of every allelemay be presented. At present (November 2006), the database contains data on 11,961,761 human reference SNP, ant it is still refilled. Beside dbSNP, which is the largest database for general purposes, there are other projects, HGBase6 [40] or JSNP7 [41], for example, dedicated to detection of SNP drug metabolism in genes, observed among Japan citizens. SwissProt8 [35] is the largest database on proteins. The year 2006 is the twentieth year of works of Swiss Institute of Bioinformatics on creation of annotations of proteins basing on the data from the literature. At present, the project is carried on by Swiss and European Institutes of Bioinformatics together, and the name of the project was changed to UniProt. For proteins described in the database are given the name and synonyms, name of gene, data on protein extracting organism, full amino acid sequence and its annotation - post-translational modification sites, the presence of disulfide bonds, signal peptides, etc. Biological activity for proteins and EC numbers, required cofactors, catalytic site residues and catalyzed reaction for enzymes are shown. Finally, cross references with other databases (over 70), including GenBank and Protein Data Bank (if for this protein the three-dimensional structure is known). For more than 2,000 human proteins, data on diseases with which these proteins are associated and references to OMIM database are given (see below). Since summer 2006, data on mutations which lead to replacement of one amino acid residue are shown. At present, over 28,500 variations are shown, a half of which are associated with diseases or aptitudes; for ~30% of variations, references to dbSNP are given. Each mutation has the annotation page with shown references to the literature, location in the sequence and in the protein structure (if known), as well as conservatism of position, in which this mutation is observed on the multitude of homologic proteins. 3
http://genome.ucsc.edu/ http://www.ensembl.org/ 5 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=snp 6 http://hgbase.cgr.ki.se/ 7 http://snp.ims.utokvo.ac.ip/ 8 http ://www. expasy. or g/sprot/ 4
14
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Protein Data Bank9 [36] is the main information repository of three-dimensional structures of proteins and nucleic acids, obtained experimentally, usually by X-ray diffraction and NMR-spectroscopy. By November 2006, the bank included 36,642 structures for 15,130 unique proteins (for the same protein several structures may be obtained, and they will be stored in the bank). Basing on PDB, Structure/MMDB (Molecular Modelling DataBase)10 was created in the NCBI Centre. The documents in this database may differ by numeration of amino acid residues. Note that for reflection of SNP position in the three-dimensional structure of protein dbSNP database refers directly to the Structure database. MutDB11 [43]. Finally, let us denote a small but suitable web-service for SNP visualization. Software located on this server allows for on-line viewing of three-dimensional structures of proteins from PDB with mutations indicated on them, which were taken from SwissProt and dbSNP databases.
DATABASES FOR THE STUDY OF ASSOCIATIONS OF GENETIC POLYMORPHISM AND DISEASES As already mentioned, the number of possible SNP is about 11106. Determination of SNPs associated with particular disease is highly time consuming task. In many cases, a set of SNP-candidates may be reduced, if the known connection between the disease and any part of chromosome or, none the worse, a definite gene, is known. OMIM12 (Online Mendelian Inheritance in Man) [42] is the oldest database (refilled since 1960s) associating inheritable diseases and genes. The database is composed manually, by data published in medical literature, many of which were obtained by investigations of genealogy of families, in the members of which the disease developed. Since the database was created before discovery of SNPs, as a rule, no data on correlation of diseases with separate SNPs are shown. Moreover, many genetic diseases described in the database are induced not by SNP, but by loss of chromosome fragments or mutation, for example, which were detected in several patients only (by SNP definition, the frequency of occurrence of the minor allele shall exceed 1%). Database documents represent descriptive texts and are not suitable for automated analysis. Despite the disadvantages, however, the database contains sufficient information and bibliographical references, and many surveys detecting SNP associated with diseases used this database for the starting point. Modern database associating diseases with separate SNP, GAD13 (Genetic Association Database) [44] and HGMD14 (Human Gene Mutation Database) [12], for example, do exist. These databases are formed from literary data, which are manually reduced to the consistent form. GAD database shows names of gene and disease, release references, where the data were taken, statistical correlation confidence and sample size (if they are mentioned in the publication). References to many databases, including PubMed, dbSNP and НарМар, are also presented. At present, the database contains information on 2,850 genes and 5,633 diseases/phenotypic
9
http://www.pdb.org http://www.ncbi.nlm.nih.gov/Structure/ 11 http://www.mutdb.org/ 12 http://www.ncbi.nlm.nih.gov/entrez/querv.fcgi?db=OMIM 13 http://geneticassociationdb.nih.gov/ 14 http://www.hgmd.org/ 10
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
15
characteristics. Since summer 2006, HGMD database requires registration of users and for non-academic user is payable. HapMap (Haplotype Map) [37]. Data on hereditary character of aptitudes for many diseases are present, but the casual relation to any single gene is not determined. Therefore, the only way to detect SNP associated with such diseases is analysis of all possible SNP (genome-wide association study). Fortunately, the scope of work may be significantly reduced in this case, too. For many SNP pairs a correlation is observed, i.e., the presence of one of SNP alleles with high probability means the presence of this allele in another SNP. Therefore, the number of SNP sets (haplotypes) observed in the population is much smaller than the set of all possible combinations. Special international project was dedicated to detection of possible haplotypes. The results of this project are shown in HapMap database. It is found that for taking into account 95% of all SNP it is enough to check the presence of about 700,000 SNP markers. If we are limited by a signle population only (Central European or Asian, for example), the number of SNP to be checked is reduced to 300,000. For example, expression levels of more than 40 genes were analyzed (RNA quantity in 300 persons was determined) [38]. For more than a half of genes SNP were detected, which correlated with significant statistical reliability with the gene expression level. Note that the presence of correlation between various SNP causes difficulties to interpretation of association between SNP and the phenotype. For example, it may appear that a variation described in the literature by itself does not cause a significant change of protein properties, such as stability or catalytic properties. It just occurs together with another variation, which is the true reason for phenotype change (changing stability or RNA expression level, for example).
Theoretical Forecasting Methods for SNP Effect on the Structure and Function of Protein Taking into account large volume of posiblepossible SNP, it would be extremely suitable to obtain methods for functional evaluation of SNP significance in order to reduce the quantity of SNP-candidates for experimental check. Since the factors affecting RNA expression level or stability are not clear enough, the majority of investigators focus their efforts on forecasting of the effect of relatively small nsSNP group (SNP leading to replacement of amino acid residues) on functional characteristics of proteins. Three methods can be separated: methods considering conservatism and amino acid composition of the position, in which replacement happened, on the multitude of homologic proteins: methods based on analysis of changes in the three-dimensional structure of protein; combined methods, frequently using software learning on test data (machine-learning methods). The authors of many methods, described below, analyzed the data by polymorphism and published results, as their own computer databases. Analysis methods of amino acid sequences are based on a suggestion that amino acid replacement in a position conservative on the multitude of proteins-homologues of different organisms, will be more significant (and unfavorable), than replacement in nonconservative position. The methods by different authors differ in the methods of detecting proteins-homologues, composition of multiple alignments and statistical functions used for quantitative estimation of the position conservatism. For nonconservative positions amino acid composition is also frequently taken into account – —
16
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
for instance, mutation leading tito occurrence of an amino acid residue existing for any homologue is assumed less significant than mutation leading to occurrence of a residue absent in this position in all proteins-homologues. The most well-known methods are SIFT15 (Sorting Intolerant From Tolerant) [45, 46], PolyPhen16 (Polymorphicm Phenotyping) [47, 48] and Panther17 [22]. The authors of the latter work have not only developed computational method, but also experimentally checked its forecasts on the culture of human cells with 17 versions of АВСА1 gene, for 16 of which theoretical and experimental data coincided. General disadvantage of the methods based on analysis of three-dimansionaldimensional structure of proteins is that this three-dimensional structure is required. On the other hand, these methods allow not only to find potentially unfavorable mutations, but also to explain their reason – —the fault of protein hydrophobic nucleus packing and stability loss, changes in protein-protein contact sites, etc. For operation of such methods, an expert set of rules allowing identification of potentially unfavorable mutations (examples of the rules – — replacement of a small amino acid residue inside the protein globule by voluminous one; replacement of cysteine participating in disulfide bond formation by another residue; hydrogen bond loss, etc.). Then for proteins with the known three-dimensional structure, SNP are checked for conformity to these rules. Such approach was used for composing SNPs3D18 [50, 51] and LS-SNP19 [52] databases. Analysis of mutations, for which data from the literature on association with diseases do exist, indicated that about 83% of mutations reduce protein stability, whereas only 5% of mutations concern residues participating in ligand binding or catalysis [50]. Finally, there are methods, which use both structural data and data on conservatism for mutation role forecasting [53, 54]. An interesting group is formed by methods, which apply sets of mutations (neutral and experimentally confirmed unfavorable ones) for learning machine classifiers, such as SVM (support vector machine) [55, 56], decision trees [57] and random forest [58]. The advantage of use of learning samples composed on actual data is independence of human set of rules. Moreover, it becomes possible to find out, what factors are the most important for discrimination of neutral and unfavorable mutations. For instance, it has been shown [59] that a set of 32 considered criteria may be reduced to two criteria only without forecast accuracy loss. One structural parameter (the amino acid residue square accessible for solvent) and one of position conservatism parameters based on multiple alignment of superfamily proteins were found the most important. Entropic image of a family of homologic proteins – —the way to detect conservative, important for the structure and catalysis, and variable parts of the polypeptide chain. When discussing the structure of active sites of enzymes, two structural components of the active site shall be marked off: 1. sorption subsite responsible for binding, fixation and orientation of substrates; properties of this site define specificity of the enzyme; 2. catalytic subsite carrying out chemical transformation of substrate molecules and usually using general acid-bace catalysis for this purpose. 15
http://blocks.fhcrc.org/~pauline/ http://www.bork.embl-heidelberg.de/PolyPhen/ 17 http://www.pantherdb.org/tools/csnpScoreForm.jsp 18 http://www.SNPS3D.org 19 http://www.salilab.org/LS-SNP 16
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
17
One may expect that in the framework of one superfamily the sorption subsite responsible for enzyme specificity may be represented by a broad variation of the protein structure coinciding with the variation of substrate structures. At the same time, catalytic sites, the number of types of which is rather limited, seem to be conservative elements of the structure. To confirm this suggestion, a bioinformative approach based on comparison of amino acid sequence in proteins of one large family was used [10, 11]. The results of sequence alignment for several large families of enzymes, represented in HSSP database20, were analyzed. Enzyme families were selected basing on the following criteria: 1. the number of analyzed family representatives must exceed 100 that provides statistical confidence of the results; 2. for analysis, a family of anzymes from different classes (oxidoreductases, hydrolases, isomerases, etc.) shall be selected; 3. if possible, selected enzymes shall have structures of active sites and catalysis mechanism determined with high degree of confidence. Usually alignment is presented as large tables obtained by imposing a protein sequence on a sequence taken for the base. Conservative elements of the sequence are determined visually, by comparison. Obviously, this method becomes low-informative and extremely unsatisfactory, if more than three-five proteins are compared. It may be automated by characterizing quantitatively the conservatism of amino acid position in the sequence. One of quantitative criteria of position conservatism of each amino acid in the protein sequence may be statistical criterion of the Shannon entropy form. Note that in the information theory, Shannon entropy is one of the most important functions. This function was introduced as the measure of uncertainty, which characterizes any event with a definite probability. Hence, information may be defined as a measure of uncertainty quantity, which is improved after receipt of message. Formally, the quantity of information is presented by the difference between informational entropies before and after the experiment (message receipt). The informational entropy (Shannon‘s entropy) seems supremely suitable function by comparison of allied proteins with various sequences of amino acids. Sequence alignment procedure in relation to any reference protein represents disposition of sequences one above the other with fixation of homologic parts and detection and elimination of inserts. Thus, by comparison of a large number of proteins the probability of occurrence of one or another amino acid in every position of the protein sequence may be calculated with adequately. This probability is determined as relative frequency of amino acid j occurrence in particular position I. For each position in the protein sequence, for all 20 amino acids the entropy function may be calculated:
H j pij log 2 pij . i
An important feature of this function is the fact that it approaches zero for both events with high (рji 1) and low (рji 0) probabilities. Therefore, resulting calculations of 20
http://www.sander.embl-heidelberg.de/hssp/
18
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Shannon‘s entropy positions in the protein sequence, which are general (absolutely conservative) for this j-th amino acid for all large family of proteins, may be determined. This is the position, in which probability of occurrence of this amino acid approaches unit, whereas for the rest amino acids it approaches zero. High Shannon‘s entropies are typical of positions in the sequence with high variability of amino acids, and low entropies are typical of amino acids with conservative positions in the amino acid sequence. In extreme case at р ji 1 (absolute conservatism), Нi 0. Using the Shannon entropy criterion, superfamilies of proteins were analyzed. Families unite proteins of any one of the number of structures and origins. For example, analyzed family of trypsin consists of more than 1,200 proteins, including such enzymes, as chymotrypsin, kallikrein, plasmin, hypostatin, neuropsin, coagulation factors IX and X, thrombocytes aggregation proteases, activator of hepatocyte growth factor, elastase, transmembrane tryptase, thrombin and many others (see entropic portrays of enzymes below). It is of interest to consider utmost conservative amino acids, for which Нi 0 (or approaches zero). The analysis determined the following regularities. 1. Amio acids, which form catalytically active site, always appear to be conservative elements at amino acid sequence alignment in the enzymes. It is known that in acid proteases of pepsin type the catalytic site includes carboxylic groups of two aspartic acid residues, Asp32 and Asp215 [17]. At alignment of amino acid sequence in pepsin protein family these aspartic acids manifest themselves as conservative positions with minimal Shannon‘s entropy. 2. As amino acid sequences in enzymes are compared, glycin and aspartic acid are most frequently observed as absolutely conservative amino acids. The result that glycin is the most conservative amino acid is somewhat unexpected. The second is aspartic acid; hence, glycin and aspartic acid totally give ~50% of all conservative amino acids. Amino acids were rated for conservatism manifestation in the studied families. For this purpose, for every amino acid the frequency of its occurrence as conservative element (Нj 0) was determined with the rate fixing for summarized general number of conservative positions for all amino acids in the studied families. Figure 1 shows the rating of amino acid conservatism. As follows from the data presented, glycin, aspartic acid, cysteine, proline and histidine are conservative amino acids most frequently occurred in the sequences, giving ~70% of all conservative positions in enzymes. Methionine and isoleucine are infrequently conservative. It is reasonable to subdivide the most concervativeconservative amino acids into two principally different groups: 1. amino acids participating in elementary acts of substrate molecule activation as acids anf bases (aspartic acid and histidine); 2. amino acids forming architectonics of the active site (glycin, cysteine, proline).
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
19
a
b Figure 1. Frequency of amino acid occurrence f as conservative elements in the structure of enzymes (a) and in the nature (b).
20
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
The bioinformative approach used demonstrates the outstanding role of aspartic acid and histidine in functioning of the active sites of enzymes. The principle forming the basis of functioning of the active sites of enzymes is coordinated action of nucleophilic and electrophilic components of the active site, which allows to reach high acceleration of reactions. It is obvious that aspartic acid is principally important for these processes. Ionized form of carboxylic group of aspartic acid is a powerful nucleophilic reagent in water molecule activation in proton transfer processes (pepsin, lysozyme, -chymotrypsin). Aspartic acid is also of principal importance for formation of complexes of metals forming active site of metal-dependent enzymes. In the protonated form carboxylic group of aspartic acid is the proton donor, thus implementing functions of electrophilic agent. The role of glycin in formation and functioning of active sites is not so obvious, as compared with aspartic acid. Clearly, conservative glycin residues do not play a significant role in chemical acts of molecule activation in the catalytic cycle. Having no substituents at -C atom, glycin is deficient of expressed chemical function. Nevertheless, glycin residues in the proteins structure are of high importance. The fact that conservative glycin residues are of principal significance for enzymatic catalysis follows from experiments, in which site-specific replacement of these conservative residues by any of amino acid was performed. As a rule, this led to complete loss or heavy decrease of the enzyme activity. It is apparent that conservative glycin residues are principally important for two functions. 1. Being the unique amino acid with the most energetically free rotation around С—N and С—С bonds of the peptide chain ( and angles by Ramachandran), glycin play the role of a nodal point providing a possibility to change direction of the polypeptide chain at "assembly" of amino acid residues into active site. Thus, the presence of conservative glycin residues allows explanation of the structural paradox of enzymatic catalysis. This paradox is that absolutely identical active sites are ―assembled‖ from various polypeptide chains. General feature for these chains is the presence of conservative glycin residues and stabilization factors of ―assembled‖ structure. For example, by means of disulfide bonds (taking the third place in the conservatism rating, cysteine also demonstrates high conservatism level). It is interesting to note that conservative glycin residues in enzymes are usually strongly "reversed‖ by the angle (С—С bond rotation in amino acid). 2. Conservative glycins may play the role of conformation "hinges‖ providing mobility of the active site. It is confirmed by the fact that in many cases, conservative glycin residues near catalytically active groups may be detected. For hydrolases from various families, the following motifs are conservative, for example: Asp215XGly217 (pepsin); Asp170XXGly173 (thermolysin); Asp32XGly34, His63Gly64, Gly119XSer221 (subtilisin), Gly173XSer177 (trypsin); His76Gly77, Ser153XGly155, Gly175XAsp177 (lipases). In the enzymes mentioned Asp, Ser, His amino acids are parts of structure of active sites.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
21
Of interest is that for some active sites, the values of angles and for amino acids, which are components of a catalytic active site, are beyond the limits of energetically ―relaxed‖ ones. This follows from Ramachandran map composition for amino acids forming the active site of oschymotripsin (His57, Asp102, Ser195), for example. The active site of this enzyme id conformation strained ( and values fall within energetically unfavorable area). The primary substrate transformation into final products proceeding in the enzymatic catalysis is associated with participation of many intermediates having structure different from the primary substrate. Glycin residues from the active site may play the role of ―relaxing‖ elements, which perform conformational adjustment of the site for subsequent elementary act. Cysteine and proline (the fourth and the fifth positions in the amino acid conservatism rating) play an important role in formation of architecture of the active site. As is known, proline is the unique amino acid, which unfolds polypeptide chain. Apparently, the role of cysteine residues concludes in that the required structure of the active site, composed of different parts of the polypeptide chain, frequently remote from one another, is "fixed" by formation of a chemical bond in as a disulfide bridge. For many enzymes, this accomplishes formation of the active site architecture. The methodology developed is principally important for studying molecular polymorphism of human enzymes. A noticeable number of singular replacements in structural gene elements of proteins and enzymes observed shall be reflected on the properties of these proteins.
Molecular Polymorphism of Human Enzymes Enzymes are the largest and the most highly specialized class of protein molecules. They are the basis for realization f molecular mechanisms, by which genes act. Enzymes catalyze thoushands of chemical reactions, which, finally, form the cellular metabolism. In this connection, molecular polymorphism of enzymes has a significant effect on the human status, the features of his behavior, and reactions of external impacts. Taking as the example several physiologically most important enzymes, manifestations of molecular polymorphism of these biomacromolecules on both the level of threedimensional structures and entropic portrays, and separate functions of the organism shall be considered.
Acetylcholinesterase Acetylcholinesterase (ACHE), (ЕС 3.1.1.7) of mammals, besides central nerve system, is also observed in peripheral tissues, such as sympathetic and parasympathetic ganglions, parasympathetic endings of organs, motor endings of effector neurons, and perspiratory glands. In blood, ACHE is mostly presented by in membranes of erythrocytes. Moreover, depending on generic belonging different quantities of ACHE may be contained in plasma. Biological role of ACHE is associated with regulation of cholinergic neurotransmission, as it is the catalyst of acetylcholine hydrolysis in the intersynaptic space.
22
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Recently, great attention is paid to ACHE physiological functions not related to cholinergic transmission [59, 60]. The role of ACHE in stimulation of nerve and muscle cell proliferation has been shown. ACHE may be a marker of early differentiation of cells. Recently, correlations between change of activity of this enzyme and development of some widespread diseases, cardiovascular system [61-64], atherosclerosis [65], Parkinson‘s and Altzheimer‘sAlzheimer‘s diseases [61, 66], in particular, have been determined. Human ACHE gene consists of 6 exons. As a result of alternative splicing, three different polypeptide chains, which do determine the stock of isoforms of this enzyme (ACHE-Т, ACHE-Н, ACHE-R) determining catalytic properties and differing by distribution in tissues only. Until 2005, four ACHE gene polymorphisms were described, and only two of them were associated with clinical implications. The first of clinically valuable polymorphisms is located on the distal promoter of ACHE gene, and its implications are associated with increased sensitivity to anticholinesterase preparations and, possibly, with implication of the Gulf War syndrome. The second polymorphism is associated with His353Asn replacement and leads to occurrence of truly changed ACHE shape on the erythrocyte surface. The 3D ACHE structure with indication of position of this amino acid replacement is shown in figure 2. This mutation defines occurrence of the so-called YT-2 blood group instead of native YT-1 one. AccontingAccounting for this circumstance is necessary for choosing donor-recipient pairs for blood transfusion.
Figure 2. The structure of human acetylcholinesterase. The place of His353Asn amino acid replacement resulting the single enzyme gene mutation is indicated.
Table 1. Amino acid replacement as a result of singular mutations in AChE genes AA replacement type, SwissProt numeration Arg34Gln Glu344Gly Gly57Arg His353Asn Pro561Arg Pro592Arg 1
Hydrophobic property in norm (HBn)1 -0.59 -1.22 -0.67 -0.64 -0.49 -0.49
Probability of AC occurrence on the globule surface in norm (Sn)2 99 82 64 83 82 82
AA charge in norm3 + 0 + 0 0
Hydrophobic property after mutation (HBm)1 -0.91 -0.67 -0.59 0.92 0.59 -0.59
Probability of AA occurrence on the globule surface after mutation (Sm)2 93 64 99 88 99 99
AA charge after mutation3
HBm/ HBn
Sm/Sn
0 0 + 0 + +
1.54 0.55 0.88 1.44 1.20 1.20
0.94 0.78 1.55 1.06 1.21 1.21
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. 3 amino acid charge at рН 6-7. 2
24
S. D. Varfolomeev , I. N. Kurochkin and I. A. Garie
Figure 3. Entropic image of human acetylcholinesterase. Red points indicate places of amino acid replacements described resulting single enzyme gene mutations.
ACHE physiological value combined with small number of determined genetic polymorphisms of this enzyme allowed to formulate the statement that ―virtually each mutation in ACHE must be dangerous.‖. Intrinsically, this statement indicates that the greater part of ACHE protein globule is important for manifestation of its function. In 2005, in Israel a large-scale investigation of ACHE gene polymorphism for different ethnic groups [67] was performed, and 13 SNP, 10 new among which, were detected. In three cases, the presence of previously detected mutations Pro592Arg, His353Asn and synonymous replacement Рrо477Рrо were confirmed. Three of newly detected SNP are nonsynonymous, i.e., lead to replacement of amino acids in the enzyme structure (Arg34Gln, Gly57Arg,
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
25
Glu344Gly), four more SNP occur in the untranslated region, and one is in the intron2; two mutations lead to synonymous replacements of amino acids. 17 Haplotypes and 5 ethnospecific alleles have been identified. The authors of this investigation suggested a hypothesis about expression of detected mutations under stress and medicated impacts. In 2003, an investigation was performed in Spain [68] that allowed to determine a relation between ACHE activity decrease and Pro561Arg replacement in patients with Altzheimer‘sAlzheimer‘s disease. Analysis of mutation disposition in the 3D ACHE structure shows that all amino acid replacements found are remote from the active site of the enzyme and cause no significant effect on the catalytic activity. Figure 3 shows ACHE entropic image indicating four of six described amino acid replacements. Determined replacements fall within high Shannon‘s entropy values that is the additional indication of their weak effect on catalytic activity of the enzyme, defined by low entropic (conservative) amino acids, which form the active site structure. Summary table 1 shows amino acid replacements due to singular mutations if ACHE gene, leading to phenotypic manifestations. Table 1 also shows values of hydrophobic property and probability of occurrence of amino acids on the protein globule surface in norm and after mutation. Table 1 shows that for ACHE all substituted amino acids are hydrophilic, as well as mutations resulting them. Hence, at replacement of amino acids the value of hydrophobic property changes insignificantly (HBm/HBn ratio), as well as the probability of contact between primary and mutant amino acids with solution surrounding the protein (Sm/Sn).
Butyrylcholinesterase Butyrylcholinesterase (BCHE), (ЕС 3.1.1.8) is contained in various tissues of mammals: liver, heart, vascular endothelium, nerve system and blood plasma. At present, there are many hypotheses describing possible biological role of this enzyme in developing and mature organisms. It was shown that BCHE plays the key role in neurogenesis [69]. In mature organisms, at their poisoning by low doses of organophosphorous compounds or carbamates, BCHE conceivably performs protective function [70] binding some part of toxicant appeared in the organism, thus, decreasing its acute toxicity. BCHE participates in metabolism processes of wide spectrum of endogenic and exogenic substrates and biotransformation of xenobiotics containing ester group. Hence, the main metabolic path for cocaine elimination from the human organism is its hydrolysis by BCHE [71]. Inactive precursors of many drugs are activated metabolically by their hydrolysis in the presence of BCHE. This effect was demonstrated on the antitumor agent irinotecan [72], antiasthmatic preparation bambuterol [73], a series of protective means against radiation – O-acyl serotonin derivatives [74]. Of special clinical interest is BCHE activity in the organism, when neuromuscular relaxant succinyl choline and analogous compounds hydrolyzed under the action of this enzyme are used. Low level of BCHE activity, which may be stipulated by its inhibition by anticholinesterase substances, at Altzheimer‘sAlzheimer‘s disease therapy or contact with insecticides [75], in particular, and the presence of genetic variants of BCHE in man, leads to an anomalous reaction to application of such relaxants - duration of their action increases that leads to asphyxia [76-78].
26
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Thus, therapeutic effect of many drugs depends on the level of butyrylcholinesterase activity of the organism. The use of each particular preparation and its dosage shall be strictly individual. That is why, in case of treatment using drugs based on carboxylic esters and at the first contact of a man with an anticholinesterase preparation the level of butyrylcholinesterase activity of the organism that may prevent occurrence of undesirable, frequently irreversible and dangerous consequences for the organism. Beside usual form (U), BCHE of the human blood plasma has about 9 phenotypic isoforms existing due to various genetic mutations [79]. The exact number of BCHE isoforms is not determined, because it is not confirmed that newly described forms are inconsistent with the previously known ones. By now, about 20 phenotypes are described, but only 10 of them may be identified clearly with the use odof standard biochemical methods. They are: ―atypical,‖, ―silent‖ (S), fluoride-resistant (F), as well as K (Kalow) и J (James), which are BCHE homo- and heterozygous variants [80]. Qualitatively they are separated by relation to inhibition in the presence of fluoride ions, dibucaine (used in medicine as local anesthetic) and (2-hydroxy-5-phenylbenzyl)-trimethylammonium bromide (Ro-02-0683) [79-81]. From clinical positions, ―atypical‖ isoform of BCHE, for which Asp98Gly, and in some cases Asp98His, replacement is typical, is of the highest importance. These mutations are present in the peripheral anionic site of the enzyme and result in 10-fold decrease of binding ability of positively charged substrates. People with such genotype are characterized by anomalous response to injection of short- acting relaxants succinyldicholine and mivarucium chloride that is expressed in 2-5-hour asphyxia (apnea) after relaxant elimination (at the background of 3-5-minute asphyxia in norm) and durable paralysis [82]. Moreover, they are more perceptive to the impact of anticholinesterase preparations, including toxic organophosphorus compounds that stipulates high risk of acute or belaid neurotoxic effects in them at contact with such compounds. For ―fluoride resistant‖ BCHE isoforms of the first and the second type amino acid replacements Thr271Met and Gly418Val were described, respectively [59]. In Japan, two more replacements, Leu335Pro and Leu358Ile, typical of this BCHE type were detected [83]. These mutant forms of the enzyme are characterized by high fluoride and low dibucaine numbers. For this BCHE form, anomalous response to injection of relaxants is lower than for ―atypical‖ form and is manifested by 30-minute asphyxia after relaxant elimination. ―K-form‖ of BCHE is characterized by Ala567Thr amino acid replacement and associated low dibucaine number [84]. This mutation is observed in patients with Altzheimer‘sAlzheimer‘s disease and other kinds of dementia. For BCHE in the ―J-form,‖, a mutation was detected [85] that led to Glu525Val replacement and manifested itself in decrease of catalytic activity of the enzyme and atypical responses to injection of relaxants. Recently, mutations were determined that led to a change in BCHE level and stability. For example, in Japan Tyrl56Cys amino acid replacement was detected [86] that led to a significant decrease of BCHE activity. In India, BCHE form with Leu335Pro replacement was determined [87] that led to very low level of the enzyme due to destabilization of its structure inducing a rapid decay of enzyme in the organism. Spatial disposition of amino acid replacement points due to BCHE gene mutations is shown in figure 4. As for ACHE, all detected amino acid replacements are observed in highly variable part of entropic image of the enzyme (see figure 5). In this case, however, such transformations of the protein globules lead to noticeable changes in catalytic (affinity
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
27
decrease of anionic site in relation to positively charged substrates) and regulatory properties (occurrence of fluorine resistance, dibucaine number decrease), and to BCHE structure destabilization, as well. Analysis of the summary table of replacements (see table 2) for BCHE shows that replacements of both hydrophilic and hydrophobic AA are typical of this enzyme. Note that in 7 of 9 cases, AA transformation at gene mutation increases hydrophobic property. Hence, strong changes in AA hydrophobic property index (by 10 times for Tyr156Cys and by 2.5 times for Leu335Pro) reduce the enzyme stability. Amino acids observed either import positive charge to the protein structure or do not change it.
Figure 4. The structure of human butyrylcholinesterase. The places of amino acid replacements described resulting single enzyme gene mutations are indicated.
Table 2. Amino acid replacement as a result of singular mutations in BuChE gene
1
AA replacement type, SwissProt numeration
Hydrophoby in norm (HBn)1
Probability of AC occurrence on the globule surface in norm (Sn)2
AA charge in norm3
Hydrophobic property after mutation (HBm)1
Ala567Thr Asp98Gly Asp98His Glu525Val Gly418Val Leu335Pro Leu358Ile Thr271Met Tyr156Cys
-0.4 -1.31 -1.31 -1.22 -0.67 1.22 1.22 -0.28 1.67
62 85 85 82 64 55 55 77 85
0 0 0 0 0 0
-0.28 -0.67 -0.64 0.91 0.91 -0.49 1.25 1.02 0.17
Probability of AC occurrence on the globule surface after mutation (Sm)2 77 64 83 46 46 82 40 60 55
AA charge after mutation3
HBm/ HBn
Sm/Sn
0 0 + 0 0 0 + 0 0
0.70 0.51 0.49 -0.75 -1.36 -0.40 1.02 -3.64 0.10
1.24 0.75 0.98 0.56 0.72 1.49 0.73 0.78 0.65
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. 3 amino acid charge at рН 6-7. 2
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
Figure 5. Entropic image of human butyrylcholinesterase. Red points indicate places of amino acid replacements described resulting single enzyme gene mutations.
29
30
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Paraoxonases An important group of esterases comprises paraoxonases (phosphoric ester hydrolases, organophosphate hydrolases, A-esterases, phosphate triesterases) [59]. Paraoxonase (PON1) is the member of protein family also including PON2 and PON3, which genes are clustered on the long arm of human chromosome 7 (q21.22). Paraoxonases demonstrate high homology and have ~65% identity by amino acids [88]. Mammals have three paraoxonase genes. They are highly conservative and demonstrate 79-95% identity by amino acids and 81-95% identity by nucleotides for various species [88-90] that allows suggestion of their important physiological role. PON-like proteins may be found in all species of animals, and even in fungi and bacteria. Figure 6 shows PON structure of rabbit having 84% homology with human paraoxonase.
Figure 6. The structure of rabbit paraoxonase. The places of amino acid replacements described resulting single enzyme gene mutations are indicated.
PON1is synthesized in liver, wherefrom it is secreted to the plasma, where it is strongly bound to high density lipoproteins (HDL) [91, 92]. Extremely high activity of paraoxonase is observed in liver and plasma in rats, over 50% of total paraoxonasa being present in the plasma [93].
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
31
Primarily, PON1, the most well studied member of the PON gene family, was described as A-esterase-organophosphite hydrolase. PON1 was named after paraoxone, the first and one of the most well studied substrates. PON1 only possesses the paraoxonase activity, whereas all three PON possess any arylesterase and lactonase activity. To provide stability and manifestation of enzymatic activity, PON1 requires Са + presence [94-96]. Recently [97], PON1 activity was studied using over 50 substrates belonged to three different classes: esters, phosphotriesters and lactones. Another enzyme related to RON family is diisopropyl fluorophosphatase of mammals, for which identity with the marker aging protein-30 is currently indicated. This enzyme also hydrolyzes soman, sarin, but not paraoxone [90-100]. PON1 plays the key role in OPC detoxication. Serum A-esterases may hydrolyze active metabolites (oxones) of some OPC (paraoxone, chlorpyriOPC-oxone, diazinon-oxone, pyrimiOPC-methyl-oxone) or phosphorylic compounds (tabun, DFP, sorin, soman, dichlorvos), playing the central role in detoxication of these compounds and their toxicity. Compared with animals having high activity of paraoxonase (rats and especially rabbits), animals with low content of this enzyme (birds) are more sensitive to the action of some OPC [94, 101]. The recent data obtained in experiments on animals [95] convincingly demonstrated the main role of paraoxonases in detoxication of thionephosphorylic OPC, which way of metabolism is P450/PON1. Low paraoxonase level in young animals explains, at least partly, their high sensitivity to OPC [94, 95, 102]. In the series of pathologies associated with atherosclerosis, PON1 activity is decreased. Hence, the reverse dependence between paraoxonase activity and the risk of cardiovascular diseases is observed [103-105] that testifies about important clinical value of this enzyme. PON1 is the protein component of high density lipoprotein which, most likely, defines their antioxidant properties [106-108]. It is shown that decreased PON1 activity (by paraoxone and diazoxone) is one of the risk factors of some cardiovascular disease development [109]. In human populations, serum paraoxonase manifests substrate-dependent polymorphism and high variability of PON1 level in the blood plasma among individuals. PON1 polymorphism associated with Gln192Arg replacement determines different catalytic activity in relation to some organophosphorus substrates [110] and increases aptitude to coronary artery diseases and mortality among women in the second part of life [111]. Moreover, the effect of this mutation on the ability to adapt, the rate and quality of aging is indicated. It is of interest that mutation in the neighbor position Gln191Arg has no significant effect on the risks of coronary artery disease development, but just significant decreases catalytic activity of PON1 [112]. Polymorphism in position 108 (С/Т) of PON1 gene makes the main contribution into differences at the level of PON1 expression and, apparently, significantly affects PON activity in plasma. These two factors significantly determine the individual sensitivity to OPC action. Leu55Met and Met54Leu polymorphisms are associtedassociated with increased risk of diabetes development and the effect of glucose on metabolism, and tolerance to insulin [113117]. For PON2, Cys311Ser polymorphism was detected, with which risks of Altzheimer‘sAlzheimer‘s disease and vascular dementia development is associated [118].
32
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Disposition of determined amino acid replacements on the entropic image of paraoxonases (see figure 7) shows that in this case, all of them fall within the highly entropic zone. Summary table of detected AA replacements is shown below.
Figure 7. Entropic image of rabbit paraoxonase having 84% homology with the human paraoxonase. Red points indicate places of amino acid replacements described resulting single gene mutations of the human enzyme.
Carboxylesterases Carboxylesterases (CE), (ЕС 3.1.1.1) of mammals represent a large group of enzymes localized in endoplasmic reticulum and cytozole of cells of many tissues [59]. Maximal carboxylesterase activity in liver microsomes was observed. Rather high activity is typical of the plasma. CE is also present in narrow intestine and colon, stomach, brain, monocytes and macrophages [119]. This group of enzymes catalyzes hydrolysis of lipophilic ether, thioether and amide containing substrates [120, 121]. A broad CE substrate specificity determines the cell possibility to mobilize a spectrum of various ether compounds. They participate in detoxication and metabolic activation of various medicinal preparations, natural toxicants and carcinogens. A great number of exogenic substances are CE substrates. They comprise cocain, capsaicine, palmitoyl-coA, haloperidol, imidapril, salicilates, steroids, etc. [122-126]. For CE of EST2 type, Arg206His polymorphism typical of Japan and classified as medicinal preparation affecting human metabolism was found [127].
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
33
Alcohol Dehydrogenases Alcohol dehydrogenases (ADG) of mammals (alcohol-NAD+-oxidoreductase), (1.1.1.1), are dimmers consisting of subunits with the molecular weight about 40,000 and containing Zn2+ ion [128]. The 3D ADG I structure (ADG2,2 allele) is shown in figure 8.
Figure 8. Alcohol dehydrogenase 1 structure (ADG2,2 allele). The places of amino acid replacements described resulting single enzyme gene mutations are indicated.
34
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
In the presence of nicotinamide adenine dinucleotide (NAD) ADG catalyzes oxidation of alcohols and acetals to aldehydes and ketones. Every subunit has areas participating in formation of binding sites of the substrate amd co-enzyme NAD. ADG classification was based on the differences of electrophoretic mobility. At present, 6 ADG classes are determined. Subunits forming the enzyme may be coded by identical or different genes. For instance, human ADG I is presented by multiple isoforms, which are coupled combinations of the three basic subunits (, , and ) subdivided, in their turn, in the variants , 1, 2, 3, 1 and 2 (ADG1, ADG1,2, ADG2,2, ADG3,2, ADG1,3, ADG2,3 alleles). ADG of classes II, III and IV consist of the pairs of identical subunits, , and , respectively. Isoenzymatic spectrum of ADG in liver reflects pathological changes in the organism, which is used for diagnostic purposes. The substrate ADG specificity of various classes has significant differences. First of all, their ability to ethyl alcohol oxidation is estimated. In a wide range of concentrations, this is the function of ADG I and ADG IV, namely. ADG II is an extremely limited participant in the ethanol oxidation. Of special attention is relatively low ability of ADG I and ADG IV to oxidize methanol. ADG II and ADG III have no such activity at all. The role of ADG II and ADG III classes in detoxication of alcohols is mostly associated with oxidation of long-chain alcohols. The important special function of ADG III is formaldehyde oxidation with participation of glutathione. ADG substrates widely comprise the compounds participating in the synthesis of some endogenic neuroregulators and hormones, as well as their catabolites. In particular, these are catecholamines and serotonine catabolites, many steroid hormone metabolites, intermediate products of cholesterol and bile acids synthesis, and retinol, as well [128]. The problems of evolution and occurrence of polymorphisms in the ADG structure have excited the curiosity of a broad range of investigators not only fundamentally, but due to significant differences in ADG in different races and ethnic groups, which are conjugated to ethanol acceptability by the man [128]. For Europeoids and Mongoloids, serious difference in ADG I activity were detected (for Mongoloids, specifically high active isoforms are ofteroften detected). Among them is Arg47His polymorphism of atypical ADG I (ADG2,2 allele) [129-133]. Activity of such enzyme in relation to ethanol is 50-100 times higher than that of ADG I isozymes typical of Europeoids. Of course, this is just a general, statistically reliable pattern of differences, although various variants of isoenzyme spectra exist in the framework of separate nations and ethnic groups. On the entropic image of this isoenzyme (see figure 9), amino acid replacement occurs in the range of rather high Shannon‘s entropies. Polymorphism 3 of the subunit in human ADG I Arg369Cys (ADG3,2 allele) weakens NAD cofactor binding and, possibly, reduces the risk of alcoholism development [134]. The presence of Gly78Stop muttion in the ADG I structure (ADG3,2 allele) increases the risk of Parkinson‘s disease [135]. It comes under notice that ADG I, for which the highest evolutional variability in the form of formation of new isoenzymes, possesses significant ethanol oxidizing activity [128]. Summary table of detected amino acid replacements for various ADG I allele forms is shown below.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
35
Alkaline Phosphatase Alkaline phosphatase (ALPL), (ЕС 3.1.3.1) is the enzyme from the class of hydrolases, which catalyze hydrolysis of phosphoric esters in the organism. Its function is to maintain the phosphate level required for various biochemical processes and phosphate transport to the cell. The enzyme consists of two identical subunits, which function alternately, and contains strongly bonded Zn atoms. Its molecular weight is 80,000. ALPL structure is shown in figure 10. Spatial disposition of polypeptide chains is known, and it is found that the reaction with substrate proceeds via the stage of enzyme phosphorylation1. Alkaline phosphatase is widespread in human tissues, especially in the mucous coat of intestine, osteoblasts, biliary duct walls in liver, placenta and lactating mammary gland. The highest ALPL concentration is observed in the bone tissue (osteoblasts), hepatocytes, cells of uriniferous tubules, mucous coat of intestine and placenta. ALPL participates in the processes associated with the bone growth. Therefore, its activity in child‘s serum is higher than in adults. Bone alkaline phosphatase is produced by osteoblasts – —large mononuclear cells on the bone matrix surface in the places of intense bone formation. Apparently, owing to extracellular location of the enzyme during calcification, a direct relation between bone disease and enzyme occurrence in the blood serum may be observed. In children, alkaline phosphatase level is high until adulthood. Activity increase of alkaline phosphatase accompanies rickets of any ethiology, Paget's disease, changes in bones related to hyperparathyroidism. The enzyme activity rapidly increases in case of osteosarcoma, cancer metastasis in bones, myeloma, megakaryoblastoma with bone affection. Alkaline phosphatase activity significantly increases at cholestasis. In contrast with amino transferase, the level of alkaline phosphatase remains normal or increases insignificantly at viral hepatitis. In 1/3 of patients with icterus and hepatic cirrhosis alkaline phosphatase activity increase was observed. Extraliver biliary obstruction is accompanied by sharp increase of the enzyme activity. Alkaline phosphatase activity increase is observed for 90% of patients with primary cancer of liver and at metastasis to liver. Its activity sharply increases in case of acute alcoholism at the background of chronic alcoholism. It may increase at curative prescriptions having hepatotoxic effect (tetracycline, paracetamol, phenacetin, 6-mercaptopurine, salicilates, etc.). On the first week of disease, about a half of patients with glandular fever demonstrate alkaline phosphatase activity increase. Women who administer antifertility agents containing estrogen and prohesteron may be subject to cholestatic jaundice and increased activity of alkaline phosphatase. Extremely high activity of this enzyme is observed in women with preeclampsia that is the consequence of placenta damage. Low activity of alkaline phosphatase in pregnant women indicates insufficient development of placenta. Beside the above-mentioned diseases and states, alkaline phosphatase activity increases in the following cases: increased metabolism in the bone tissue (at fracture healing), primary and secondary hyperparathyroidism, osteomalacia, renal rickets provided by vitamin D resistant rickets, combined with the secondary hyperparathyroidism, cytomegaloviral infection of children, extraliver sepsis, ulcerative colitis, regional ileitis, enteric bacterial infections, thyrotoxicosis. 1
http://obi.img.ras.ru/humbio/endocrinology/emptv/x00e3bd3 .htm#0014b260.htm
36
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Figure 9. Entropic image of alcohol dehydrogenase 1. Red points indicate places of amino acid replacements described resulting single enzyme gene mutations.
Reduction of the enzyme activity is observed at hypothyroidism, scurvy, expressed anemia, hypophosphatasia – —a heritable disease due to insufficient ALPL activity, characterized by rickets-like changes in the skeleton and urinary excretion of phosphoethanolamine and inheritable by autosomal-recessive type. All these diseases are related to ALPL gene mutations, which is localized in 1р36.1-34 b chromosome and comprises 12 exons separated by more than 50 kb2. Considering ALPL singular importance at the stage of early development of the human organism, special attention is devoted to obtaining information on molecular polymorphisms of this enzyme since the perinatal period. The extended and renewed database (184 mutations
2
http://obi.img.ras.ru/humbio/har/00238766.htm
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
37
for children of different age and adults) is present on special site3. At the same time, detected SNP, which were strongly associated by the investigators with unusual clinical responses, shall be noted. Ala179Thr replacement increases the risk of hypophosphatasia (in this case, bone demineralization and immature loss of teeth) development in children [136]. Hypophosphatasia in children is also induced by Arg71Cys and Gly334Asp replacements [137-139]. An increase of dedentition risk is observed for Pro108Leu [140, 141] and Ala116Thr replacements. Tyr263His mutation, observed in Japan [142], induces risks of brittleness of bone development and osteoporosis in women at the age of pausimenia. High frequence of Glu191Lys mutation related to the risk of moderate hypophosphatasia development is typical of European countries.
Figure 10. The structure of human alkaline phosphatase.
Disposition of determined amino acid replacements at the background of entropic image of human ALPL is shown in figure 11. It is of importance that mutations leading to detected decrease of ALPL activity in blood are observed in both high and low entropy zones. Table 5 shows summarized data on described amino acid replacements resulting single ALPL gene mutations, which correlate with valuable phenotypic manifestations.
3
http://www.sesep.uvsq.fr/database_hypo/Mutation.html
Table 3. Amino acid replacement as a result of singular mutations in PON genes
AA replacement type, SwissProt numeration
Hydrophobic property in norm (HBn)1
Met54Leu (PON1) Glnl92Arg (PON1) Leu55Met (PON1) Cys311Met (PON2)
1.02 -0.91 1.22 0.17
Probability of AA occurrence on the globule surface in norm (Sn)2 60 93 55 55
AA charge in norm3 0 0 0 0
Hydrophobi c property after mutation (HBm)1 1.22 -0.59 1.02 -0.55
Probability of AA occurrence on the globule surface after mutation (Sm)2 55 99 60 78
AA charge after mutatio n3 0 + 0 0
HBm/ HBn
Sm/Sn
1.20 0.65 0.84 -3.24
0.92 1.06 1.09 1.42
1
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. 3 amino acid charge at рН 6-7. 2
Table 4. Amino acid replacement as a result of singular mutations in ADG I gene
1 2 3
AA replacement type, SwissProt numeration (allele)
Hydrophobic property in norm (HBn)1
Probability of AA occurrence on the globule surface in norm (Sn)2
AA charge in norm3
Arg369Cys (ADG3,2) Arg47His (ADG2,2) Gly78Stop (ADG3,2)
-0.59
99
+
-0.59 -0.67
99 64
+ 0
Hydrophobi c property after mutation (HBm)1 0.17 -0.64
Probability of AA occurrence on the globule surface after mutation (Sm)2
AA charge after mutation3
HBm/ HBn
Sm/S
55
0
-0.29
0.56
83
+
1.08 0.00
0.84 0.00
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. amino acid charge at рН 6-7.
n
Table 5. Amino acid replacements as a result of singular mutations in AP gene AA replacement type, SwissProt numeration Tyr263His Pro108Leu Arg71Cys Glu191Lys Gly334Asp Ala179Thr Ala116Thr 1 2 3
Hydrophobic property in norm (HBn)1
Probability of AA occurrence on the globule surface in norm (Sn)2
AA charge in norm3
1.67 -0.49 -0.59 -1.22 -0.67 -0.4 -0.4
85 82 99 82 64 62 62
0 0 + 0 0 0
Hydrophobi c property after mutation (HBm)1 -0.64 1.22 0.17 -0.67 -1.31 -0.28 -0.28
Probability of AA occurrence on the globule surface after mutation (Sm)2 83 55 55 97 85 77 77
AA charge after mutation3
HBm/ HBn
Sm/S
+ 0 0 + 0 0
-0.38 -2.49 -0.29 0.55 1.96 0.70 0.70
0.98 0.67 0.56 1.18 1.33 1.24 1.24
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. amino acid charge at рН 6-7.
n
40
S. D. Varfolomeev , I. N. Kurochkin and I. A. Garie
Figure 11. Entropic image of human alkaline phosphatase. Red points indicate places of amino acid replacements described resulting single enzyme gene mutations.
Protein Phosphatases Protein phosphatases (F.K. 3.1.3.48) represent a large group of enzymes, which perform dephosphorylation of protein substrates and affect in a diversified manner on the functional activity of other enzymatic systems and the function of cells. Dephosphorylation is of the same importance as phosphorylation and, accordingly, protein phosphatases are the integral components of the signal systems controlled by protein kinases. In eukaryotic cell, about 30% proteins are phosphorylated. Thus, the reversible phosphorylation of proteins catalyzed by protein kinases and protein phosphatases regulates many intracellular processes. Sinceg the
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
41
early 1980s, protein phosphatases were actively studied, and by the mid- 1990s, about 200 intracellular and 30 receptor phosphatases were described. It is suggested that about 100-120 catalytic phosphatase subunits and much more quantity of regulatory subunits are encoded in the mammal genome. The combination of these subunits defines the multiformity of phosphatases and testifies about large quantity of targets and regulated functions. There are several alternatives of protein phosphatase classification. For instance, two large groups of these enzymes are separated [143-146]: 1. intracellular low-molecular phosphatases; 2. high-molecular phosphatases associated with the surface receptors. Among intracellular protein phosphatases, two big classes are separated: serine-threonine phosphatases and tyrosine phosphatase. In their turn, serine-threonine phosphatases are divided into two big classes. Phosphatases of the first class are inhibited by two thermostable and acid-resistant proteins, called inhibitor 1 and inhibitor 2 [147-149]. The first type includes phosphatase 1A (PP1) capable of dephosphorylating phosphorylase kinase alpha-subunit (P.K. 2.7.1.38), and this phosphorylation is inhibited by heparin and protamine. Representatives of the second class dephosphorylate beta-subunit of phosphorylase kinase and are insensitive to the action of protamine and heparin [150, 151]. To this class phosphatases 2А, 2В, 2С relate. Phosphatases 1A and 2A are exposed to the action of total specific inhibitors – —ocadaic acid and microcystine-LR which has no effect on phosphatase 2C [152, 153]. Phosphatase РР2В (or calcineurin) is inhibited by immunosuppressive preparations - cyclosporin and FK506 and represents a heterodimer consisting of catalytic, A (59 kDa), and Са2+-binding regulatory, В (19 kDa), subunits [154, 155]. Hence, beside the catalytic domain, catalytic subunit of the enzyme comprises a binding site with calmodulin [156, 157] and C-terminal autoinhibition domain, which elimination leads to permanent CaN activation [157, 158]. In 1991, Monkanen et al. [159] extracted the third type of phosphateses – —PP3 phosphatase with molecular mass 36 kDa, sensitive to protamine and intact to heparin. Tyrosine phosphatases dephosphorylate substrates by tyrosine residues. They are represented by the first discovered phosphatase 1B and T-cellular phosphatase. Phosphatase 1B (ion-dependent, vanadate, molybdate-sensitive phosphatase [160]). In the literature, the name of placental phosphatase is frequently used [161, 162]. This enzyme with 37 kDa molecular mass contains 321 amino acid residues. In cells, phosphatase 1B interlocks phosphorylation of ribosomal protein S6 by S6-kinase, and other insulin-induced effects, i.e., it is the antagonist to tyrosine protein kinase with a receptor bound to insulin [163]. Phosphatase 1B gene is the long arm of chromosome 20; the gene product is 50 kDa protein, which transforms to 37 kDa enzyme via proteolysis [164]. This phosphatase is localized in the endoplasmic reticulum, fixing to the membrane by C-terminal fragment [165]. T-cellular phosphatase (TCP) has the molecular mass of 48 kDa, and carboxyterminal sequence (11 kDa) comprises 200 amino acid residues [166]. TCP substrate is intracellular protein pp34 (serine-threonine kinase), which in dormant cell is phosphorylated by Tyr15, and its dephosphorylation regulates the beginning of mitosis [167]. TCP dephosphorylates synthetic peptides, which reproduce C-terminal sites of pp60c-src (Tyr525) phosphorylation and suggested sites of рр60с-src (Tyr416) и p51frg (Tyr412) autophosphorylation, in this manner possibly affecting their functional activity [168]. TCP is homological to phosphatase
42
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
1B: 65% by nucleotide sequence and 74% by amino acid residues. TCP gene is localized in chromosome 18 in man and mice [169]. VH1-like phosphatases (of dual specificity). Dual specificity phosphatases may dephosphorylate both tyrosine and serine/threonine residues. Dual specificity phosphatases are also homological to Cdc25, regulators of the cell cycle of yeasts. They activate cyclindependent kinases-2 (Cdc2/CDK1) resulting dephosphorylation of neighbor threonine and tyrosine residues. A method of protein phosphatases into three families: PPP Phosphatases, PPM Phosphatases, and PTP Phosphatases, is described. РРР and РТР phosphatases include phosphoserine- and phosphothreonine-specific enzymes; РРМ is the family of phosphatases activated by magnesium. PTP are phosphotyrosine-specific and dual specificity phosphatases. PPP phosphatases. The PPP family comprises protein phosphatases of РР2А, РР1, РР, РР6, РР2В, РР5 and РР7 types. PPM phosphatases are magnesium (Mg2+) activated phosphatases. PPM comprise phosphoderine- and phosphothreonine-specific enzymes. PTP phosphatases are phosphotyrosine-specific ones. Contrary to resine-threonine phosphatases, which are oligomers and subunit composition of which defines substrate specificity of the enzyme, all tyrosine phosphatases are monomeric enzymes. They are divided into two groups: transmembrane or receptor-like and cytosolic. Transmembrane receptor-like PTP phosphatases are classified by the structure of their extracellular domains, which may consist of both extremely short and branched chains. Branched chains are similar to ligand-binding domains of adhesion molecules (fibronectin type), the range of physiological functions of which is very broad. Basing on similarity to ligand-binding domains of adhesion molecules, ist has been suggested that extracellular domains of phosphatases also play the role of receptors. However, neither ligands for these receptors, nor a signaling system associated with them are discovered yet. PTP are cytosolic phosphatases. They are classified according to their domain structure. Their substrates are nucleus and cytoskeleton proteins. The important subclass is composed of SHP-1 and SHP-2 possessing SH2 domains. SH2compact globular domain interacts with the proteins containing phosphorylated tyrosine residue in a definite amino acid sequence. SH2domain comprises 100 amino acid residues. It is found in cytoplasmic sequences of receptors of many growth factors (in the part where the receptor is autophosphorylated by tyrosine residues), phospholipase C, and GAP-protein [170]. The function of this domain, in particular, is enzyme direction to substrate and making the enzyme-substrate interaction easier. Other cytosolic phosphatases are characterized by the presence of PEST sequences (Pro-Glu/Asp-Ser/Thr) in the C-terminal half of the molecule. Physiological functions of protein phosphatases. Serine-threonine phosphatases are antagonists to serine-threonite kinases and play the specific role in signal conduction inside the cell. For instance, 10 min after T-lymphocytes stimulation by the alternative path via CD2 dephosphorylation of cytosolic protein (19 kDa) is observed, but 2-4 h after it was phosphorylated again by serine residues [171-173]. In this connection, the intialinitial interest to tyrosine protein phosphatases was stipulated by the hope that they may be antitumor agents, because transforming effects are associated with activation of tyrosine protein kinases. It has been found, however, that some phosphatases do really amplify protein kinase signals. Phosphatases 1A and 2A inhibit the gene expression, induced by one of ALPL-1 (activator protein-1) transcription factors [174, 175]. The are able to inhibit the
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
43
intiationinitiation factor-2 and elongation factor-2 [176]. Under the effect of inhibitor of these phosphatases of ocadaic acid, formation of jun, c-OPC, fra-1 genes mRNA [174] and mRNA of gene encoding interleukin 2 [177] increases significantly. Phosphatases 1A and 2A regulate the action of cytotoxic lymphocytes of target cells [178]. Low doses of ocadaic acid intensify and high doses inhibit the response of cytotoxic lymphocytes. In activated T-cells, phosphatase PP2B dephosphorylates nuclear factors of activated Tcells that leads to their transfer into the nucleus, where they interact with other transcription factors. As found out in 1990s, phosphatase PP2B signal is sufficient, and in some cases necessary for supernormal heart growth. Polymorphous forms of protein kinases are rather widely described. For example, the following replacements in the structure of catalytic subunit PP1 of phosphatase are typical of Japan: Glu310Stop, Leu1098Stop, Leu1086His, Gln1062Lys, Thr330Asn, Glu674Stop and Pro1017Ala. These replacements are observed in 7% of population and lead to increased risk of lung cancer. Phe229Leu, Gly639Cys and Glu275Val replacements are manifested in 14% Japanese population and increase the risk of ovarian carcinoma, colon and stomach cancer [179]. Arg883Ser replacement was observed in 27% ethnic group of Pima tribe living in Arisona and affined to adiposis and diabetes development. Reduced insulin concentration in fasting plasma and high level of insulin mediated glucose capture as a response to insulin injection are associated with this mutation. In this ethnic group, 3-UTR polymorphism of (untranslated) sequence of PP1 phosphatase gene sequence (frequency of occurrence is 0.44) was observed. This leads to a 10-fold decrease of the enzyme expression level and is associated with resistance to insulin and type 2 diabetes [180]. Nonsynonimous replacement Asp905Tyr in amino acid sequence of catalytic PP1 unit correlates with the changes in insulin secretion and, as a consequence, with the occurrence of resistance to insulin in skeletal muscles. Moreover, a noticeable correlation between genotypes characterized by Asp905Tyr polymorphism and risk of Altzheimer‘sAlzheimer‘s disease was observed [181]. Replacements in the structure of PP2A beta-subunit leading to an increase of mammary gland cancer risk (Gly90Asp) and colon cancer (Gly15Ala, Leu499Ile, Val498Glu, Val500Gly, Ser365Pro) were found [182, 183]. Table 6 shows the summary information on detected phenotypically valuable mutations in РР1 and РР2А genes.
Angiotensin-Converting Enzyme Angiotension-converting enzyme (ACE), (ЕС 3.4.15.1) represents metalloproteinase containing zinc atom in the active site and being activated by chlorine ions [184]. Determination of the primary ACE structure of human endothelial cells allowed to detect high internal homology between two large doamains (357 amino acid residues each) of one polypeptide chain (1277-1278 residues) [185]. This ACE form is synthesized in all somatic cells, except of testicles [186, 187]. Each of the domains contains the active site and zinc atom; both domains are catalytically active, but are unequal. Active sites of the domains differ by peptide hydrolysis rates, rate of deceleration by specific ACE inhibitors and activation
Table 6. Amino acid replacement as a result of singular mutations in protein phosphatase genes
1 2 3
AA replacement type, SwissProt numeration
Hydrophobic property in norm (HBn)1
Probability of AA occurrence on the globule surface in norm (Sn)2
Arg883Ser Asp905Tyr Gln1062Lys Glu275Val Glu310Stop Glu674Stop Gly639Cys Leu1086His Leu1098Stop Phe229Leu Pro1017Ala Thr330Asn
-0.59 -1.31 -0.91 -1.22 -1.22 -1.22 -0.67 1.22 1.22 1.92 -0.49 -0.28
99 85 93 82 82 82 64 55 55 50 82 77
Gly15Ala Gly90Asp Leu499Ile Ser365Pro Val498Glu Val500Glu
-0.67 -0.67 1.22 -0.55 0.91 0.91
64 64 55 78 46 46
Hydrophobic property after mutation (HBm)1 РР1 (PPP1R3 gene) + -0.55 1.67 0 -0.67 0.91 0 0.17 0 -0.64 0 0 1.22 0 -0.4 0 -0.92 РР2А (PPP2R1B gene) 0 -0.4 0 -1.31 0 1.25 0 -0.49 0 -1.22 0 -0.67 AA charge in norm3
Probability of AA occurrence on the globule surface after mutation (Sm)2
AA charge after mutation3
78 85 97 46
0 0 + 0
55 83
0 +
55 62 88 62 85 40 82 82 64
HBm/ HBn
Sm/Sn
0 0 0
0.93 -1.27 0.74 -0.75 0.00 0.00 -0.25 -0.52 0.00 0.64 0.82 3.29
0.79 1.00 1.04 0.56 0.00 0.00 0.86 1.51 0.00 1.10 0.76 1.14
0 + 0 0
0.60 1.96 1.02 0.89 -1.34 -0.74
0.97 1.33 0.73 1.05 1.78 1.39
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. amino acid charge at рН 6-7.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
45
profile by chlorine ions [188-190]. In contrast with N-domain, C-domain activity depends on chlorine ion concentration. In the absence of chlorine ions, C-domain loses its activity; the maximal activity is observed at 200-800 mM concentration depending on the substrate used. In the absence of chlorine ions, N-domain preserves the activity and is fully activated at rather low concentration (10-15 mM). It is suggested that these differences are of significant physiological meaning. It is suggested that in the organism N-domain may perform specific hydrolysis of some physiologically important substrate, such as negative regulator of haemopoiesis, peptide AcSDKP [191], luliberin [190]. Convertion of enkephalin seven-membered precursor Tyr1Gly2-Gly3-Phe4-Met5-Arg6-Phe7 to enkephalin is also performed mostly by N-domain. Most probably, in vivo at convertion of this heptapeptide to (Met5)-enkefalin the predominant role is also played by N-domain [192]. At the same time, Leu5-enkefalin and Met5-enkefalin are faster degraded by C-domain. Angiotensin I and bradykinin are hydrolyzed by both domains, angiotensin I being degraded somewhat faster by C-domain. Specific ACE inhibitors used in clinics reduce activity of both domains [193], being somewhat different by efficiency that is stipulated, in general, by difference in the dissociation rate. Depending on predominant interaction between inhibitors and one of two active sites, their biological effect may vary at their application as drugs. ACE plays an important role in blood pressure regulation and electrolyte balance by hydrolysis of angiotensin I to angiotensin II. Angiotensin II is a potential vasopressor and aldosterone-stimulating peptide sustaining cardiovascular homeostasis. The effect of ACE on cardiovascular system is in many ways genetically stipulated. A connection between ACE gene polymorphism and its activity in blood and tissues and increased risk of occurrence of some cardiovascular diseases was observed. In case of ACE gene cloning, it was found that in nitron 16 a DNA fragment (Alu-repeat) consisting of 287 base pairs is either present (Insertion, I) or absent (Deletion, D) [194, 195]. Hence, a correlation betwennbetween D alleles and ACE level in blood, linpha and tissues was observed. ACE level in serum of healthy men allozygous by D allele (DD genotype was observed approx. in 36% of men) was almost twice higher than in ones allozygous by I allele (II genotype observed for about 17%) and moderate for heterozygous ones – —ID genotype (47%). ACE level in human heart was also associated with the gene polymorphism [196]. ACE gene D-alleles are assumed to be the risk factor of acute myocardial infarction [197], hypertension, coronary vessel spasm [198], left ventricular hypertrophy, extravasations, and high risk of atherosclerosis development [199]. In patients homozygous by D-allel an increased tonicity of smooth musculature of vessels is observed [200]. I-alleles area associated with increased endurance of sprotsmen under physical loads (runners, oarsmen, climbers) [201]. Genetic predisposition to cardiovasularcardiovascular diseases, including acute ischemias [202], is observed in men, which, according to common criteria, are characterized by low risk factors (usually, risk factros are excessive body mass, hypercholesteremia, lipoproteinemia, etc.) [198, 203, 204]. The study of 4,773 patients with diabetes demonstrated that ACE gene D-alleles were associated with the risk of the main disease complication with nephropathy, but not with diabetical retinopathy (2,010 patients were examined). This was observed for both achrestic and usual diabetes [205]. Basing on the analysis of 145 messages, which included examinations of 5,000 patients, a group of investigators from several European countries has concluded that D-alleles are associated with increased risk of coronary vessels
46
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
disease, acute myocardial infarction, extravasation and diabetical nethropathy, especially at atherosclerotic diseases, but not with hypertension [206]. However, this did not concerned malignant hypertension, for which a connection of D-alleles with the risk of disease was observed [207]. For the malignant form, ACE DD-genotype was observed three times more frequent than for benign form. Detected SNP in ACE gene promotor of A-239-T type increases the risk of Altzheimer‘sAlzheimer‘s disease and reduces the risk of acute myocardial infraction among Europeans. The occurrence of G in 20218042 position and А in 20221743 position 45-fold increases the risk of Altzheimer‘sAlzheimer‘s disease in the Arabian-Israeli group. A the same time, A-11599-G replacement in exon 7 reduces the risk of this disease for Europeans. Simultaneous presence of А-262-Т and А-11860-G mutations in exon 17 increases the risk of artherial pressure rise in the Africans. А-240-Т replacement in the gene promotor increases the risk of mammary gland cancer in Chinese women.
Cyclooxygenase Cyclooxygenase (prostaglandin endoperoxide synthase, prostaglandin-H-synthase, PGendoperoxide synthase, PG-synthase, PES), (ЕС 1.14.99.1) catalyzes convertion of polyunsaturated fatty acids to PG-endoperoxide (PGH))1, which is the general precursor for other prostaglandins and thromboxane [208, 209]. During this conversion the enzyme performs two catalytic reactions – —cyclooxygenase reaction, in which 15-hydroperoxy-POendoperoxide (PGG) is formed, and peroxidase reaction, by which two-electron deoxidation of PGG to PGH happens. Both activities are associated with one protein molecule [210-212]. Cyclooxygenase is the integral membrane protein, mostly observed in microsomal membranes [213]. The studies of subcellular localization indicate that this enzyme is associated with endoplasmic reticulum. In these cells cyclooxygenase was also detected in the nuclear fraction (nuclear membranes) and plasmic membranes [214, 215]. Cyclooxygenase is a homodimer with subunit molecular weight of about 72 kDa [210, 211, 216]. The enzyme contains 2 to 3 oligosaccharides Man9(GlcNAc)2 and Man-6(GlcNAc)2 per subunit of the protein [211, 217]. Molecular weight calculated for the primary structure and taking no account of oligosaccharide residues equaled 65.5 kDa [218-220]. Cyclooxygenase is also hemoprotein. The hemin-protein subunit complex in stoichiometric ratio of 1:1 is formed. It is shown that iron atom forms a complex with the protein via His309 residue [221, 222]. For the mechanism of catalysis Tyr385 residue is of importance [223-225]. Tyr residue interlocking by a soecificspecific agent [223] and site-specific mutation with Tyr replacement by Phe [224] led to full loss of cyclooxygenase but not peroxidase activity by PG synthase. Cyclooxygenase in two isoforms (COX-1 and COX-2) plays an important physiological role in homeostatic and compensatory-reducing processes via regulation of prostaglandin synthesis from arachidonic acid. Both isoforms of the enzyme are represented everywhere, but are unequally distributed in different organs and tissues, and are functionally different [208, 209].
1
http://obi.img.ras.ru/humbio/endocrinology/00133206.htm
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
47
COX-1 is the constructive enzyme permanently expressed in cells. COX-1 predominates in mucous cover of gastrointestinal tract, where it implements the cytoprotector function, in thrombocytes, with which it is connected by aggregate properties, in kidney cells and some other organs. Constitutional properties of COX-1 are explained by controlled synthesis of thromboxane A2, prostaglandin E2 and prostacyclin. Insufficient COX-1 activity induces damage of mucous cover of gastrointestinal tract, the risk of gastropathies development up to alarm bleedings and perforations, inclusively. Nonsteroidal antiinflammatory drugs suppress COX-1 activity. Thrombocyte COX-1 activity suppression accompanied by reduction of thrombocytes aggregate abilities increases the risk of hemorrhage at clotting disorders, vasculites, but appears desirable at heart diseases and some other pathological states. COX-2 is the inducible enzyme, which expression may be caused by inflammation mediators; it dominates in brain, genital organs, kidneys, mononuclear leukocytes of blood (monocytes) and tissues (macrophages). In kidneys, it is one of the important enzymes controlling water and sodium reabsirption and, via it, other functions. COX-2 affects blood circulation by stimulating vasodilatory prostacyclin-12 synthesis. It is found that COX-2 promotes spreading of malignant neoplasms and development of inflammatory processes in joints and muscles. A polymorphism of COX-1 gene promoter A-842-G leading to occurrence of persistence to aspirin was found [226]. 18 polymorphous modifications of COX-1 gene were detected, 7 of them were nonsynonymous Arg8Trp, Pro17Leu, Arg53His, Lys185Thr, Gly230Ser, Leu237Met, Lys341Arg) [227, 228]. 4 nucleotide replacements and one deletion in the intron sequence were additionally identified. For three identified replacements only phenotypic manifestations were detected. Leu237Met replacement (for spatial disposition of this replacement in the COX-1 structure see figure 12) increases the risk of colon cancer in Caucasians [227]. At the same time, Arg8Trp and Pro17Leu replacements in the signal peptide significantly reduce sensitivity to aspirin in European patients [228]. As a consequence of COX-1 gene polymorphous modifications, five of seven determined amino acid replacements are represented at the background of entropic image (see figure 13) of human COX-1 close homologue - rabbit COX-1 (84% identity). The data indicate that detected mutations may locate in the areas of both low and high Shannon‘s entropies. Polymorphism Е-8473-С in COX-2 gene stop codon significantly increases the risk of lung cancer [229]. Val511Ala mutation is associated with intestine cancer manifestation [230]. Figure 14 shows spatial disposition of this replacement at the background of the structure of human COX-2 close homologue – mouse COX-2 (88% identity). The replacement is located far from the active site, in the area with high Shannon's entropy (see figure 15). Detection of COX-2 gene polymorphism (guanine replacement by cytosine in position 765 of the promoter) reduces COX-2 expression and allows genetic determination of the risk of acute myocardial infraction and atherothrobic ischemic insult [231, 232]. Table 7 summarizes information on the replacements for cyclooxygenases described. It should be noted that phenotypically sufficient replacements in COX-1 are the ones by hydrophobic amino acids. In all cases, replacements by hydrophilic amino acids give no sufficient phenotypic effects.
Table 7. Amino acid replacement as a result of singular mutations in cyclooxygenase genes
1 2 3
AA replacement type, SwissProt numeration
Hydrophobic property in norm (HBn)1
Arg8Trp Leu237Met Pro17Leu
-0.59 1.22 -0.49
Arg53His Gly230Ser Lys185Thr Lys341Arg
-0.59 -0.67 -0.67 -0.67
Val511Ala
0.91
Probability of Hydrophobi Probability of AA occurrence AA c property AA occurrence on the globule charge in after on the globule surface in norm norm3 mutation surface after (Sn)2 (HBm)1 mutation (Sm)2 COX-1 (replacements with phenotypic manifestations) 99 + 0.5 73 55 0 1.02 60 82 0 1.22 55 COX-1 (replacements without phenotypic manifestations) 99 + -0.64 83 64 0 -0.55 78 97 + -0.28 77 97 + -0.59 99 COX-2 46 0 -0.4 62
AA charge after mutation3
HBm/ HBn
Sm/Sn
0 0 0
-0.85 0.84 -2.49
0.74 1.09 0.67
+ 0 0 +
1.08 0.82 0.42 0.88
0.84 1.22 0.79 1.02
0
-0.44
1.35
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. amino acid charge at рН 6-7.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
Figure 12. Cyclooxygenase-1 structure with indication of Leu237Met amino acid replacement.
49
50
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Figure 13. Entropic image of cyclooxygenase-1. Red points indicate places of amino acid replacements described resulting single enzyme gene mutations.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
Figure 14. Cyclooxygenase-2 structure with indication of Val511Аlа amino acid replacement.
51
52
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Figure 15. Entropic image of cyclooxygenase-2. Red point indicates the place of Val511Аlа amino acid replacement.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
53
Catalase Catalase (ЕС 1.11.1.6) is the enzyme of hydroperoxidase group; it catalyzes redox reaction, in which 2 hydrogen peroxide molecules form water and oxygen1. Catalase is widely spread in cells of animals, plants and microorganisms; it relates to chromoproteids, which have oxidized heme as prosthetic (nonprotein) group. Typical heme catalase has large molecular mass (250-300 kDa) and possesses extremely high catalytic activity: nearly any collision of its macromolecule with the substrate finishes by substrate degradation. Four subunits in the catalase molecule are folded so that N-terminal sequence of the polypeptide chain in each subunit passes through a loopbinding heme-containing domain of one subunit with the domain including spiral sequence of the neighbor subunit. Similar to the family of hemoglobin, heme-containing catalases have their own unique spatial organization, invariable during the evolution. All studied heme catalases have the same packing of the polypeptide chain, which is called the catalase type of folding. Catalase specificity in relation to substrate-reducer is low, therefore, catalase may catalyze not only H2O2 decay, but also oxidation of the lower alcohols. The function of catalase irs reduced to degradation of toxic hydrogen peroxide formed during various oxidative processes in the organism. This enzyme is present in many cells (including erythrocytes in blood and liver cells). Polymorphous modifications of the catalase gene increase the risk of some diseases. For example, G replacement by A in position 5 of intron 4 leads to occurrence of the risk of acatalasemia (hereditary absence or low level of catalase in blood leading to frequently repeating infections and gingivitis and ulitis; this disease is most widespread among Japanese - Takahara's disease) [233, 234]. Deletion polymorphism of the catalase gene is described. It is associated with the risk of aniridia (characterized by full or partial hypoplasia of iris of the eye accompanied by cataract, opacity of cornea, glaucoma, etc.; the frequency of occurrence of the pathology is 1 per 64,000 births to 1 per 96,000 births) [235]. SNP in position 844 of the start codon of the catalase gene is associated with increased risk of hypertension in Chinese [236].
Myeloperoxidase Myeloperoxidase (MPO), (ЕС 1.11.1.7) is the general name of enzymes of peroxidase subclass contained in the blood cells of myeloid sequence. MPO represents a hemecontaining protein with molecular weight 150 kDa2. MPO molecule consists of two heterodimers with heme incorporated in their structure and bound by disulfide bond. Each heterodimer consists of 59 kDa ans 1-3.5 kDa subunits bound by disulfide bridge. Figure 16 shows 3D structure of the enzyme.
1 2
http://obi.img.ras.ru/humbio/Biochem/0009552f.htm#empty/x0088854.htm http://obi.img.ras.ru/humbio/har/0038a3ab.htm
54
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
Figure 16. Myeloperoxidase structure.
Hydrogen peroxide, chlorine and myeloperoxidase of neutrophil form a system producing extremely toxic substances: hypochlorite and molecular chlorine. These substances oxidize and halogenate various components of bacteria and tumor cells, and in very high concentrations may damage tissues. Myeloperoxidaze attaches makes gleet green and, possibly, participates in suppression of inflammation by means of inactivation of chemoattractants and suppression of motor activity of phagocytes. Deficiency of myeloperoxidase is the most widespread disturbance of neutrophil function. The disease is autosomally-recessively inherited. Its occurrence is about 1:2000. In the absence of other diseases making protective forces of the organism weaker, primarily, noncompensated diabetes, no deficiency of myeloperoxidase is manifested. Its activity compensates other antimicrobial systems of phagocytes, for example, hydrogen peroxide formation increases. Bactericide action of neutrophils delays, but not fully disappears. When deficiency myeloperoxidase is combined with diabetes, resistance to infections is significantly reduced. Acquired deficiency of myeloperoxidase is observed at acute myeloblastosis and acute myelomonoblast leukemia. The following polymorphisms: Arg569Trp [237], Tyr173Cys [238], Met251Thr [239], Alal66Val [240], Leu406Trp [240], as well as the presence of deletions in exons 3 and 9 [240] lead to various forms of MPO deficiency in the cells of myeloid sequence. Disposition of determined amino acid replacements at the background of entropic image is shown in
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
55
figure 17. G-463-A polymorphism in MPO gene promoter significantly increases the risk pof Altzheimer‘sAlzheimer‘s disease in men and affects hormone-substituting therapy at atherosclerosis development [241, 242].
Figure 17. Entropic image of myeloperoxidase. Red points indicate places of two amino acid replacements described resulting single enzyme gene mutations.
Eosinophilic Peroxidase Eosinophilic peroxidase (EPX), (ЕС 1.11.1.7) is the general name of enzymes of peroxidase subclass contained in eosonophils. EPX is the heme-containing dimer protein with total molecular mass of 70 kDa, consisting of light (15 kDa) and heavy (55 kDa) subunits bound with one another by a disulfide bridge. EPX participates in such pathological processes
56
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
as acute respiratory distress-syndrome (the main reason for mortality of patients after operation) or recently described autoimmune syndrome X, at which blood vessels are damaged. EPX gene mutations leading to deficiency of its activity are described. This is mutation leading to His286Arg replacement and occurrence of an insertion segment at the junction of intron-exon-10 [243]. Figures 18 and 19 show disposition of the amino acid replacement in the EPX structure and position of this replacement at the background of the entropic image, respectively. The important moment is disposition of the replacement in the low-entropic (conservative) range that shall affect significantly catalytic properties of the enzyme.
Figure 18. The structure of eosinophilic peroxidase. The places of amino acid replacements described resulting single enzyme gene mutations are indicated.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
57
Figure 19. Entropic image of eosinophilic peroxidase. The red point indicates the place of Arg286His replacement resulting the single enzyme gene mutation.
Thyroid Peroxidase Thyroid peroxidase (TPO) relates to the class of oxidoreductases, the electron acceptor for which is hydrogen peroxide. This enzyme is contained only in the thyroid gland that gives it the unique property compared with other tissues – —this is the ability to oxidize iodide. TPO is the essential participant of biosynthesis of thyroid hormones. For this type of peroxidases, several mutations are described, which lead to amino acid replacements and, as a consequence, to increase of probability of defects manifestation in the system of iodide organic conversion. These replacements are: Ile447Phe [244], Tyr453Asp [245], Gly590Ser [245], Arg648Glu [246], Trp693Arg [247], Glu799Lys [247], and subsitutions Туr/Asp and Glu/Lys in exons 9 and 14 [245], as well. Table 8 summarizes the information about amino acid replacements in the structure of human peroxidases, which are anyhow associated with phenotypic manifestations. In may be noted that for TPO, in 5 cases of 6, the replacement is made by amino acid, which contact with the solution surrounding the protein is more probable than in norm.
Table 8. Amino acid replacement as a result of singular mutations in human peroxidase genes
1 2 3
AA replacement type, SwissProt numeration
Hydrophobi c property in norm (HBn)1
Probability of AA occurrence on the globule surface in norm (Sn)2
Ala166Val Arg569Trp Leu406Trp Met251Thr Tyr173Cys
-0.4 -0.59 1.22 1.02 1.67
62 99 55 60 85
Arg286His
-0.59
99
Arg648Gln Glu799Lys Gly590Ser Ile447Phe Trp693Arg Tyr453Asp
-0.59 -1.22 -0.67 1.25 0.5 1.67
99 82 64 40 73 85
Hydrophobic property AA charge after in norm3 mutation (HBm)1 Myeloperoxidase (MPO) 0 0.91 + 0.5 0 0.5 0 -0.28 0 0.17 Eosinophilic peroxidase (EPO) + -0.64 Thyroid peroxidase TPO) + -0.91 -0.67 0 -0.55 + 1.92 0 -0.59 0 -1.31
Probability of AA occurrence on the globule surface after mutation (Sm)2
AA charge after mutation3
HBm/ HBn
Sm/S
46 73 73 77 55
0 0 0 0 0
-2.28 -0.85 0.41 -0.27 0.10
0.74 0.74 1.33 1.28 0.65
83
+
1.08
0.84
93 97 78 50 99 85
0 + 0 0 + -
1.54 0.55 0.82 1.54 -1.18 -0.78
0.94 1.18 1.22 1.25 1.36 1.00
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA. the probability that over 5% of AA molecule surface contacts with solution surrounding the protein. amino acid charge at рН 6-7.
n
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
59
Superoxide Dismutase (SOD) Superoxide dismutase (superoxide‘superoxide-oxidoreductase, ЕС 1.15.1.1) is represented by a family of metalloenzymes, which catalyze dismutation of superoxide radicals. Superoxide dismutases are the main enzymes playing the key role in utilization of free radicals and oxidative damage of the cell1. By SOD activity, organs of mammals differ by tens of times. The highest Cu,Zn- and Mn-SOD activity was observed in liver. High Cu,Zn-SOD activity is observed in erythrocytes that allows to use blood as the source for extraction and purification of the enzyme. At present, three superoxide dismutase isoenzymes are known, which were found in man. Table 9 presents brief characterization of these three enzymes. Physiological function of SOD is associated with protection of cells against free-radical damage. In conditions of normal exchange suoeroxide dismutases preserve standard concentration of superoxide radicals at a particular level. In the literature, special attention is paid to the study of changes in erythrocyte SOD activity during aging of the organisms and at some diseases, such as hemolytic anemia, ischemia, and some neurolytic diseases. A significant role is belonged to superoxide radicals in development of inflammatory processes. These investigations resulted in the use of SOD as anti-inflammatory agent (orgotein, peroxynorm). Table 9. Human superoxide dismutase characteristics Nomenclature Cofactor
SOD1 Cu2+, Zn2+
SOD2 Mn2+
Localization
Cytoplasm
Mitochondria
Structure Molecular mass (kDa) Number of amino acids Number of exons Gene
Homodimer 32 153 5 21q22.1
Homotetramer 84 153 1 6q25
SOD3 Cu2+, Zn2+ Intercellular space Homotetramer 120 222 1 4р15.2
SOD-1 SOD-1 possesses the highest activity among all superoxide dismutases. Activity of this enzyme is independent of medium pH in the range of 5-9. Figure 20 shows 3D structure of SOD-1. The Cu-binding site of the enzyme comprises 4 His, and the Zn-binding site – 2 His and 1 Asp. Arg-141, which is sufficient for the catalytic activity, and Cys58 and 160 disulfide bridge, unique per subunit, are permanent. Each SOD-1 subunit has a barrel-shaped structure (beta-barrel) formed by 8 antiparallel beta-layers and contains 3 corbelled external loops. The dimer represents a prolate ellipsoid (33, 67, 36A). About 5% of the external surface of each subunit is occupied by the contact zone. The first and terminal pairs of beta-layers of the betabarrel and the zones of two loops in the sequences 47-82 and 100-112 of residues. Beta-barrel 1
http://obi.img.ras.ru/humbio/proteins/000fc6e4.htm
60
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
is asymmetric: beta-layers from 5 to 8 are shorter and have shorter number of hydrogen bonds than layers 1-4. The loops differ by sizes and the structure. The greatest loop comprises a disulfide bridge and the zone of Zn binding site. The disulfide bond links covalently the large loop and the beginning of beta-layer 8. The second loop has a small alpha-spiral zone.
Figure 20. Superoxide dismutase-1 structure. The places of amino acid replacements described resulting single enzyme gene mutations are indicated.
The distance between C atoms and active sites equals 33.8A. Separation of two active sites in space and their seeming identity allow a suggestion that strong dimer interaction provides, more likely, structural stability of SOD rather than enzymatic function. Amino acid residues His61 and Arg141 are important for realization of a catalytic act. Cu(II) and Zn(II) are located at the bottom of deep narrow channel, at a distance of 6.3A: Zn is fully submerged, Cu is more open and accessible for the solvent. Side His-61 chain forms the bridge between Cu and Zn. Cu ligands are His-44, His-46, His-61 and His-118; Zn ligands are His-61, His-69, His-78 and Asp-81. Position of metal-binding residues is stabilized by a complex of hydrogen bonds. Molecular surface of the active site channel is formed by 18 amino acid residues.
Table 10. Amino acid replacement as a result of singular mutations in human superoxide dismutase genes AA replacement type, SwissProt numeration
Hydrophobic property in norm (HBn)1
Probability of AA occurrence on the globule surface in norm (Sn)2
Ala112Gly Ala145Thr Ala4Val Ala95Thr Arg46His Asp90Ala Asp96Asn Cys6Phe Glu100Gly Glu21Lys Gly12Arg Gly16Ser Gly37Arg Gly41Asp Gly41Ser Gly72Ser Gly85Arg Gly93Ala Gly93Arg Gly93Cys His43Arg His80Ala Ile104Phe Ile113Thr Ile151Thr Leu106Val Leu112Stop Leu144Ser Leu38Val
-0.4 -0.4 -0.4 -0.4 -0.59 -1.31 -1.31 0.17 -1.22 -1.22 -0.67 -0.67 -0.67 -0.67 -0.67 -0.67 -0.67 -0.67 -0.67 -0.67 -0.64 -0.64 1.25 1.25 1.25 1.22 1.22 1.22 1.22
62 62 62 62 99 85 85 55 82 82 64 64 64 64 64 64 64 64 64 64 83 83 40 40 40 55 55 55 55
AA charge in norm3 SOD-1 0 0 0 0 + 0 0 0 0 0 0 0 0 0 0 0 + + + + + 0 0 0 0
Hydrophobic property after mutation (HBm)1
Probability of AA occurrence on the globule surface after mutation (Sm)2
AA charge after mutation3
HBm/ HBn
Sm/Sn
-0.67 -0.28 0.91 -0.28 -0.64 -0.4 -0.92 1.92 -0.67 -0.67 -0.59 -0.55 -0.59 -1.31 -0.55 -0.55 -0.59 -0.4 -0.59 0.17 -0.59 -0.4 1.92 -0.28 -0.28 0.91
64 77 46 77 83 62 88 50 64 97 99 78 99 85 78 78 99 62 99 55 99 62 50 77 77 46
0 0 0 0 + 0 0 0 0 + + 0 + 0 + 0 + 0 + 0 0 0 0 0
1.68 0.70 -2.28 0.70 1.08 0.31 0.70 11.29 0.55 0.55 0.88 0.82 0.88 1.96 0.82 0.82 0.88 0.60 0.88 -0.25 0.92 0.63 1.54 -0.22 -0.22 0.75
1.03 1.24 0.74 1.24 0.84 0.73 1.04 0.91 0.78 1.18 1.55 1.22 1.55 1.33 1.22 1.22 1.55 0.97 1.55 0.86 1.19 0.75 1.25 1.93 1.93 0.84
-0.55 0.91
78 46
0 0
-0.45 0.75
1.42 0.84
Table 10.(Continued)
1 2 3
AA replacement type, SwissProt numeration
Hydrophobic property in norm (HBn)1
Leu84Phe Leu84Val Phe45Cys Ser134Asn
1.22 1.22 1.92 -0.55
Probability of AA occurrence on the globule surface in norm (Sn)2 55 55 50 78
Ala16Val
-0.4
62
Arg213Gly
-0.59
99
AA charge in norm3 0 0 0 0 SOD-2 0 SOD-3 +
Hydrophobic property after mutation (HBm)1 1.92 0.91 0.17 -0.92
Probability of AA occurrence on the globule surface after mutation (Sm)2 50 46 55 88
AA charge after mutation3
HBm/ HBn
Sm/Sn
0 0 0 0
1.57 0.75 0.09 1.67
0.91 0.84 1.10 1.13
0.91
46
0
-2.28
0.74
-0.67
64
0
1.14
0.65
hydrophobic property by relative units by OHM scale, positive values correspond to hydrophobic AA and negative to hydrophilic AA the probability that over 5% of AA molecule surface contacts with solution surrounding the protein amino acid charge at рН 6-7
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
63
Figure 21. Entropic image of superoxide dismutase-1. Red points indicate places of amino acid replacements described resulting single enzyme gene mutations.
Investigations detected SOD-1 gene mutations at motor neuron diseases in 14-25% cases of family amyotrophic lateral sclerosis [248, 249], and in 5-7% patients with sporadic form of amyotrophic lateral sclerosis. Totally, by now over 60 mutations in this gene are described [250]. Among them, 52 are missence mutations, at which replacement of one nucleotide by another causes no change of the polypeptide molecule length. Among the replacements detected, 33 are nonsynonymous and are represented in the summary table 10 [249, 251-275], and at the background of SOD-1 entropic portray, as well (see figure 21). Beside this, deletion or insertion mutations, located in codons 126-133, as well as splicing mutation in the 3‘terminal end of intron 4 were determined. These mutations lead to translational frameshift and the change of polypeptide length. Analysis of locations of replacements on the entropic image of the enzyme indicates that of SOD-1 replacements in both high and low entropy zones are typical. It is interesting that not for all replacements in the low Shannon's entropy zone a significant decrease of catalytic activity of the enzyme is described. Only for two of 33 cases replacement by negatively charged amino acid is observed. In all other cases, no replacements by negatively charged amino acids were observed.
SOD-2 Beside SOD-1, the level of free radicals in the cell is also controlled by Mn-dependent SOD-2. Numerous investigations allow a suggestion that mitochondrial SOD-2 plays an important role in cell protection against stress [276, 277]. The detailed study of SOD-2 gene in patients with both family and sporadic amyotrophic lateral sclerosis has detected not mutations associated with development of the disease. However, recently some investigators determined Ala9Val polymorphism in SOD-2 gene and found that allele containing the alanine sequence is associated with increased risk factor of motor neurons disease development [277]. This polymorphism is also of interest, because, most likely, it affects cellular localization of the enzyme rather than its activity. Such
64
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
mutation changes the secondary structure by breaking the alpha-spiral, which integrity is important for enzyme transport from cytoplasm to mitochondrial matrix [278, 279]. Ala16Val replacement decreases the processing efficiency and, as a consequence, induces the risk of idiopathic cardiomyopathy [280].
SOD-3 For SOD-3 a polymorphism is described that leads to the amino acid replacement Arg213Gly [281]. In case of this mutation, a 10-fold increase of SOD-3 content in the plasma and increase of risk of ischemical heart disease are observed.
CONCLUSION Structural and catalytic polymorphism of hyman enzymes has a significant effect on metabolism of the human organism and risks of manifestations of many diseases. Singular replacements of bases in genes of enzymes may lead to replacements of amino acids in both low entropy and high entropy position of biocatalyst molecule; hence, replacements are more frequently observed in high entropy positions of proteins. Frequently, polymorphous modifications of amino acids in both high and low entropy positions have no effect on catalytic functions of the enzyme, but are tightly associated (entangled) with development of some diseases. Glycine is subject to polymorphous replacements more frequently than other amino acids. In the considered sequence of enzymes, asparagine is not subject to polymorphous replacements at all. The approaches to detection of amino acid residues involved in formation of catalytic sites of the enzyme and theoretical approaches to analysis of changes in coordinates of these amino acids at some singular replacements, developed on the basis of bioinformative methods, hold out a hope of creation of highly efficient methods for forecasting variations of the main functions of enzymes at changes in the structure of genes encoding these enzymes.
ACKNOWLEDGMENT The authors are thankful to V.V. Mal‘gin, the Head of Laboratory on Pharmacology, IPAS RAS, and G.F. Makhaeva, the Leading Sci., IPAS RAS, for presentesing material for description of properties and physiological role of esterases (ACHE, BCHE, CE, paraoxonases). The authors frankly appreciate G.V. Dubacheva, M.S. Osipova, M.V. Porus and L.G. Sokolovskaya for help in collecting information from the literature on existing polymorphisms of enzyme genes and for cooperation in technical preparation of the manuscript.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
65
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]
[20]
[21] [22] [23] [24] [25] [26] [27]
Varfolomeev SD, Chemical Enzymology, 2005, M.: Publ.Centre “Akademia”, 472. Soloviev YI, Kurennoi VI, Yakob Bercelius. Life and Activity. 1980, M.: Nauka, 319. Shamin AN, Biocatalysis and Biocatalysts (Historical sketch), 1971, M.: Nauka, 193. Webb L, Inhibitors of Enzymes and Metabolism, 1966, M.: Mir, 1066. Dixon M, Webb E, Enzymes, 1982, M.: Mir, 392. Berezin IV, Martinek K, Fundamentals of Physical Chemistry of the Enzymatic Catalysis, 1977, M.: Vysshaya Shkola, 280. Varfolomeev SD, Zaitsev SI, Kinetic Methods in Biochemical Surveys, 1982, M.: Publ. Moscow State Univ., 345. Varfolomeev SD, Gurevich KG, Biokinetics, 1999, M.: Fair-Press, 720. Varfolomeev SD, Pozhitkov A.E., Vestn. Mosk. Univ., Ser. 2, Khimia, 41, (2000), 147156. Varfolomeev SD, Gurevich KG, Poroinov VV, Sobolev BN, Fomenko AE, Dokl. RAN, (2001), 379, 548-550. Varfolomeev SD, Gurevich KG, Izv. Akad. Nauk, Ser. Khim., (2001), 10, 1629-1637. Varfolomev SD, Mendeleev Comm. 5. 2004. P. 185-189. Varfolomev SD, Uporov IV, Gariev IV, Uspekhi Khimii, (2005), 74, 67-83. Varfolomev SD, Chemical and Biological Kinetics. New Horizons, (2005), M.: Khimia, 2, 175-213. Gariev I.V., Varfolomeev S.D., Bioinformatics. 2006, 22, 2574-2576. Finkelstein A, Physics of Protein, (2002), M.: Nauka. Antonov VK, Chemistry of Proteolysis, (1991), M.: Nauka, 504. McKerell A.D., Wiyrkiewicz-Kuczeru J., Karplus M.// J.ann.chem. Soc. (1995). 117. 11946-11975. McKerell A.D., Bashford D., Bellot M., Dunbrack K.L., Evanseck J.D., Field M.J, Fisher S, Gao J, Guo H, Ha S, Josph-MacCarthy D., Kuchnir L, Kuczera K, Lan F.T.K, Smith J.C, Store R, Straub J, Wa-tanobe M, Wiyrkiewicz-Kuczera J, Yin D, Karplus M, II J. Phys. Chem. 1998. 102- P. 3586-3616. Schelenkrich M, Brickmann J, McKerell A.D, Korplus M, (1996) in a Molecular Perspective from Computation and experiment, Merz km, Roux B, eds, Birkhouser, p. 31-81. Cornell W.D, Cieplak P, Bayly C.I, Gould I.R, Merz K.M, Ferguson D.M, Spellmeyer D.S, Fox T, Coldwell J.W, Kollman P.A.// J arm. Chem. Soc. 1995. 117. 5179-5197. Varfolomeev SD, Uporov IV, Fedorov EV, Biokhimia (2002) 67, 1328-1340. Warshel A, Levitt M.I // J. Mol. Biol. 103- 227. (1976). Nemukhin A.V, Grigorenko B.L, Topol I.A, Burt S.K. // J. Сотр. Chem. 24. 2003. P. 1410. Nemukhin A.V, Grigorenko B.L, Rogov A.V, Topol I.A, Burt S.K. // Ther. Chem. Асе. III, 36. (2004). Koshland D.E, // Proc. Nat. Acad. Sci. USA. 44, 98. (1958). Nemukhin A.V, Gariev I.A, Rogov A.V, Varfolomeev S.D. (2006) // Mendeleev Communications, 16, 290-292.
66
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
[28] The International SNP Map Working Group (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms.// Nature, 409, 928933. [29] Brookes AJ (1999) Gene. 234(2), 177-86. [30] Kruglyak L, Nickerson DA. (2001) Nat. Genet. 27, 234-236. [31] Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2006) Nucleic Acids Res. 34, D16-20. [32] Wolfsberg TG, Wetterstrand KA, Guyer MS, Collins FS, Baxevanis AD. (2003) Nat. Genet., 35 (Supp 1), 4. [33] Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. (2001) Nucleic Acids Res. 29, 308-11. [34] International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome.// Nature, 409(6822), 860-921. [35] Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS.. (2005) Nucleic Acids Res. 33, D154-9. [36] Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE.. (2000) Nucleic Acids Res. 28, 235-42. [37] Altshuler D, Brooks LD, Chakravarti A, Collins FS, Daly MJ, Donnelly P; International HapMap Consortium. (2005) Nature. 437, 1299-320. [38] Cheung VG, Spielman RS, Ewens KG, Weber TM, Morley M, Burdick JT. (2005) Nature. All, 1365-9. [39] Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS, Abeysinghe S, Krawczak M, Cooper DN. (2003) Hum. Mutat. 21, 577-81. [40] Brookes AJ, Lehvaslaiho H, Siegfried M, Boehm JG, Yuan YP, Sarkar CM, Bork P, Ortigao F. (2000) Nucleic Acids Res. 28, 356-60. [41] Iida A, Saito S, Sekine A, Takahashi A, Kamatani N, Nakamura Y. (2006) Cancer Sci. 97, 16-24. [42] McKusick, V.A.: Mendelian Inheritance in Man. A Catalog of Human Genes and Genetic Disorders.// Baltimore: Johns Hopkins University Press, 1998 (12th edition). [43] Dantzer J, Moad C, Heiland R, Mooney S. (2005) Nucleic Acids Res. 33, W311-4. [44] Becker KG, Barnes КС, Bright TJ, Wang SA. (2004) Nat. Genet. 36, 431-2. [45] Ng PC, Henikoff S. (2001) Genome Res. 11, 863-74. [46] Ng PC, Henikoff S. (2002) Genome Res. 12, 436-46. [47] Ramensky V, Bork P, Sunyaev S. (2002) Nucleic Acids Res. 30, 3894-900. [48] Sunyaev S, Kondrashov FA, Bork P, Ramensky V. (2003) Hum. Mol. Genet. 12, 332530. [49] Brunham LR, Singaraja RR, Pape TD, Kejariwal A, Thomas PD, Hayden MR. (2005) PLoS Genet. 1, e83. [50] Wang Z, Moult J.. (2001). Hum. Mutat. 17, 263-70. [51] Yue P, Melamud E, Moult J.. (2006) BMCBioinformatics. 7, 166. [52] Karchin R, Diekhans M, Kelly L, Thomas DJ, Pieper U, Eswar N, Haussler D, Sali A.. (2005) Bioinformatics. 21, 2814-20. [53] Yue P, Moult J.. (2006) J. Mol. Biol. 356, 1263-74. [54] Sunyaev S, Ramensky V, Koch I, Lathe W 3rd, Kondrashov AS, Bork P. (2001) Hum. Mol. Genet. 10, 591.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism [55] [56] [57] [58] [59]
[60]
[61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83]
67
Yue P, Li Z, Moult J.. (2005) J. Mol. Biol. 353, 459-73. Karchin R, Kelly L, Sali A.. (2005) Рас. Symp. Biocomput., 397-408. Krishnan VG, Westhead DR.. (2003) Bioinformatics. 19,2199-209. Bao L, Cui Y. (2005) Bioinformatics. 21, 2185-90. Sokolovskaya LG, Sigolaeva LV, Eremenko AV, Kurochkin IN, Makhaeva GF, Malyigina VV, Zyikova IE, Kholstjv VI, Zavialova NV, Varfolomeev SD, Chemical and Biological Safety (2004) 1-2 (13-14), 21-31. Makhaeva, G., Filonenko, I., Fomicheva, S., Malygin, V. "Esterase profiles" of O,Odialkyl-0-dimethyl-chloroformimino phosphates in prediction of their toxic effects. Toxicol. Lett., 1996, v.88, Suppl.l, p.25. Small D. H., Michaelson S., Sbema G. Neurochem. Int., 1996, v. 28, p. 453-483. Billecke S, Draganov D, Counsell R. et al. Drug Metab. Dispos., 2000, v. 28, № 11, p. 1355-1342. La Du B. N, Billecke S, Hsu C. et al. Drug Metab. Dispos., 2001, v. 4, № 11, p. 566569. Antikainen M, Murtomaki S, Syvanne M. et al. J.Clin.Invest., 1996, v. 98, № 4, p. 883885. Blatter Garin M.-C, James R. W, Dussoix Ph. et al. J. Clin. Invest, 1997, v. 99, № 1, p. 62-66. Nicholls D. P, Maxwell A. P, Hasselwander O. et al. Atherosclerosis., 1997, v. 134, № 1-2, p.212. 7. Yamomoto M, Kondo I. Brain Res., 1998, v. 806, p. 271-273. Hasin Y, Avidan N, Bercovich D, Korzyn A.D, Silman I, Beckmann J.S, Sussman J.L. (2005), Current Alzheimer Research, 2,207-218. Clarimon J, Bertranpetit J, Calafell F, Boada M, Tarraga L, Comas D. J. Neurol. (2003) 250: 956-961. Barbosa M, Rios O, Velasquez M. et al. Surg. Neurol., 2001, v. 55, p. 106-112. Zhan C. G, Zheng F, Landry D.W. J. Am. Chem. Soc, 2003, v. 125, № 9, p. 2462-2474. Guemei A. A, Cottrell J, Band R. et al. Cane. Chem. Pharmacol., 2001, v. 47, p. 283290. Tunek A, Hjertberg E, Mogensen J.V. Biochem. Pharmacol., 1991, v. 41, № 3, p. 345348. Makhaeva G. F, Suvorov N. N, Ginodman L. M, Antonov V. K. Bioorg. Chem., 1977, v. 3, p. 1384-1399. Richardson, R.J. (1995) J. Tox. Env. Health, 44, 135-165. La Du B.N. et al (1990) Clin. Biochem., 23,423-431. Lockridge, O. (1990) Pharmacol. Therap., 47, 35-60. Pirmohamed, M, and Park, B.K. (2001) TIPS, 22 (6), 298-305. Simeon-Rudolf V, Reiner E, Evans R.T. et al. Chem. Biol. Inter., 1999, v. 119-120, p. 165-71. Simeon-Rudolf V, Kovarik Z, Skrinjaric-Spoljar M, Evans R.T. Chem. Biol. Inter., 1999, v. 119-120, p. 159-164. Satoh T, Hosokava M. Toxicol. Let., 1995, v. 82/83, p. 447-52. Pirmohamed M, Park K. Trends Pharmacol. ScL, 2001, v. 22, № 6, p.298-305. Sudo, K.; Maekawa, M.; Akizuki, S.; Magara, Т.; Ogasawara, H.; Tanaka, T, Biochem. Biophys. Res. Commun. 240: 372-375, 1997.
68
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
[84] Barrels CF, Jensen FS, Lockridge O, van der Spek AF, Rubinstein HM, Lubrano T, La Du BN. Am. J. Hum. Genet. 1992 May; 50(5): 1086-103. [85] Barrels CF, James K, La Du BN. Am. J. Hum .Genet. 1992 May; 50(5): 1104-14. [86] Hidaka, K.; Iuchi, I.; Tomita, M.; Watanabe, Y.; Minatogawa, Y.; Iwasaki, K.; Gotoh, K.; Shimizu, C, Ann. Hum. Genet. 61: 491-496, 1997. [87] Manoharan I, Wieseler S, Layer PG, Lockridge O, Boopathy R, Pharmacogenet. Genomics. 2006 Jul. 16(7): 461-8. [88] Primo-Parmo, S.L., Sorenson, R.C., Teiber, J., La Du, B. (1996) Genomics, 33, 498507. [89] La Du B.N., Aviram, M, Billecke S. et al, (1999) Chem. Biol. Inter., 119-120, 379-88. [90] La Du, B.N., Draganov, D. (2004) First International Conference "Paraoxonases - Basic and Clinical Directions of Current Research", Ann Arbor, MI, USA, April 23-24, 2004. wvw.umich.edu/pons-conference/ files/abstract_book.pdf. [91] Sorenson, RC, Bisgaier, C.L., Aviram, M., Hsu, C, Billecke, S., La Du B.N. (1999) Arterioscler. Thromb. Vasc. Biol, 19(9), 2214-2225. [92] Deakin, S., Leviev, I., Gomaraschi, M. et al (2002) J. Biol. Chem., 277(6), 4301-4308. [93] Pellin, M.C., Moretto, A., Lotti, M., Vilanova, E. (1990) Neurotoxicol. Teratol., 12, 611-614. [94] Costa, L.G., Li, W.F., Richter, R.J., Shih, D.M., Lusis, A., Furlong, C.E. (1999) Chem.Biol. Inter., 119-120, 429-438. [95] Costa, L.G., Richter, R.J., Li, W.-F., Cole, Т., Guzzetti, M., Furlong, C.E. (2003) Biomarkers, 8 (1), 1-12. [96] Costa, L.G. (2004) First International Conference "Paraoxonases - Basic and Clinical Directions of Current Research,", Ann Arbor, MI, USA, April 23-24, 2004. www.umich.edu/pons-conference/ files/abstract_book.pdf. [97] Khersonsky, O., Tawlik, D.S,. (2005) Biochemistry, Epub. ahead of print. March 26, 2005; doi:10.1021/bi047440d. [98] La Du, B.N. (1996) Nature Medicine, 2 (11), 1186-1187. [99] Karanth, S. and Pope, C. (2000) Toxicol. Sci., 58, 282-89. [100] Satoh, Т., Taylor, P., Bosron, W.P., Sanghani, S.P., Hosokawa, M., LaDu, B. (2002) Drug Metab. Dispos., 30 (5), 488-493. [101] La Du„. B.N. (1990) Clin. Biochem., 23, 423-431. [102] Karanth, S. and Pope, C. (2003) Int. J. Toxicol., 22, 429-433. [103] Billecke S., Draganov D., Counsell R. et al. (2000) Drug Metab. Dispos., 28(11), 13551342. [104] La Du, B.N., Billecke, S., Hsu, C, Haley, R.W., Broomfield, C.A. (2001) Drug Metab. Dispos., 4 (11), 566-569. [105] Fuhrman, В., Aviram, M. (2002) Ann. NY Acad. Sci., 957, 321-324. [106] Mackness, M.I., Arrol, S., Abbott, C.A., Durrington, P.N. (1993) Atherosclerosis, 104, 125-135. [107] Mackness, M.I, Durrington, P.N, Ayub, A, Mackness, B. (1999) Chem. Biol. Interact., 119/120, 389-397. [108] Nguyen S.D, Sok D-E. (2003; Biochem. J. Oct 15, 375(Pt 2): 275-285. [109] Jarvic, R.P, Hatsukami, T.S, Carlson, C, Richter, R.J, Jampsa, R, Brohy V.H. et al (2003) Arterioscter. Thromb.Vasc.Biol, 23, 1465-1471.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
69
[110] Furlong, C.E, Cole, T.B, Jarvik, J.P, Costa, L.G. (2002), Pharmacogenomics, 3 (3), 341-348. [111] Humbert, R.; Adler, D. A.; Disteche, С. M.; Hassett, C; Omiecinski, C. J.; Furlong, С. E. Nature Genet. 3: 73-76, 1993. [112] Antikainen, M.; Murtomaki, S.; Syvanne, M.; Pahlman, R.; Tahvanainen, E.; Jauhiainen, M.; Frick, M. H.; Ehnholm, C, J. Clin. Invest. 98: 883-885, 1996. [113] Brophy, V. H.; Jampsa, R. L.; Clendenning, J. В.; McKinstry, L. A.; Jarvik, G. P.; Furlong, С. E, Am. J. Hum. Genet. 68: 1428-1436, 2001. [114] Garin, M.-C. В.; James, R. W.; Dussoix, P.; Blanche, H.; Passa, P.; Froguel, P.; Ruiz, J, J. Clin. Invest. 99: 62-66, 1997. [115] Kao, Y.-L.; Donaghue, K.; Chan, A.; Knight, J.; Silink, M, J. Clin. Endocr. Metab. 83: 2589-2592, 1998. [116] Deakin, S.; Leviev, I.; Nicaud, V.; Meynet, M.-C. В.; Tiret, L.; James, R.W, J. Clin. Endocr. Metab. 87: 1268-1273, 2002. [117] Barbieri, M.; Bonafe, M.; Marfella, R.; Ragno, E.; Giugliano, D.; Franceschi, C; Paolisso, G, J. Clin. Endocr. Metab. 87: 222-225, 2002. [118] Janka Z, Juhasz A, Rimanoczy A A, Boda K, Marki-Zay J, Kalman J, Mmol. Psychiatry. 2002;7(1):110-2. [119] Saton T, Hosokawa M. Toxicol. Let, 1995, v. 82/83, p. 439-445. [120] Xie M, Yang F, Liu L. et al. Drug Metab. Dispos., 2002, v. 30, Is. 5, p. 541-547. [121] Saboori A. M, Newcombe D. S. J. Biol. Chem., 1990, v. 265, № 32, p. 19792-19799. [122] McRee D. Chem. Biol, 2003, v. 10, p. 295-297. [123] Brzezinski M.R, Spink B.J, Dean R.A. et al. Drug. Metab. Dispos., 1997, v. 25, p. 1089-1096. [124] Park Y.H, Lee S.S. Biochem. Mol. Biol. Int., 1994, v.34, p. 351-360. [125] Nambu K, Miyazaki H, Nakanishi Y. et al. Biochem. Pharmacol., 1987, v. 36, p. 17151722. [126] Takai S, Matsuda A, Usami Y. et al. Biol. Pharm. Bull., 1997, v.20, p.869-873. [127] Aritoshi Iida, 1 Susumu Saito,2 Akihiro Sekine,3 Atsushi Takahashi,4 Naoyuki Kamatani4 and Yusuke Nakamura1,2,5,6, Cancer Sci. 2006 Jan; 97(1): 16-24. [128] Ashmarin IP, Uspekhi Biologicheskoi Khimii, (2003), 43, 3-18. [129] Shea SH, Wall TL, Carr LG, Li TK, Behav. Genet. 2001 Mar; 31(2): 231-9. [130] Osier M, Pakstis AJ, Kidd JR, Lee JF, Yin SJ, Ко НС, Edenberg HJ, Lu RJ3, Kidd KK, Am. J. Hum. Genet. 1999 Apr; 64(4): 1147-57. [131] Ogurtsov PP, Garmash IV, Miandina GI, Guschin AE, Itkes AV, Moiseev VS, Addict. Biol. 2001 Sep; 6(4): 377-383. [132] Suzuki Y, Fujisawa M, Ando F, Niino N, Ohsawa I, Shimokata H, Ohta S, Neurology. 2004 Nov 9; 63(9): 1711-3. [133] Chai YG, Oh DY, Chung EK, Kim GS, Kim L, Lee YS, Choi IG, Am. J. Psychiatry. 2005 May; 162(5):1003-5. [134] Burnell, J. C; Carr, L. G.; Dwulet, F. E.; Edenberg, H. J.; Li, T.-K.; Bosron, W. F., Biochem. Biophys. Res. Commun. 146: 1227-1233, 1987. [135] Buervenich, S.; Carmine, A.; Gaiter, D.; Shahabi, H. N.; Johnels et. al., Arch. Neurol. 62: 74-78, 2005. [136] Weiss M.J., Cole D.E., Ray K., Whyte M.P., Lafferty M.A., Mulivor R.A., Harris H. (1988), Proc. Natl. Acad. Sci. USA, 85, 7666-7669.
70
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
[137] Henthorn P.S., Whyte M.P. (1992), Clin. Chem, 1992, 38, 2501-2505. [138] Henthorn P.S., Raducha M., Fedde K.N., Lafferty M.A., Whyte M.P. (1992) Proc. Natl. Acad. Sci. USA, 89, 9924-9928. [139] Greenberg C.R., Taylor C.L., Haworth J.C., Seargeant L.E., Philipps S., Triggs-Raine В., Chodirker B.N. (1993) Genomics, 17, 215-217. [140] Herasse M., Spentchian M., Taillandier A., Keppler-Noreuil K., Fliorito A.N., Bergoffen J., Wallerstein R., Muti C, Simon-Bouy В., Mornet E. (2003) J. Med. Genet, 40, 605-609. [141] Hu J.C., Plaetke R., Mornet E., Zhang C, Sun X., Thomas H.F., Simmer J.P. (2000) Eur. J. Oral Sci, 108, 189-194. [142] Goseki-Sone M., Sogabe N., Fukushi-Irie M., Mizoi L., Orimo H., Suzuki Т., Nakamura H., Orimo H., Hosoi T. (2004) J. Bone Miner Res, 20, 773-782. [143] Cool D.E., Tonks N.K., Charbonneau H., Walsh K.A., Fischer E.H., KrebsE.G. (1989) Proc. Natl. Acad. Sci. USA, 86,5257-5261. [144] Kaplan R. , Morse B. , Hubner K. , Croce C. , Howk E., Ravera M., Ricca G. , Jaye M. , Schlessinger J. (1990) Proc. Natl. Acad. Sci. USA, 87, 7000-7004. [145] Ramachandran C. , Aebersold R. , Tonks N.K. , Pot D.A. (1992) Biochemistry, 31, 4232-4238. [146] Hunter T. (1989) Cell, 58, 1013-1016. [147] Chan CP., McNall S.J., Krebs E.G. , Fischer E.H. (1988) Proc. Natl.Acad. Sci. USA, 85, 6257-6261. [148] Webb L, Inhibitors of Enzymes and Metabolism, 1966, M.: Mir, 862. [149] Thomas M.L. (1989) Аппи. Rev. Immunol, 7, 339-369. [150] Cohen P., Cohen P.T.W. (1989) J. Biol. Chem., 264, 21435-21438. [151] Monkanen R.E, Zwiller J, Daily S.L. , Khatra B.S, Dukelow M, Boynton A.L. (1991) J. Biol. Chem., 266,6614-6619. [152] Honkanen R.E, Zwiller J, Moore R.E, Daily S.L, Khatra P.S, Dukelow M, Boynton A.L. (1990) J. Biol. Chem., 265, 19410-19404. [153] Metcalfe S., Milner J. (1990) Immunol. Lett., 26. 177-182. [154] Klee C.B, Draetta OF., and Hubbard M. (1988) J. Adv. Enzymol., 61, 149-200. [155] Kakalis L. (1995) FEBS Lett., 362, 55. [156] Sikkink R. et al. (1995) Biochemistry, 34, 8348-8356. [157] Hubbard M.J., and Klee C.B. (1989) Biochemistry, 28, 1868-1874. [158] Hashimoto Y, Perrino B.A., Sodelring T.R. (1990) J. Biol. Chem., 265, 1924-1927. [159] Monkanen R.E, Zwiller J, Daily S.L, Khatra B.S, Dukelow M, Boynton A.L. (1991) J. Biol. Chem., 266, 6614-6619. [160] Chernoff J, Schievella A.R, Jost C.A, Erikson R.L, Neel B.G. (1990) Proc. Natl. Acad. Sci. USA., 87, 2735-2739. [161] Charbonneau H, Tonks N.K., Kumar S, Dilts CD, Harrylock M, Cool E, Krebs E.G., Fischer E.H, Walsh K.A. (1989) Proc. Natl. Acad. Sci. USA., 86, 5252-5256. [162] Tonks N.K, Diltz CD, Fischer E.H. (1988) J. Biol. Chem., 263, 6731-6737. [163] Cicirelli M.F, Tonks N.K, Diltz CD, Weiel J.E, Fischer E.H, Krebs G. (1990) Proc. Natl. Acad. Sci. USA., 87, 5514-5518. [164] Brown-Shimer S, Johnson K.A, Lawrence J.B, Jonson C, Breskin A, Green N.R, Hill D.E. (1990) Proc. Natl. Acad. Sci. USA., 87, 5148-5152. [165] Frangioni J.V, Beahm P.H, Shifrin V, Jost C.A, Neel B.G. (1992) Cell, 68, 545-560.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
71
[166] Cool D.E, Tonks N.K, Charbonneau H, Fischer E.H, Krebs E.G. (1990) Proc. Natl. Acad. Sci. USA., 87, 7280-7284. [167] Gould K.L, Moreno S, Tonks N.K, Nurse P. (1990) Science, 250, 1573-1576. [168] Ruzzene M, Donella-Deana A, Marin O, Perich J.W, Ruzza P, Borin G, Calderan A, Pinna LA. (1993) Eur. J. Biochem., 211, 289-295. [169] Sakaguehi AY, Sylvia V.L., Martinez L, Lalley PA, Shows T.B, Han E.S, Smith E.A, Chosh Choudhury G. (1991) Cytogenet. Cell. Genet., 58,2014-2015. [170] Shlessinger J, Ullrich A. (1992) Neuron, 9, 383-391. [171] Samstag Y, Badler A, Meuer S.C (1990) Immunobiology, 181, 149-150. [172] Samstag Y, Bader A, Meuer S.C. (1991) J. Immunol., 147, 788-794. [173] Samstag Y, Henning S.W, Badler A, Meuer S.C. (1992) Int. Immunol., 4, 1255-1262. [174] Thevenin С, Kim S.-J., Kehre J.H. (1991) J. Biol. Chem., 266, 9363-9366. [175] Chedid M., Ioza B.K., Brooks J.W., Mizel S.B (1991) J. Immunol, 147, 867-873. [176] Redpath N.T., Proud C.G. (1990) Biochem. J., 272, 175-180. [177] Richards F.M., Milner J., Metcalfe S. (1992) Immunology., 76, 642-647. [178] Taffs R.E., Redegeld F.A., Sitkovsky M.V. (1991) J. Immunol, 147, 722-728. [179] Kohno Т., Takakura S., Yamada Т., Okamoto A., Tanaka Т., Yokota J., Cancer Res., 59, 4170-4174. [180] Xia J., Scherer S.W., Cohen P.T., Majer M., Xi Т., Norman R.A., Knowler W.C., Bogardus C. and Prochazka M. (1998) Diabetes, 47, 1519-1524. [181] Liolitsa D., Powell J., Lovestone S. (2002) J. Neurol. Neurosurg. Psychiatry., 73, 2616. [182] Esplin E.D., Ramos P., Martinez В., Tomlinson G.E., Mumby M.C., Evans G.A. (2006) Genes Chromosomes Cancer., 45, 182-90. [183] Takagi Y., Futamura M., Yamaguchi K., Aoki S., Takahashi Т., Saji S. Gut, 41, 268-71. [184] Eliseeva YE, Zh. Bioorgan. Khim. (1998) 24, 262-270. [185] Soubrier F., Alhenc-Gelas F., Hubert C, Allegrini J., John M., Tregear G., Corvol P. (1988) Pros. Natl. Acad Sci. USA., 85, 9386-9390. [186] Antonov VK, Chemistry of proteolysis, (1991) M.: Nauka. [187] Hooper N.M. (1991) Int. J. Biochem., 23, 641-647. [188] Williams T.A., Soubrier F., Corvol P. (1996) Zinc Metalloproteases in Health and Disease/ Ed. N.M. Hooper. L.: Taylor & Francis, 1996. P. 83- 104. [189] Wei L., Alhenc-Gelas F., Corvol P., Clauser E. (1991) J. Biol. Chem., 266, 9002-9008. [190] Jaspard E., Wei L., Alhenc-Gelas F. (1993) J. Biol. Chem., 268, 9496-9503. [191] Rousseau A., Michaud A., Chauvet M.-T., Lenfant M., Corvol P. (1995) J. Biol. Chem., 270, 3656-3661. [192] Deddish P.A., Jackman H.L., Skidgel R.A., Erdos E.G. (1997) Biochem. Pharmacol, 53, 1459-1463. [193] Wei L., Clauser E., Alhenc-Gelas F., Corvol P. (1992) J. Biol. Chem., 267, 1339813405. [194] Rigat В., Hubert C, Alhenc-Gelas F., Cambien F. et al. (1990) J. Clin. Invest., 86, 13431346. [195] Rigat В., Hubert C, Corvol P., Soubrier F. (1992) Nucleic Acids Res., 20, 1433. [196] Danser A.H.,Schalekamp M.A., Bax W.A. et al. (1995) Circulation, 92, 1387-1388. [197] Arbustini E, Grasso M, Fasani R, Kiersy C. et al. (1995) Brit. Heart J., 74, 584-591. [198] Oike Y., Hata A, Ogata Y, Numata Y. et al. (1995) J. Clin. Invest, 96, 2975-2979.
72
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
[199] Arbustini E, Grasso M, Fasani R., Kiersy C. et al. (1995) Brit Heart J., 74,584-591. [200] Prasad A, Narayanan S, Waclawiw M.A, Epstein N, Quyyumi A.A. J. Am. Coll. Cardiol, 36, 1579-1586. [201] Woods D.R, Humphries S.E, Montgomery H.E. (2000) Trends Endocrinol. Metab., 11, 416-420. [202] Tiret L., Blanc H,Ruidavets J.B, Arveiler D. et al. (1998 ) J. Hypert, 16, 37-44. [203] Cambien F, Poirier O, Lecerf L. et. al. Nature. 1992. V.359. P.641-644. [204] Lindpaintner K, Pfeffer M.A., Kreutz R. et al. (1995) N. Engl. J. Med, 332, 706-711. [205] Fujisawa T, Ikegami H, Kawaguchi Y, Hamada Y. et al. (1998) Diabetologia, 41,47-53. [206] Staessen J.A, Wang Ji G, Ginocchio G., Petrov V. et al. (1997) J. Hypertension, 15, 1579-1592. [207] Stefansson B, Ricksten A, Rymo L, Aurell M, Herlitz H. (2000) Blood Press, 9, 104109. [208] Varfolomeev SD, Mevkh AT, Prostaglandins – Molecular Bioregulators (1985) M.: Publ. Moscow State Univ., 308. [209] Sergeeva MG, Varfolomeeva AT, Cascade of Arachidonic Acid (2006) M.: Narodnoe Obrazovanie, 255. [210] Miyamoto Т, Ogino N, Yamamoto S, Hayaishi О. (1976) J. Biol. Chem., 251, 26292636. [211] Van der Ouderaa P.O., Buytenhek M, Nugteren D.H, Van Dorp D.A. (1977) Bioch. Bioph. Acta (L), 487, 315-331. [212] Pagels W.R, Sachs R.J, Marnett L.J, DeWitt D.L, Day J.S, Smith W.L. (1983)/. Biol. Chem., 258, 6517-6525. [213] Van der Ouderaa F.J.G, Buytenhek M. (1982) Methods Enzymol, 1982, 86, 60-68. [214] Smith W.L, DeWitt D.L, Allen M.L. (1983) J. Biol. Chem., 1983, 258, 4922-4926. [215] Rollins Т.Е., Smith W.L. (1980)7. Biol. Chem., 255,4872-4876. [216] Van der Ouderaa F.J, Buytenhek M, Slikkerveer F.J, Van Dorp D.A. (1979) Biochim. Biophys. Acta, 1979, 572, 29-41. [217] Mutsaers J.H.G.M, Van Halbeek H, Kamerling J.P, Vliegenthart J.F.G. (1985) Eur. J. Biochem., 147, 569-574. [218] De Witt D.L., Smith D.L. (1988) Proc. Natl. Acad. Sci. USA, 85, 1412-1416. [219] Merlie J.P., Fagan D., Mudd J., Needleman P. (1988) J. Biol. Chem., 263, 3550-3553. [220] Yokoyama C, Takai Т., Tanabe R. (1988) FEBSLett., 231, 347-351. [221] Lambeir A.M., Markey СМ., Dunford H.B., Marnett L.J. (1985) J. Biol. Chem., 260, 14894-14896. [222] Kulmacz R.J., Tsai A.L., Palmer G. (1987) J. Biol. Chem., 262, 10524-10531. [223] Kulmacz R.J., Ren Y., Tsai A.L., Palmer G. (1990) Biochemistry, 29, 8760-8771. [224] Shimokawa Т., Kulmacz R.J., DeWitt D.L., Smith W.L. (1990) J. Biol. Chem., 265, 20073-20076. [225] Dietz R„ Nastainczyk W., Ruf H.H. (1988) Eur. J Biochem., 171, 321-328. [226] Lepantalo A., Mikkelsson J. (2006) Thromb. Haemost., 95(2), 253-259. [227] Goodman J.E., Bowman E.D. (2004) Carcinogenesis, 25(12), 2467-2472. [228] Maree A.O., Curtin R.J. (2005) J. Thromb. Haemost., 3(10), 2340-2345. [229] S. Zienolddiny, Campa D. (2004) Carcinogenesis, 25(2), 229-235. [230] Lin H.J., Lakkides K.M. (2002) Cancer Epidemiol. Biomarkers Prev., 11(11), 13051315.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
73
[231] Konheim E.L. (2003) Hum. Genet., 113(5), 377-381. [232] Cipollone F., Toniato E. (2004) JAMA, 291(18), 2221-2228. [233] Wen J. K., Osumi Т., Hashimoto Т., Ogata M. (1990) Molec. Biol, 211, 383-393. [234] Kishimoto Y., Murakami Y., Hayashi K., Takahara S., Sugimura Т., SekiyaT. (1992) Hum. Genet., 88,487-490. [235] Mannens M., Slater R. M., Heyting C, Bliek J., Hoovers J., Bleeker-Wagemakers E. M., Voute P. A., Coad N., Frants R. R., Pearson P. L. (1987) Cytogenet. Cell Genet., 46, 655. [236] Jiang Z., Akey J. M., Shi J., Xiong M., Wang Y., Shen Y., Xu X., Chen H., Wu H., Xiao J., Lu D., Huang W., Jin L. (2001) Hum. Genet, 109, 95-98. [237] Nauseef W. M., Brigham S., Cogley M. (1994) J. Biol. Chem., 269, 1212-1216. [238] DeLeo F. R., Goedken M., McCormick S. J., Nauseef W. M. (1998) J. Clin. Invest., 101, 2900-2909. [239] Romano M., Dri P., Dadalt L., Patriarca P., Baralle F. E. (1997) Blood, 90, 4126-4134. [240] Marchetti C, Patriarca P., Solero G. P., Baralle F. E., Romano M. (2004) Hum. Mutat., 23, 496-505. [241] Reynolds W. F, Hiltunen M, Pirskanen M, Mannermaa A, Helisalmi S, Lehtovirta M, Alafuzoff I, Soininen H. (2000) Neurology, 55, 1284-1290. [242] Makela R, Dastidar P, Jokela H, Saarela M, Punnonen R, Lehtimaki T.J. Clin. Endocr. Metab., 88, 3823-3828. [243] Romano M., Patriarca P., Melo C, Baralle F. E, Dri P. (1994) Proc. Nat. Acad. Sci., 91, 12496-12500. [244] Bikker H, Baas F, De Vijlder J. J. M. (1997) J. Clin. Endocr. Metab., 82, 649-653. [245] Bikker H., Vulsma T, Baas F, de Vijlder J. J. M. (1995) Hum. Mutat., 6, 9-16. [246] Pannain S, Weiss R. E, Jackson С. E, Dian D, Beck J. C, Sheffield VC, Cox N, Refetoff S. (1999) J. Clin. Endocr. Metab., 84, 1061-1071. [247] Fugazzola L, Cerutti N, Mannavola D, Vannucchi G, Fallini C, Persani L., Beck-Peccoz P. (2003) J. Clin. Endocr. Metab., 88, 3264-3271. [248] De Belleroche J, Leigh P.N, Clifford Rose F. (1997) Familial motor neuron disease. In: Leigh P.N, Swash M. (eds). Motor neuron disease. // Springier-Verlag, London, 35-51. [249] Rosen D. R, Siddique T, Patterson D, Figlewicz D. A, Sapp P, Hentati A, Donaldson D, Goto J, O'Regan J. P, Deng H.-X, Rahmani Z, Krizus A, McKenna-Yasek D, Cayabyab A, Gaston S. M, Berger R, Tanzi R. E, Halperin J. J, Herzfeldt B, Van den Bergh R, Hung W.-Y, Bird T, Deng G, Mulder D. W, Smyth C, Laing N. G, Soriano E, PericakVance M. A, Haines J, Rouleau G. A, Gusella J. S, Horvitz H. R, Brown R. H, Jr. (1993) Nature, 362, 59-62. [250] Andersen P.M. (1997) Amyotrophic lateral sclerosis and CuZn-superoxide dismutase. // Umea Universitet. [251] Borchelt D. R, Lee M. K, Slunt H. S, Guarnieri M, Xu Z.-S, Wong P.C, Brown R. H, Jr., Price D. L, Sisodia S. S, Cleveland D. W. (1994) Proc. Nat. Acad. Sci., 91, 82928296. [252] Kawamata J, Hasegawa H, Shimohama S, Kimura J, Tanaka S, Ueda K. (1994) Lancet, 343, 1501. [253] Jones С. T, Brock D. J. H, Chancellor A. M, Warlow С. P, Swingler R.J. (1993) Lancet, 342, 1050-1051.
74
S. D. Varfolomeev , I. N. Kurochkin and I. A. Gariev
[254] Hayward C, Swingler R. J, Simpson S. A, Brock D. J. H. (1996) Am. J. Hum. Genet., 59, 1165-1167. [255] Kikugawa K, Nakano R, Inuzuka T, Kokubo Y, Narita Y, Kuzuhara S, Yoshida S, Tsuji S. (1997) Neurogenetics, 1,113-115. [256] Deng H.-X, Hentati A, Tainer J. A, Iqbal Z, Cayabyab A, Hung W.-Y, Getzoff E. D, Ни P, Herzfeldt B, Roos R. P, Warner C, Deng G, Soriano E, Smyth C, Parge H. E, Ahmed A, Roses A. D, Hallewell R. A, Pericak-Vance M. A, Siddique T. (1993) Science, 261, 1047-1051. [257] Aoki M, Ogasawara M., Matsubara Y., Narisawa K., Nakamura S., Itoyama Y., Abe K. (1993) Nature Genet., 5, 323-324. [258] Aoki M., Ogasawara M., Matsubara Y., Narisawa K., Nakamura S., Itoyama Y., Abe K. (1994) J. Neurol. Sci., 126, 77-83. [259] Andersen P. M., Nilsson P., Ala-Hurula V., Keranen M.-L., Tarvainen I., Haltia Т., Nilsson L., Binzer M., Forsgren L., Marklund S. L. (1995) Nature Genet., 10, 61-66. [260] Aguirre Т., Matthijs G., Robberecht W., Tilkin P., Cassiman J.-J. (1999) Europ. J. Hum. Genet., 7, 599-602. [261] Hand С. K., Mayeux-Portas V., Khoris J., Briolotti V., Clavelou P., Camu W., Rouleau G. A. (2001) Ann. Neurol, 49, 267-271. [262] Ikeda M., Abe K., Aoki M., Sahara M., Watanabe M., Shoji M., St. George-Hyslop P. H., Hirai S., Itoyama Y. (1995) Neurology, 45, 2038-2042. [263] Sapp P. C, Rosen D. R., Hosier B. A., Esteban J., Mckenna-Yasek D., O'Regan J. P., Horvitz H. R., Brown R. H., Jr. (1995) Neuromusc. Disord, 5, 353-357. [264] Morita M., Aoki M., Abe K., Hasegawa Т., Sakuma R., Onodera Y., Ichikawa N., Nishizawa M., Itoyama Y. (1996) Neurosci. Lett., 205, 79-82. [265] Kostrzewa M., Damian M. S., Muller U. (1996) Hum. Genet., 98, 48-50. [266] Jones С. Т., Swingler R. J., Brock D. J. H. (1994) Hum. Molec. Genet., 3, 649-650. [267] Watanabe M., Aoki M., Abe K., Shoji M., Iizuka Т., Ikeda Y., Hirai S, Kurokawa K., Kato Т., Sasaki H., Itoyama Y. (1997) Hum. Mutat., 9, 69-71. [268] Aoki M., Abe K., Houi K., Ogasawara M., Matsubara Y., Kobayashi Т., Mochio S., Narisawa K., Itoyama Y. (1995) Ann. Neurol, 37, 676-679. [269] Kawamata J., Shimohama S., Takano S., Harada K., Ueda K., Kimura J. (1997) Hum. Mutat., 9, 356-358. [270] Zu J. S., Deng H.-X., Lo T. P., Mitsumoto H., Ahmed M. S., Hung W.-Y., Cai Z.-J., Tainer J. A., Siddique T. (1997) Neurogenetics, 1, 65-71. [271] Orrel R. W., Marklund S. L., de Belleroche J. S. (1997) J. Neurol. Sci., 153, 46-49. [272] Penco S., Schenone A., Bordo D., Bolognesi M., Abbruzzese M., Bugiani O., Ajmar F., Garre C. (2001) Neurology, 53, 404-406. [273] Gellera C, Castellotti В., Riggio M. C, Silani V., Morandi L., Testa D., Casali C., Taroni F., Di Donate S., Zeviani M., Mariotti C. (2001) Neuromusc. Disord., 11,404410. [274] Elshafey A., Lanyon W. G., Connor J. M. (1994) Hum. Molec. Genet., 3, 363-364. [275] Alexander M. D., Traynor B. J., Miller N., Corr В., Frost E., McQuaid S., Brett F. M., Green A., Hardiman O. (2002) Ann. Neurol, 52, 680-683. [276] Bowler С, Alliotte T, De Loose M. et al. (1989) EMBO J, 8, 31-38. [277] Landeghem G.F., Tabatabaie P, Beckman G, Beckman L, Andersen P. (1999) Europ. J. of Neurol., 6, 639-644.
Human Enzymes – Genetic, Proteomic and Catalytic Polymorphism
75
[278] Rosenblum J.S., Gilula N.B., Lerner R.A. (1996) Proc. Natl. Acad. Sci. USA, 93, 44714473. [279] Shimoda-Matsubayashi S, Matsumine H, Kobayashi T. et al. (1996) Biochem. Biophys. Res. Commun, 226, 561-565. [280] Hiroi S, Harada H, Nishi H, Satoh M, Nagai R, Kimura A. (1999) Biochem. Biophys. Res. Commun., 261, 332-339. [281] Sandstrom J., Nilsson P., Karlsson K, Marklund S. L. (1994) J. Biol. Chem., 269, 19163-19166.
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev, G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 2
POLYMORPHISM OF TUMOR-SUPPRESSOR GENES AND GENETIC CONTROL OF CARCINOGENESIS M.M. Aslanyan*1,2 , S.S. Litvinov1, E.S. Tsyrendorzhieva2 and V.A. Tarasov2 1
Biological Department, M.V. Lomonosov Moscow State University, Russia N.I. Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
2
ABSTRACT Cancer is one of the fundamental problems of general biology that covers the processes of cell growth, differentiation, cell and tissue metabolism, immune response, mutations and repair of genetic damage. All these processes are under rigid genetic control, due to which homeostasis of cells and tissues in a normal organism is maintained. Development of malignant tumors is a multistep genetically controlled process that can be treated as the microevolution of individual cell clones within a tumor. Tumor progression is determined by the occurrence of mutations in one or several cooperatively functioning genes and by the selection of mutant clones. Decoding of the human genome primary structure has favored the determination and characterization of not only the structural-functional organization of the major genes controlling the process of carcinogenesis but also of their genetic polymorphism. These genes can be arbitrarily divided in 2 large categories – protooncogens and tumor-suppressor genes. Heterozygotes for mutant alleles of suppressor genes, or so-called germline mutants, have an increased hereditary predisposition to the development of cancer. Hence, heterozygosity for mutant alleles can be used for diagnosis of individual susceptibility to cancer. Analysis of 21 polymorphic sites in 14 tumor-suppressor genes has led to the identification of genotypes that differ by tens of times in the risk of breast cancer development in women.
*
[email protected]
78
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
INTRODUCTION The process of tumor development is called carcinogenesis or oncogenesis. A fundamental collective monograph ―CARCINOGENESIS‖ was issued in 2004, under the editorship of professor D.G. Zaridze. The authors of the works presented in the monograph are noted Russian specialists in oncology [1]. Our task is to outline in brief and easily understood terms the genetic aspects of the problems of oncogenesis. Tumor formation is a multistep genetically controlled process that can be treated as the microevolution of individual cell clones [2, 3]. Tumor progression is determined by the occurrence of mutations in one or several cooperatively functioning genes and by the selection of mutant cell clones. Decoding of the human genome primary structure has favored the establishment of the structural-functional organization and polymorphic variants of major genes controlling the process of carcinogenesis. These genes can be arbitrarily divided in two2 large categories – protooncogens and tumor-suppressor genes [2], [3], [4], [5]. A carcinogenic event for oncogens is the enhancement of their expression. A carcinogenic event in the case of tumor-suppressor genes is the loss of the normal function caused by the occurrence of inheritable recessive point mutations or deletions. More than 100 tumor-suppressor genes have been described in the human genome [6]. Heterozygotes for mutant alleles of suppressor genes, or so-called germline mutants, have an increased genetic predisposition to cancer development. Hence, heterozygosity for mutant alleles can be used for diagnosis of individual susceptibility to cancer. Early in the 20th century, the search for the causes of tumor formation was crowned with success. P. Rous discovered the virus of chicken sarcoma (1914). Later on, the virus of rabbit papilloma (R. Shope, 1932) and the virus of murine mammary gland tumor (J. Bitner, 1936) were discovered. In 1946, L. Zilber stated the virus-genetic theory of cancer [7]. According to this theory, the virus is a factor that causes the transformation of a normal cell into a tumoral one. Subsequently, it was demonstrated that tumoral transformation by a virus can occur through different pathways: 1) through enhancement of cell proliferation by virus oncogens; 2) through the change of the structure or expression of genes due to the integration of a virus in the cell genome. A great number of human oncogenic viruses are known now. In parallel with the elaboration of the virus-genetic theory, the mutational theory of cancer was worked out. This theory addresses the problem of mutation accumulation in somatic cells. As a result of spontaneous mutational events, normal cells can acquire selective advantages in the rate of growth and mutation. The frequency of somatic mutations in normal cells is constant but the mutation rate can increase under the action of mutagens or mutations in the DNA repair system. Changes in the normal cell phenotype can occur without changes in the genome nucleotide sequence. Such events are epigenetic and associated with disturbances in the regulation of gene expression. The epigenetic theory of cancer suggested in the 1980s postulated a direct relation between DNA methylation and changes in the expression of protogens and suppressor genes in the course of tumor development [8]. Thus, the above theories do not exclude each other and can be realized at different stages of cancer progression.
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
79
FORMS OF CANCER Two forms of cancer are distinguished today: familial and sporadic. The familial form of cancer is found among members of one family. The disease demonstrates a clear-cut hereditary pattern and the probability of its occurrence in healthy members of the family is several times higher than in the case of sporadic cancer. The hereditary factor determining the development of familial cancer can be transmitted according to Mendel‘s laws. The sporadic form of cancer also has the genetic component but its structure may be much more complicated. In this case, a tumor develops under the action of not one but several hereditary mutations. The study of polymorphisms of genes involved in carcinogenesis has revealed combinations of their polymorphic variants that increase the probability of cancer development. Owing to combinational variation, such complex genotypes segregate already in the next generation. That is what explains the absence of diseases among relatives of patients with sporadic cancer.
MODELS OF CANCER DEVELOPMENT In the 1970s, Alfred Knudson proposed the ―two-hit‖ model of retinoblastoma development [9]. He suggested that two mutations are sufficient for development of retinoblastoma and that in the case of bilateral retinoblastoma, carriers of the disease have one inherited mutation. According to Knudson‘s model carriers of a germline mutation are by one step closer to retinoblastoma than individuals without it. Thus, in the first group, retinoblastoma will develop considerably more often. Later on, the RB gene inactivated during tumor development was mapped and cloned. Based on Knudson‘s two-hit model, other researchers [10] assumed that since inactivation of two copies of the RB gene leads to tumoral degeneration of cells the introduction of a functional RB copy will cause the restoration of the normal phenotype. The introduction of RB gene decreased tumor malignancy but in no way affected the rate of cell growth. From this, it followed that RB gene mutations alone were insufficient for retinoblastoma development. It was established later that 60% of retinoblastoma patients had a 6p isochromosome. These two factors necessitated the revision of the ―two-hit‖ model. The ―multihit‖ model of retinoblastoma formation was thus proposed [11]. It suggested three steps in the process of tumor development. The first two steps (or ―hits‖) correspond to the ―twohit‖ model of Knudson. Inactivation of both alleles of the RB gene normally induces cell apoptosis. Hence, a mutation in the apoptotic pathway is required for further existence of the tumor. And most likely, it is isochromosomes 6p and 1q that permit cells to avoid apoptosis and get an advantage in growth [10]. The multihit model explains the origin of monoclonal tumors (i.e., tumors originating from the same clone). The study of polymorphic sites in X-chromosome showed that far not all tumors are monoclonal in nature (for example, plexiform neurofibroma). To explain this phenomenon, the ―recruitment‖ model was elaborated. This model implies that tumor cells (represented in small quantities) stimulate the growth of normal cells and promote their rapid
80
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
transformation. Indeed, tumor cells are able to produce growth factors which that make them and cells surrounding the tumor independent of mitogenic signals. Despite differences between the current models of cancer development, all of them present crcinogenesis as a multistep genetically determined process.
MULTISTEP PATTERN OF TUMOR GROWTH Different tumors develop in accordance to general principles. In the course of development, the tumor focus undergoes changes that are revealed by histological and cytogenetic methods. The whole process can be divided in several steps: initiation, promotion, angiogenesis, invasion and metastasis. The first step, initiation, is characterized by cell changes on the genetic or epigenetic level. It is difficult to fix the time when this step begins because genomic changes do not lead to serious shifts in the cell phenotype. In the course of ontogenesis, normal tissue homeostasis is maintained in the organism. The cells are subjected to stabilizing selection which permits them to reproduce only at a medium rate (specific of each tissue type). The mechanism controlling the rate of proliferation involves extracellular and intracellular signals of growth. The main strategy of tumor cells is to get out of this control. An increase in the rate of division induces a programmed death (apoptosis) of cells. Therefore, mutations permitting cells to avoid apoptosis are found at the very early stages of carcinogenesis. Theoretically, tumor degeneration of cells can be provoked by 1-2 mutations but the mutation process is not finished with that. Analysis of the genome of intestinal tumor cells showed that there are about 11,000 mutations per one cell [12]. This suggests that mutagenesis in them proceeds at a very high rate and gives rise to a pool of genetically heterogenic cells. Such heterogeneity of cancer cells is revealed on both molecular and cytological levels by the FISH method (figure 2) [13]. In the absence of stabilizing selection factors, a clonal selection occurs: as a result of spontaneous mutagenesis cells acquire an advantage in growth rate and displace normal cells and cells similar to themselves. Every new advantage permits the cell to displace its ancestors. Such mutator phenotype is expressed in the case of disturbances in the systems of repair and replication. Thus, the main two characteristics of tumor cells are uncontrolled growth and accelerated mutagenesis. The next step of tumor progression is promotion. At this step, the population of tumor cells rapidly increases. Tumor cells are able to divide over an indefinitely long priodperiod, whereas normal cells divide only 50-60 times after which replicative aging and death follow [14]. But these processes are far from encompassing all cells. Stem and cancer cells are capable of restoring the length of the terminal regions of chromosomes (telomeres) due to the activity of the telomerase enzyme. Telomerase is a complex of proteins and ribonucleic acids. The catalytic subunit of telomerase is TERT (reverse transcriptase). The activity of telomerase, which protects cells from replicative aging, was revealed in different types of tumors. However, telomerase was not found in 10% of tumors and nevertheless no shortening of telomeres was observed in them. This phenomenon has been termed ―alternative lengthening of telomeres.‖. In such cells, the maintenance of the length of telomeres is associated with instability of minisatellite repeats and with recombination between telomeres.
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
81
TERT gene expression is regulated by the c-Myc protein. In somatic cells, telomerase is present in residual amounts. TERT gene expression is partially inhibited by the BRCA1 protein. It can bind to c-Myc and inhibit the activity of c-Myc-dependent promoters. In cells with an inactive BRCA1 gene, the expression of TERT is increased by 30%. Such an increase in TERT gene expression and even overexpressionover expression of this gene is not sufficient for transformation of normal cells in tumor ones. But tumoral transformation of cells becomes possible upon addition to TERT of overexpressedover expressed H-RAS gene or T antigen of the SV40 virus (expressed separately they do not cause transformation). It can be suggested on this basis that the process of telomere maintenance by itself cannot provide the induction of oncogenesis but is necessary for unlimited division of cancer cells [16]. Uncontrolled growth of cells leads to tissue hardening. Such growth continues until the tumor size reaches just several millimeters in diameter. At this moment, necrotic processes begin in the center of the growth focus because of the deficiency of oxygen (hypoxia0 and glucose. Thus, at this stage the tumor size is limited by hypoxia. On the one hand, such conditions may be a signal for a cell to enter apoptosis, but there is an alternative pathway for further tumor progression – angiogenesis. Slight hypoxia activates genes that increase glycolysis and adapt cells to the altered conditions. In tumor cells, one of the key genes activated in response to hypoxia is HIF-1 (hypoxia-inducible factor 1). This gene encodes the transcriptional factor. In complex with other transcriptional factors and coactivators the HIF-1 protein induces expression of such genes as lactate dehydrogenase A, erythropoietin and vascular epithelium growth factor (VEGF). Expression of the latter one and other similar genes induced by hypoxia is needed for the formation of a vascular system (angiogenesis) around the tumor. In this way, necrotization of tumor cells is prevented and their growth is accelerated [17]. Studies of the nucleotide sequence of the HIF1 gene revealed its polymorphic regions. Two one-nucleotide substitutions (in the 5‘noncoding region A-2500T, amino acid substitution in codon 582 – proline by serine) showed an association with an adaptive response to oxygen deficiency: the basal level of oxygen in tissues changed in response to physical training [18]. Two polymorphic sites were found to be associated with prostate cancer (substitution of proline-582 by serine) [19] and with renal cell carcinoma (substitutions of proline-582 by serine and alanine-588 by threonine) [20]. But the activation of HIF-1 expression alone is not sufficient for angiogenesis to be induced. In an adult organism, angiogenesis is actively suppressed by the PTEN gene and therefore a necessary event for further development of a tumor is its inactivation. In different types of tumors, the loss of a chromosome 10 region with PTEN was observed in the range from 23% (in breast tumors) to 54% (in glioblastomas) of cases [21]. On the other hand, angiogenesis is influenced by growth factors. A dramatic example of such influence is the stimulation of HIF-1 gene by the insulin-like growth factor, IGF1. This way of activation of angiogenesis is also under PTEN negative control. Another way to get out of this control is mutational activation of genes suppressed by PTEN (PDK, AKT) [21]. Activated AKT kinase can affect such processes as metabolism, cell cycle, repair and regulation of the size of cells and organs. The loss of negative control over AKT may be a pathway for survival of tumor cells since this protein can directly activate the expression of the above-mentioned HIF-1 gene, i.e., this signal pathway can induce angiogenesis [22].
82
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
Apart from the fact that tumor cells grow more rapidly than normal cells, they use glycolysis as the main source of energy. When reaching a large size, the tumor becomes a ―trap‖ of nutrients. The whole organism begins to work for maintaining the tumor. In most cases of cancer, tumors cells with defective mitochondria are found. Such cells derive energy from glycolysis rather than from oxidative phosphorylation. Oxygen deficiency enhances glycolysis in normal cells but a total absence of oxygen and consequently of oxidative phosphorylation causes apoptosis. Oxidative phosphorylation is possible only given a full functional electron-transport chain. A full chain consists of 87 proteins, of which 13 are encoded by the mitochondrial genome. Even with a normal respiratory chain a spontaneous leakage of electrons (mainly from complexes I and III) and generation of superoxide radicals can occur [23]. Mutations affecting the integrity of the chain lead to a sharp increase in the quantity of active forms of oxygen. The peculiarity of mitochondrial mutations is that they are not subject to selection. Because two types of mitochondria (normal and mutant) are simultaneously present in cells a defect in the respiratory chain manifests itself only after their segregation (figure 1).
Figure 1. A scheme of carcinogenesis induction as a result of a mitochondrial mutation.
A constant oxidative stress imparts to cells a mutator phenotype characteristic for tumors. A normal cell cannot exist in such conditions for a long time. Its further existence is possible only when the pathways of apoptosis are blocked. Thus, mutations of genes encoding mitochondrial proteins can initiate carcinogenesis or participate in its progression. The occurrence of secondary foci of carcinogenesis as a result of metastasis becomes the final step in tumor development. Cancer cells spread in the body through the circulatory or
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
83
lymphatic system. Multiple tumors occurring in different organs disturb their functions and lead to the lethal outcome. The step of invasion, as well as the subsequent step of metastasis, has a complex genetic control. A chain of consecutive changes in the structure of tumor tissue is required for its realization. First, to have the capacity for migration, cancer cells must break the connection with the tissue maintained through cellular contacts. Genes controlling the process of cell adhesion represent the first group of metastatic suppressors, including genes encoding calciumdependent cadgerins such as E-cadgerin. This glycoprotein is involved not only in the formation of intercellular contacts (in the form of cadgerin bridges) but also in the transmission of antigrowth signals. After disruption of intercellular connections, it is necessary for tumor cells to penetrate in the circulatory or lymphatic system. But normal tissues surrounding the tumor and the vascular walls still remain to be a barrier for tumor cells. Matrix metalloproteases (MMP) help them to overcome this barrier. These proteases belong to the families of transmembrane and secreted proteins. All these enzymes secreted by tumor cells degrade the components of the intercellular matrix such as fibrin, type 4 collagen (basic matrix component)), etc. The next step is metastasis – integration of tumor cells into another tissue. To grow in a foreign tissue, they have to adapt themselves to the new microenvironment. The process involves integrins that belong to the family of heterodimeric transmembrane glycoproteins (more than 200). Although integrins can normally participate in different processes, beginning with proliferation and ending with apoptosis regulation, their function in the process of adaptation is to anchor tumor cells in the new tissue [24]. After that, the integration of cancer cells ends and the formation of a secondary focus of tumor development begins. This stage presents a major concern because numerous metastases make the surgical methods of treatment insufficient.
GENES INVOLVED IN CARCINOGENESIS In the course of cancer development, genes involved in this process change their functional status. By the character of changes, the genes are divided in protooncogens and tumor-suppressor genes. The protein products of protooncogens are the components of a signal system which controls the processes of cell proliferation and differentiation (the family of G-proteins, integrins). The carcinogenic event for protooncogens is their activation that occurs in three ways. The first way is a mutation in the coding or regulatory region of a protooncogen; the second way is protooncogen amplification (the family of MYC genes; the N-MYC oncogen reaches 200 copies per cell in the case of retinoblastomas and neuroblastomas); the third way involves chromosome rearrangements (Philadelphian chromosome detected in chronic myeloleukemia occurs in 95% of cases from a translocation of a fragment of chromosome 9 onto chromosome 22 with the formation of a chimeric gene, BCR-ALB) (figure 2 A). After activation, the protooncogen becomes an oncogen. The result of protooncogen activation is usually the enhancement of proliferation. The oncogen dominates over the remaining normal copy of the protooncogen.
84
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
Suppressors are genes involved in the negative control of the cell cycle (genes RB, P53, P21) and genes maintaining genomic stability (RAD51, BRCA1, BRCA2, ATM, MLH1, MSH2, PMS2). Inactivation of both alleles of a suppressor gene is required for induction of carcinogenesis, i.e., in this case the normal allele of the gene dominates over the mutant one.
Figure 2. Karyotypic analysis of human malignant cells. Left (A) – the karyotype of a patient with chronic myelocytic leukemia (translocation of a fragment of chromosome 22 to chromosome 9). Right (B) – the karyotype of a patient with bladder carcinoma demonstrating a great number of chromosomal fragments and chromosomal rearrangements.
TUMOR-SUPPRESSOR GENES Inactivation of a suppressor gene is easily revealed by the loss of expression of one of the alleles of a heterozygous locus (loss of heterozygisity). Such losses are often associated with the loss of a chromosomal fragment. One of the regions that are most often lost very early in tumor development is locus p13.1 on chromosome 17 [25]. This locus carries one of the most important tumor-suppressor genes, P53. Its contribution to the prevention of transformation of normal cells into cancer ones is difficult to overestimate. The rate of P53 gene inactivation in tumors of different types is higher than 50%. The P53 gene controls such vitally important processes as repair, recombination, cell cycle and apoptosis. It can participate in these processes both directly (in repair and recombination) and through activation of other genes involved in them (regulation of cell cycle and apoptosis). The P53 gene is exceptional in that 80% of all its mutations are amino acid substitutions (the frequency of missense-mutations in other suppressor genes is about 10%). The P53 protein is small in size (394 amino acids) and highly conserved. Therefore, almost all missense-mutations lead to disturbances in its functions. Inherited P53 gene mutations result
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
85
in the early development of the Li-Fraumeni syndrome characterized by an elevated frequency of occurrence of all tumor types in the organism. In 1992, Vogelstein and Kinzler [26] postulated the necessity of the presence of a tetrameric form of P53 protein for normal functioning of cells. They proposed five possible mechanisms of P53 gene inactivation: 1. the loss of one or both gene copies as a result of a deletion leading to reduced expression of cell growth inhibitors; 2. truncated protein (due to nonsense-mutations, frame-shift mutations or mutations disturbing splicing) preventing P53 oligomerization (since the oligomerization domain is located at the C-end of the protein (figure 4)); 3. missense-mutations producungproducing the dominant negative effect on the protein functions; 4. degradation of P53 caused by the interaction with the human papilloma virus E6 oncoprotein; 5. degradation of P53 resulting from overexpressionover expression of its antagonist, MDM2 gene. Numerous DNA lesions induce an increase in P53 gene expression. The P53 protein activates transcription of the P21WAF1 gene which encodes the inhibitor of cycline-dependent kinases. This protein inhibits activity of cycline-kinase complexes thus arresting the cell in the G1 phase (figure 3). A turning moment in passing the check-point between the G1 and S phases of the cell cycle is RB protein phosphorylation. RB was the first discovered tumor-suppressor gene. Its function is negative control of the cell cycle. In normal cells, the gene is expressed during the whole cell cycle. Complete phosphorylation of the protein occurs when the cell enters the S phase. In the hypophosphorylated state, the protein binds to the free E2F transcription factor (figure 3). In complex with RB this factor is unable to activate transcription of genes responsible for proliferation. This results in the arrest of the cell in G1 (figure 3) [27]. The RB-E2F complex binds to deacetylases of the HDAC family (HDAC1, HDAC2). Such a triple complex sits on nucleosomes and preserves the ―closed‖ form of chromatin in the region of promoters of cell proliferation genes and thus impedes their transcription. With RB protein phosphorylation the complex disintegrates and the structure of chromatin becomes ―open‖ under the action of acetylases [28]. RB phosphorylation is mediated by cycline-dependent kinases. In early G1 CDK4 and CDK6 kinases bound to cycline D phosphorylate the C-terminal region of the protein and deacetylase is thus released. At the next stage the CDK2 kinase bound to cycline E continues phosphorylation and the RB-E2F complex is disintegrated (figure 3). As a result of a mutation in the regulatory region of the D1 cycline gene its expression may increase. An excess of cycline D competitively binds to RB instead of E2F leading to tumor formation since E2F acquires a possibility to constantly stimulate cell proliferation.
86
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
Figure 3. The mechanism of cell cycle blocking by P53 protein in response to stress conditions.
As a result of a mutation in the regulatory region of the D1 cycline gene its expression may increase. An excess of cycline D competitively binds to RB instead of E2F leading to tumor formation since E2F acquires a possibility to constantly stimulate cell proliferation. An alternative pathway of tumor transformation of cells is associated with the loss of P53 gene expression. This event may be caused by overexpressionover expression of the MDM2 gene. The protein encoded by this gene binds to the P53 protein inducing its ubiquitination and subsequent degradation of proteosomes. Polymorphism existing in the MDM2 gene promoter (substitution of T by G in nucleotide –309) enhances expression of the gene. Cells with the genotypes 309 G/G and G/T enter apoptosis less frequently under the action of radiation. Hence, the probability of their malignization is higher than of cells with the 309 T/T genotype [20]. Carriers of this genotype have a high risk of pulmonary cancer [30]. Another important function of the P53 protein is negative control of homologous recombination. P53 binds to RAD51 (homologous recombination protein) and inhibits recombination if the length of a completely homologous region does not exceed 200 bp. In this way, the stability of the genome enriched with different types of repeats is maintained. A very small amount of the functional P53 protein is sufficient for recombination inhibition. And vice versa a small amount of the dominant negative mutant P53 protein is sufficient to remove the 200 bp barrier of homologous recombination. It is of interest that some hot spots of mutagenesis in the P53 gene are located in regions responsible for the interaction of P53 with RAD51 (figure 4). Indeed, a lot of chromosome aberrations are observed in cancer cells. Thus, tumor initiation or progression can occur through mutations increasing the frequency of recombination [31].
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
87
Figure 4. Frequencies of somatic mutations in the coding part of P53 gene in all tumor types in projection on the protein domain structure.
In the case of DNA damage P53 activation can induce both a delay of the cell cycle during which the damage is repaired and apoptosis. An essential influence on this choice is produced by protooncogenic proteins of the MYC family [32]. MYC binds to the promoter of the P21/WAF1 gene and blocks the induction of its transcription by P53. Thus, the P53dependent response is switched from a cell cycle delay to apoptosis. In a critical situation, P53 can trigger the program of apoptosis through two pathways. The first one is the activation of proapoptotic genes (BAX, APAF-1) directly involved in apoptosis. The second mechanism implies that the P53 protein is able to bind and inhibit antiapoptotic proteins (Bcl-2, Bcl-xL) [33]. The central part in the realization of the program of apoptosis is played by the caspase pathway (a chain of proteolytic reactions) which is activated by cytochrome C. In normal cells cytochrome C is present only in mitochondria. Its release in the cytoplasm is controlled by the family of Bcl-2 proteins that includes activators (Bax, Bad, Bid) and inhibitors (Bcl-2, Bcl-xL) of apoptosis. The proteins of this family are present in the outer membranes of mitochondria and the ratio of activators and inhibitors determines the fate of the cell [34]. P53 protein binds to the protein inhibitors of apoptosis through the DNA-binding domain. Thus, mutations in this domain can simultaneously inactivate two pathways of apoptosis. The coding part of the P53 gene has two polymorphic regions with substitutions in the protein amino acid sequence (substitutions of proline-47 by serine and arginine-72 by proline). Both polymorphisms are associated with a low apoptotic response to DNA damage [35], [36], [37]. The serine-47 polymorphic variant is rare in the human population in contrast
88
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
to the codon 72 polymorphism. The frequency of the proline-72 allele was found to increase in the population as the equator is approached. Although this allele is associated with a low apoptotic response its wide distribution in the equatorial regions of the Earth is explained by a higher resistance to UV-radiation of its carriers [38]. P53 protein activation is achieved through the sensor system which signals about DNA damage. This system includes such proteins as ATM, CHEK2 and ATR. ATM gene mutations result in the development of ataxia-telangiectasia. This disease is characterized by defects in the immune system, hypersensitivity to radioactivity, genomic instability and a high risk of leukemia. With occurrence of DNA damage ATM phosphorylates P53 and CHEK2 (figure 5) [27]. Activated CHEK2 also phosphorylates P53, and the interaction of P53 with MDM2 and consequently P53 degradation are inhibited [27]. Phosphorylation of MDM2 by ATM leads to the same result.
Figure 5. A scheme of interaction of the products of tumor-suppressor genes in the case of DNA damage.
DNA damage occurring in the S phase can be repaired due to a temporary delay in the beginning of the G2 phase. An active Cdc25C protein is needed for proceeding to G2, but DNA damage provokes its phosphorylation by CHEK2. Inactivated Cdc24C moves from the nucleus into the cytoplasm arresting the cell cycle in the S phase. In turn, dephosphorylated Cdc25C protein activates the cycline-dependent kinase CDK2 promoting the progression of G2 and subsequent mitosis (figure 2) [27]. The second sensor of DNA damage is ATR gene. In contrast to ATM, ATR responds not only to double-stranded DNA breaks but also to damage caused by UV-radiation and replication errors. Activated ATR can control both G1 and G2 [27]. In parallel with blocking
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
89
the cell cycle ATR and ATM activate DNA repair proteins. One of such activated proteins is BRCA1 (figure 5). BRCA1 is the central component of the BASC protein complex. This complex includes more than ten proteins [39]. BCRA1 and BCRA2 form the core of the complex on the surface of which all the rest proteins (constant or temporary complex components) are located (GigFig.6) [40]. With occurrence of double-stranded DNA breaks the BCRA1 protein is phosphorylated by the ATM kinase and the whole BASC complex moves to the region of damage. The double-stranded breaks are repaired via homologous recombination. This process directly involves proteins of the complex: MRE11, NBS1, RAD50 and RAD51 (homologous recombination protein). In the absence of DNA damage BRCA2 binds to RAD51 and inhibits its activity [41]. Germline mutations in BRCA1 and BRCA2 genes are associated with the hereditary form of breast and ovarian cancers. In addition to the above-mentioned proteins, the BASC complex includes the components of the system of repair of unpaired bases: MSH2, MSH6 and MLH1 [39]. Proteins of this system correct replication errors and inhibit spontaneous recombination between lowhomologous DNA regions. As well as in the case of P53 protein, MSH2, MSH6 and MLH1 protein mutations can result in the occurrence of microsatellite instability [42]. Thus, proteins of the BASC complex play an important role in maintaining genomic stability.
PROTOONCOGENS One of the first discovered oncogens was v-Myc of the chicken sarcoma virus. Later on, its homologs were found in the human genome: c-Myc, N-MYC and L-MYC. Most important in the process of carcinogenesis is c-Myc. OverexpressionOver expression of this gene was observed in 90% of cases of sexual system cancer, in 80% of breast cancer cases, in 70% of intestinal cancer cases and in 50% of liver carcinoma cases. c-Myc gene encodes a protein that is a transcriptional factor. In normal cells the protein regulates the expression of genes responsible for cell proliferation and is also involved in such processes as apoptosis, metabolism, differentiation and adhesion [43]. In normal cells c-Myc is weakly expressed but its expression sharply increases owing to growth factors that appear at a certain time of the cell cycle. The functions of c-Myc and NMYC genes are negatively regulated by the RB protein, but when RB binds to some transforming proteins of DNA-containing viruses (T-antigen of SV40 virus, adenovirus E1A, E7 of human papilloma virus of types 16 and 18) the MYC genes are activated and cell proliferation increases [27]. In turn, c-Myc gene overexpressionover expression induces apoptosis via P53 protein [44]. Despite this, the c-Myc oncogen is overexpressedover expressed in cancer cells, suggesting changes in genes controlling programedprogrammed cell death. c-Myc protein induces the expression of genes that are required for a cell to enter the S phase. These are genes encoding D type cyclines, cycline E and cycline-dependent kinase CDC4. Cycline-dependent kinases activated by the interaction with cyclines promote cell progression from G1 into S.
90
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
Expression of MYC genes is regulated by protooncogens of the RAS family. Oncogenic mutations in genes of this family are found in different human tumors with a frequency from 10% to 95% [45]. N-RAS gene is most susceptible to mutations. The RAS family is composed of genes encoding GTP-binding proteins that are anchored in the cell plasmatic membrane. Their function is normally the modulation of an extracellular growth signal from the tyrosine kinase receptor to effector proteins (Ral-GEF, Rafs, P13-K and MEKK) [45]. It was long ago established that RAS and MYC function in cooperation in carcinogenesis induction [46]. The activity of Myc is modulated by the RAS protein through two independent mechanisms. The first one is triggered by the Raf-1 protein after which a cascade of activations of signal kinases (MEK and ERK/MAPK) leading to MYC phosphorylation by serine in position 62 follows (figure 7). This modification stabilizes the protein and thus prolongs its half-life period. The second mechanism of regulation is realized through the phosphoinositide-3-kinase pathway (PI3K-AKT). This mechanism leads to the inhibition of glycogen synthetase-3 kinase (GSK-3) activity and prevents MYC phosphorylation by threonine in position 58 (figure 7). MYC phosphorylation by this amino acis residue is a necessary event for a rapid proteolytic degradation of the protein. Thus, both regulatory pathways prolong MYC protein activity. Mutant proteins of the RAS and MYC families induce uncontrolled proliferation of cells and the development of resistance to apoptotic signals [47].
Figure 6. The domain structure of BRCA1, BRCA2 and sites of protein-protein interaction.
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
91
Figure 7. A scheme of MYC activity regulation by RAS protein under the action of extracellular growth signals.
The cooperative interaction of Myc and Ras protooncogens was studied in transgenic mice. Both genes fused with the promoter of the murine mammary tumor virus (MMTV) were transferred in the mouse genome. The result of overexpressionover expression of Ras and Myc protooncogens was an increased frequency of occurrence of mammary and salivary gland tumors as well as lymphoid tissue tumors. It was also found that mice with both transgens were distinguished by a higher rate of tumor formation than mice with one of the transgens (figure 8A) [46]. A similar result of synergid interaction was obtained for oncogens Wnt and HER-2 (figure 8B). The character of the tumors was similar to that of tumors formed in mice with oncogens Myc and Ras. This result is explained by the fact that Wnt and Myc genes are the components of one signal pathway and HER-2 and Ras of another one. In case transgenic mice had only one oncogen, either Wnt or c-Myc, the frequency of somatic mutations in the Ras gene in tumors was elevated [48]. Thus, generative mutations predetermine what mutational events will subsequently be advantageous for tumor progression.
Figure 8. Kinetics of tumor development in female mice carrying two oncogenic transgens myc and rasD – individually and in combination (A) and in a similar experiment with oncogenic transgens Wnt1 and Her-2 (B). The results demonstrate a cooperative effect of tumor induction in the case of two mutations.
92
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
GENES OF BIOTRANSFORMATION SYSTEM Detoxication genes are those that are responsible for metabolism, degradation, detoxication and release of xenobiotics and chemical compounds. It is polymorphic variants of these genes that determine individual reactions of the organism to different chemicals and foodstuffs [49]. Numerous epidemiological studies indicate that many common diseases, including different forms of cancer, are associated to a variable degree with unfavorable environmental factors, among which smoking and low-quality foodstuffs are most serious. When penetrating in the organism most of foreign compounds (xenobiotics) do not produce a direct biological effect. To be cleared from the body, xenobiotics undergo enzymatic transformation – biotransformation. Detoxication of toxic cell metabolites and xenobiotics proceeds in liver cells in two stages. Reactions of the first stage are catalyzed by the monooxygenase system, the components of which are incorporated in endoplasmic reticulum membranes. The reactions of oxidation, reduction or hydrolysis are the first step in the removal of hydrophobic molecules from the body. They transform substances into polar water-soluble metabolites. The main enzyme at the first stage of detoxication is cytochrome-P-450. A lot of isoforms of this enzyme have been identified at present and assigned to several families depending on their properties and functions. 13 Thirteen subfamilies of cytochrome-450 are known in mammals [50]. It is considered that enzymes of families I-IV are involved in biotransformation of xenobiotics and the rest metabolize endogenic compounds (steroid hormones, prostaglandins, fatty acids, etc) [51]. At the first stage of biotransformation the formation of hydroxyl, carboxyl, thiolic and amino groups takes place, and the molecule can undergo further transformation and be removed from the body. Besides cytochrome-450, biotransformation at the first stage involves cytochrome-b5 and cytochrome reductase. At the first stage of transformation many drugs are transformed into active forms and produce a desired therapeutic effect. However some xenobiotics are often not detoxicated with the involvement of the monooxygenase system and become more responsive. The metabolic products of foreign compounds formed at the first stage of biotransformation undergo further detoxication through a series of second stage reactions. The resulting compounds are less polar and therefore easily removed from cells. The predominating process is conjugation catalyzed by glutathione-S-transferase, sulfotransferase and UDP-glucuronyltransferase. Conjugation with glutathione that gives rise to mercapturic acids is commonly regarded as the main mechanism of detoxication [52]. The most widespread enzymes of the second phase of biotransformation belong to the superfamily of glutathione –S-transferases (GFT). The enzymes of this family catalyze conjugation of reduced glutathione (GSH) with a lot of electrophilic substances, participate in metabolism of prostaglandins and leukotriens, in the transport of steroid hormones and play an important part in the protection of cells from carcinogenic compounds. Genes encoding detoxication enzymes are characterized by a high polymorphism and the frequencies of this polymorphism show significant historically evolved populational, ethnic and racial differences [53, 54]. Polymorphism of these genes is associated with changes or a complete loss of the activity of enzymes encoded by them. The risk of onset of different
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
93
cancers increases in the case of unfavorable combinations of functionally defective variants of several genes involved in different phases of detoxication. A high activity of phase 1 cytochromes in combination with a low activity of phase 2 enzymes is the most unfavorable forerunner of cancer and other mutifactorial diseases. For instance, the combination of homozygosity for full deletions of GSTM1 and GSTT1 genes with homozygosity for Msp1 polymorphism in the CYP1A1 gene is associated with a high risk of breast cancer development. The group of genes of the biotransformation system responds most finely to the interaction with many environmental factors (foodstuffs, drugs, alcohol, tobacco and narcotics), thus determining the predisposition of specific genotypes to different diseases.
GENETIC SUBDIVISION OF THE HUMAN POPULATION Associated with the Risk of Tumor Development Since 2001, our team in cooperation with the Laboratory of Clinical Genetics of the Russian Research Oncological Center of the Russian Academy of Medical Sciences has been studying the relationship of polymorphism of genes controlling the processes of cell division, repair and apoptosis with the development of sporadic forms of breast cancer in women. Table 1 presents the characteristics of twenty- one polymorphic sites studied in the work. In most cases nucleotide substitutions at these sites lead to amino acid substitutions. Blood DNA samples from a group of women with breast cancer (151 patients) and from a control group (191 individuals) have been analyzed. Polymorphism was studied with the use of two methods: PCR-RFLP and PCR with modifying primers (dCAPs). The genes under study control the synthesis of proteins constituting the polyprotein complex BASC (39). The key role in this complex is played by proteins BRCA1, BRCA2, P53 etc.. There are good grounds to believe that this polyprotein complex is essential in recognizing DNA structural changes and in postreplicative repair regulation [55]. It has been established that the loss of functions of a set of genes encoding the proteins of the complex has a profound impact on the development of malignant tumors in different organs, including breast, ovaries, prostate, large intestine, pancreas and stomach [56]. Amino acid substitutions in certain domains of interacting proteins of the BASC complex can lead to its reduced functional activity and to genomic destabilization. It has been found by us previously that the genetic structure of the cohort of women with breast cancer differs from that of the control group. Besides, genotypes characterized by an increased risk of breast cancer development and genotypes with a low risk of tumor formation have been identified [57-59]. Analysis of 34 polymorphic sites in 18 tumor-suppressor genes has revealed genotypes that differ by several tens of times in the risk of breast cancer in women (table 2). The established associations of polymorphic alleles with a high risk of tumor formation may be basic for 3540% of cases of sporadic breast cancer in women.
94
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al. Table 1. Genes and polymorphic sites studied in the work
Gene
Exon
11
BRCA1 13
16 10
BRCA2
11 27
Codon
Nucleotide Nucleotide position in substitution mRNA DNA repair genes
Amino acid substitution
Symbo l
A
356
1186
A>G
Gln>Arg
693
2196
G>A
Asp>Asn
-
694
2201
T>C
Ser>Ser
S1
771
2430
T>C
Leu>Leu
S2
1038
3232
A>G
Glu>Gly
S3
1040
3238
G>A
Ser>Asn
-
1183
3667
A>G
Lys>Arg
S4
1431
4410
T>C
Ser>Pro
S5
1436
4427
T>C
Ser>Ser
S6
1605
4932
T>C
Leu>Leu
-
1613
4956
A>G
Ser>Gly
S7
1628
5002
T>C
Met>Thr
-
372
1342
A>C
Asn>His
M
991
3199
A>G
Asn>Asp
-
1420
4486
G>T
Tyr>Asp
-
1915
5972
C>T
Thr>Met
-
3412
10462
A>G
Ile>Val
-
135
C>G
-
R
NBS1
5' untranslated region 5 185
663
C>G
Gln>Glu
T
MSH6
1
39
172
G>A
Gly>Glu
W
RAD51
MSH3
23
1036
3386
A>G
Thr>Ala
E
XRCC1
10
399
1744
G>A
Arg>Gln
X
1245
C>G
Ser>Cys
U
776 T>G 2896 C>T Genes of biotransformation
Asp>Glu Pro>Ser
V P
Ala>Val
-
OGG1
7
326
APEX1 BRIP1
6 19
148 919
CYP11B1
8
386
4536
T>C
GSTT1
Gene deletion
-
GSTM1
Gene deletion
-
Gene of cell cycle control and apoptosis 4
72
12139
G>C
Arg>Pro
J
6
213
890
A>G
Arg>Arg
-
BARD1
7
557
1743
G>C
Cys>Ser
-
P21
2
31
187
C>A
Ser>Arg
Q
C>T
Pro>Ser
-
C>T
Ala>Val
F
P53
Gene of regulation of a cell response to hypoxia HIF1A
12
582
1772 Housekeeping gene
MTHFR
5
222
849
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
95
Table 2. Association of polymorphic sites for which the risk of tumor development is statistically significantly different from the average populational one Proportion(%) №
Combination of genes
Genotype
Cas e
Contro l
2 P
OR
Low risk 1
RAD51 NBS1 P21
Rr tt QQ
0,7
6,3
7,29
0,007
0,14
2
BRCA1 P53
SS jj
0,7
6,8
8,11
0,004
0,13
3
APEX P53
Vv jj
0
5,2
8,14
0,004
0,06
ss Ww
7,3
1,0
8,97
0,003
6,2
2,6
9,31
0,002
4,1
1,0
7,74
0,005
5,6
0,000 5
9,7
High risk 1
BRCA1 MSH6
2
OGG1 NBS1 MSH6
UU TT Ww
10, 6
3
BRCA2 NBS1 MSH6
Mm TT Ww
6,6
4
BRCA1 OGG1 NBS1 MSH6
AA UU TT Ww
9,3
1,0
12,1 9
5
APEX BRCA2 MSH6
VV Mm Ww
6,0
0,5
8,78
0,003
8,5
6
NBS1 APEX XRCC1
TT VV XX
5,3
0,5
7,50
0,006
7,5
7
RAD51 XRCC1 BRCA2
Rr XX Mm
8,6
2,1
7,58
0,006
4,1
* - Frequent alleles in genotypes are designated by capital letters and rare alleles by small letters.
Table 3. Association of polymorphic sites with an elevated risk of breast cancer in enlarged examined groups (456 samples from patients and 299 control samples) Combination of genes
Genotype
Proportion (%) Case
Control
OR
Confidence interval (95%)
P
Low risk NBS1 OGG1 APEX1 MTHFR
t t UU
5,92
10,70
0,53
0,40
0,69
5,28
0,022
v v FF
8,99
14,05
0,60
0,48
0,76
4,20
0,040
APEX1 OGG1
V v UU
12,94
18,73
0,64
0,53
0,79
3,98
0,046
NBS1 P53
t t Jj
3,73
7,02
0,51
0,37
0,72
3,90
0,048
NBS1 APEX1
t t Vv
3,73
7,02
0,51
0,37
0,72
3,90
0,048
MTHFR OGG1
f f Uu
5,04
1,00
5,24
2,82
9,73
8,56
0,003
APEX1 P21
VV QQ
24,56
16,39
1,66
1,37
2,01
5,66
0,017
APEX1 BRCA2
VV Yy
12,72
7,02
1,93
1,48
2,52
5,60
0,018
APEX1 BRCA1
VV Ss
14,47
8,70
1,78
1,39
2,27
4,95
0,026
APEX1 XRCC1
VV XX
12,50
7,69
1,71
1,32
2,22
3,94
0,047
NBS1 MSH6
TT Ww
15,57
8,70
1,94
1,52
2,47
6,64
0,010
NBS1 RAD51
TT Rr
14,25
8,70
1,75
1,37
2,23
4,63
0,031
NBS1 P21
TT QQ
37,72
28,43
1,52
1,30
1,79
4,58
0,032
NBS1 OGG1
TT UU
29,82
21,74
1,53
1,29
1,82
4,43
0,035
High risk
MSH3 OGG1
ee UU
6,36
3,01
2,19
1,48
3,23
4,03
0,045
MSH6 BRCA2
Ww Yy
13,38
8,03
1,77
1,37
2,28
4,59
0,032
RAD51 P53
Rr Jj
12,72
7,36
1,83
1,41
2,39
4,90
0,027
96
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
To improve the reliability of the results, the number of samples was later increased: 456 samples from women with cancer and 299 control samples. Genes analyzed in the larger groups of examined individuals are presented in bold type in table 1. Analysis of the distribution of genotype frequencies has revealed statistically significant differences by the MTHFR gene. The frequency of rare 222 Val/Val homozygotes in the group of patients and in the control group was 9.4% and 4.7%, respectively (OR=2.12, P=0.02). A large number of genotypes associated with a high and low risk of breast cancer have been established in the analysis of combinations of two genes. The results are presented in table 3. Analysis of polymorphic variants of tumor-suppressor genes and protooncogens permits molecular markers associated with cancer development to be identified and diagnosticums for early diagnosis of the disease to be worked out.
CONCLUSION It is known now that a combination of polymorphic variants of genes can be at the basis of many polygenic diseases, such as atherosclerosis, osteoporosis, ischemia, diabetes, etc. [49]. Cancer is a classical example of a polygenic multifactorial disease. Its development is associated with a successive occurrence of mutations in a number of genes, both protooncogens and tumor-suppressor genes [4]. An essential achievement in understanding the molecular mechanisms of carcinogenesis was the discovery of genes that suppress the formation of tumors and whose loss results in preventing negative regulation of cell proliferation. For instance, the development of breast cancer is controlled by many genes (BRCA1, BRCA2, P53, P21, RAD50, RAD51, Rb, MSH2 etc.). The key genes are BRCA1 and BRCA2. They encode multifunctional proteins the mutant phenotypes of which determine breast and ovarian cancer predisposition. Oncogenesis in individuals with a germinal mutation in BRCA occurs upon inactivation of a corresponding wild-type allele in a somatic cell. Hundreds of mutations have been revealed in tumor-suppressor genes and most of them are missense-mutations represented by tens of polymorphic variants evolutionally fixed in human populations. The knowledge of the nucleotide sequences of the main tumor-suppressor genes and consequently of the amino acid sequences and the configuration of their protein products makes it possible to establish sites which determine the tertiary structure of the product. The knowledge of sites where genes interact in complex can facilitate the identification of certain gaplotypes for polymorphic loci that show a good correlation with the development of certain types of tumors. Two forms of cancer are distinguished at present: familial and sporadic. The familial form is found among the members of one family, but it constitutes only 10% of the total incidence of cancer diseases. The familial form demonstrates a clear-cut hereditary pattern, and the probability of its occurrence in healthy members of the family is several times higher than in the case of sporadic cancer. The sporadic form of cancer also has the genetic component but its structure may be much more complicated. In this case, a tumor develops under the action of not one hereditary mutation but several mutations. The study of polymorphism of genes involved in carcinogenesis has revealed combinations of their
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
97
polymorphic states that increase the probability of cancer development. Owing to combinational variation such genotypes are segregated already in the next generation. This is what explains the absence of diseases among relatives of patients with sporadic cancer. In the analysis of twenty -one polymorphic sites in nine tumor-suppressor genes, we have identified associations of specific genotypes with a high risk of occurrence of sporadic breast cancer that can determine the development of a malignant tumor in 30-35% of cases [58, 59].
REFERENCES [1] [2] [3]
[4] [5] [6] [7] [8] [9] [10] [11]
[12] [13] [14] [15] [16]
[17]
Cancerogenesis [Russian], D.G. Zaridze ed., 2004, Meditsina, Moscow, 576 p. Baranova A.V., Yankovsky N.K. Geny-supressory opuholevogo rosta. Molekulyarnaya biologiya [Russian], 1998, t. 32, № 2, pp. 206-218. Gar'kavceva R.F., Gar'kavcev I.V. Molekulyarno-geneticheskie aspekty zlokachestvennyh novoobrazovanii. Vestnik Rossiiskoi Akademii medicinskih nauk [Russian], 1998, t. 2, pp. 38-44. Kiselev F.L., Geny stabilizacii DNK i kancerogenez. Molekulyarnaya biologiya [Russian], 1998, t. 32, № 2, pp. 197-205. Karp Gerald. Cell and molecular biology: Concepts and experiments, p. 705-720. Devilee P. Cleton Jansen A.M., Cornelisse C. J. Ever since Knudson. Trends in Genetics, v. 17, № 10, p. 569-573. Zil'ber L.A., Virusnaya teoriya proishozhdeniya zlokachestvennyh opuholei. M., Medgiz, 1946, 72 p. Holliday R. Epigenetics A Historical Overview. Epigenetics, 2006, v.1, № 2, p. 76-80. Knudson A.G., Mutation and Cancer: Statistical Study of Retinoblastoma. Proceedings of the National Academy of Sciences, 1971, v. 68, №. 4, p. 820-823. Tucker T., Friedman J.M. Pathogenesis of hereditary tumors: beyond the ‗‗two-hit‘‘ hypothesis. Clinical Genetics, 2002, v. 62, p. 345–357. Gallie B.L., Campbell C., Devlin H., Duckett A., Squire J.A. Developmentalbasis of retinal-specific induction of cancer by RB mutation. Cancer Research, 1999, v. 59, p. 1731–1735. Boland C.R., Ricciardiello L. How many mutations does it take to make a tumor? Proceedings of the National Academy of Sciences, 1999, v. 96, № 26, p. 14675–14677. Aplan P.D. Causes of oncogenic chromosomal translocation. TRENDS in Genetics, v. 22, № 1, p. 46-55. Sharpless N.E., DePinho R.A. Telomeres, stem cells, senescence, and cancer. The Journal of Clinical Investigation, 2004, v. 113, № 2, p.160-168. Muntoni A., Reddel R.R. The first molecular details of ALT in human tumor cells. Human Molecular Genetics, 2005, v. 14, i. 2, p. R191-R196. Xiong J., Fan S., Meng Q., Schramm L., Wang C., Bouzahza B., Zhou J., Zafonte B., Goldberg I.D., Haddad B.R., Pestell R.G., Rosen E.M. BRCA1 Inhibition of Telomerase Activity in Cultured Cells. Molecular and Cellular Biology, 2003, v. 23, № 23, p. 8668–8690. Kunz M., Ibrahim S. M. Molecular responses to hypoxia in tumor cells. Molecular Cancer, 2003, v. 2, p. 23-28.
98
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
[18] Prior S.J., Hagberg J.M., Phares D.A., Brown M.D., Fairfull L., Ferrell R.E., Roth S.M.Sequence variation in hypoxia-inducible factor 1alpha (HIF1A): association with maximal oxygen consumption. Physiological Genomics, 2003 v. 15, p. 20-26. [19] Chau C.H., Permenter M.G., Steinberg S.M., Retter A.S., Dahut W.L., Price D.K., Figg W.D. Polymorphism in the hypoxia-inducible factor 1alpha gene may confer susceptibility to androgen-independent prostate cancer. Cancer Biology and Therapy, 2005, v. 4, i. 11, p.1222-1225. [20] Ollerenshaw M., Page T., Hammonds J., Demaine A. Polymorphisms in the hypoxia inducible factor-1alpha gene (HIF1A) are associated with the renal cell carcinoma phenotype. Cancer Genetics and Cytogenetics, 2004, v. 153, i. 2, p.122-126. [21] Engelman J.A., Luo J., Cantley L.C. The evolution of phosphatidylinositol 3-kinases as regulators of growth and metabolism. Nature Reviews Genetics, 2006, v.7, p. 606-619. [22] Su J.D., Mayo L.D., Donner D.B., Durden D.L. PTEN and phosphatidylinositol 3kinase inhibitors up-regulate P53 and block tumor-induced angiogenesis: Evidence for an effect on the tumor and endothelial compartment. Cancer Research, 2003, v.63, p. 3585–3592. [23] Carew J.S., Huang P. Mitochondrial defects in cancer. Molecular Cancer, 2002, v. 1:9. [24] Keleg S., Büchler P., Ludwig R., Büchler M.W., Friess H. Invasion and metastasis in pancreatic cancer. Molecular Cancer, 2003, v. 2:14. [25] Garnis C., Buys T.P., Lam W.L. Genetic alteration and gene expression modulation during cancer progression. Molecular Cancer, 2004, v. 3:9. [26] Vogelstein B., Kinzler K.W. P53 function and dysfunction. Cell, 1992, v.70, p. 523526. [27] Hakem R., Mak T.W. Animal models of tumor-suppressor genes. Anuual review of Genetics, 2001, v. 35, p.209-241. [28] Luo R.X., Dean D.C. Chromatin remodeling and transcriptional regulation. Journal of the National Cancer Institute, 1999, v. 91, № 15, p.1288-1294. [29] Harris S.L., Gil G., Robins H., Hu W., Hirshfield K., Bond E., Bond G., Levine A.J. Detection of functional single-nucleotide polymorphisms that affect apoptosis. Proceedings of the National Academy of Sciences, 2005, v. 102, № 45, p.16297-16302. [30] Lind H., Zienolddiny S., Ekstrom P.O., Skaug V., Haugen A. Association of a functional polymorphism in the promoter of the MDM2 gene with risk of nonsmall cell lung cancer. International Journal of Cancer, 2006, v. 119, № 3, p. 718-721. [31] Bertrand P., Saintigny Y., Lopez B.S. P53‘s double life: transactivation-independent repression of homologous recombination. TRENDS in Genetics, 2004, v. 20, i. 6, p. 235-243. [32] Seoane J., Le H.V., Massague J. Myc suppression of the p21(Cip1) Cdk inhibitor influences the outcome of the P53 response to DNA damage. Nature, 2002, v. 419, p. 729-734. [33] Mihara M., Erster S., Zaika A., Petrenko O., Chitterenden T., Pancoska P., Moll U.M. P53 has a direct apoptogenic role at the mitochondria. Molecular Cell, 2003, v. 11, p. 577-590. [34] Li B., Dou Q.P. Bax degradation by the ubiquitinyproteasomedependent pathway: Involvement in tumor survival and progression. Proceedings of the National Academy of Sciences, 2000, v. 97, n. 8, p. 3850-3855.
Polymorphism of Tumor-Suppressor Genes and Genetic Control of Carcinogenesis
99
[35] Bergamaschi D., Samuels Y., Sullivan A., Zvelebil M., Breyssens H., Bisso A., Del Sal G., Syed N., Smith P., Gasco M., Crook T., Lu X. iASPP preferentially binds P53 proline-rich region and modulates apoptotic function of codon 72-polymorphic P53. Nature Genetics, 2006, v. 38, № 10, p. 1133-1141. [36] Dumont P., Leu J.I., Della Pietra A.C. 3rd, George D.L., Murphy M. The codon 72 polymorphic variants of the P53 tumor suppressor protein demonstrate marked differences in apoptotic potential. Nature Genetics, 2003, v. 33, № 3, p. 357-365. [37] Li X., Dumont P., Della Pietra A., Shetler C., Murphy M.E. The codon 47 polymorphism in P53 is functionally significant. Journal of Biological Chemistry, 2005, v. 280, № 25, p.24245-24251. [38] McGregor J.M., Harwood C.A., Brooks L., Fisher S.A., Kelly D.A., O'nions J., Young A.R., Surentheran T., Breuer J., Millard T.P., Lewis C.M., Leigh I.M., Storey A., Crook T. Relationship between P53 codon 72 polymorphism and susceptibility to sunburn and skin cancer. Journal of Investigative Dermatology, 2002, v. 119, № 1, p. 84-90. [39] Wang Y., Cortez D., Yazdi P., Neff N., Elledge S.J., Qin J. BASC, a super complex of BRCA1-associated proteins involved in the recognition and repair of aberrant DNA structures. Genes & Development, 2000, v.14, p.927–939. [40] Hedenfalk I. A., Ringner M., Trent M.J., Borg A., Gene Expression in Inherited Breast Cancer. Advances in Cancer Research, 2002, v. 84, p. 1-34. [41] Liu Y., West S.C. Distinct functions of BRCA1 and BRCA2 in double-strand break repair. Breast Cancer Research, 2002, v. 4, p. 9-13. [42] Banerjea A., Ahmed S., Hands R.E., Huang F., Han X., Shaw P.M., Feakins R., Bustin S.A., Dorudi S. Colorectal cancers with microsatellite instability display mRNA expression signatures characteristic of increased immunogenicity. Molecular Cancer, 2004, v. 3:21. [43] Dang C.V. c-Myc target genes involved in cell growth, apoptosis, and metabolism. Molecular and Cellular Biology, 1999, v.19, p. 1-11. [44] Prendergast G.C. Mechanisms of apoptosis by c-Myc. Oncogene, 1999, v.18, p. 29672987. [45] Reuter C.W., Morgan M.A., Bergmann L. Targeting the Ras signaling pathway: a rational, mechanism-based treatment for hematologic malignancies? Blood, 2000, v. 96, № 5, p.1655-1669. [46] Sinn E., Muller W., Pattengale P., Tepler I., Wallace R., Leder P. Coexpression of MMTV/v-Ha-ras and MMTV/c-myc genes in transgenic mice: synergistic action of oncogenes in vivo. Cell, 1987, v. 49, № 4, p. 465-475. [47] Bachireddy P., Bendapudi P.K., Felsher D.W. Getting at MYC through RAS. Clinical Cancer Research, 2005, v. 11, № 12, p. 4278-4281. [48] Podsypanina K., Li Y., Varmus H.E. Evolution of somatic mutations in mammary tumors in transgenic mice is influenced by the inherited genotype. BMC Medicine, 2004, v. 2:24. [49] Baranov V.S. Geneticheskie osnovy predraspolozhennosti k nekotorym chastym mul'tifaktorial'nym zabolevaniyam. Medicinskaya genetika [Russian], 2004, t. 3, № 03. [50] Grosso L.M., Triche E.W., Belanger K., Benowitz N.L., Holford T.R., Bracken M.B. Caffeine Metabolites in Umbilical Cord Blood, Cytochrome P-450 1A2 Activity, and Intrauterine Growth Restriction. American Journal of Epidemiology, 2006, v. 163 №. 11, p. 1035-1041.
100
M. M. Aslanyan, S. S. Litvinov, E. S. Tsyrendorzhieva et al.
[51] Hedenmalm K., Guzey C., Dahl M.L., Yue Q.Y., Spigset O. Risk factors for extrapyramidal symptoms during treatment with selective serotonin reuptake inhibitors, including cytochrome P-450 enzyme, and serotonin and dopamine transporter and receptor polymorphisms. Journal of Clinical Psychopharmacology, v. 26, № 2, p. 192197. [52] Satianegara G., Rogers P.L., Rosche B. 2006. Comparative studies on enzyme preparations and role of cell components for (R)-phenylacetylcarbinol production in a two-phase biotransformation. Biotechnology and Bioengineering, 2006, v. 94, i. 6, p. 1189-1195. [53] Nebert D.W., Carvan M.J. Ecogenetics: from biology to health. Toxicology and Industrial Health, . 1997, v. 13., p. 163-192. [54] Nebert D.W. Polymorphisms in drug-metabolizing enzymes: what is their clinical significance and why do they exist? The American Journal of Human Genetics, 1997, v. 60, p. 265–271. [55] Welcsh P.L., Owens K.N., King M.C. Insights into the functions of BRCA1 and BRCA2 Trends in Genetics, 2000, v. 16, № 2, p. 69-74. [56] Arason A., Barkardottir R., Egilsson V. Linkage analysis of chromosome 17q markers and breast-ovarian cancer in Icelandic families, and possible relationship to prostate cancer. American Journal of Human Genetics, 1993, v. 52, p. 711-717. [57] Tarasov V.A., Aslanyan M.M., Tsyrendorzhiyeva E.S., Gar‘kavtseva R.F., Lyubchenko L.N., Altukhov Yu.P. The dependence of the risk of breast cancer in women on their genotype. Doklady Biological Sciences, 2004, t. 398, pp. 391-394. [58] V. A. Tarasov, M. M. Aslanyan, E. S. Tsyrendorzhiyeva, R. F. Garkavtseva, L. N. Lyubchenko, Yu. P. Altukhov and V. A. Mel‘nik. Population Genetic Analysis of the Association Between the BRCA1 and P53 Gene Polymorphisms and the Risk of Sporadic Breast Cancer. Russian Journal of Genetics, 2005, v. 41, N. 8. [59] Tarasov V.A., Aslanyan M.M., Tsyrendorzhiyeva E. S, Litvinov S. S., Gar‘kavtseva R. F. and Yu. P. Altukhov Yu. P., Genetically determined subdivision of human populations with respect to the risk of breast cancer in women. Doklady Biological Sciences, 2006, V. 406, N. 1-6, pp. 66-69.
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev, G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 3
ASSOCIATION OF CANDIDATE GENES POLYMORPHISM WITH ASTHMA IN BASHKORTOSTAN REPUBLIC OF RUSSIA E. K. Khusnutdinova*, A. S. Karunas, U.U. Fedorova and I.R. Gilyazova Institute of Biochemistry and Genetics, Ufa Science Center, Russian Academy of Sciences, Russia
ABSTRACT Asthma is a chronic inflammatory disease of respiratory tracts most probably caused by an interaction of genetic and environmental factors. Asthma is one of the most widespread and heavy chronic disorders both among children and adults. To reveal genetic risk factors for asthma development in Bashkortostan Republic of Russia, we‘ve examined associations between genetic polymorphisms of cytokines genes, β2-adrenergic receptor gene polymorphisms (ADRB2 gene), monocyte differentiation antigen (CD14), a disintegrin and metalloproteinase domain 33 (ADAM33) gene and asthma. The study sample included 156 asthma patients and 169 nonasthmatic subjects of Russian, Tatar and Bashkir ethnic origin – residents of Bashkortostan Republic. As a result of investigation, genetic markers of the increased and decreased risk of asthma development in Bashkortostan Republic were revealed. Significant genetic variation within ethnic groups of asthma patients have been demonstrated in asthma development.
ABBREVIATIONS ISAAC PCR RFLP-analysis *
[email protected]
International Study of Asthma and Allergy in Childhood Polymerase chain reaction Restriction fragments length polymorphism
102
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al. OR IL4 IgE IL4Rα IL9 IL10 TNFA ADRB2 CD14 ADAM33
Odds Ratio interleukin 4 immunoglobulin Е α-chain of receptor IL-4R interleukin 9 interleukin 10 tumor necrosis factor alpha β2-adrenergic receptor monocyte differentiation antigen (receptor for lipopolysaccharide) metalloprotease domain 33
INTRODUCTION Asthma is one of the most common chronic diseases, in which many cells and cellular elements play a role [1]. Inflammation in asthma contributes to airway hyperresponsiveness, airflow limitation, respiratory symptoms - coughing, especially at night or in the early morning, wheezing, shortness of breath or rapid breathing, chest tightness. Asthma is one of the most widespread and a socially- significant diseases, which has not reached constantly high level, but the prevalence of asthma has dramatically increased over the past two dacadesdecades. According to the World-wide organization of public health services data, 100-150 million people all over the world suffer from asthma [2]. Asthma prevalence in various countries strongly varyies as among children (from 6,6 %% in Denmark to 34,0 %% in New Zealand and 42,2 %% in Austria), and among adults (from 2,7 - 4,0 %% in Germany, Spain and France to 12,0 %% in England and Australia) [1]. Recent studies under the «ISAAC» program (International Study of Asthma and Allergy in Childhood), performed in some cities of Russia in the period of 1993-1998, have demonstrated high asthma prevalence among children of 7-8 years old: 16,9 %% in Moscow and 10,6 %%-11,1 %% in Irkutsk; less asthma frequency has been shown among children of 13-14 years old: from 8 % in Moscow to 12,1 %% in Irkutsk. Asthma prevalence among adults in Russia varies from 5,6 %% in Irkutsk to 7,3 %% in St.-Petersburg. The data received using ISAAC questionnaires differ from the data of official statistics according to which the prevalence of asthma in Russia is 0,66 %% (according to Ministry of Public Health of Russian Federation, 2002) [3]. A recent statistical report (2005) estimated that the prevalence of asthma in Bashkortostan Republic is 0,776 %% that is comparable to the official data about asthma prevalence in Russia. Pathogenesis of asthma is complex. The majority of researchers consider that predisposition to asthma is caused by combination of some allelic gene variants in a genotype of the individual, resulting in an adverse hereditary background, realized in interaction with environmental factors (allergens, pollutants, professional sensitizers, respiratory infections, smoking, etc.) . Genome-wide linkage studies for asthma predisposition are conducted all over the world. Considerable effort has been extended in whole-genome screens aimed at detection of genetic loci contributing to the susceptibility to human complex diseases. More than a hundred of genes coding proteins which function is closely connected with asthma pathogenesis - cytokine, chemokine genes and their receptors, human leukocyte antigen (HLA) genes, inflammatory mediators genes and their receptors, β2-adrenergic receptor gene,
Association of Candidate Genes Polymorphism with Asthma…
103
xenobiotic biotransformation genes, etc. - have shown significant evidence of linkage to asthma [4-8]. More than 20 genome-wide studies have identified genetic regions involved in the disease, such as 5q31.1-33, 6p12-21.2, 11q12-13, 12q14-24.1, 16p12.1-11.2, Xq28/Yq12, and DNA-loci, associated with asthma development, were revealed. Recently, positional cloning allowed to identify five5 genes, which functions are still unknown: ADAM33, DPP10, PHF11, GPRA, and SPINK5 [6, 9]. Thus, numerous molecular-genetic studies of asthma, combining positional cloning and fine mapping studies indicated new data concerning genetic bases and pathology of asthma, but there are still a lot of unclear questions in etiology and pathogenesis of the disease. High prevalence and steady growth of asthma all over the world define the important social, economic and medical value of the problem, necessity to study mechanisms of asthma development to create effective methods of diagnostics, prevention and pathogenetic therapy taking into consideration ethnic origin and genetic features of each patient. Molecular-genetic studies of asthma in Russia are not numerous and are devoted, basically, to xenobiotic detoxication genes and cytokine gene polymorphisms research in asthma development [10-13]. In the laboratory of human molecular genetics at the Institute of Biochemistry and Genetics (Ufa Science Centre, Russian Academy of Sciences) GSTM1, chemokine receptor of macrophages (chemokine receptor 5 (CCR5)) and angiotensinconverting enzyme in children with atopic asthma from Bashkortostan Republic, and their relatives have been performed. Polymorphic variants of glutathione-S-transferase М1 gene and angiotensin-converting enzyme, being genetic markers of the increased and decreased risk for atopic asthma development are detected [14]. The study included DNA samples of asthma patients and healthy donors from Bashkortostan Republic. The purpose of the present research was analysis of polymorphic variants of cytokine genes (IL-4, α-chain of IL-4 receptor, IL-9, IL-10, TNF-α), ADRB2 receptor gene, CD14, and ADAM33 in asthma patients and nonasthmatic subjects. Taking into consideration the variety of populations, living in Bashkortostan, their heterogeneity and existence of significant differences in genetic risk factors of diseases development in different ethnic groups, the study samples consisted of three widespread populations of Bashkortostan: Russians, Tatars and Bashkirs. DNA collection included 156 affected patients of three ethnic origins (60 Russians, 33 Tatars, 16 Bashkirs and 47 patients from mixed ethnic origin marriages). The mean age of asthma patients was 46,5 years old. Asthma was diagnosed according to the principles proposed by GINA (Global Strategy for asthma management and prevention) and guideline «Asthma. A manual for doctors in Russia» on the basis of clinical and laboratory examination data, spirometry, R-graphy, skin and allergy tests results. The control group consisted of healthy individuals of Russian (58 individuals), Tatar (61 individuals) and Bashkir (50 individuals) ethnic origin from Bashkortostan Republic. The informed consent was obtained from each participant of investigation. Genomic DNA was isolated from peripheral blood leukocytes by a standard procedure of phenol-chloroform extraction [15]. The fragments of the examined loci were amplified using polymerase chain reaction (PCR) in PCR-machine ―Теrcik‖ ("DNA-technology" company, Moscow). Polymerase chain reaction was carried out in 25 µL reaction mixture containing 2,5 µL 10хTaq-buffer (67 mM Tris-HCl (pH 8,8), 16,6 mM (NH4)2 SO4, 1,5мМ MgCl2, 0,01 %% Тween-20), 0,1 mkg of genomic DNA, 1.0 mM of each dNTP and 1 unit of Taq-DNA polymerase (" Silex" company, Moscow) and 5-10 pM of each primer, specific for each locus.
Тable 1. Polymorphism, nucleotide sequence of primers and nomenclature of the analyzed DNA loci Gene, chromosomal localization ADRB2 5q31-32
Polymorphism, restriction enzyme, gene localization, reference
Primers
Alleles (fragments lengths, b.p.)
Arg16Gly ( NcoI - RFLP), exon 1 Holloway J. et al., 2000 [41]
5`-CCT TCT TGC TGG CAC CCC AT-3` 5`-GGA AGT CCA AAA CTC GCA CCA-3`
*Arg16 (308), *Gly16 (292, 16)
Gln27Glu (BbvI - ПДРФ), exon 1 Holloway J. et al., 2000 [41] IL4 5q31-33
-590C>T (AvaII -RFLP) promoter region Noguchi E. et al., 1998 [22]
*Gln27 (259, 49), *Glu27 (308) 5`-TAA ACT TGG GAG AAC ATG GT-3` 5`-TGG GGA AAG ATA GAG TAA TA-3`
IL4*C (177, 18) IL4*T (195)
Ile50Val ( MslI- RFLP), exon 3 Kauppi P. et al., 2001 [56]
5'-CTGTTGCTATGACCCCACCT-3' 5'-AGGTGACCAGCCTAACCCAG-3'
*Ile50 (308) *Val50 (254, 54)
Gln576Arg (MspI - RFLP) exon 12 Kauppi P. et al., 2001 [56]
5`-CCCCCACCACCAGTGGCTACC-3` 5`-CCAGGAATGAGGTCTTGGAA-3`
*Gln576 (221) *Arg576 (204, 17)
IL9 (5q31-33)
Thr113Met (NcoI – RFLP), exon 5 Walley A. et al., 2001 [19]
5`-GGC TGC TTG GCT CTA CAT C-3` 5`-ATT TAG AGT AGC TTA CTT G-3`
Thr113 (269) Met113 (172, 97)
IL10 (1q31-32)
-627А/С (RsaI - RFLP), Promoter region Hang L. et al., 2003 [33]
5`-CCTAGGTCACAGTGACGTGG-3` 5`-GGTGAGCACTACCTGACTAGC-3`
IL10*A (236, 176) IL10*C (412)
TNFA 6p21.3
-308 G>A (NcoI - RFLP), promoter region Karplus T.M., et al., 2002 [57]
5'-AGG CAA TAG GTT TTG AGG GCC AT - 3‘ 5'-TCC TCC CTG CTC CGA TTC GG -3'.
TNFA*G (87, 20), TNFA*A (107)
CD14 (5q31.1)
-159C/T (AvaII - RFLP) promoter region Baldini M.. et al., 1999 [45]
5`-GTGCCAACAGATGAGGTTCAC-3` 5`-GCCTCTGACAGTTTATGTAATC-3`
CD14*C (497) CD14*T (353, 144)
7575 G/A Intron 6 F+1 (Msp I - RFLP) Сheng L. et al., 2004 [58]
5`-GGGGAGCCCTCCAAATCAGAAGAGCC-3` 5`-AGTGGAAGCTGCTGGGCTT-3`
ADAM33*G ADAM33*A
IL4Rα (16p12)
ADAM33 20q13
Association of Candidate Genes Polymorphism with Asthma…
105
The list of the investigated loci, sequences of primers, sizes of amplified fragments are presented in Table 1. Definition of nucleotide changes was performed using restriction fragments length polymorphism (RFLP) analysis (Table 1). RFLP analysis products were resolved on a 7% polyacrylamide gel stained with ethidium bromide followed by the subsequent visualization in UV light. A statistical analysis was spent on IBM (Pentium 4) using Statistica 6.0 program [16], and Microsoft Exсel software application.
THE ASSOCIATION ANALYSIS OF POLYMORPHIC VARIANTS OF THE INTERLEUKIN- 4 GENE AND THE α-CHAIN OF RECEPTOR IL-4Rα WITH ASTHMA IL-4 is a pleiotropic cytokine that plays a critical role in the induction and maintenance of allergy and respiratory tracts inflammations. Large gene cluster of cytokines (IL-3, IL-4, IL-5, IL-9, IL-13, and the granulocyte-macrophage colony stimulating factor) has been mapped to chromosome 5q31-34 where asthma and associated clinical signs have also been linked [17-19]. The b2 adrenergic receptor and other genes connected with asthma development (CD14, glucocorticoid receptor 1, fibroblasts growth factor) are also located in this chromosome region. IL-4 plays a major role in allergic inflammatory development and IgE production. IL4 exerts its biological effects through binding to the IL-4 receptor complex, generating high serum IgE levels. IL-4 activates cellular adhesion molecules in vessels endothelium that results in T cells, monocytes, basophils, and eosinophilic cell migration to inflammation centre [20]. In 1995, Rosenwasser L.J et al. revealed that polymorphism in promoter of the IL4 gene – transition of cytosine to thymine in -590 position (-590С>Т) was associated with asthma and increased total IgE level in American population [21]. Subsequently, several studies reported associations of the IL4 gene polymorphic variants with asthma in Japanese and European populations [22, 23]. In 2003, Kabesh M. et al. performed complete screening of the IL-4 in asthma children of German ethnic origin, and revealed 16 polymorphisms, fourteen of which haven‘t been reported previously, and established association of ten closely linked polymorphisms of the IL-4 gene with asthma and elevated levels of serum IgE [24]. We carried out the analysis of the IL4 gene promoter polymorphism (-590С>Т) in asthma patients and healthy individuals from Bashkortostan Republic. The results of the investigations are summarized in Table 2. The analysis of heterogeneity revealed statistically significant differences within the control group between Tatars and Russians (χ2=7,7; р=0,009 and χ2=8,65; р=0,009, respectively) as well as between Tatars and Bashkirs (χ2=9,21; р=0,002 and χ2=9,33; р=0,008) when allele and genotype frequencies of the examined polymorphism were compared. The IL4*С/*С genotype frequency was significantly higher in Tatars (67, 21%) than in Russians (41, 38%, χ2=8,005; р=0, 0046) and Bashkirs (44%, χ2=6, 03; р=0,014). The IL4*Т/*Т genotype was found at a frequency of 14% in Bashkirs, that was higher compared to Tatars (1 64%, χ2=6,28; р=0,012) and Russians (6,9%, χ2=1,48; р=0,22). Allele IL4*С was the most frequent in all examined groups, and the highest frequency was observed in Tatars (82,79%) versus 67,24% in Russians and 65% in Bashkirs. According to literature data, allele IL4*С is prevalent in European populations
Table 2. Allele and genotype frequency distributions of the IL-4 -590С>Т polymorphism in asthma patients and healthy individuals
Group 1
2
Control group
Russians
Tatars
Bashkirs
Control (in whole)
Asthma patients group
Russians
Tatars
Bashkirs Asthma patients (in whole)
Alleles C T ni, pi±sp, ni, pi±sp, CI % CI % 3 4 78 38 67,24±4,36 32,76±4,36 (57,91-75,67) (24,33-42,09) 101 82,79±3,42 (74,9-89,02) 65 65±4,77 (54,82-74,27) 244 72,19±2,44 (67,08-76,9) 89 74,17±4 (65,38-81,72) 46 69,7±5,66 (57,15-80,41) 24 75±7,65 (56,6-88,54) 224 71,79±2,55 (66,45-76,72)
N1 5
СС ni, pi±sp, CI % 6
Genotypes СТ ni, pi±sp, CI % 7
N2 ТТ ni, pi±sp, CI % 8
9
116
24 41,38±6,47 (8,6-55,07)
30 51,72±6,56 (38,22-65,05)
4 6,9±3,33 (1,91-16,73)
58
21 17,21±3,42 (10,98-25,1)
122
41 67,21±6,01 (54-78,69)
19 31,15±5,93 (19,9-44,29)
1 1,64±1,63 (0,04-8,8)
61
35 35±4,77 (25,73-45,18)
100
22 44±7,02 (29,99-58,75)
21 42±6,98 (28,19-56,79)
7 14±4,91 (5,82-26,74)
50
94 27,81±2,44 (23,1-32,92)
338
87 51,48±3,84 (43,68-59,23)
70 41,42±3,79 (33,91-49,24)
12 7,1±1,98 (3,72-12,07)
169
33 55±6,42 (41,61-67,88) 16 48,48±8,7 (30,8-66,46) 8 50±12,5 (24,65-75,35) 80 51,28±4 (43,16-59,35)
23 38,33±6,28 (26,07-51,79) 14 42,42±8,6 (25,48-60,78) 8 50±12,5 (24,65-75,35) 64 41,03±3,94 (33,22-49,17)
4 6,67±3,22 (1,85-16,2) 3 9,09±5 (1,92-24,33)
31 25,83±4 (18,28-34,62) 20 30,3±5,66 (19,59-42,85) 8 25±7,65 (11,46-43,4) 88 28,21±2,55 (23,28-33,55)
120
66
32
312
60
33
0
16
12 7,69±2,13 (4,04-13,05)
156
Reference here and further: N1,, N2 – number of individuals examined; pi – genotype (allele) frequency; sp –pi mistake, CI % - 95% confidence interval
Association of Candidate Genes Polymorphism with Asthma…
107
(70-82%), whereas IL4*Т allele is found at a high frequency (54-70%) in Afro-American and Mongoloid populations [11, 22-25]. No statistically significant differences between patients with asthma and controls were found in allele and genotype frequency distributions (р>0,05). However, statistically significant differences were revealed between asthma patients of Tatar ethnic origin and control Tatars (χ2=4,8; р=0,05). The IL4*Т allele frequency in asthma patients was significantly higher than in control group – 30,3% versus 17,21%, respectively (χ2=4,3; р=0,038). Risk of asthma development was assessed with the two-sided χ2 test with Yates‘s correction. Odds ratio (OR) for asthma patients, carrying IL4*Т allele was 2,09 (95%CI= 1,03-4,23). Moreover, our results showed a trend of association between IL4*Т/*Т and IL4*С/*Т genotypes with asthma. Genotypes IL4*Т/*Т and IL4*С/*Т were more frequently observed in asthma patients than in controls (9,09% versus 1,64%, 42,42% versus 31,15%, respectively). The genotype IL4*С/*С, on the contrary, was found at a lower frequency in patients (48,48 %%) than in control group of Tatars (67,21 %%), OR=0,46 (CI95 %% 0,18-1,09), χ2=3,15; р=0,076. In asthma patients of Russian ethnic origin, there was a tendency to IL4*С/*С genotype frequency increasing: it was revealed in Russian patients at a frequency of 55% versus 41,38% in controls (OR=1,73 (0,84-3,59), χ2=2,19, р=0,13). No statistically significant differences were found between patients exhibiting moderate and severe disease forms when the allele and genotype frequencies of the examined locus were compared. Taking into consideration the association of -590C>T polymorphism of the IL-4 gene with the severity of asthma which has been previously reported, and, reduction of FEV1 associated with -590*Т allele and -590*Т/*Т genotype in particular [25, 26], we‘ve conducted the analysis of genotype and allele frequencies of the given locus in relation to one of the basic parameters – FEV1 (forced expiratory volume in 1 s), and level of lung function according to spirometry data (tab. 3). The tendency of the IL4*Т allele association with FEV1 and lower level of lung function was revealed in asthmatic subjects. It was shown, that in patients with FEV1 value 20-39 %% from normal parameter, the IL4*С/*Т genotype frequency was 52,94 %% that was considerably higher in comparison with 21,05 %% of patients with FEV1 value more than 80 %% (χ2=3,88, р=0,049). Heterozygous genotype was statistically more frequent (50%) in patients with lung function value 40-59% than in patients with higher parameters of lung function (80% and higher) (25,45%) (χ2=6,63, р=0,01). The genotype IL4*С/*С, on the contrary, was found more frequently in patients with lower lung function, i.e., with higher parameters of FEV1 and lung capacity. The results of our investigation confirm the data of other researchers about the association of the IL4*-590Т allele with asthma severity [25, 26]. Thus, the analysis of -590С> Т polymorphism of the IL4 gene showed significant differences either between healthy individuals of Tatar and Russian ethnic origin or between Tatar and Bashkir ethnic origin. The allele IL4*Т was found to be a marker of the increased risk for asthma development in Tatars and genotype IL4*С/*Т was shown to be associated with low level of lung function. IL-4 acts through the IL-4 receptor (IL-4R) that consists of two subunits, the α chain (IL4RA) and the γ chain. IL4RA is a functionally significant component which plays an essential role in IgE production. The IL4RA gene is located on chromosome 16p (16p12.1), a region reported in linkage with asthma [18]. More than thirty singlenucleotide polymorphisms (SNPs) have been identified in the coding region of the IL4RA gene. The majority of SNPs (about thirteen) are located in exon 12 and results in
Table 3. Allele and genotype frequency distributions of -590С>Т polymorphism of the IL4 gene in asthma patients with different indices of spirometry FEV1 (forced expiratory volume) 20-39 % 40-59 % 60-79 % 80 % and higher
Alleles C ni, pi±sp 46 67,65±5,67 82 73,21±4,18 67 71,28±4,67 30 78,95±6,61
T ni, pi±sp 22 32,35±5,67 30 26,79±4,18 27 28,72±4,67 8 21,05±6,61
66 68,75+4,73 74 69,81+4,46 84 76,36+4,05
30 31,25+4,73 32 30,19+4,46 26 23,64+4,05
N1 68 112 94 38
СС ni, pi±sp 14 41,18±8,44 31 55,36±6,64 23 48,94±7,29 13 68,42±10,66
Genotypes СТ ni, pi±sp 18 52,94±8,56 20 35,71±6,4 21 44,68±7,25 4 21,05±9,35
ТТ ni, pi±sp 2 5,88±4,03 5 8,93±3,81 3 6,38=3,56 2 10,53±7,04
21 43,75+7,16 24 45,28+6,84 35 63,64+6,49
24 50+7,22 26 49,06+6,87 14 25,45+5,87
3 6,25+3,49 3 5,66+3,17 6 10,91+4,2
N2 34 56 47 19
FVC (forced vital capacity) 40-59 % 60-79 % 80 % and higher
96 106 110
48 53 55
Association of Candidate Genes Polymorphism with Asthma…
109
aminoacid substitutions. Approximately 14 SNPs of the IL4RA gene are considered to be polymorphisms [27, 28]. The association between Ile50Val and Gln576Arg polymorphisms of the IL4RA gene with atopy and severe asthma has been revealed in some recent works. [23, 26]. K. Ober et al. carried out investigation of nonsynonymous substitutions in the IL4RA gene in asthma families of various ethnic origin (Hatterites, outbredout bred whites, blacks from Chicago and Baltimore, Hispanics) as a result of which all population samples showed evidence of association to atopy and asthma, but the alleles and haplotypes showing the strongest evidence differed between the groups [27]. We‘ve lead the association analysis of two polymorphisms of the IL4RA gene (Ile50Val and Gln576Arg) with asthma in Bashkortostan Republic. The analysis of Ile50Val polymorphism of the IL4RA gene hasn‘t revealed statistically significant differences between asthma patients and controls (tab. 4). The analysis of allele and genotype frequencies between Russian, Tatar and Bashkir nonasthmatic subjects, showed significant differences between Russians and Tatars (χ2=2,7; р=0,05 and χ2=4,64; р=0,09, accordingly). Homozygous allele IL4RA*Ile50 frequency in Russians was significantly lower (24,14%) than in Tatars (42,62%, χ2=4,55; р=0,033) and Bashkirs (38%, χ2=2,43; р=0,11). Heterozygous genotype was found at a frequency of 58,62% in Russians, 42,62% in Tatars (χ2=3,04; р=0,08) and 40% in Bashkirs (χ2=3,72; р=0,05). Allele IL4RA*Ile50 was the most prevalent in all examined groups, but it was found in Russians at a lower frequency (53,45% of chromosomes) than in Tatars (63,93 %%) and Bashkirs (58%). The analysis of genotype and allele frequencies of the IL4RA gene in patients and controls taking into consideration their ethnic origin, showed that the allele IL4RA*Ile50 and genotype IL4RA*Ile50/*Ile50 were more frequent in Russian asthma patients (64,17% и 40%) than in control group of Russians (53,45% и 24,14%). Odds ratio for homozygous genotype IL4RA*Ile50/*Ile50 carriers was 2,1 (95%CI 0,98-4,5), χ2=3,4, p=0,05. The frequency of the IL4RA*Ile50/*Ile50 genotype was higher in patients with severe asthma form (42,19%) compared to moderate form of the disease (30,43%, χ2=2,28, p=0,131). The prevalence of the IL4RA*Ile50/*Ile50 genotype in patients of Russian ethnic origin with severe asthma compared to moderate asthma patients and controls of the same ethnic origin was 46,15% versus 35,29% and 24,14%. Odds ratio for severe asthma development in Russians was 2,69 (1,01-7,16), χ2=2,69, p=0,043. The analysis of allele and genotype frequencies and spyrometry data showed the correlation between IL4RA*Ile50/*Ile50 genotype frequency and parameters of FEV1 and level of lung function. The IL4RA*Ile50/*Ile50 genotype frequency was twice higher (52,94%) in asthma patients with low FEV1 value (20-39%) than in patients with high FEV1 value (80% and more) - (28,57%) (χ2=3,87; р=0,048). The frequency of this genotype in patients with 4059% of lung capacity was also higher than in patients with high lung capacity value (43,75% versus 27,27%) (χ2=3,06; р=0,08). The allele IL4RA*Ile50 was found at a frequency of 75 %% in patients with low parameters of FEV1 (20-39 %%) and at a frequency of 57,14% in patients with high FEV1 values (80% and higher) (χ2=3,82; р=0,05). The data of our research are consistent with the results of K. Mitsuyasu et al., who revealed association of the IL4RA*Ile50 allele with allergic asthma and found IL4RA*Ile50 variant to 3- fold increase of IL4 response in comparison with IL4RA*Val50 due to receptor subunit activity increasing [29].
Table 4. Allele and genotype frequency distributions of Ile50Val polymorphism of the IL4Rα gene in asthma patients and healthy individuals Group
Control group
Russians
Tatars
Bashkirs Controls (in whole)
Asthma patients
Russians
Tatars
Bashkirs Asthma patients (in whole)
Alleles Ile50 Val50 62 54 53,45±4,63 46,55±4,63 (43,95-62,76) (37,24-56,05) 78 44 63,93±4,35 36,07±4,35 (54,75-72,43) (27,57-45,25) 58 42 58±4,94 42,00±4,94 (47,71-67,8) (32,2-52,29) 198 140 58,58±2,68 41,42±2,68 (53,12-63,88) (36,12-46,88) 77 43 64,17±4,38 35,83±4,38 (54,9-72,71) (27,29-45,1) 40 26 60,61±6,01 39,39±6,01 (47,81-72,42) (27,58-52,19) 17 15 53,12±8,82 46,88±8,82 (34,74-70,91) (29,09-65,26) 193 119 61,86±2,75 38,14±2,75 (56,22-67,27) (32,73-43,78)
N1 116
122
100
338
120
66
32
312
Ile50/Ile50 14 24,14±5,62 (13,87-37,17) 26 42,62±6,33 (30,04-55,94) 19 38±6,86 (24,65-52,83) 59 34,91±3,67 (27,75-42,61) 24 40±6,32 (27,56-53,46) 11 33,33±8,21 (17,96-51,83) 4 25±10,83 (7,27-52,38) 55 35,26±3,83 (27,79-43,3)
Genotypes Ile50/Val50 34 58,62±6,47 (44,93-71,4) 26 42,62±6,33 (30,04-55,94) 20 40±6,93 (26,41-54,82) 80 47,34±3,84 (39,62-55,15) 29 48,33±6,45 (35,23-61,61) 18 54,55±8,67 (36,35-71,89) 9 56,25±12,4 (29,88-80,25) 83 53,21±3,99 (45,06-61,23)
Val50/Val50 10 17,24±4,96 (8,59-29,43) 9 14,75±4,54 (6,98-26,17) 11 22±5,86 (11,53-35,96) 30 17,75±2,94 (12,31-24,36) 7 11,67±4,14 (4,82-22,57) 4 12,12±5,68 (3,4-28,2) 3 18,75±9,76 (4,05-45,65) 18 11,54±2,56 (6,98-17,62)
N2 58
61
50
169
60
33
16
156
Table 5. Allele and genotype frequency distributions of Ile50Val polymorphsm of the IL4RA gene in asthma patients with different indices of spirometry FEV1 (forced expiratory volume) 20-39 % 40-59 % 60-79 % 80 % and higher FVC (forced vital capacity 40-59 % 60-79 % 80 % and higher
Alleles Ile50 ni, pi±sp 51 75±5,25 67 59,82±4,63 51 56,67±5,22 24 57,14±7,64
Val50 ni, pi±sp 17 25±5,25 45 40,18±4,63 39 43,33±5,22 18 42,86±7,64
67 69,79+4,69 63 59,43+4,77 63 57,27+4,72
29 30,21+4,69 43 40,57+4,77 47 42,73+4,72
N1 68 112 90 42
96 106 110
Ile50/ Ile50 ni, pi±sp 18 52,94±8,56 18 32,14±6,24 13 28,89±6,76 6 28,57±9,86
Genotypes Ile50/Val50 ni, pi±sp 15 44,12±8,52 31 55,36±6,64 25 55,56±7,41 12 57,14±10,8
Val50/Val50 ni, pi±sp 1 2,94±2,9 7 12,5±4,42 7 15,56±5,4 3 14,29±7,64
21 43,75+7,16 19 35,85+6,59 15 27,27+6,01
25 52,08+7,21 25 47,17+6,86 33 60+6,61
2 4,17+2,89 9 16,98+5,16 7 12,73+4,49
N2 34 56 45 21
48 53 55
112
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al.
Thus, the analysis of Ile50Val polymorphism of the IL4RA gene showed statistically significant differences between healthy individuals of Russian and Tatar ethnic origin. The investigation results also demonstrated the association of the IL4RA*Ile50/*Ile50 genotype with asthma and its severity in Russian patients and association of this genotype and IL4RA*Ile50 allele with lung function abnormalities. The analysis of allele and genotype frequencies of the Gln576Arg polymorphism of the IL4RA gene revealed no significant differences between asthma patients and controls (р>0,05). The IL4RA*Gln576 allele was prevalent either in asthma patients (79,81% of chromosomes) or in healthy individuals (81,31% of chromosomes). The most frequent genotype IL4RA*Gln576/*Gln576 was revealed in 62,18% of patients and 63,91% of controls. The IL4RA*Arg576/*Arg576 genotype was rare both in patients (2,56%) and controls (1,78%). No significant differences were observed in the genotype and allele frequencies of the IL4RA gene between patients of different ethnic origins, exhibiting different disease forms and severity and corresponding controls. Finally, in the present work there was a lack of association of the polymorphism Gln576Arg of the IL4RA gene with asthma in Bashkortostan Republic. The analysis of genotype combinations in the IL4RA gene polymorphisms (Ile50Val and Gln576Arg) found out eight of nine potential different combinations. The combination *Val50/*Val50-*Arg576/*Arg576 was not revealed. The most prevalent combinations were the following *Ile50/*Val50-*Gln576/*Gln576 (32,05% in patients versus 27,22% in controls); *Ile50/*Ile50-*Gln576/*Gln576 (21,15% versus 23,08%, respectively); *Ile50/*Val50-*Gln576/*Arg576 (19,87% versus19,53%). No statistically significant differences between the group of patients with asthma and the controls were found when genotype combinations frequency distributions were compared (p>0,05). Furthermore, patients were subdivided into groups according to the ethnic origin, form and severity of the disease, but there wasere also no significant differences observed when genotype combinations frequencies were analyzed (p>0,05). Taking into consideration the fact that IL4 and IL4RA proteins interact, we investigated whether there was an association between genotypes combination of IL4*590C>Т and IL4RA*Ile50Val with asthma. We did not detect differences in global distribution of genotypes combinations of these polymorphic loci between the group of patients and the group of controls (p>0,05). There were eight types of genotype combinations, three of them were the most frequent: IL4*C/*C-IL4RA*Ile50/*Val50, IL4*C/*T-IL4RA*Ile50/*Val50, IL4*C/*C-IL4RA*Ile50/*Ile50. Heterozygous genotype combinations frequency analysis between patients of different ethnic origin and nonasthmatic subjects of the same ethnicity revealed a significant difference between Russians and Tatars. The prevalence of this combination was detected in nonasthmatic subjects of Russian ethnic origin at a frequency of 31,03%, whereas only 15% of patients carried such combination (OR=0,39 (0,16-0,96), χ2=4,29; р=0,038). On the contrary, this combination was more frequent in asthma patients of Tatar ethnic origin (27,27%) compared to controls (14,75%) - OR=2,17 (0,76-6,15), χ2=2,17; р=0,14). The combination IL4*C/*C-IL4RA*Ile50/*Ile50 was more frequent in patients with severe asthma (23,44%) than in moderate asthma group (10,87%, χ2=4,43; р=0,035). Thus, case-control study of polymorphic variants of the IL4 and IL4RA genes in Bashkortostan Republic showed that the IL4*Т allele of the IL4 gene is a risk marker for asthma in Tatars; IL4RA*Ile50/*Ile50 genotype is a risk factor for asthma in Russians and is associated with severe disease form, and heterozygous genotype combinations IL4*C/*T-IL4RA*Ile50/*Val50 is a protective factor for asthma development in Russians,
Association of Candidate Genes Polymorphism with Asthma…
113
genotypes IL4*C/*Т and IL4RA*Ile50/*Ile50 are associated with lung function abnormalities.
ASSOCIATION ANALYSIS OF THR113MET POLYMORPHISM OF THE INTERLEUKIN 9 GENE WITH ASTHMA The interleukin-9 (IL9) gene is one of the cytokine genes located on chromosome 5q31-34, which plays an important role in the development of allergic inflammatory process. Taking into account the location of the IL9 gene and the fact that in transgenic mice the IL9 gene overexpressionover expression results in the development of an asthmatic phenotype, significantly higher expression of IL-9 mRNA and immunoreactivity in bronchial biopsies of asthmatics [30], we performed investigation of Thr113Met polymorphism of the IL9 gene in asthma patients and nonasthmatic individuals from Bashkortostan Republic. No statistically significant differences were found in allele and genotype frequency distributions of the Thr113Met polymorphism either between the group of asthma patients and controls or between ethnically subdivided groups of patients and controls. The frequency of IL9*Met113 allele in all examined groups was low - 14,66% in Russian nonasthmatic individuals, 13,11% - in Tatars, 9% - in Bashkirs. Its frequency in Russian asthma patients was 16,67%, in Tatar - 16,42%, Bashkir - 12,5%. The genotype IL9*Met113/*Met113 was observed in the control group only once in Russian subject, twice in asthma patients of Russian ethnic origin and once in patient of Tatar ethnic origin. The distribution of allele frequencies in the examined ethnic groups was similar to that of Finnish population where IL9*Met113 allele was found at a frequency of 15% in a whole sample and 13% in patients with asthma [31], and to results of asthma patients investigation in Tomsk, where the allele frequency was 18,06% in healthy individuals and 18,63% in asthma patients [10]. The analysis of allele and genotype frequencies of this polymorphism between patients with different disease forms and severity also revealed no significant differences. So, the results of our investigation have not pointed out an association between Thr113Met polymorphism of the IL9 gene and asthma in Bashkortostan Republic. The results of our investigation support the findings of two previous studies performed by Laitinen T. in Finnish population and Freidin M. in Russian population from Tomsk [10, 31].
ASSOCIATION ANALYSIS OF -627С>А PROMOTER POLYMORPHISM OF THE INTERLEUKIN-10 GENE WITH ASTHMA IL-10 is one of cytokines that might play a role in the process of inflammation and is therefore considered to be involved in the pathogenesis of asthma. It participates in both immunoproliferative and inflammatory responses. The anti-inflammatory effect of IL-10 is through the inhibition of macrophages and human polymorphonuclear leucocytes to the synthesis of proinflammatory cytokines, chemokines, and inflammatory enzymes. Low production of IL-10 was found in the alveolar macrophages and peripheral blood mononuclear cells of asthma patients. Taking into consideration data of literature about altered IL-10 synthesis in asthma patients, influence of promoter region polymorphisms on IL-10 gene expression, and association of this polymorphism with asthma [32, 33], we
114
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al.
investigated the potential relationship between asthma and -627С>А promoter polymorphism of the interleukin 10 (1q31-32) gene in Bashkortostan Republic. The analysis of allele and genotype frequencies revealed no statistically significant differences between group of patients and controls (p>0,05). The most frequent genotype in both groups was IL10*C/*C revealed in 55,13% of asthma patients and 49,11% of healthy individuals. The IL10*C and IL10*А alleles were found at a frequency of 74,68% and 25,32% in patients with asthma and 71,6% и 28,4% in controls, respectively. Statistically significant differences were observed between nonasthmatic subjects of Tatar and Bashkir ethnic origin (χ2=5,92; р=0,015; χ2=7,18; р=0,02). The frequency of IL10*С allele was higher in Tatars (77,87%) compared to Bashkirs (63%). The IL10*С/*С genotype was also revealed at a higher frequency in Tatars (57,38%) than in Bashkirs (36%) (χ2=6,24; р=0,013). The most frequent genotypes observed in Bashkirs were IL10*C/*А (54% versus 40,98% in Tatars, χ2=1,87; р=0,17) and IL10*А/*А (10% versus 1,64%, respectively, χ2=2,3; р=0,13). All the examined ethnic groups of patients showed no statistically significant differences in allele and genotype frequency distributions compared to nonasthmatic patients of the same ethnic origin. The distribution of alleles and genotypes in patients with different asthma severity revealed the increased frequency of the IL10*А allele (30,47% versus 21,74%, χ2=3,04; р=0,081) and the IL10*С/*А genotype in patients with severe asthma (45,31% versus 34,78%, χ2=1,76; р=0,18). S. Lim and co-workers has previously demonstrated that -627A allele/ATA haplotype (4000, -1200, and -627 polymorphisms) of the IL-10 gene was associated with low IL-10 expression in severe asthmatics [34]. Low level of the IL-10 gene expression will favor inflammatory, immune mediated, and profibrotic mechanisms of bronchial cells reaction. Thus the results of promoter polymorphism -627С>А of the IL-10 gene analysis revealed statistically significant differences in allelic and genotypic frequencies between healthy individuals of Tatar and Bashkir ethnic origin. Our data indicated that the IL-10 gene polymorphism is not associated with asthma in Bashkortostan Republic.
ASSOCIATION ANALYSIS OF POLYMORPHIC LOCUS -308G>A 0F THE TUMOR NECROSIS FACTOR α (TNFA) GENE WITH ASTHMA Association analysis between polymorphisms of the tumor necrosis factor α gene (TNFA) (6р21.1-21.3) and asthma has been intensively performed by scientists in different countries. Tumor necrosis factor α (TNFA) is particularly interesting because of involvement in the inflammatory reaction, elevated concentration in the airways, blood, sputum, bronchoalveolar lavage and alveolar macrophages cells of symptomatic subjects [35, 36]. In the studies reported so far, there are still controversies over the effects of TNFA polymorphisms on asthma. Association between a position –308 guanine (G)-toadenine (A) polymorphism in the TNF promoter and asthma has been tested in 19 published studies that included different age groups of individuals with asthma from a range of ethnic backgrounds. Association between the TNFА*А allele and asthma was reported in seven of these studies. Two studies showed an association between the wildtype TNFА*G allele and asthma. In nine additional studies, authors reported no association between asthma and this single-nucleotide polymorphism [37].
Table 6. Allele and genotype frequency distributions of -627С>А polymorphisms of the IL10 gene in asthma patients and healthy individuals
Asthma patients
Control group
Group
Alleles
N1
С
А
Russians
84 72,41±4,15 (63,34-80,3)
32 27,59±4,15 (19,7-36,66)
116
Tatars
95 77,87±3,76 (69,46-84,88)
27 22,13±3,76 (15,12-30,54)
122
Bahkirs
63 63±4,83 (52,76-72,44)
37 37±4,83 (27,56-47,24)
100
Control group (in whole)
242 71,6±2,45 (66,47-76,35)
96 28,4±2,45 (23,65-33,53)
338
Russians
93 77,5±3,81 (68,98-84,62)
27 22,5±3,81 (15,38-31,02)
120
46 69,7±5,66 (57,15-80,41) 25 78,12±7,31 (60,03-90,72)
20 30,3±5,66 (19,59-42,85) 7 21,88±7,31 (9,28-39,97)
233 74,68±2,46 (69,47-79,41)
79 25,32±2,46 (20,59-0,53)
Tatars
Bashkirs Asthma patients (in whole)
66
32
312
СС 30 51,72±6,56 (38,2265,05) 35 57,38±6,33 (44,0669,96) 18 36±6,79 (22,9250,81) 83 49,11±3,85 (41,35-56,9) 35 58,33±6,36 (44,8870,93) 16 48,48±8,7 (30,8-66,46) 10 62,5±12,1 (35,43-84,8) 86 55,13±3,98 (46,9763,09)
Genotypes СА
АА
24 41,38±6,47 (28,6-55,07)
4 6,9±3,33 (1,91-16,73)
58
25 40,98±6,3 (28,55-54,32)
1 1,64±1,63 (0,04-8,8)
61
27 54±7,05 (39,32-68,19)
5 10±4,24 (3,33-21,81)
50
76 44,97±3,83 (37,32-52,8)
10 5,92±1,82 (2,87-10,61)
169
23 38,33±6,28 (26,07-51,79)
2 3,33±2,32 (0,41-11,53)
60
14 42,42±8,6 (25,48-60,78) 5 31,25±11,59 (11,02-58,66)
3 9,09±5 (1,92-24,33) 1 6,25±6,05 (0,16-30,23)
61 39,1±3,91 (31,4-47,23)
9 5,77±1,87 (2,67-10,67)
N2
33
16
156
116
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al.
Polymorphism -308G>A of the TNFA gene indicated no significant association with asthma when the asthmatic subjects were compared with the nonasthmatic group. The TNFA*G allele was prevalent in both groups (87,18% and 89,05%, respectively). Analysis of heterogeneity hasn‘t revealed significant differences in allele and genotype frequency distributions in subgroups separated by ethnic origin (р>0,05). The frequency of the prevalent genotype TNFA*G/*G varied from 77,59% in Russians to 80,33% in Tatars; heterozygous genotype TNFA*G/*А was also found at a high frequency, whereas genotype TNFA*А/*А was found only in control group of Tatars (3,28 %%). There was also no statistically significant increase in the prevalence of the TNF308G>A polymorphism either in different ethnic groups from Bashkortostan Republic compared with corresponding controls or in the disease form and severity. In summary, the results of this study suggest that -308G>A polymorphism of the TNFA gene is not a risk factor for the development of asthma in populations of Bashkortostan Republic, that is consistent with data of many researchers [37].
ASSOCIATION ANALYSIS OF Β2 – ADRENERGIC RECEPTOR GENE POLYMORPHISMS (ARG16GLY AND GLN27GLU) WITH ASTHMA Many researches of asthma are focused on the β2 – adrenergic receptor gene (ADRB2) analysis because of its direct interaction with β2–agonists and its central role in the β2-agonist pathway. The β2-adrenergic agonists are the most potent bronchodilators for the treatment of asthma. Altered functional activity of β2 – adrenergic receptor was revealed in experimental animal models and in patients with severe asthma [38]. A total of 13 polymorphisms have been identified in the intronless ADRB2 gene, located on chromosome 5q31-32, four of which results in changes of amino-acid residues 16, 27, 34, and 164 [39, 40]. Two-closely linked polymorphisms, Arg16Gly and Gln27Glu, are the most widespread in European populations [40-42]. These polymorphisms according to the preliminary studies have been associated with increased bronchial responsiveness [42], total serum IgE levels, nocturnal and childhood asthma, and severe asthma [40, 41]. Current research in asthma pharmacogenetics has highlinghted associations between SNPs in the β-adrenergic receptors and modified response to regular inhaled β-agonist treatments (e.g., albuterol) [43]. We‘ve conducted Arg16Gly and Gln27Glu polymorphisms analysis of the ADRB2 gene in asthma patients and nonasthmatic individuals from Bashkortostan Republic to investigate the possible influence of the ADRB2 polymorphisms on the development of asthma. The analysis of Arg16Gly polymorphism showed no significant differences between asthma patients and controls (р>0,05) (table 7). When the subject groups were subdivided to focus on subgroups of different ethnic origin, we‘ve observed that allele and genotype frequencies differed significantly between healthy individuals of Tatar and Bashkir ethnic origin (χ2=4,29; р=0,038; χ2=4,57, р=0,1). A significantly lower frequency of homozygous genotype ADRB2*Gly16/*Gly16 was observed in Tatars (26,23%) compared with Bashkirs (44% χ2=3,85, р=0,049) and Russians 43,1%, χ2=3,75, р=0,05). The frequency of ADRB2*Gly16 allele in Tatars (53,28%) was similar to the average world-wide frequency (54,8%), in Russians (62,93%) and Bashkirs (67%) this allele prevalence was greater than average world-wide frequency and also more frequent than in populations of Caucasian (60,7%) and Asian
Table 7. Allele and genotype frequency distributions of Arg16Gly polymorphism of the ADRB2 gene in asthma patients and healthy individuals Group
Control group
Russians
Tatars
Bashkirs Control group (in whole)
Asthma patients
Russians
Tatars
Bashkirs Asthma patients (in whole)
Alleles Arg16 43 37,07±4,48 (28,29-46,53) 57 46,72±4,52 (37,64-55,97) 33 33±4,7 (23,92-43,12) 133 39,35±2,66 (34,11-44,78) 47 39,17±4,46 (30,39-48,5) 31 46,97±6,14 (34,56-59,66) 15 46,88±8,82 (29,09-65,26) 125 40,06±2,77 (34,58-45,73)
Gly16 73 62,93±4,48 (53,47-71,71) 65 53,28±4,52 (44,03-62,36) 67 67±4,7 (56,88-76,08) 205 60,65±2,66 (55,22-65,89) 73 60,83±4,46 (51,5-69,61) 35 53,03±6,14 (40,34-65,44) 17 53,12±8,82 (34,74-70,91) 187 59,94±2,77 (54,27-65,42)
N1 116
122
100
338
120
66
32
312
Arg16/Arg16 10 17,24±4,96 (8,59-29,43) 12 19,67±5,09 (10,6-31,84) 5 10±4,24 (3,33-21,81) 27 15,98±2,82 (10,8-22,39) 10 16,67±4,81 (8,29-28,52) 8 24,24±7,46 (11,09-42,26) 4 25±10,83 (7,27-52,38) 29 18,59±3,11 (12,82-25,59)
Genotypes Arg16/Gly16 23 39,66±6,42 (27,05-53,36) 33 54,1±6,38 (40,85-66,94) 23 46±7,05 (31,81-60,68) 79 46,75±3,84 (39,04-54,56) 27 45±6,42 (32,12-58,39) 15 45,45±8,67 (28,11-63,65) 7 43,75±12,4 (19,75-70,12) 67 42,95±3,96 (35,06-51,11)
N2 Gly16/Gly16 25 43,1±6,5 (30,16-56,77) 16 26,23±5,63 (15,8-39,07) 22 44±7,02 (29,99-58,75) 63 37,28±3,72 (29,97-45,04) 23 38,33±6,28 (26,07-51,79) 10 30,3±8 (15,59-48,71) 5 31,25±11,59 (11,02-58,66) 60 38,46±3,9 (30,79-46,58)
58
61
50
169
60
33
16
156
118
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al.
(46%) descent [40]. No statistically significant differences were observed in asthmatic subjects of Russian, Tatar and Bashkir ethnic origin compared with nonasthmatic individuals of the same ethnic origin (p>0,05). A tendency of ADRB2*Gly16 allele and ADRB2*Gly16 homozygotes frequencies increasing was revealed in patients with severe asthma compared with moderate asthma group. The higher frequency of the ADRB2*Gly16/*Gly16 homozygotes was revealed in the severe asthma group (45,31%) compared with moderate asthma (33,7%, χ2=2,15, p=0,11). Pharmacogenetic investigations demonstrated that homozygous genotype ADRB2*Gly16/*Gly16 carriers have less effective bronchodilator response than ADRB2*Arg16/*Arg16 genotype carriers after exogenously administrated β2- agonist therapy [43, 44] that may course cause severe asthma. In conclusion, we have found statistically significant differences between nonasthmatic patients of Tatar and Bashkir ethnic origin and an increased frequency of the ADRB2*Gly16/*Gly16 genotype in a group of severe asthmatic patients. This finding is consistent with other studies which found association of the ADRB2*Gly16/*Gly16 genotype with asthma severity. The analysis of the Gln27Glu polymorphism of the ADRB2 gene haven‘t revealed significant differences between total groups of asthma patients and controls from Bashkortostan Republic. The overall frequency of the most widespread ADRB2*Gln27 allele in subjects without asthma and asthmatic individuals was similar ((58,58% and 58,01%, respectively, (table 8)), that is highly consistent to that of European populations [40-42]. Statistically significant differences were also revealed between nonasthmatic subjects of Russian and Bashkir ethnic origin (χ2=8,23, p=0,019) and between controls of Tatar and Bashkir ethnic origin (χ2=7,18, p=0,027). The frequency of the heterozygous genotype ADRB2*Gln27/*Glu27 in Bashkirs (62%) was higher than in Russians (36,21%, χ2=7,16, p=0,0074) and Tatars (39,34%, χ2=5,64, p=0,017). The ADRB2*Glu27 allele was also found at a higher frequency in Bashkirs (49% of chromosomes) compared to Russians (38,79% χ2=2,28, p=0,13) and Tatars (37,7%, χ2=2,86, p=0,09). The analysis of allele and genotype frequencies revealed significant differences between Russian asthma patients and Russian controls: the frequency of the ADRB2*Gln27/*Glu27 genotype was higher in asthma patients (53,33%) than in healthy donors (36,21%), (OR=2,01 (0,964,2), χ2=3,5; p=0,054). In patients of Bashkir ethnic origin genotype ADRB2*Gln27/Gln27 was significantly prevalent (50%) compared to nonasthmatic subjects of the same ethnic origin (20%), OR=4,0 (1,2-13,28), χ2=5,5; p=0,019). When the examined groups were subdivided to focus on subgroups with different disease forms and severity, there was no statistically significant difference observed. Thus, the results of our investigation of the Gln27Glu polymorphism of the ADRB2 gene showed significant differences between nonasthmatic subjects of Russian and Bashkir ethnic origin as well as Tatar and Bashkir ethnic origin when allele and genotype frequencies were compared. The association of the ADRB2*Gln27/*Glu27 genotype with asthma was demonstrated in Russians, whereas the ADRB2*Gln27/Gln27 genotype was associated with the disease in Bashkirs. The analysis of genotype combinations of two polymorphisms of the ADRB2 gene showed differences between control subjects of Russian and Bashkir ethnic origin (χ2=12,68; p=0,068). The proportion of the heterozygous genotype combinations was prevalent in Bashkirs (36%) compared with Russians (18,97%, χ2=3,97; p=0,046) as well as combinations of *Gly16/*Gly16 and *Gln27/*Glu27- 26% versus 13,79% (χ2=2,55; p=0,11). The analysis of genotype combinations frequency found no differences between either overall group samples or ethnically subdivided groups of patients and controls.
Тable 8. Allele and genotype frequency distributions of Gln27Glu polymorphism of the ADRB2 gene in asthma patients and healthy donors Group
Control group
Russians
Tatars
Bashkirs Control group (in whole)
Asthma patients
Russians
Tatars
Bashkirs Asthma patients (in whole)
Alleles Gln27 71 61,21±4,52 (51,72-70,11) 76 62,3±4,39 (53,07-70,91) 51 51±5 (40,8-61,14) 198 58,58±2,68 (53,12-63,88) 68 56,67±4,52 (47,31-65,68) 39 59,09±6,05 (46,29-71,05) 23 71,88±7,95 (53,25-86,25) 181 58,01±2,79 (52,32-63,55)
Glu27 45 38,79±4,52 (29,89-48,28) 46 37,7±4,39 (29,09-46,93) 49 49±5 (38,86-59,2) 140 41,42±2,68 (36,12-46,88) 52 43,33±4,52 (34,32-52,69) 27 40,91±6,05 (28,95-53,71) 9 28,12±7,95 (13,75-46,75) 131 41,99±2,79 (36,45-47,68)
N1 116
122
100
338
120
66
32
312
Gln27/ Gln27 25 43,1±6,5 (30,16-56,77) 26 42,62±6,33 (30,04-55,94) 10 20±5,66 (10,03-33,72) 61 36,09±3,69 (28,86-43,83) 18 30±5,92 (18,85-43,21) 14 42,42±8,6 (25,48-60,78) 8 50±12,5 (24,65-75,35) 55 35,26±3,83 (27,79-43,3)
Genotype Gln27/Glu27 21 36,21±6,31 (23,99-49,88) 24 39,34±6,25 (27,07-52,69) 31 62±6,86 (47,17-75,35) 76 44,97±3,83 (37,32-52,8) 32 53,33±6,44 (40-66,33) 11 33,33±8,21 (17,96-51,83) 7 43,75±12,4 (19,75-70,12) 71 45,51±3,99 (37,53-53,67)
N2 Glu27/ Glu27 12 20,69±5,32 (11,17-33,35) 11 18,03±4,92 (9,36-29,98) 9 18±5,43 (8,58-31,44) 32 18,93±3,01 (13,33-25,67) 10 16,67±4,81 (8,29-28,52) 8 24,24±7,46 (11,09-42,26) 1 6,25±6,05 (0,16-30,23) 30 19,23±3,16 (13,37-26,3)
58
61
50
169
60
33
16
156
120
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al.
The results of the investigation of β2 –adrenergic receptor gene polymorphisms observed significant interethnic differences between healthy individuals of Tatar and Bashkir ethnic origin when allele and genotype frequency distributions of the Arg16Gly polymorphic variant were analyzed. The investigation of the Gln27Glu polymorphism of the ADRB2 gene showed statistically significant differences between nonasthmatic individuals of Russian and Bashkir, Tatar and Bashkir ethnic origins. The ADRB2*Gln27/*Glu27 genotype was shown to be associated with asthma in Russians OR=2,01 (0,96-4,2), genotype ADRB2*Gln27/Gln27 – in Bashkirs OR=4,0 (1,2-13,28). The trend of increasing of the ADRB2*Gly16/*Gly16 genotype frequency was found in patients with severe asthma.
ASSOCIATION ANALYSIS OF -159С>Т POLYMORPHISM OF CD14 GENE AND ASTHMA CD14 is the receptor for lipopolysaccaride and other bacterial wall-derived components and mainly expressed on the surface of macrophages, dendritic cells and neutrophils. Baldini et al. described a C-to-T single nucleotide polymorphism at position 159 in the promoter of CD14 in 1999. They found that children with TT genotype had significantly higher soluble CD14 (sCD14) and lower total IgE concentrations in serum when compared to individuals with the CT and CC genotypes [45]. The high level of CD14 in serum may result in Th1 immune response stimulation and low IgE level. This polymorphic marker was also found to be associated with skin test positivity to aeroallergens [46, 47], higher serum IgE [48, 49] and altered CD14 expression in asthmatic subjects in some populations [50]. To test whether the variant in the promoter of the CD14 gene relates to asthma in Bashkortostan Republic, we‘ve conducted a genetic association study, the results of which are shown in Table 9. Statistically significant differences between nonasthmatic subjects of Bashkir and Tatar ethnic origin were found in the genotype and allelic frequencies for -159С>Т polymorphism of the CD14 gene (χ2=4,5; p=0,034 и χ2=5,13; p=0,07). The CD14*С allele was present in 61% Bashkirs versus 46,72% Tatars and 56,9% Russians, the genotype CD14*С/*С – 34% versus 18,03% Tatars (χ2=3,71; p=0,053) and 29,3% Russians (χ2=0,27; p=0,6). No significant differences between asthma patients and healthy controls were found when allele and genotype frequencies were analylyzed. We also could not find any association between CD14 gene polymorphism and various ethnic groups examined. However, when the patients and controls were subdivided according their ethnic origin, we revealed differences between Russian patients and controls and Tatar patients and controls. The CD14*Т/*Т genotype was 2- fold increased in frequency in asthmatics of Russian ethnic origin compared with controls (30% versus 15,52%), OR=2,33 (0,94-5,73), χ2=3,5; p=0,051. Finally, the present study found significant interethnic differences in allele and genotype frequencies of the -159С>Т polymorphism of the CD14 gene between Tatar and Bashkir nonasthmatic individuals. Moreover, the CD14*Т/*Т genotype of this polymorphism was shown to be risk marker for asthma development in Russians.
Table 9. Allele and genotype frequency distributions of the -159С>T polymorphism of CD14 gene in asthma patients and healthy donors Group
Control group
Russians
Tatars
Bashkirs Control group (in whole)
Asthma patients
Russians
Tatars
Bashkirs Asthma patients (in whole)
Alleles С 66 56,9±4,6 (47,38-66,06) 57 46,72±4,52 (37,64-55,97) 61 61±4,88 (50,73-70,6) 184 54,44±2,71 (48,96-59,84) 60 50±4,56 (40,74-59,26) 34 51,52±6,15 (38,88-64,01) 21 65,62±8,4 (46,81-81,43) 171 54,81±2,82 (49,1-60,42)
Т 50 43,1±4,6 (33,94-52,62) 65 53,28±4,52 (44,03-62,36) 39 39±4,88 (29,4-49,27) 154 45,56±2,71 (40,16-51,04) 60 50±4,56 (40,74-59,26) 32 48,48±6,15 (35,99-61,12) 11 34,38±8,4 (18,57-53,19) 141 45,19±2,82 (39,58-50,9)
N1 116
122
100
338
120
66
32
312
СС 17 29,31±5,98 (18,09-42,73) 11 18,03±4,92 (9,36-29,98) 17 34±6,7 (21,21-48,77) 45 26,63±3,4 (20,13-33,96) 18 30±5,92 (18,85-43,21) 10 30,3±8 (15,59-48,71) 6 37,5±12,1 (15,2-64,57) 50 32,05±3,74 (24,81-39,9)
Генотипы СТ 32 55,17±6,53 (41,54-68,260 35 57,38±6,33 (44,06-69,96) 27 54±7,05 (39,32-68,19) 94 55,62±3,82 (47,79-63,25) 24 40±6,32 (27,56-53,46) 14 42,42±8,6 (25,48-60,78) 9 56,25±12,4 (29,88-80,25) 71 45,51±3,99 (37,53-53,67)
N2 ТТ 9 15,52±4,75 (7,35-27,42) 15 24,59±5,51 (14,46-37,29) 6 12±4,6 (4,53-24,31) 30 17,75±2,94 (12,31-24,36) 18 30±5,92 (18,85-43,21) 9 27,27±7,75 (13,3-45,52) 1 6,25±6,05 (0,16-30,23) 35 22,44±3,34 (16,15-29,8)
58
61
50
169
60
33
16
156
Table 10. Allele and genotype frequency distributions of 7575 G/A polymorphism of the ADAM33 gene in asthma patients and healthy donors Group
Control group
Russians
Tatars
Bashkirs Control group (in whole)
Asthma patients
Russians
Tatars
Bashkirs Asthma patients (in whole)
Аллели G 63 63+4,83 (52,76-72,44) 68 57,63+4,55 (48,19-66,67) 83 66,94+4,22 (57,92-75,12) 214 62,57+2,62 (57,21-67,72) 63 54,31+4,63 (44,81-63,59) 39 65+6,16 (51,6-76,87) 17 53,12+8,82 (34,74-70,91) 176 59,06+2,85 (53,24-64,7)
A 37 37+4,83 (27,56-47,24) 50 42,37+4,55 (33,33-51,81) 41 33,06+4,22 (24,88-42,08) 128 37,43+2,62 (32,28-42,79) 53 45,69+4,63 (36,41-55,19) 21 (35+6,16) (23,13-48,4) 15 46,88+8,82 (29,09-65,26) 122 40,94+2,85 (35,3-46,76)
N1 100
118
124
338
116
60
32
298
GG 17 34+6,7 (21,21-48,77) 16 27,12+5,79 (16,36-40,27) 29 46,77+6,34 (33,98-59,88) 62 36,26+3,68 (29,06-43,94) 15 25,86+5,75 (15,26-39,04) 12 40+8,94 (22,66-59,4) 4 25+10,83 (7,27-52,38) 49 32,89+3,85 (25,42-41,05)
Генотипы GA 29 58+6,98 (43,21-71,81) 36 61,02+6,35 (47,44-73,45) 25 40,32+6,23 (28,05-53,55) 90 52,63+3,82 (44,87-60,3) 33 56,9+6,5 (43,23-69,84) 15 50+9,13 (31,3-68,7) 9 56,25+12,4 (29,88-80,25) 78 52,35+4,09 (44,02-60,59)
N2 AA 4 8+3,84 (2,22-19,23) 7 11,86+4,21 (4,91-22,93) 8 12,9+4,26 (5,74-23,85) 19 11,11+2,4 (6,82-16,81) 10 17,24+4,96 (8,59-29,43) 3 10+5,48 (2,11-26,53) 3 18,75+9,76 (4,05-45,65) 22 14,77+2,91 (9,49-21,5)
50
59
62
171
58
30
16
149
Association of Candidate Genes Polymorphism with Asthma…
123
АSSOCIATION ANALYSIS OF THE 7575G>A POLYMORPHISM OF ADAM33 GENE WITH ASTHMA The gene, coding a disintegrin and metalloprotease domain 33 (ADAM33) and located on chromosome 20q13 has been identified as a susceptibility gene for asthma using a positional cloning strategy [51]. Gene expression studies indicate that ADAM33 is expressed in bronchial smooth muscle and other muscle tissues and its biological role includes myogenesis and bronchial hyperresponsiveness [51, 52]. Positive associations were demonstrated between SNPs of the ADAM33 and asthma in Afro-American, Hispanic, German, Korean and white populations [53]. Polymorphisms of the ADAM33 gene were subsequently associated with excess decline in lung function in asthma patients [54-55]. We failed to detect significant evidence of association to asthma with 7575G>A polymorphism of the 6 intron of the ADAM33 gene tested among the overall samples of astmathics and controls (Table 10). The heterozygous genotype ADAM33*G/*A was prevalent in both examined groups - 52,63% in controls and 52,35% in asthma patients. The ADAM33*G allele was revealed on 62,57% of chromosomes in control subjects and 59,06% - in asthma patients. The distributions of genotypes and alleles for this polymorphism showed significant differences between Tatar asthma patients and controls of the same ethnic origin: the ADAM33*G/*G genotype was overrepresented in asthmatics (40%) and underrepresented in healthy controls (27,12%), OR=1,79 (95%CI 0,7-4,5), χ2=1,53, р=0,1. The tendency towards association between ADAM33 polymorphism and asthma severity was observed. The frequency of ADAM33*G/*G genotype was increased in severe asthmatics (40,98%) compared to patients with moderate asthma (27,59%) (χ2=2,9 р=0,08). The results of the investigation showed the evidence of the increased ADAM33*G/*G genotype frequency in asthma patients of Tatar ethnic origin.
CONCLUSION The analysis of asthma susceptibility genes in Bashkortostan Republic have demonstrated significant genetic variation within ethnic groups in allele and genotype frequencies of the IL4, IL4RA, IL10, ADRB2, CD14 genes and genetic risk factors of the disease development. The genotypes IL4RA*Ile50/*Ile50, ADRB2*Gln27/*Glu27 and CD14*Т/*Т are shown to increase the risk of asthma in Russians. The IL4*Т allele of the IL4 gene has shown significant evidence of association with the disease in Tatar ethnic group. The genotype ADRB2*Gln27/*Gln27 is a risk factor for asthma development in Bashkirs. Moreover, genotypes IL4*С/*Т and IL4RA*Ile50/*Ile50 are associated with low level of lung function. Significant ethnic differences have also been demonstrated for risk factors of asthma development in various genes involved in the disease that emphasized importance and requirement of population genetics approach in such investigations. High prevalence and steady growth of asthma all over the world is a global medical and social problem, which still needs intense investigation and significant insights into understanding of asthma pathophysiology. The important goal of asthma research is to understand the genetic and environmental triggers and genetic markers of the increased and decreased risk for asthma development. Genetic studies may improve the understanding of asthma and lead to new methods to prevent, diagnose, and treat this disease.
124
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al.
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
[11]
[12]
[13]
[14] [15] [16] [17]
[18]
Global Strategy for asthma management and prevention (GINA)., Мoscow., Athmosphere, 2002, 160p. The WHO global programme. Report of a consultation to review progress and develop future activities. Geneva, World Health Organization, 2000. Chuchalin A.G. Manual on diagnostics, treatment and preventive maintenance of asthma. М., ООО «NTC KVAN», 2005, 37 p. Ober C., Hoffjan S. Asthma genetics 2006: the long and winding road to gene discovery. Genes and Immunity, 2006, v.7(2), pp.95-100. Cookson W, Moffatt M. Making sense of asthma genes. N. Engl. J. Med. 2004, v.351(17), pp.1794-1796. Gao P.S., Huang S.K. Genetic aspects of asthma. Panminerva Med., 2004, v.46, pp.121-134. Palmer LJ, Cookson WO. Genomic Approaches to Understanding Asthma. Genome Res. 2000, v.1.0(9), pp.1280-1287. Malerba G., Pignatti P.F. A review of asthma genetics: gene expression studies and recent candidates. J. Appl. Genet. 2005, v.46(1), pp. 93-104. Kere J, Laitinen T. Positionally cloned susceptibility genes in allergy and asthma. Curr. Opin. Immunol. 2004, v.16(6), pp.689-694. M. B. Freidin, V. P. Puzyrev, L. M. Ogorodova, O. A. Salyukova, E. M. Kamaltynova, I. M. Kulmanakova, and Yu. A. Petrovskaya Analysis of the Association between the T113M Polymorphism of the Human Interleukin 9 Gene and Bronchial Asthma. Russian Journal of Genetics. 2000, Vol. 36, No. 4, p. 453. M. B. Freidin, V. P. Puzyrev, L. M. Ogorodova, O. S. Kobyakova, and I. M. Kulmanakova Polymorphism of the Interleukin- and Interleukin Receptor Genes: Population Distribution and Association with Atopic Asthma. Russian Journal of Genetics, 2002, Vol. 38, No. 12, p. 1452. T. E. Ivaschenko, O. G. Sideleva, M. A. Petrova, T. E. Gembitskaya, A. V. Orlov, and V. S. Baranov Genetic Determinants of Predisposition to Bronchial Asthma. Russian Journal of Genetics. 2001, Vol. 37, No. 1, p. 94 . Ivaschenko T.E., Sideleva O.G., Baranov V.S. Glutathione- S-transferase micro and theta gene polymorphisms as new risk factors of atopic bronchial asthma. J. Mol. Med. 2002, v. 80(1), pp.39-43. Etkina I.A. Clinical and genetic associations in asthma children/ abstract of a thesis : 03.00.15, Ufa, 2000, 250p. Mathew C.C. The isolation of high molecular weight eucariotic DNA // Methods in molecular biology / Ed. Walker J.M. N.Y.; Haman press, 1984, v. 2, pp.31-34. StatSoft, Inc. (2001). STATISTICA (data analysis software system), version 6. www.statsoft.com. Marsh D. G., Neely J. D., Breazeale D. R. et al. Linkage analysis of IL4 and other chromosome 5q31.1 markers and total serum immunoglobulin E concentration. Science, 1994, v. 264, pp. 1152-1156. Daniels S. E., Bhattacharrya S., James A. et al. A genome-wide search for quantitative trait loci underlying asthma. Nature, 1996, v. 383, pp. 247-250.
Association of Candidate Genes Polymorphism with Asthma…
125
[19] Walley A.J., Wiltshire S., Ellis C.M., Cookson W.O. Linkage and allelic association of chromosome 5 cytokine cluster genetics markers with atopy and asthma associated traits. Genomics, 2001, v.72, pp.15-20. [20] Steinke J.W., Borish L. Th2 cytokines and asthma. Interleukin-4: its role in the pathogenesis of asthma, and targeting it for asthma treatment with interleukin-4 receptor antagonists. Respir. Res., 2001, v. 2, pp.66–70. [21] Rosenwasser L.J., Klemm D.J., Dresback J.K. et al. Promoter polymorphisms in the chromosome 5 gene cluster in asthma and atopy. Clin. Exp. Allergy. 1995, v. 25 (2), pp. 74-78. [22] Noguchi E., Shibasaki M., Arinami T. et al. Association of asthma and the interleukin-4 promoter gene in Japanese. Clin. Exp. Allergy. 1998, v. 28, pp.449-453. [23] Beghe B., Barton S., Rorke S. et al. Polymorphisms in the interleukin-4 and interleukin-4 receptor α chain genes confer susceptibility to asthma and atopy in a Caucasian populations. Clin. Exp. Allergy. 2003, v. 33, pp. 1111-1117. [24] Kabesch M, Tzotcheva I, Carr D, Hofler C, Weiland SK, Fritzsch C, von Mutius E, Martinez FD. A complete screening of the IL4 gene: novel polymorphisms and their association with asthma and IgE in childhood. J. Allergy Clin. Immunol. 2003, v. 112(5), pp. 893-898. [25] Burchard E.G., Silverman E.K., Rosenwasser L.J. et al. Association between a sequence variant in the IL-4 gene promoter and FEV(1) in asthma. Am. J. Respir. Crit. Care Med. 1999, v.160, pp. 919-922. [26] Sandford A.J., Chagani T., Zhu S. et al. Polymorphisms in the IL4, IL4RA, and FCERIB genes and asthma severity. J. Allergy Clin. Immunol. 2000, v.106(1 Pt 1), pp.135-140. [27] Ober C., Leavitt S., Tsalenko A. et al. Variation in the interleukin 4-receptor alpha gene confers susceptibility to asthma and atopy in ethnically diverse populations. Am. J. Hum. Genet., 2000, v. 66, pp. 517-526. [28] Hytonen A.M., Lowhagen O., Arvidsson M. et al. Haplotypes of the interleukin-4 receptor alpha chain gene associate with susceptibility to and severity of atopic asthma. Clin. Exp. Allergy. 2004, v. 34(10), pp.1570-1575. [29] Mitsuyasu H., Yanagihara Y., Mao X. et al. Cutting edge: dominant effect of Ile50Val variant of the human IL-4 receptor alpha-chain in IgE synthesis. J. Immunol. 1999, v.162(3), pp.1227-1231. [30] Shimbara A., Christodoulopoulos P., Soussi-Gounni A. et al. IL-9 and its receptor in allergic and nonallergic lung disease: increased expression in asthma. J. Allergy Clin. Immunol., 2002, v. 105, pp. 108-115. [31] Laitinen T., Kauppi P., Ignatius J. et al. Genetic control of serum IgE levels and asthma: linkage and linkage disequilibrium studies in an isolated population. Hum. Mol. Genet., 1997, v. 6 (12). pp.2069–2076. [32] Borish L., Aarons A., Rumbyrt J. et al. Interleukin-10 regulation in normal subjects and patients with asthma. J. Allergy Clin. Immunol. 1996, v. 97(6), pp. 1288-1296. [33] Hang L., Hsia T., Chen W. et al. Interleukin-10 Gene –627 Allele Variants, Not Interleukin-I Beta Gene and Receptor Antagonist Gene Polymorphisms, Are Associated With Atopic Bronchial Asthma. J. Clin. Lab. Anal. 2003, v. 17, pp. 168–173.
126
E. K. Khusnutdinova, A. S. Karunas, U. U. Fedorova et al.
[34] Lim S., Crawley E., Woo P., Barnes PJ. Haplotype associated with low interleukin-10 production in patients with severe asthma. Lancet. 1998, Jul 11; v.352(9122), pp.113117. [35] Gosset P., Tsicopoulos A., Wallaert B. et al. Increased secretion of TNF and IL-6 by alveolar macrophages consecutive to the development of the late asthmatic reaction. J. Allergy Clin. Immunol. 1991, v.88, pp. 561-571. [36] Cembrzynska-Nowak M., Szklarz E., Inglot A.D. et al. Elevated release of TNF and IFN by bronchoalveolar leukocytes from patients with bronchial asthma. An. Rev. Respir. Dis. 1993, v. 147, pp. 291-295. [37] Gao J., Shan G., Sun B. et al. Association between polymorphism of tumour necrosis factor {alpha}-308 gene promoter and asthma: a meta-analysis. Thorax. 2006, v. 61(6), pp. 466-471. [38] Bai TR. Abnormalities in airway smooth muscle in fatal asthma. A comparison between trachea and bronchus. Am. Rev. Respir. Dis. 1991, v. 143(2), pp.441-443. [39] Reihsaus E., Innis M., MacIntyre N., Liggett S.B. Mutations in gene encoding for the beta 2-adrenergic receptor in normal and asthmatic subjects. Am. J. Respir. Cell Mol. Biol., 1993, v. 8, pp. 334-339. [40] Contopoulos-Ioannidis D.G., Manoli E.N., Ioannidis J.P.A. Meta-analysis of the association of b2-adrenergic receptor polymorphisms with asthma phenotypes, J. Allergy Clin. Immunol. 2005, v. 115(5), pp. 963-972. [41] Holloway J.W., Dunbar P.R., Riley G.A. et al. Association of beta2-adrenergic receptor polymorphisms with severe asthma. Clin. Exp. Allergy. 2000, v.30, pp. 1097-1103. [42] Ulbrecht M., Hergeth M.T., Wjst M. et al. Association of beta(2)-adrenoreceptor variants with bronchial hyperresponsiveness. Am. J. Respir. Crit. Care Med. 2000, v.161, pp. 469-474. [43] Palmer L.J., Silverman E.S., Weiss S.T., Drazen J.M. Pharmacogenetics of Asthma. Am. J. Respir. Crit. Care Med., 2002, v. 165(7), pp. 861-866. [44] Cho S.H., Oh S.Y., Bahn J.W. et al. Association between bronchodilating response to short-acting beta-agonist and non-synonymous single-nucleotide polymorphisms of beta-adrenoceptor gene. Clin. Exp. Allergy, 2005, v. 35(9), pp. 1162-1167. [45] Baldini M., Lohman I.C., Halonen M. et al. A Polymorphism in the 5' flanking region of the CD14 gene is associated with circulating soluble CD14 levels and with total serum immunoglobulin E. Am. J. Respir. Cell Mol. Biol., 1999, v. 20, pp. 976-983. [46] Koppelman G.H., Reijmerink N.E., Colin Stine O. et al. Association of a promoter polymorphism of the CD14 gene and atopy. Am. J. Respir. Crit. Care Med., 2001, v. 163, pp. 965-969. [47] Buckova D., Holla L.I., Schuller M. CD14 promoter polymorphisms and atopic phenotypes in Czech patients with IgE-mediated allergy. Allergy, 2003, v. 58(10), pp. 1023-1026. [48] Gao P.S., Mao X.Q., Baldini M. et al. Serum total IgE levels and CD14 on chromosome 5q31. Clin. Genet., 1999, v.56. pp.164–165. [49] Sharma M., Batra J., Mabalirajan U. et al. Suggestive evidence of association of C159T functional polymorphism of the CD14 gene with atopic asthma in northern and northwestern Indian populations. Immunogenetics, 2004, v.56(7). pp. 544-547. [50] Zdolsek H.A., Jenmalm M.C.. Reduced levels of soluble CD14 in atopic children. Clin. Exp. Allergy, 2004, v. 34(4), pp. 532-539.
Association of Candidate Genes Polymorphism with Asthma…
127
[51] Van Eerdewegh P., Little R. D., Dupuis J. et al. Association of the ADAM33 gene with asthma and bronchial hyperresponsiveness. Nature, 2002, v.418, pp.426-430. [52] Shapiro S.D., Owen C.A. ADAM-33 surfaces as an asthma gene. N. Engl. J. Med., 2002, v. 347(12), pp. 936-938. [53] Lee J.H., Park H.S., Park S.W. et al. ADAM33 polymorphism: association with bronchial hyper-responsiveness in Korean asthmatics. Clin. Exp. Allergy, 2004, v.34, pp. 860–865. [54] Jongepier H., Boezen H.M., Dijkstra A. et al. Polymorphisms of the ADAM33 gene are associated with accelerated lung function decline in asthma. Clin. Exp. Allergy, 2004, v. 34(5), pp. 757-760. [55] Van Diemen C.C., Postma D.S.et al. A disintegrin and metalloprotease 33 polymorphisms and lung function decline in the general population. Am. J. Respir. Crit. Care Med., 2005, v. 172(3), pp. 329-333. [56] Kauppi P., Lindblad-Toh K., Sevon P. et al. A second-generation association study of the 5q31 cytokine gene cluster and the interleukin-4 receptor in asthma. Genomics, 2001, v. 77 (1-2), pp.35-42. [57] Karplus T.M., Jeronimo S. M., Chang H. et al. Association between the Tumor Necrosis Factor locus and the Clinical outcome the Leishmania chagasi Infection. Infection and Immunity, 2002, v. 70 (12), pp. 6919-6925. [58] Cheng L., Enomoto T., Hirotaw T. et al. Polymorphisms in ADAM33 are associated with allergic rhinitis due to Japanese cedar pollen. Clin. Exp. Allergy, 2004, v. 34, pp. 1192–1201.
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev, G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 4
GENES AND LANGUAGES: ARE THERE CORRELATIONS BETWEEN MTDNA DATA AND GEOGRAPHY OF ALTAY AND URAL LANGUAGES E. Khusnutdinova and I. Kutuev Institute of Biochemistry and Genetics of Ufa Science Centre of Russian Academy of Sciences, Russia
INTRODUCTION Correlation between social and biological features in humans is one the most interesting questions in anthropology. Biological features are inherited, but social features could even change during the lifespan. Even Darwin was interested in possible correlations between inherited features and languages, saying that if we create the human genealogy and group human races, this will let us to make the best language classification [1]. Cavalli-Sforza was one of the first investigators who dedicated his work to researches to analysis of correlations between genes and languages. He inspired many investigators to follow this way [2,3, 4, 5, 6]. He noticed that as linguistic so and genetic evolutions are quite similar in their nature and they are consecutive divergence. After the division of two populations, differentiation of languages and genes starts. Of course, the speed of these different evolutions are different but they should be correlated [6-9]. Many researchers have not accepted correlation between genes and languages and as an example they pointed to Turkic speakers, in which language interrelations are really doesn‘t correlate with races. Among Turkic speakers, there are as Caucasian (Turks, Gagauzes, Azeris) so mongoloid (Yakuts, Dolgans, Tuvinians, Tofalars) populations, as well as mixed populations (Turkmens, Uzbeks, Kirgizis, Kazakhs, Karakalpaks). But in this case, it should be noticed that in early human evolution, time correlation between language and the race was much stronger, and further migrations, leading to admixture erased this these signals [10, 11]. Altaic speakers have underwent strong race transformation. Most of the Altaic speaking people were Caucasians as well as other people belonging to Nostratic language. But many people moving from west to east were assimilated by local mongoloid peoples. South-western
130
E. Khusnutdinova and I. Kutuev
Turkic speakers are still Caucasoid. The ancestors of Mongol and Tungus speakers as well as ancestors of Koreans and Japanese moving eastward haves lost their Caucasoid features. The same concerns Eskimo and Aleut [10, 11]. Some people of Ural language family gained mongoloid features in a various extent. Most of the Uralic speakers are Caucasoid;, only Khants, Mansis and Yukagirs are mongoloids [10, 11].
PHYLOGENETIC ANALYSIS OF MTDNA LINEAGES During many centuries population admixture, huge migrations, assimilations and even extinctions of many tribes and peoples took place in Eurasian steppe belt. Ethnogenesis of people inhabiting this region is a result of admixture of peoples from Europe and Asia. Two waves of migrations faced here. The steppe belt during long time was a place where many modern people have been formed, where populations of different origin and culture interacted with each other (Ugric people of Siberia, Finns of Eastern Europe, people of Near East, Turkic speakers of South Siberia and Altay, Slavic peoples of Eastern and Western Europe and nomadic Mongols) [11]. The traces of all of these demographic processes are imprinted in genes of modern people inhabiting the Eurasian steppe belt. Modern populations of the Eurasian steppe belt are very diverse as in language so in physical anthropological types. The region is inhabited by Uralic, Altaic and Indo-European speaking people. Modern population genetic researches are based on analysis of polymorphic markers, which nowadays have very high resolution and are powerful tool for analysis of demographic processes in populations, reconstruction of admixture and migrations which took place in the past. The most powerful tools are mtDNA and Y chromosome analysis. Unlike nuclear DNA, which is inherited from both parents and in which genes are rearranged in the process of recombination, there is usually no change in mtDNA from parent to offspring. Although mtDNA also recombines, it does so with copies of itself within the same mitochondrion. Because of this and because the mutation rate of animal mtDNA is higher than that of nuclear DNA, mtDNA is a powerful tool for tracking ancestry through females (matrilineage) and has been used in this role to track the ancestry of many species back hundreds of generations. Human mtDNA can be used to identify individuals. mtDNA contains 37 genes, all of which are essential for normal mitochondrial function. Thirteen of these genes provide instructions for making enzymes involved in oxidative phosphorylation. The remaining genes provide instructions for making molecules called transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs), which are chemical cousins of DNA. These types of RNA help assemble protein building blocks (amino acids) into functioning proteins [12]. In sexually reproducing organisms, mitochondria are normally inherited exclusively from the mother. The fact that mitochondrial DNA is maternally inherited enables researchers to trace maternal lineage far back in time. (Y chromosomal DNA, paternally inherited, is used in an analogous way to trace the agnate lineage.) This is accomplished in humans by sequencing one or more of the hypervariable control regions (HVR1 or HVR2) of the mitochondrial DNA. HVR1 consists of about 440 base pairs. These 440 base pairs are then compared to the control regions of other individuals (either specific people or subjects in a database) to determine
Genes and Languages
131
maternal lineage. The concept of the Mitochondrial Eve is based on the same type of analysis, attempting to discover the origin of humanity by tracking the lineage back in time [13]. Because the base sequence of animal mtDNA changes rapidly, it is useful for assessing genetic relationships of individuals or groups within a species and also for identifying and quantifying the phylogeny (evolutionary relationships) among different species, provided they are not too distantly related. To do this, biologists determine and then compare the mtDNA sequences from different individuals or species. Data from the comparisons is used to construct a network of relationships among the sequences, which provides an estimate of the relationships among the individuals or species from which the mtDNAs were taken. This approach has limits that are imposed by the rate of mtDNA sequence change. In animals, the rapid rate of change makes mtDNA most useful for comparisons of individuals within species and for comparisons of species that are closely or moderately -closely related, among which the number of sequence differences can be easily counted. As the species become more distantly related, the number of sequence differences becomes very large; changes begin to accumulate on changes until an accurate count becomes impossible. Because mtDNA is not highly conserved and has a rapid mutation rate, it is useful for studying the evolutionary relationships— - phylogeny— - of organisms. Biologists can determine and then compare mtDNA sequences among different species and use the comparisons to build an evolutionary tree for the species examined [14-25]. At the moment, there is a huge data set about variability of mtDNA in many human populations [20, 26-49]. At the same time, many populations of Russia with its complex genetic structure and very high linguistic diversity are not involved in these studies. AtBy the moment, some data on Russian and some Siberian populations have been acquired. Populations of the Volga-Ural region, Central Asia and the Caucasus are still a white spot in the gene geography studies. Analysis of mtDNA HVSI in combination with coding region analysis is a very effective and reliable approach for investigation of these regions for understanding and reconstruction historical events which lead to up-to-date genetic landscape of the regions. Due to the complex history and location at the border of Europe and Asia the North Caucasus, Volga-Ural region and Central Asia are the most interesting regions for analysis of populations living there. Two waves of migrations meet here— – from Europe and from Asia. Located between Europe and Asia Volga-Ural region during all the historical time was the place of interaction of many peoples and tribes [50, 51]. The North Caucasus is situated between the Caspian and Black sea. With more than 50 distinct peoples and dozens of distinct languages, the Caucasus is one of the most complex linguistic and ethnic regions in the world. The Caucasus is one of the most important migrations corridors from Africa to Eurasia [52]. Central Asia is also a complex region at the border between Europe and Asia. Numerous anthropological, ethnographical studies of Central Asian populations demonstrated their close relationships with populations of Volga-Ural region, especially with Turkic -speaking Bashkirs [51].
MATERIALS AND METHODS Blood samples were collected in Volga-Ural region, the Caucasus and Central Asia from healthy unrelated individuals after obtaining informed consent in 1993-2004 (table 1). DNA was extracted using the phenol-chloroform method [53].
132
E. Khusnutdinova and I. Kutuev
HVS-I was sequenced between nucleotide positions (nps) 16024 and 16400 of the revised Cambridge Reference Sequence in all the DNA samples [12, 54]. RFLP analysis of diagnostic mtDNA positions was performed, and mtDNA haplogroups were assigned to each sample by use of published criteria [27, 28, 32, 33, 42, 55- 58]. Factor analysis has been performed in Statistica v.7 [59].
RESULTS AND DISCUSSION Most of lineages revealed in investigated populations (80%) belongs to western Eurasian haplogroups H, I, J, K, T, U, V, W and X [27, 28, 42, 55, 57, 60-64] (figure 1). The rest of the samples belongs to eastern Eurasian lineages [33, 44, 65-68].
Africa
Eastern Eurasia C
D L2 M1
L1
Z M
U6
E
L3 G A I Y
N
X
B
W K
R U
F J T
H
V
Western Eurasia
Figure 1. Phylogenetic tree of mtDNA clusters.
Figure 1. Phylogenetic tree of mtDNA clusters.
Geographic analysis of haplogroups frequencies in Altaic- speaking populations revealed a gradient of increasing of frequency of eastern Eurasian lineages from west to east. The frequency of eastern Eurasian mtDNA lineages varies from 1% in Gagauzes to 99% in Yakuts and Dolgans. We haven‘t revealed any correlations between mtDNA haplogroups frequency and geographical distribution of Turkic languages. Moreover, it is quite obvious that
Genes and Languages
133
linguistic affiliations of populations (concerning, for example, language subgroups) plays much less role than geography (figure 2).
Figure 2. Maternal lineages of western and eastern Eurasian origin among 18 Turkic speaking populations.
The same concerns Uralic speakers. The frequency of eastern Eurasian lineages varies from 0% in Estonians to 80% in Nganasans. The only exceptions are Khants, Mansis and Selkups, in which relatively high frequency of common western Eurasian lineages are revealed (60-70%). We found high frequency of U4 and low frequency of W haplogroups, what which is typical for populations of Volga-Ural region. This data demonstrate most likely geneflowgene flow from west to east, but not from east to west [69-71]. One of the advantages of mtDNA usage is coalescence time tool. A useful analysis based on coalescence theory seeks to predict the amount of time elapsed between the introduction of a mutation and a particular allele or gene distribution in a population. This time period is equal to how long ago the most recent common ancestor existed. Under conditions of genetic drift alone, every finite set of genes or alleles has a "coalescent point" at which all descendants converge to a single ancestor (i.e., they ―'coalesce'‖). This fact can be used to derive the rate of gene fixation of a neutral allele (that is, one not under any form of selection) for a population of varying size (provided that it is finite and nonzero). Because the effect of natural selection is stipulated to be negligible, the probability at any given time that an allele will ultimately become fixed at its locus is simply its frequency p in the population at that time.
134
E. Khusnutdinova and I. Kutuev
Coalescence time for H haplogroup in Volga-Ural region is 20,036±4,250 what which is related to population re-expansion time in Urals after the glacial maximum. Coalescence time for J1 and T1 haplogroups in Caucasus populations were 30, 000 and 20, 000 years correspondingly whatwhich is significantly— – several times— - exceeds the boundary of Holocene. Thus, there are reasons to believe that these two ―marker‖ haplotypes have been a part of the Caucasus gene pool far earlier than the Neolithic period began. The other explanation is that the Neolithic influx from the Upper Mesopotamia to the Caucasus was so massive that it carried with it much or the pre-existing diversity. The principal component analysis (figure 4) based on mtDNA haplogroups frequency in population of Volga-Ural region, the North Caucasus and Central Asia explains 52.8% of haplogroups frequency variability. The results obtained correspond with the east-west gradient of mtDNA haplogroups frequency alongside the Eurasian steppe belt.
Figure 3. Maternal lineages of western and eastern Eurasian origin among 16 Uralic speaking populations.
Populations on the plot are clustered together according to geography but not due to linguistic affiliation of the investigated populations. Close location of Nogays and Bashkirs on the plot could be explained by high percentage of eastern Eurasian lineages in both populations. High frequency of eastern Eurasian lineages in Nogays is not surprising since they moved and settled in Caucasus quite recently [69]. Analysis of three principal components (64.9%) hasn‘t revealed major changes (figure 5). Udmurts due to new dimension in the projection of the 3 rd component are located quite far
Genes and Languages
135
from other populations. This could be explained by relatively high frequency of haplogroup T in this population (0.238). The majority of mtDNA haplogroups revealed in populations of Volga-Ural region have western Eurasian origin. In European populations, frequency of eastern Eurasian origin is the highest in Eastern Europe. High frequency of G, D, C, Z, and F in Turkic -speaking Bashkirs, Uralic -speaking Udmurts and Permian Komis demonstrates gene flow from Siberia and Central Asia to Volga-Ural region [72].
Figure 4. Principal component analysis (2 dimensions) of mtDNA haplogroup frequencies in populations of Volga-Ural, Central Asia and the Caucasus.
Karachays Kumyks Chuvashes Maris Tatars Nogays
Komi Syrian
Mordvinians Bashkirs
PC3 (12.0%)
Uighurs Kazakhs Uzbeks
Komi Permyan
Udmurts
PC2 (22.3%) PC1 (30.6%)
Figure 5. Principal component analysis (3 dimensions) of mtDNA haplogroup frequencies in populations of Volga-Ural, Central Asia and the Caucasus.
136
E. Khusnutdinova and I. Kutuev
Bashkirs from Perm oblast of Russia have high percent of F (11.1%), D (13.9%) и G2a (6.9%) haplogroups [69]. These haplogroups are typical for populations of Central Asia [65, 67]. This fact let us to suggest that in ethnogenesis of this subpopulation Bulgars, Ugric [73], and Central Asia [74] people played major role. The interesting fact revealed is high frequency of western Eurasian lineages in Uighurs (~45%). It‘s noticeable still this population is surrounded by Mongol, Chinese, Kirgizis and Altays, in which frequency of western Eurasian lineages are less than 15% [46, 49, 67, 75, 76]. Detailed analysis revealed high frequency of typical Anatolian, Iranian and South Caucasus lineages what which let us to suggest that at least part of genetic pool of modern Uighurs are common for Indo-Iranian nomads of Neolithic time. It is quite obvious now that so-called protoaltaic or protouralic genetic substrate doesn‘t exist. Even low -level resolution based on haplogroup frequency data demonstrate great differences between the most western (Gagauz) and eastern (Dolgans) Turkic speakers. In case of analysis of all the populations inhabiting Eurasia (belonging to other language families), the genetic landscape doesn‘t change drastically. This means that the modern genetic landscape is formed generally due to demographic processes in populations inhabiting this vast area. Altaic and Uralic speakers inhabiting European part of the continent are characterized by high frequency of western Eurasian lineages; those who inhabit Asian part are characterized by high frequency of eastern Eurasian lineages. The only exceptions are Kalmyks and Nogays living in the North Caucasus [77, 78]. Low level of eastern Eurasian mtDNA lineages in Gagauzes, Turks, Azeris and Kumyks supports the hypothesis about recent expansion of Turkic languages to west. At the same time, it‘s possible that genetic pool of prototurkic people consisted mostly of western Eurasian lineages and subsequent admixture with populations rich of eastern Eurasian lineages lead to drastic increasing of latter in Turkic speaking populations living in Asian part of Eurasia. There are a couple of exceptions within the existent west-east gradient of mtDNA haplogroups frequency change. One of these exceptions is Nogays inhabiting northern part of Dagestan and KabardinoBalkariya. The frequency of eastern Eurasian haplogroups in Nogays is up to 40%, but the neighboring populations (Kumyks, Karachays, and Balkars) have them at frequency lower than 7% [79]. This fact can be explained by history of Nogays which are remnants of the Nogay Horde whatwhich compredcompared Turkic, Ugric and Mongol tribes. Nogays formed as an ethnicity quite recently (XIV-XV) [78]. The other exception is Kalmyks, people who came to the North Caucasus some 3 three centuries ago [77]. The interesting fact that mtDNA lineages revealed in the Caucasus populations don‘t belong to A clade which frequency in Nogays and Kalmyks are up to 6% [79]. This fact demonstrates that eastern Eurasian lineages penetrated into European part of the continent due to mass migrations of Mongols. At the same time, the admixture of nomadic Mongols with autochthonous populations of the Caucasus was insignificant. Another exception among Uralic speakers are Khantis, Mansis and Selkups in which up to 70% of western Eurasian lineages are revealed [80]. This observation is not the result of admixture with Russians. The detailed analysis of haplogroup spectra demonstrates typical pattern for Finno-Ugric populations. This fact displays recent mass circumpolar migrations what which is also supported by the phylogeography of Y chromosomal haplogroup N3 [81-
Genes and Languages
137
83]. Frequency of eastern Eurasian lineages in the most western Finnic Baltic populations is less than 1% [71, 84, 85]. In modern shape of genetic landscape, the major role played geographic locations of the populations and demographic processes in them but not linguistic or cultural barriers [69]. This means that there is much more common between Uralic and Altaic speakers inhabiting the same region (Volga-Ural), than between Uralic speakers from distant regions of Eurasia. Similar results have been obtained earlier on other populations [71, 85-87].
CONCLUSION Modern shape of ethnical landscape of Eurasian steppe belt is diverse and formed during several thousand years. Comings and goings, assimilations, admixture and migrations of numerous tribes formed this heterogeneous picture of up-to-date region. Apriori we expected that those people who speak the language belonging to the same language family or its subgroups should have at least slight common genetic pattern. In current research concerning Altaic and Uralic speakers, we haven‘t found any common patterns among speakers in the same language family or the same language group. We conclude that geography rather than genetic affiliation plays a major role in genetic relationships of Altaic- and Uralic -speaking populations.
REFERENCES [1] [2] [3]
[4] [5]
[6] [7] [8]
Darwin. The origin of species. London, John Murray, 1859. Barbujani G., Sokal RR. Zones of sharp genetic change in Europe are also linguistic boundaries. Proc. Natl. Acad. Sci. U. S.A. 1990, vol. 87, № 5, pp. 1816-9. Villems R., Adojaan M., Kivisild T., Metspalu E., Parik J., Pielberg G., Rootsi S., Tambets K., Tolk HV. Reconstruction of maternal lineages of Finno-Ugric speaking people and some remarks on their paternal inheritance. In: WiikKJulkuK, editors. The roots of peoples and languages of Northern Eurasia I. Turku: Societas Historiae FennoUgricae; 1998. P., 180-200. Diamond J., Bellwood P., Farmers and their languages: the first expansions. Science, 2003, vol. 300, № 5619, pp. 597-603. Arnaiz Villena A., Martinez Laso J., Alonso Garcia J., The correlation between languages and genes: the Usko-Mediterranean peoples. Hum. Immunol. 2001, vol. 62, № 9, pp. 1051-61. Sokal RR., Genetic, geographic, and linguistic distances in Europe. Proc. Natl. Acad. Sci. U.S.A. 1988, vol. 85, № 5, pp. 1722-6. Cavalli-Sforza L., L Menozzi P., Piazza A. The history and geography of human genes. Princeton, N.J., Princeton University Press, 1994, xi, 541, 518 p. c. Cavalli-Sforza LL., Genes, peoples, and languages. Proc. Natl. Acad. Sci. U.S.A. 1997, vol. 94, № 15, pp. 7719-7724.
138 [9]
[10] [11] [12]
[13]
[14] [15]
[16]
[17]
[18] [19]
[20] [21]
[22]
[23]
[24]
E. Khusnutdinova and I. Kutuev Cavalli-Sforza LL., Piazza A., Menozzi P., Mountain J., Reconstruction of human evolution: bringing together genetic, archaeological, and linguistic data. Proc. Natl. Acad. Sci. U.S.A. 1988, vol. 85, № 16, pp. 6002-6. Puchkov P.I. Divergence of languages and the problem of correlation between languages and reaces. In: Peoples and religions of the world. Moscow, 1998. Tishkov V.A. Peoples and religions of the world. Moscow. The Big Russian Encyclopedia, 1998. Anderson S., Bankier A., T Barrell B., G Bruijn de MH., Coulson AR., Drouin J., Eperon IC., Nierlich DP., Roe BA., Sanger F., Schreier P.H., Smith AJ., Staden R., Young IG., Sequence and organization of the human mitochondrial genome. Nature. 1981, vol. 290, № 5806, pp. 457-65. Horai S., Evolution and the origins of man: clues from complete sequences of hominoid mitochondrial DNA. Southeast Asian J., Trop. Med. Public Health. 1995, vol. 26, № Suppl 1, pp. 146-54. Giles RE., Blanc H., Cann HM., Wallace DC., Maternal inheritance of human mitochondrial DNA. Proc. Natl. Acad. Sci. U.S.A. 1980, vol. 77, № 11, pp. 6715-9. Ward RH., Frazier B., Dew Jager K., Paabo S., Extensive mitochondrial diversity within A single Amerindian tribe. Proc. Natl. Acad. Sci. U.S.A. 1991, vol. 88, №, pp. 8720-8724. Torroni A., Sukernik R., I., Schurr T., G., Starikorskaya Y., B., Cabell M., F., Crawford M., H., Comuzzie A., G Wallace DC mtDNA variation of aboriginal Siberians reveals distinct genetic affinities with Native Americans. Am. J. Hum. Genet. 1993, vol. 53, № 3, pp. 591-608. Wallace DC., Mitotic segregation of mitochondrial DNAs in human cell hybrids and expression of chloramphenicol resistance. Somat. Cell Mol. Genet. 1986, vol. 12, № 1, pp. 41-9. Wallace DC., Structure and evolution of organelle genomes. Microbiol. Rev. 1982, vol. 46, № 2, pp. 208-40. Ashley R., Peterson E., Abbo H., Gold D., Corey L., Comparison of monoclonal antibodies for rapid detection of cytomegalovirus in spin-amplified plate cultures. J. Clin. Microbiol. 1989, vol. 27, № 12, pp. 2858-60. Forster P., Ice Ages and the mitochondrial DNA chronology of human dispersals: A., review. Philos Trans R., Soc. Lond B., Biol. Sci. 2004, vol. 359, № 1442, pp. 255-264. Excoffier L., Evolution of human mitochondrial DNA: evidence for departure from A., pure neutral model of populations at equilibrium. J. Mol. Evol. 1990, vol. 30, № 2, pp. 125-39. Mishmar D., Ruiz Pesini E., Golik P., Macaulay V., Clark AG., Hosseini S., Brandon M., Easley K., Chen E., Brown M., D., Sukernik R., I., Olckers A., Wallace DC., Natural selection shaped regional mtDNA variation in humans. Proc. Natl. Acad. Sci. U.S.A. 2003, vol. 100, № 1, pp. 171-176. Kivisild T., Shen P., Wall DP., Do B., Sung R., Davis KK., Passarino G., Underhill PA., Scharfe C., Torroni A., Scozzari R., Modiano D., Coppa A., Knjiff de P., Feldman MW., Cavalli-Sforza LL., Oefner PJ., The role of selection in the evolution of human mitochondrial genomes. Genetics. 2005, vol., №, pp. Wallace DC., Brown MD., Lott MT., Mitochondrial DNA variation in human evolution and disease. Gene. 1999, vol. 238, № 1, pp. 211-30.
Genes and Languages
139
[25] Horai S., Hayasaka K., Intraspecific nucleotide sequence differences in the major noncoding region of human mitochondrial DNA. Am. J. Hum. Genet. 1990, vol. 46, № 4, pp. 828-42. [26] Kivisild T., Rootsi S., Metspalu M., Mastana S., Kaldma K., Parik J., Metspalu E., Adojaan M., Tolk HV., Stepanov V., Gцlge M., Usanga E., Papiha SS., Cinnioglu C., King R., Cavalli-Sforza L., Underhill PA., Villems R., The genetic heritage of the earliest settlers persists both in Indian tribal and caste populations. Am. J. Hum. Genet. 2003, vol. 72, №, pp. 313-332. [27] Achilli A., Rengo C., Battaglia V., Pala M., Olivieri A., Fornarino S., Magri C., Scozzari R., Babudri N., Santachiara Benerecetti AS., Bandelt HJ., Semino O., Torroni A., Saami and berbers--an unexpected mitochondrial DNA link. Am. J. Hum. Genet. 2005, vol. 76, № 5, pp. 883-886. [28] Achilli A., Rengo C., Magri C., Battaglia V., Olivieri A., Scozzari R., Cruciani F., Zeviani M., Briem E., Carelli V., Moral P., Dugoujon JM., Roostalu U., Loogvali EL., Kivisild T., Bandelt HJ., Richards M., Villems R., Santachiara Benerecetti AS., Semino O., Torroni A., The molecular dissection of mtDNA haplogroup H., confirms that the Franco-Cantabrian glacial refuge was A., major source for the European gene pool. Am. J. Hum. Genet. 2004, vol. 75, № 5, pp. 910-8. [29] Cann RL., Stoneking M., Wilson AC., Mitochondrial DNA and human evolution. Nature. 1987, vol. 325, № 6099, pp. 31-6. [30] Derbeneva OA., Starikovskaia EB., Volod'ko NV., Wallace DC., Sukernik RI., [Mitochondrial DNA variation in Kets and Nganasans and the early peoples of Northern Eurasia]. Genetika. 2002, vol. 38, № 11, pp. 1554-60. [31] Derenko MV., Grzybowski T., Malyarchuk BA., Dambueva IK., Denisova GA., Czarny J., Dorzhu CM., Kakpakov VT., Miscicka Sliwka D., Wozniak M., Zakharov IA., Diversity of mitochondrial DNA lineages in South Siberia. Ann. Hum. Genet. 2003, vol. 67, № 5, pp. 391-411. [32] Finnila S., Lehtonen MS., Majamaa K Phylogenetic network for European mtDNA. Am J. Hum. Genet. 2001, vol. 68, № 6, pp. 1475-1484. [33] Kivisild T., Helle-Viivi T., Parik J., Yiming WS., Surinder SP., Bandelt HS., Villems R., The emerging limbs and twigs of the East Asian mtDNA tree. Mol. Biol. Evol. 2002, vol. 19, № 10, pp. 1737-1751 (erratum 20:162). [34] Kong QP., Yao YG., Sun C., Bandelt HJ., Zhu CL., Zhang YP., Phylogeny of East Asian mitochondrial DNA lineages inferred from complete sequences. Am. J. Hum. Genet. 2003, vol. 73, № 3, pp. 671-676. [35] Loogvali EL., Roostalu U., Malyarchuk BA., Derenko MV., Kivisild T., Metspalu E., Tambets K., Reidla M., Tolk HV., Parik J., Pennarun E., Laos S., Lunkina A., Golubenko M., Barac L., Pericic M., Balanovsky OP., Gusar V., Khusnutdinova EK., Stepanov V., Puzyrev V., Rudan P., Balanovska EV., Grechanina E., Richard C., Moisan JP., Chaventre A., Anagnou NP., Pappa KI., Michalodimitrakis EN., Claustres M., Golge M., Mikerezi I., Usanga E., Villems R., Disuniting uniformity: A pied cladistic canvas of mtDNA haplogroup H., in Eurasia. Mol. Biol. Evol. 2004, vol. 21, № 11, pp. 2012-21. [36] Macaulay V., Hill C., Achilli A., Rengo C., Clarke D., Meehan W., Blackburn J., Semino O., Scozzari R., Cruciani F., Taha A. ,Shaari NK., Raja JM., Ismail P., Zainuddin Z., Goodwin W., Bulbeck D., Bandelt HJ., Oppenheimer S., Torroni A.,
140
[37]
[38]
[39]
[40]
[41]
[42]
[43] [44]
[45]
E. Khusnutdinova and I. Kutuev Richards M., Single, rapid coastal settlement of Asia revealed by analysis of complete mitochondrial genomes. Science. 2005, vol. 308, № 5724, pp. 1034-6. Malyarchuk BA., Derenko MV., Mitochondrial DNA variability in Russians and Ukrainians: Implications to the origin of the Eastern Slavs. Ann. Hum. Genet. 2001, vol. 65, № 1, pp. 63-78. Malyarchuk BA., Grzybowski T., Derenko MV., Czarny J., Drobnic K., Miscicka Sliwka D., Mitochondrial DNA variability in Bosnians and Slovenians. Ann. Hum. Genet. 2003, vol. 67, № Pt 5, pp. 412-25. Metspalu M., Kivisild T., Metspalu E., Parik J., Hudjashov G., Kaldma K., Serk P., Karmin M., Behar DM., Gilbert MTP., Endicott P., Mastana S., Papiha SS., Skorecki K., Torroni A., Villems R., Most of the extant mtDNA boundaries in South and Southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans. BMC Genet. 2004, vol. 5, № 1, pp. 26. Pakendorf B., Wiebe V., Tarskaia LA., Spitsyn VA., Soodyall H., Rodewald A., Stoneking M., Mitochondrial DNA evidence for admixed origins of central Siberian populations. Am. J. Phys. Anthropol. 2003, vol. 120, № 3, pp. 211-24. Quintana Murci L., Chaix R., Wells S., Behar D., Sayar H., Scozzari R., Rengo C., Al Zahery N., Semino O., Santachiara Benerecetti AS., Coppa A., Ayub Q., Mohyuddin A., Tyler Smith C., Mehdi Q., Torroni A., McElreaveyK Where West meets East: The complex mtDNA landscape of the Southwest and Central Asian corridor. Am. J. Hum. Genet. 2004, vol. 74, №, pp. 827-845. Richards M., Macaulay V., Hickey E., Vega E., Sykes B., Guida V., Rengo C., Sellitto D., Cruciani F., Kivisild T., Villems R., Thomas M., Rychkov S., Rychkov O., Rychkov Y., Golge M., Dimitrov D., Hill E., Bradley D., Romano V., Cali F., Vona G., Demaine A., Papiha S., Triantaphyllidis C., StefanescuG Tracing European founder lineages in the Near Eastern mtDNA pool. Am. J. Hum. Genet. 2000, vol. 67, № 5, pp. 1251-1276. Schurr T., G., Wallace DC Mitochondrial DNA diversity in Southeast Asian populations. Hum. Biol. 2002, vol. 74, № 3, pp. 431-52. Starikovskaya EB., Sukernik RI., Derbeneva OA., Volodko NV., Ruiz Pesini E., Torroni A., Brown MD., Lott MT., Hosseini SH., Huoponen K., Wallace DC., Mitochondrial DNA diversity in indigenous populations of the southern extent of Siberia, and the origins of Native American haplogroups. Ann. Hum. Genet. 2005, vol. 69, № Pt 1, pp. 67-89. Tambets K., Rootsi S., Kivisild T., Help H., Serk P., Loogvali E., L Tolk HV., Reidla M., Metspalu E., Pliss L., Balanovsky O., Pshenichnov A., Balanovska E., Gubina M., Zhadanov S., Osipova L., Damba L., Voevoda M., Kutuev I., Bermisheva M., Khusnutdinova E., Gusar V., Grechanina E., Parik J., Pennarun E., Richard C., Chaventre A., Moisan J., P Barac L., Pericic M., Rudan P., Terzic R., Mikerezi I., Krumina A., Baumanis V., Koziel S., Rickards O., Stefano De GF., Anagnou N., Pappa KI., Michalodimitrakis E., Ferak V., Furedi S., Komel R., Beckman L., Villems R., The Western and Eastern Roots of the Saami-the Story of Genetic "Outliers" Told by Mitochondrial DNA and Y Chromosomes. Am. J. Hum. Genet. 2004, vol. 74, № 4, pp. 661-82.
Genes and Languages
141
[46] Yao YG., Zhang YP., Phylogeographic analysis of mtDNA variation in four ethnic populations from Yunnan Province: new data and A., reappraisal. J. Hum. Genet. 2002, vol. 47, №, pp. 311-318. [47] Kivisild T., Reidla M., Metspalu E., Rosa A., Brehm A., Pennarun E., Parik J., Geberhiwot T., Usanga E., Villems R., Ethiopian mitochondrial DNA heritage: tracking gene flow across and around the gate of tears. Am. J. Hum. Genet. 2004, vol. 75, № 5, pp. 752-70. [48] Kivisild T., Rootsi S., Metspalu M., Metspalu E., Parik J., Kaldma K., Usanga E., Mastana S., Papiha SS., VillemsR. The genetics of language and farming spread in India. In: BellwoodPRenfrewC, editors. Examining the farming/language dispersal hypothesis. Cambridge: The McDonald Institute for Archaeological Research; 2003. P., 215-222. [49] Comas D., Plaza S., Wells R., S., Yuldaseva N., Lao O., Calafell F., BertranpetitJ Admixture, migrations, and dispersals in Central Asia: evidence from maternal DNA lineages. Eur. J. Hum. Genet. 2004, vol. 12, № 6, pp. 495-504. [50] Alexeev V.P. Geography of human races. Moscow. 1974. 351 P. [51] Kuzeev R.G. Peoples of Volga and Urals. Moscow. 1985, 308 P. [52] Kosven M.O. Peoples of the Caucasus. Ed. Odr. Moscow. 1960. 612 P. [53] Sambrook J., Fritsch E., F Maniatis T., Molecular cloning: A., laboratory manual. Cold Spring Harbor, NY, Cold Spring Harbor Laboratory Press, 1989. [54] Andrews RM., Kubacka I., Chinnery PF., Lightowlers RN., Turnbull DM., Howell N., Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet. 1999, vol. 23, № 2, pp. 147. [55] Torroni A., Huoponen K., Francalacci P., Petrozzi M., Morelli L., Scozzari R., Obinu D., Savontaus ML., Wallace DC., Classification of European mtDNAs from an analysis of three European populations. Genetics. 1996, vol. 144, № 4, pp. 1835-50. [56] Richards MB., Macaulay VA., Bandelt HJ., Sykes BC., Phylogeography of mitochondrial DNA in western Europe. Ann. Hum. Genet. 1998, vol. 62, № Pt 3, pp. 241-60. [57] Macaulay VA., Richards MB., Hickey E., Vega E., Cruciani F., Guida V., Scozzari R., Bonnu Tamir B., Sykes B., Torroni A., The emerging tree of West Eurasian mtDNAs: A., synthesis of control-region sequences and RFLPs. Am. J. Hum. Genet. 1999, vol. 64, № 1, pp. 232-249. [58] Tanaka M., Cabrera VM., Gonzalez AM., Larruga JM., Takeyasu T., Fuku N., Guo LJ., Hirose R., Fujita Y., Kurata M., Shinoda K., Umetsu K., Yamada Y., Oshida Y., Sato Y., Hattori N., Mizuno Y., Arai Y., Hirose N., Ohta S., Ogawa O., Tanaka Y., Kawamori R., Shamoto Nagai M., Maruyama W Shimokata H., Suzuki R., Shimodaira H., Mitochondrial genome variation in eastern Asia and the peopling of Japan. Genome Res. 2004, vol. 14, № 10a, pp. 1832–1850. [59] StatSoft Inc. STATISTICA (data analysis software system), version 7 2004. [60] Richards M., Corte Real H., Forster P., Macaulay V., Wilkinson Herbots H., Demaine A., Papiha S., Hedges R., Bandelt HJ., Sykes B., Paleolithic and neolithic lineages in the European mitochondrial gene pool. Am. J. Hum. Genet. 1996, vol. 59, № 1, pp. 185203. [61] Richards M., Macaulay V., Torroni A., Bandelt HJ., In search of geographical patterns in European mitochondrial DNA. Am. J. Hum. Genet. 2002, vol. 71, № 5, pp. 1168-74.
142
E. Khusnutdinova and I. Kutuev
[62] Metspalu E., Kivisild T., Kaldma K., Parik J., Reidla M., Tambets K., Villems R., The Trans-Caucasus and the Expansion of the Caucasoid-Specific Human Mitochondrial DNA. In: Papiha S., Deka R., Chakraborty R., editors. Genomic Diversity: Application in Human Population Genetics. New York: Kluwer Academic / Plenum Publishers; 1999. P., 121-134. [63] Torroni A., Bandelt HJ., D'Urbano L., Lahermo P., Moral P., Sellitto D., Rengo C., Forster P., Savontaus M., L., Bonne Tamir B., Scozzari R., mtDNA analysis reveals a major late Paleolithic population expansion from southwestern to northeastern Europe. Am. J. Hum. Genet. 1998, vol. 62, № 5, pp. 1137-52. [64] Torroni A., Richards M., Macaulay V., Forster P., Villems R., Norby S., Savontaus M., L., Huoponen K., Scozzari R., Bandelt HJ., mtDNA haplogroups and frequency patterns in Europe. Am. J. Hum. Genet. 2000, vol. 66, № 3, pp. 1173-7. [65] Comas D., Calafell F., Mateu E., Perez Lezaun A., Bosch E., Martinez Arias R., Clarimon J., Facchini F., Fiori G., Luiselli D., Pettener D., Bertranpetit J., Trading genes along the silk road: mtDNA sequences and the origin of Central Asian populations. Am. J. Hum. Genet. 1998, vol. 63, № 6, pp. 1824-38. [66] Horai S., Murayama K., Hayasaka K., Matsubayashi S., Hattori Y., Fucharoen G., Harihara S., Park K., S., Omoto K., Pan IH., mtDNA polymorphism in East Asian Populations, with special reference to the peopling of Japan. Am. J. Hum. Genet. 1996, vol. 59, № 3, pp. 579-90. [67] Kolman C., Sambuughin N., Bermingham E., Mitochondrial DNA analysis of Mongolian populations and implications for the origin of New World founders. Genetics. 1996, vol. 142, № 4, pp. 1321-34. [68] Schurr T., G., Sukernik R., I., Starikovskaya Y., B., Wallace DC., Mitochondrial DNA variation in Koryaks and Itel'men: population replacement in the Okhotsk Sea-Bering Sea region during the Neolithic. Am. J. Phys. Anthropol. 1999, vol. 108, № 1, pp. 1-39. [69] Bermisheva M., Viktorova T., Tambets K., Villems R., Khusnutdinova E. Diversity of mtDNA in peoples of Volga-Ural region of Russia. Molecular biology. 2002, Vol. 36, pp.905-906. [70] Tambets K., Tolk HV., Kivisild T., Metspalu E., Parik J., Reidla M., Voevoda M., Damba L., Bermisheva M., Khusnutdinova E., Golubenko M., Stepanov V., Puzyrev V., Usanga E., Rudan P., Beckmann L., Villems R., Complex signals for population expansions in Europe and beyond. In: BellwoodPRenfrewC, editors. Examining the farming/language dispersal hypothesis, McDonald Institute for Archaeological Research Monograph Series. Cambridge: Cambridge University Press; 2003. P., 449458. [71] Villems R., Rootsi S., Tambets K., Adojaan M., Orekhov V., Khusnutdinova E., Yankovsky N., Archaeogenetics of Finno-Ugric speaking populations. In: JulkuK, editors. The Roots of Peoples and Languages of Northern Eurasia IV. Oulu: Societas Historiae Fenno-Ugricae; 2002. P., 271-284. [72] Khusnutdinova E., Khidiatova I., Fatkhlislamova R., Viktorova T., Restriction polymorphism of mtDNA HVSI in populations of Volga-Ural region. Genetika. 1999, Vol. 5, P. 586-592. [73] Kuzeev R.G. The origin of Bahskirs, Moscow. 1974, 570 P. [74] Asfandiarov A., Asfandiarova K., History of Bashkir villages of Perm and Sverdlov region, Ufa, 1999. 253 P.
Genes and Languages
143
[75] Yao YG., Kong QP., Bandelt HJ., Kivisild T., Zhang YP., Phylogeographic differentiation of mitochondrial DNA in Han Chinese. Am. J. Hum. Genet. 2002, vol. 70, № 3, pp. 635-651. [76] Yao YG., Kong QP., Wang CY., Zhu CL., Zhang YP., Different matrilineal contributions to genetic structure of ethnic groups in the silk road region in china. Mol. Biol. Evol. 2004, vol. 21, № 12, pp. 2265-80. [77] Bakunin V.M. Description of Kalmyk peoples, especially Torgout, of their actions, Khans and masters. Elista. 1995. 153 P. [78] Kereitov R. Ethnic history of Nogays. Stavropol. 1999. 176 P. [79] Bermisheva M.A., Kutuev I. A., Korshunova T.Yu., Dubova N.A., Villems R., Khusnutdinova E.K. Phylogeographic Analysis of Mitochondrial DNA in the Nogays: A Strong Mixture of Maternal Lineages from Eastern and Western Eurasia Molecular Biology. 2004, Vol.38, p. 516-523. [80] Derbeneva OA., Starikovskaya EB., Wallace DC., Sukernik RI., Traces of early Eurasians in the Mansi of northwest Siberia revealed by mitochondrial DNA analysis. Am. J. Hum. Genet. 2002, vol. 70, № 4, pp. 1009-14. [81] Derenko MV., Malyarchuk BA., Denisova GA., Dorzhu ChM., Karamchakova ON., Luzina FA., Lotosh EA., Dambueva JK., Ondar UN., Zakharov JA., Polymorphism of the Y-Chromosome Diallelic Loci in Ethnic Groups of the Altai-Sayan Region. Russian Journal of Genetics. 2002, vol. 38, №, pp. 309-314. [82] Underhill PA., Inferring Human History: Clues from Y-Chromosome Haplotypes. Cold Spring Harbor Symposia on Quantitative Biology. Volume LXVIII: Cold Spring Harbor Laboratory Press; 2003. P., 487-493. [83] Underhill PA., Passarino G., Lin AA., Shen P., Lahr Mirazon M., Foley R., Oefner PJ., Cavalli Sforza LL., The phylogeography of Y., chromosome binary haplotypes and the origins of modern human populations. Ann Hum Genet., 2001, vol. 65, № 1, pp. 4362. [84] Sajantila A., Paabo S., Language replacement in Scandinavia. Nat. Genet. 1995, vol. 11, № 4, pp. 359-360. [85] Kittles RA., Perola M., Peltonen L., Bergen AW., Aragon R., A., Virkkunen M., Linnoila M., Goldman D., Long JC., Dual origins of Finns revealed by Y., chromosome haplotype variation. Am. J. Hum. Genet. 1998, vol. 62, № 5, pp. 1171-9. [86] Sajantila A., Lahermo P., Anttinen T., Lukka M., Sistonen P., Savontaus ML., Aula P., Beckman L., Tranebjaerg L., Gedde Dahl T., Issel Tarver L., DiRienzo A., Paabo S., Genes and languages in Europe: an analysis of mitochondrial lineages. Genome Res. 1995, vol. 5, № 1, pp. 42-52. [87] Rosser ZH., Zerjal T., Hurles ME., Adojaan M., Alavantic D., Amorim A., Amos W., Armenteros M., Arroyo E., Barbujani G., Beckman G., Beckman L., Bertranpetit J., Bosch E., Bradley DG., Brede G., Cooper G., Corte Real H., B., Knijff de P., Decorte R., Dubrova YE., Evgrafov O., Gilissen A., Glisic S., Golge M., Hill EW., Jeziorowska A., Kalaydjieva L., Kayser M., Kivisild T., Kravchenko SA., Krumina A., Kucinskas V., Lavinha J., Livshits LA., Malaspina P., Maria S., McElreavey K., Meitinger T., A., Mikelsaar A., V Mitchell RJ., Nafa K., Nicholson J., Norby S., Pandya A., Parik J., Patsalis P., C., Pereira L., Peterlin B., Pielberg G., Prata M., J., Previdere C., Roewer L., Rootsi S., Rubinsztein D., C., Saillard J.,
144
E. Khusnutdinova and I. Kutuev Santos FR., Stefanescu G., Sykes BC., Tolun A., Villems R., Tyler Smith C., Jobling MA., Y-chromosomal diversity in Europe is clinal and influenced primarily by geography, rather than by language. Am. J. Hum. Genet. 2000, vol. 67, № 6, pp. 152643.
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev, G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 5
COMMON AND SPECIAL FEATURES OF THE HUMAN RIBOSOMAL DNA Natalia. S. Kupriyanova and Alexei. P. Ryskov The Institute of Gene Biology, Russian Academy of Sciences, Moscow 119334, Russia
ABSTRACT Ribosomes are among the most ancient and important cell organelles and have structural features common for all modern organisms. Ribosomal DNA (rDNA) in all the vertebrate genomes exists in a form of abundant discrete clusters. Genes within human ribosomal DNA clusters are tandemly repeated in a head- to- tail fashion and exist at multiple chromosomal locations, as it occurs in other vertebrates. Tandemly arranged rDNA repeats comprise so- called nucleoli organizers (NORs), specific chromosomal regions, were nucleoli are forming during a mitotic telophase. Each ribosomal RNA (rRNA) gene consists of coding regions for 18S, 5.8S and 28S rRNA and ribosomal intergenic spacer (rIGS). The coding region being formed by external transcribed spacer (5‘ETS)-18SrDNA-internal transcribed spacer1 (ITS1)-5.8SrDNA- internal transcribed spacer2 (ITS2)-28SrDNA-3‘ETS is transcribed as a long precursor (pre-rRNA). The variable rDNA regions differ in size and sequence from organism to organism, within organisms, and within individual species. Length variable regions exist upstream of the promoter, downstream of the terminator, and, at least in higher primates, in the central part of the rIGS. The rIGS harbors the gene promoter and terminator, the spacer promoter and terminator and, in human and apes, many Alu-retroposons, together with many sites of sequence motifs that can adopt alternative structures. Subtelomeric DNA regions, adjacent to the 5‘-ends of rDNA clusters at all human‘s NOR+ chromosomes, reveal surprising conservativeness, suggesting their participation in the rDNA conservativeness supporting. The multi-copy and multiple -cluster arrangement of the ribosomal genes makes the evolution of these gene systems very complicated, involving different mechanisms of concerted evolution. Connections were detected between ectopic nucleolus locations, rDNA copy number, methylation status, and transcription activity on some neurological and hereditary diseases, as for ageing and cancer. The vast extent of sequence heterogeneity coupled to the central role rRNA plays in protein translation
146
Natalia S. Kupriyanova and Alexei P. Ryskov makes the rDNA an interesting model system. The next few years should yield the results of the human rDNA sequence analysis in connection with other experimental data and lead to understanding of the role of various sequence motifs within the gene.
INTRODUCTION Cells grow and divide. Some cells grow without of dividing: neurons, or oocytes. Some cells divide without of growing: developing zygotes. For most cells, however, growth and division are coupled, thereby maintaining cell size within narrow limits. Cell growth requires the synthesis of proteins;, the synthesis of proteins requires ribosomes. Thus, ultimately the control of cell growth must involve the control of ribosome synthesis [1]. Genes that code for nucleic acids and proteins forming ribosomes, and genes that service their function, maturation of their transcripts, and activation of mature products, together form a vast polygenic complex, and their coordinated operation is of vital importance for the viability of individual cells and whole organisms. The mechanisms of the human rDNA replication, transcription, and occurrence of variability are far from clear. They are studied in some laboratories throughout the world, together with connected problems [2-10]. In Russia, the rDNA structure and functions in vertebrates have been reviewed before in 1982 [11], and in 2001 [12]. Here, we will discuss the problems of intra-chromosomal, interchromosomal and evolutionary variability of the rDNA clusters along with their 5‘- adjacent regions, some mechanisms of the rDNA expression activity regulation, and the problems of the rDNA status in different physiological states of a human organism: in an ageing and on different diseases, Werner‘s syndrome, schizophrenia, rheumathoid arthritis, Hodgkin‘s disease, other types of blood cancer, and so on.
RIBOSOMAL DNA ORGANIZATION, AND QUANTATIVE EVALUATION OF ITS CONTENT IN THE HUMAN GENOME The tandem structure of the rRNA genes has been demonstrated with various methods and supported by the possibility of their isolation in the form of a distinct band of a characteristic density upon centrifugation in a CsCl gradient, direct observations of rDNA transcription (loops of the lamp-brush chromosomes) during gametogenesis in several organisms [13,14], a restriction pattern specific for tandem repeats, and by resolution of large DNA fragments by pulsed-field gel electrophoresis [15]. Although sometimes evidences appeared against the tandem structure of the rDNA clusters, they were all subsequently refuted. So, it is now safe to say that the great bulk of the rDNA repeats form arrays in all known genomes. Human rDNA comprise tandemly arranged clusters that are present on the p-arms of the five acrocentric chromosomes (13, 14, 15, 21, and 22). They comprise so -called nucleoli organizers (NORs), specific chromosomal regions, were nucleoli are forming during a mitotic telophase. Nucleoli disappear during mitosis with the resulting NORs transforming into secondary subtelomeric constrictions in the condensed chromosomes. In an interphase, a NORs number often reduces as the result of their fusion.
Common and Special Features of the Human Ribosomal DNA
147
The rRNA genes are many times repeated in genomes of higher organisms. A number of rDNA repeating units markedly differ between individual classes of eukaryotes. The rDNA copy number is relatively low and varies within a narrow range (200-500 copies) in insects, birds, and mammals. The situation is different with fish, amphibians, and plants. In these classes, some species, even related, vary in ploidy and the rDNA copy number, and it is not necessarily associated with the genome size. Possibly, polyploidy contributes to the genome variation and promotes evolution [16]. An accuracy of estimating of the rDNA repeats number depends on a precision of the method applied. In the earliest experiments, a number of rDNA in a human diploid genome has been estimated as ~400 copies by saturation hybridization of nuclear DNA immobilized on nitrocellulose filters with labeled rRNA. With the method of Veiko et al. [17], the repeat number varies from 390 to 580 per diploid genome in humans. The most modern method for the detection of a DNA copies number today is based on the PCR in a real-time regime. Chromosome maps of the acrocentric short arms are infrequently studied due in large part to their paucity of transcribed genes and their high concentration of functionally unambiguous repetitive elements. On evidence of FISH and confocal laser scanning microscopy, rDNA clusters account for about 10% of the short arm of the acrocentrics and are isolated from other genes by long (about 10 Mb) satellite sequences [18]. In addition to rDNA, the nucleolus includes large heterochromatin blocks containing non ribosomal repeats. Disseminated repeats and microsatellites, which are major in heterochromatin, tend to form unusual three-dimensional structures, which possibly play an important role in the nucleolar organization and in genome evolution [19]. The above methods have also provided data on the spatial arrangement of individual components in the nucleolus. Thus, both the centromere and the rDNA cluster are close together at the periphery of the nucleus, which explains recombination between rDNA and pericentromeric alphoid satellites of acrocentric chromosomes [18]. An rDNA repeating unit consists of the transcribed region (ribosomal operon) and the ribosomal intergenic spacer (rIGS) (figure 1). In eukaryotic cells, the rRNA genes are transcribed by RNA polymerase I (Pol l) in the nucleolus to produce a large (40S- 47S in various organisms) rRNA precursor (рге-rRNA). A pre-rRNA molecule contains the 5'external transcribed spacer (5'-ETS), 18S rRNA, left internal transcribed spacer (ITS1), 5.8S rRNA, right internal transcribed spacer (ITS2), 28S rRNA, and 3' external transcribed spacer (З'-ETS). Mature products (28S, 18S, and 5.8S rRNAs) result from specific nuclease cleavage of pre-rRNA. Evolution is associated with elongation of 18S and 28S rRNA genes and of the transcribed spacers, but the general structure of the transcribed region remains constant. In mammals, 28S rRNA is about 1.5 kb longer than in yeast and consists of alternating conserved and variable regions. Inserts and nucleotide substitutions in these regions are phenotypically neutral. Intraspecific variation of the transcribed spacers is comparable with that of 28S rDNA. This parameter has been characterized for the human transcribed spacers [4] and ITS1 and ITS2 of higher primates [20].
148
Natalia S. Kupriyanova and Alexei P. Ryskov
Figure 1. The arrangement of the human ribosomal DNA repeats. The genes of 18S, 28S and 5.8S rDNA are shown as non filled rectangles. The curved arrows with ‗t‘ denote the transcription start points. The transcribed and non transcribed regions are denoted by thin and thick lines, correspondingly.
As mentioned above, the transcribed regions in the rDNA alternate with rIGSs. The rIGS size noticeably increases from primitive to higher eukaryotes, comprising 10% of the total rDNA repeating unit in yeast and up to 70% in mammals. The rIGS size varies among and between individuals (inter- and intragenomic polymorphism) [3, 5, 21]. The variation mostly concerns the repeat number in arrays with units ranging from 2-6 nt (microsatellite clusters) to several thousand nucleotides (blocks of large repeats). Microsatellite repeats spectra in mammalian rIGSs can partly span, although some microsatellite variants are species specific [22]. As a part of the ―Human genome‖ program, we were searching for highly polymorphic microsatellite markers in the cosmid library of the human chromosome 13 probing its highly ordered filters with labeled oligonucleotides composed by different microsatellite motives [23, 24]. It turns out that three of the seven motives used (TCC, GACA, GA) are detected in the majority (70-80%) of cases in the rIGS and often repeatedly represented in the same cosmid inserts. The clusters formed by GAC and GACT repeats are detected with the same frequency in the rIGS and the last part of the chromosome 13, whereas the clusters formed by TCG and GATG motives are practically absent in the rIGS. The results obtained let to propose that NOR‘s nucleotide sequences are evolving at least partly independently of the bulk part of the genome. We have detected later that in the rIGS of higher primates, a large number of microsatellite clusters formed by (TTGC) n [25, 26]. This motif is absent in rIGS of more primitive primates studied so far. Oligonucleotide probe, homologous to this microsatellite, may be used to detect human rDNA and investigate structural polymorphism in human rIGS (figure 2). The rIGS polymorphism based on differences in a number of repeating elements was earlier thought to have no phenotypic expression. Now it is clear that the repeat number in certain arrays located upstream of the promoter affects the intensity of rRNA gene expression and, consequently, the protein-synthesizing capacity of a cell and the general status of an organism, as most clearly demonstrated in plants [27].
Common and Special Features of the Human Ribosomal DNA
149
Figure 2. Southern hybridization of the nuclear DNA, isolated from the lymphocytes of four higher primate species, with the labeled (TTGC)4 probe. The DNA was digested by EcoRI restriction endonuclease. 1-Homo sapiens; 2- Pan troglodites; 3- Gorilla gorilla; 4- Pongo pygmaeus.
Polymorphisms of another common type are point nucleotide substitutions occurring throughout rDNA and especially in the region upstream of the promoter. Such polymorphisms, known for a long time, are best studied in man and higher primates, and have been first identified as restriction fragment length polymorphism (RFLP) on patterns obtained with EcoRl, HindII, NotI, HinfI, and HindIII [21, 28, 29]. A high concentration of nucleotide substitutions in a region associated with rDNA transcription has been assumed to affect the specificity of transcription factor binding and thereby to contribute to the molecular mechanisms of speciation [2]. Many point nucleotide substitutions have been found in X. laevis and Misgurnus fossilis 5S rDNA and 5S rRNA, which is a constitutive component of ribosomes and forms hydrogen bonds with 18S rRNA [30]. The sequences determined are as heterogeneous that there is no way to distinguish major variants between them. More likely, there is a certain 5S rRNA consensus from which individual molecules differ by a number of substitutions. The role of 5S rRNA in ribosomes is still unclear, and it is of interest to analyze whether the sequence variation of 5S rRNA is consistent with that of the other rRNAs and is associated with their interaction [31]. Our studies of cloned ribosomal DNA (rDNA) variants isolated from the cosmid library of human chromosome 13 have revealed some disproportion in representation of different rDNA regions [32]. We have shown also nonrandom cleavage of human rDNA with Sau3A or its isoshizomer MboI under mild hydrolysis conditions. The hypersensitive cleavage sites were found to be located in the ribosomal intergenic spacer (rIGS), especially in the regions of about 5-5.5 and 11 kb upstream of the rRNA transcription start point. This finding is based
150
Natalia S. Kupriyanova and Alexei P. Ryskov
on sequencing mapping of the rDNA insert 5‘- and 3‘- ends in randomly selected cosmid clones generated in a course of human chromosome 13 cosmid library engineering. It lets to propose that some Sau3A sites on a native rDNA exhibit hypersensitivity to the Sau3A restriction endonuclease action (figure 3). To answer this proposal, an experimental procedure was developed including EcoRl exhaustive treatment of genomic DNA followed by Sau3A (or MboI) endonuclease action at a low enzyme concentration for different periods of time and subsequent blot hybridization with the specific labeled rDNA fragment [32]. There exists a number of data about chromatin sites hypersensitive to endonucleases action. A detailed structural analysis of the mouse pre-promoter rDNA chromatin, for an example, has revealed hypersensitive sites in the ori of replication, enhancer repeats, spacer promoters, the two replication providing sites and so on [7, 9]. However, hypersensitive cleavage sites produced by micrococcal endonuclease on the naked SV40 DNA were described only once [33]. So, our results show that a methylation status and supercoiled state of the rIGS regions has no effect on cleavage sites‘ sensitivity. However, all primary cleavage sites are adjacent to or enter into Alu retroposons. A number of regulatory elements harbor in Alu elements, including Pol III promoter and terminator, and ―hot points‖ of DNA recombination [34-36]. These data let to suggest a possible role of neighboring sequences in an extent of the Sau3A sites‘ nuclease accessibility.
Figure 3. (A) A general map of the human ribosomal DNA repeat. EcoRl cleavage sites producing fragments A, B, C, and D are indicated by vertical arrows. The positions of oligonucleotide probes rl, r2, r3, r4, and r5 used for the identification of EcoRl fragments are denoted by vertical bars under the main line; clusters of (ttgc)n hybridizing with r3 are denoted by one bold bar; t, transcription start point. The location of each cosmid insert on the rDNA map is shown above the major rDNA line by the numbered horizontal lines. The insert‘s ends that reveal no homology with the human rDNA are indicated by dotted lines. (B) An expanded scheme of the fragment С with all Sau3A sites denoted by vertical lines under the main horizontal line and numbered in the direction upstream of the transcription start point. The predominant Sau3A sites revealing hypersensitivity to the action of Sau3A are shown by hammer symbols above the line; the Alu elements are indicated by horizontal arrows, according to their direction. The location of cdc 27 is shown by a horizontal bracket.
Common and Special Features of the Human Ribosomal DNA
151
Human rIGS pre-promoter region contains besides of unque nucleotide sequences and micro- and minisatellite clusters three pairs of collinear Alu retroposons. In our experiments, PCR amplification was used to find new structural variations in human rIGS. It turns out that on PCR amplification of the two rIGS regions containing collinear Alu repeats separated by microsatellite clusters (Alu1-Alu2, and Alu3-Alu4), the two PCR products are formed contrary to the expectations, the expected one and shortened one (figures 4, 5). All our results on cloning and sequencing of the PCR products unambiguously indicate that the shorter fragments do not exist in native genomic DNA but are forming during PCR reaction (37, 38). The shortened fragments lack one Alu element and sequences between collinear Alu pairs. A presence in the Alu1- Alu4 of specific nucleotide variations makes it possible to map stop points of the PCR amplification. They harbor in the most conservative (Alu-―core‖) part or the retroposons. It seems plausible that the Alu-―core‖ being a part of the Pol III promoter can display an elevated affinity to the Taq-polymerase arresting its movement along a DNA strand. It leads to a premature termination of the reaction and forming of hybrid shortened fragments. In any case, our results indicate that a great care should be exercised in interpreting comparison PCR data of complex loci, such as rIGS, generally used in evolutionary or comparative studies.
Figure 4. The scheme of the human rDNA repeat. (a) A, B, C and D are EcoRI-defined segments; ttranscription start point; (b) expanded BamHI-EcoRI section of fragment C, where Alu elements are shown by numbered arrows, the deleted part of the 1.8 kb fragment containing the 90 bp md cluster is shown by a gray line; positions of the specific and mcs probes and P1 and P2 primers are denoted by triangles, asterisks and letters, respectively.(c) A putative mechanism of the shortened PCR products arising.
152
Natalia S. Kupriyanova and Alexei P. Ryskov
Figure 5. Electrophoresis in 0.8% agarose of PCR products obtained as the result of the Alu3-md/mcsAlu4 region amplification. (a) UV; (b) Southern blot probing with the human rIGS specific oligonucleotide marker and (c) with microsatellite marker (ttgc)4. Lanes 1-4: genomic DNA of unrelated individuals, lane 5: rIGS containing cosmid DNA isolated from human chromosome 13 library (LA1 3NC01, Los Alamos). 1 kb ladder was used as a marker.
Beyond of rDNA arrays, there are experimental evidences that dispersed rRNA pseudo genes and sequences similar to rIGS (orphans) are abundant in eukaryotic genomes. The phenomenon of orphans was firstly shown for insects, D. melanogaster and D. simulans [39, 40]. Amplification of the 13.5 kb rIGS region (up to 200 copies) was also observed in the mouse BALB/c line [41]. The four clones were isolated from higher primate‘s genomes with interrupted 18S rDNA [42- 44]. A presence of rDNA- like sequences outside of there clusters was repeatedly recorded in the human genome [4]. On the distal part of the human chromosome 22, some rDNA- like segments were detected including those homologous to the 28S rDNA and rIGS [5], whereas the part proximal to the rDNA cluster, was completely lacking in them [45]. Some features of a structure and genomic distribution of pseudogenes let to propose that in the majority of cases, they most probably do not enter into and do not interact with genetically active rDNA clusters, i.e., exist beyond of the nucleolar region. It was proposed, however, that the 18S rDNA pseudogene with mosaic structure in D. melanogaster, which includes alternating conservative and diverged regions, could imply a reversible character of the 18S rDNA mutations and possibility of their restoration as the result of gene conversion with the normal rDNA [46]. Blot-hybridization analysis of the clones harboring rIGS fragments, isolated from the cosmid library of the human chromosome 13, revealed the two clones with vast expanded (10 and 26 kb) deletions in the rIGS [47], (figure 6). The deletions were mapped on a comparison of the rDNA sequence from the GenBank (U13369) with recombinant insertions‘ sequences from the cosmid clones 36G10 and 47H2, correspondingly. In the both cases, 5‘- and 3‘-ends of the deletions were located in the microsatellite (TC)n clusters. Comparative FISHs of the genome DNA with the insertion from the clone 47H2 and corresponding native rIGS DNA segment were performed in mild or hard conditions. The experiments revealed an intensive hybridization of the 47H2 probes with all NOR+ chromosomes in mild conditions, whilst hybridization took place only with chromosomes 13 and 21 as the result of hard washing. The results obtained demonstrate a possibility of presence in the human genome of rDNA units harboring deletions in their rIGSs (figure 7). A restriction-hybridization analysis of the 47H2 inserts showed their complex nature, namely, alternation of highly conservative rDNA regions with foreign segments. A searching in databases detected the ETS fragment at the 5‘end of the insert and showed a presence of an unidentified nucleotide sequence at its 3‘-end. An unsuccessful searching of a homology for this nucleotide sequence in the human genome
Common and Special Features of the Human Ribosomal DNA
153
is possibly connected with an exclusion of the short arms of the NOR+ chromosomes from the ―‗Human genome‖‘ program.
Figure 6. The positions of the prolonged deletions in the 36G10 (a), and 47H2 (b) cosmid clones isolated from human chromosome 13 library (LA1 3NC01, Los Alamos). EcoRl cleavage sites producing fragments A, B, C, and D are indicated by vertical arrows; t- transcription start point. The fragments remained intact after deletions are shown underline.
Figure 7. FISH localization of the cosmid clone 47H2 (with prolonged deletion) on human chromosomes: (a) – after mild washing; (b) – after hard washing.
154
Natalia S. Kupriyanova and Alexei P. Ryskov
SUBTELOMERIC AND SUBCENTROMERIC DNA AREAS NEIGHBOURING RDNA CLUSTERS IN ACROCENTRIC CHROMOSOMES It is shown, that sequences distal to rDNA at all acrocentrics in human and higher primates, which might have been expected to evolve independently without correcting each other, reveal, nonetheless, a high extent of uniformity [4-5]. The results obtained let to suppose that the 5‘- flanking regions play an important role in the conservativeness maintenance and/or in the regulation of the variability of the NOR‘s nucleotide sequences. The regions of the short arms of acrocentric chromosomes adjacent to the rDNA clusters are often involved in recombination between rDNA repeats. The Robertsonian‘s fusion reveals an extreme example of such an event, when rDNA clusters‘ deletion is accompanied by their q-arms fusion on retention of one or two centromeres. Analogous fusions can occur between homologous and nonhomologous acrocentric chromosomes in humans. Some time ago, during cloning of the site of X/21 translocation responsible for a rise of the Duschen muscular dystrophy, the clones harboring an edge between the rDNA and adjacent non ribosomal region (DJ) were isolated [48]. It was estimated by means of these clones that the rRNA transcription process at the p-arm of the acrocentric chromosome occurs in the direction of the telomere. A primary structure of the 8.3 kb of the non ribosomal region adjacent to the 5‘-end of the rDNA cluster was determined on the human chromosome 21 [5]. An analysis of the nucleotide sequence showed that the distal segment differs from the rIGS by an absence of prolonged tracts of simple repeats and benched DNA structures. It contains some fragments homologous to the 28S rDNA and rIGS and also of the two possible pseudogenes, 11 Alu elements, one LINE element and two MER4 fragments [5]. The nucleotide sequence of the link between the 3‘-end of the rDNA cluster and non ribosomal DNA on the human chromosome 22 was also estimated. It is localized in the ITS1 and represents a unique sequence 68 b.p. long, followed by a cluster of repeating 147 b.p. elements. This cluster is detected on all human‘s acrocentric chromosomes and is involved in forming of the repeating units of a more high order, of about 6.4-6.8 kb in size [4].
COMPLEX STRUCTURE AND FAST EVOLUTION OF THE HUMAN SUBTELOMERS A tendency for revision of main hypothesis of nuclear DNA evolution can be observed in recent years. According to universally accepted models, appreciable genome reorganizations were rarely occurring in evolution, no more often than once for 10 MYR [49, 50]. However as the result of the ―‗Human Genome Program‖‘ realization, it becamesbecomes clear that a wide range of prolonged segments‘ duplications took place for the last 35 MYR, namely, during primates‘ evolution [51]. By segments‘ duplications are meant duplications of DNA regions between 1 and about 500 kb in size revealing extent of similarity about 95.5%. Duplicated segments (duplicons) can occur in tandems or, more often, be distributed through genomes. Duplications can be inter-, or intra-chromosomal ones, with inter-chromosomal duplications most often located in subtelomeric and peri-centromeric regions. The structure and evolution of the most considerable in lengths (>1kb) and extent of homology (>90%) segment duplications of the human chromosome 22 was precisely studied [52]. One of the
Common and Special Features of the Human Ribosomal DNA
155
most interesting results of this work consists in the detection of the region on the human chromosome 22 without of homology to chromosomal DNA of other higher primates. It is precisely the end of the nucleotide sequence detected at present time in the pericentromeric region of the human genome. This result lets to propose the most ‗―young‘‖ (from the evolutionary point of view) sequences to be adjacent to chromosomal centromeres. It is a common point of view that human genome have has been totally sequenced. However, sub-telomeric, pericentromeric sequences, and the sequences of the short arms of acrocentric chromosomes have not been completely detected yet. Duplicons located in these regions hamper construction of contigs and thus of the full-sized human genome maps. However, a number of functionally important regions have been sequenced. Comparing of sequences of the short PCR amplified paralogous segments from the distal rDNA regions (rDR) of the human non homologous acrocentric chromosomes revealed their striking similarity, which possibly reflects their functional importance [5]. Recently, new information has appeared about the human‘s subtelomeric duplicon structure and organization. The extent of nucleotide sequence divergence within subtelomeric duplicon families varies considerably, as does the organization of duplicon blocks at subtelomere alleles. Subtelomeric internal (TTAGGG)n-like tracts occur at duplicon boundaries, suggesting their involvement in the generation of the complex sequence organization. Most duplicons have copies at both subtelomere and non-subtelomere locations, but a class of duplicon blocks is identified that is subtelomere-specific. In addition, a group of six subterminal duplicon families are identified that, together with six single-copy telomere-adjacent segments, include all of the (TTAGGG)n-adjacent sequence identified so far in the human genome [52].The significant levels of nucleotide sequence divergence within many duplicon families as well as the differential organization of duplicon blocks on subtelomere alleles may provide opportunities for allele-specific subtelomere marker development. We have performed a sequencing of the rDR segment (~10kb) sub-cloned from the cosmid library of the human chromosome 13 (GenBank, no AF478540). The nucleotide sequence analysis has shown its practically full homology to the paralogous segment from the human chromosome 21. The primary structure of the extra region (~2kB) detected towards the telomere revealed its 84% homology to the nucleotide sequence of the BAC clone of the human chromosome 19 (GenBank, no AC006504). This region of homology is the nearest one to the chromosome 19 centromere among all the regions already sequenced [BLASTN 2.2.18 (Mar.02-2008)]. This result reveals once more information about segmental duplications in the human genome (figure 8a, b). Both nucleotide sequences contain shortened Alu-repeats of 213 b.p. long [53]. This argues for presence of the Alu-fragments in the area under study even before duplication. The detection of the promoter of the gene CD30, connected with Hodgkin‘s lymphoma [54] in the chromosome 19 pericentromeric region together with its absence from the homologous region of the chromosome 13 provides a possibility to suggest that a segment harboring the promoter region invaded into the chromosome 19 after the segment duplication.
156
Natalia S. Kupriyanova and Alexei P. Ryskov
Figure 8. A schematic comparison of the human chromosome 13 subteloromeric fragment with: (a) the region of the BAC clone harboring the human chromosome 19 pericentromeric segment. (b) The localization of homologous regions on the 13 and 19 chromosomes is shown by a black rectangle.
SPECIFICITY OF THE HUMAN RDNA EVOLUTION Human rDNA is a tandemly arranged multicopy gene family that is present on p-arms of the five acrocentric chromosomes. Tandem genes on all five chromosomes are subject to concerted evolution, a process that promotes homogeneity among rDNA copies through the mechanisms of unequal homologous exchange and gene conversion. These mechanisms can correct and eliminate new rDNA variants, and they can also promote the spread of new gene variants. This spreading can occur throughout individual clusters, and among homologous and non homologous chromosomes. A number of scientists were trying to elucidate which mechanism is more important (gene conversion or unequal cross over) and to estimate relative frequencies of exchanges within chromatides, between sister chromatides, among homologues, and between nonhomologues chromosomes. Different experiments have yielded different answers to these questions. On the one hand, evidences exist in favor of interchromosomal exchanges (by unequal crossingovercrossing over) resulting in rIGSs lengths‘ variability on nonhomologous chromosomes [21, 55, 56] strengthening in generations [29, 57]. On the other hand, evidence favoring intrachromosomal exchanges includes the data implying linkage disequilibrium in human rDNA [58], the Mendelian inheritance of spacer variants in families [59], human rIGS variants in syntenic fashion [21]. A number of constant and variable areas are found in the human rIGS. The variable regions are adjacent to the start and termination points of transcription. Human genomic DNA contains four major BamHI fragment variants of 3.9kB, 4.6kB, 5.4kB, and 6.2kB in the region located just downstream of the primary transcription termination site, between nucleotides 13473 and 15523 [3, 60]. The human rIGS region preceding the promoter (up to - 7kb) contains three pairs of Alu retroposons alternating with homogenous and hierarchically organized tandem repeats (figure 9). Although a full version of the human rDNA nucleotide sequence is known for more than
Common and Special Features of the Human Ribosomal DNA
157
ten years [3], the features of its functional organization are poorly studied as compared with the other vertebrate models, mouse, rat, and Xenopus laevis [61-64]. The rIGSs of the mouse, rat, and frog harbor a variable number of spacer repeats, which have been shown to act as transcription enhancer elements in conjunction with the spacer promoter. In the mouse, for an example, enhancer elements (135-140 bp) including long poly-T tails abut to the promoter, spanning in a total of about 2 kb. The Xenopus laevis and the rat rIGS pre-promoter regions are organized similarly. The human rIGS lacks analogous repeats before the promotor, whereas the two collinear Alu retroposons divided by about 800 bp of so -called 90 bp repeats [3] formed by hierarchically organized microsatellite motives are positioned about 2 kb upstream of the rRNA transcription start point. The 90 bp repeats in a human rIGS were proposed to function as enhancers, although it has not been yet shown experimentally [5]. At the same time, the 3‘- end of the nearest to the promoter Alu-like B1 retroposon in the mouse is located only 3398 bp upstream of the transcription start point [64]. A drastic difference between the rIGS pre-promoter region organization in humans and other known vertebrates has made it interesting to study this region organization in the great apes. We have PCR amplified, cloned and sequenced the rIGS fragments of about 7 kb in length, located upstream of the rRNA transcription start point for Pan paniscus, Pan troglodytes, Gorilla gorilla and Pongo pygmaeus. The sequences have been registered in EMBLBank under GenBank nos. DQ133470, DQ133468, DQ133471 and DQ133469, correspondingly. Alignment of the primates‘ orthologic nucleotide sequences reveals high extent of similarity, with the exception of highly repetitious region between the two Alu repeats, nearest to the onset of transcription [26], (figure 10).
Figure 9. A scheme of the human rDNA unit organization. The region of investigation is expanded. Long arrows denote positions of the numbered Alu elements and their directions. t – Start point of transcription. Positions of the primers (P1-P3) and oligonucleotides for screening (TI-TIII) are shown by short arrows and short bold lines.
158
Natalia S. Kupriyanova and Alexei P. Ryskov
Figure 10. Core repeating units, forming the region between Alu2 and Alu1 in the great apes‘ rIGS. Identical or prevailing bases in vertical columns were used for consensus sequences formation. Consensus sequences are shown in bold letters. a. Homo sapiens; b. Pan paniscus; c. Pan troglodytes; d. Gorilla gorilla; e. Pongo pygmaeus.
Common and Special Features of the Human Ribosomal DNA
159
As far, as the human rIGS sequencing has been performed by parts since a middle of 80s‘, there are no universally adopted designations for the Alu elements, entering into it. So, we decided to designate them by the numbers from ―1‖ to ―6‖ in the direction away from the promoter. The Alu1 and Alu2 retroposones are separated by ~800 bp nucleotide sequence, formed by 90 bp repeats [3]. 90 bp monomers are formed mainly by regularly alternating microsatellite clusters (TTTC) n and (TTGC) n with rare nucleotide substitutions, deletions and insertions. Similar regions are lacking in the rIGS of the mouse, rat and Xenopus laevis. Earlier, we have shown that microsatellite (TTGC) n represents a specific marker sequence for the human and chimpanzee species, being absent in the rIGS of the orangutan and some less highly organized primates [25]. In the human genome, the estimated rate of point mutations is approximately 10-9 mutations/nucleotide/year, while the slippage probability is about 10-3 per repeat per generation. Our results for the great apes also show considerably higher rate of evolutionary changes among simple and microsatellite clusters on a comparison with unique DNA sequences. So, evolutionary repeat dynamics consisting of elongations and shortenings of repeats, combined with point mutation, can be considered as starting mechanisms of the evolution [65-67]. A number of neighboring repeating units can be elongated or shortened as the result of unequal crossover or a replication slippage. If, in doing so, two (or more) adjacent units have analogous base substitutions, they can form internal subcluster, which will evolve later on according to its own dynamics. The great apes‘ rIGS microsatellite evolution model reveals an appropriate illustration for this scheme. Although the ―90 bp‖ repeating units have not yet been experimentally shown to function as enhancers, their major element (CTTT) n has the potential to form triple stranded structures, which could be involved in gene regulation. On the other hand, it is known, that a fraction of MARs might cohabit with transcriptional enhancers. Classical AT-rich MARs have been proposed to anchor enhancers‘ complexes with transcription factors to the nuclear matrix via the cooperative binding to MARs of abundant matrix proteins [68]. The MARs/SARs distribution in the mouse rIGS, where enhancers have been experimentally mapped, lets to suggest, that transcription enhancers are adjacent to the MARs/SARs complexes, but do not enter into them. So, we decided to compare MARs/SARs regions distribution between primates and the mouse, believing that enhancers should be distributed similarly in these taxons. We have scanned the human and the mouse rIGS pre-promoter region (~ 7 kb upstream of the promoter) searching for MARs/SARs elements with the help of MAR-WIZ Programm (figure 11). It can be suggested that the rIGS evolution in the primates‘ ancestor lineage involved divergence and elimination of enhancer repeats nearest to the promoter. It was attended by active divergence of poly-pyrimidinic clusters that resulted in a rise and propagation of new microsatellite motives with the subsequent switching of the enhancer functions into the poly-pyrimidinic region.
160
Figure 11 (Continued)
Natalia S. Kupriyanova and Alexei P. Ryskov
Common and Special Features of the Human Ribosomal DNA
161
Figure 11. Pattern of the MARs/SARs distribution in the rIGS pre-promoter regions (about 5 kb upstream of the promoter) for the primates and mouse.
Figure 12. A scheme of the human rDNA unit organization. The region of investigation is expanded. t – Start point of transcription. The core promoter, and the site of the universal control element binding are denoted by grey rectangles. The positions of (CCCT)n microsatellite clusters are shown by black rectangles.
162
Natalia S. Kupriyanova and Alexei P. Ryskov
Figure 13. Electrophoresis in 4% PAAG of the complexes obtained as the result of incubation of the HeLa nuclear protein extracts with [γ -32P] labeled oligonucleotides. The products of binding between the SP1 factor and control oligonucleotide (k); between HeLa extracts and double stranded (5‘-CCCT3‘)6 / (3‘- AGGG-5‘)6 (1) and single stranded (5‘-CCCT-3‘)6 (2) and (5‘-AGGG-3‘)6 (3).
Figure 14. A scheme of the human ribosomal DNA repeats. 18S, 28S and 5.8S rDNA regions are shown in a dark gray. The vertical arrows show EcoRI restriction sites. The curved arrows with ‗t‘ letters denote the transcription start points. The expanded region corresponds to the LR1-LR2 repeats (black rectangles). The variable regions are set off by more light color. The regions of interest are denoted as LR1var and LR2var.
Common and Special Features of the Human Ribosomal DNA
163
In the subsequent work, we have used 24- 40-mer oligonucleotides, corresponding to the major microsatellite motives from the pre-promoter region of the ribosomal DNA for searching of functionally important elements by the method of electrophoresis mobility shift assays (EMSA) with HeLa cells extract. The results obtained showed an absence of binding between double -stranded oligonucleotides and proteins from the extract, whilst complexes between single- stranded microsatellites and proteins have been detected. The protein binding with single- stranded oligonucleotide (AGGG)10 was identified by our colleagues from the Emmanuelle‘s Institute of Biochemical Physics, using the mass-spectrometry method as DNA-dependent protein kinase catalytic subunit (DNA-PKcs). DNA-dependent protein kinase (DNA-PK) comprises a catalytic subunit (DNA-PKs) and DNA-binding protein Ku that interacts with double- and single-stranded DNA and RNA. DNA-PK catalytic subunit can phosphorylate many transcription factors and among other factors strongly repress transcription by RNA polymerase I (Pol I) [69]. A presence of the 5‘-(CCCT) 9- 3‘ /3‘(GGGA) 9-5‘ clusters in the higher primates‘ rDNA pre-promoter region lets to propose that the DNA-PKcs can bind them independently from the Ku subunit (figure 15).
Figure 15. A frequency of (G) n and (AG) m components of the central compound microsatellite cluster with different monomer units. a - (G) n clusters; b - (AG) m clusters.
The central part of the human rIGS also contains a variable region formed by 2kb repeats LR. A number of the LR repeats is usually equal to two, but can vary sometimes from two to three [5]. Changes in the LR number lead to variability of the total rDNA lengths. LR1 and LR2 on the background of 88.8 % similarity harbor four short orthologous hypervariable segments enriched in microsatellite clusters (figure 16). In the previous work, it was shown that the LR1 segment between the nucleotides‘ positions 20, 916 and 21, 000 begins with (G)n and subsequently contains several (AG)n/(CT)n clusters [5]. Comparing individual clone sequences shows that this region has class-specific patterns [5].
164
Natalia S. Kupriyanova and Alexei P. Ryskov
Figure 16. A sum of the allele variants (A-H) detected in the region LR2var among 547 sequences taken from ten human genomes. The nucleotide sequences are shown without of taking into account ‗n‘ and ‗m‘ numbers in the (G) n (AG) m clusters. Variable nucleotides and microsatellite arrays a shown in a bold. All the nucleotides are numbered in the both direction from the central (G) n (AG) m clusters. The number of occurrences of each variant found is shown at the end of the corresponding raw. Asterisks in H1-H5 variants indicate a presence of substitutions in the poly-AG clusters, the positions and characters of which are shown to the right of the corresponding rows.
In recent previous work, 36 copies of the rIGS hypervariable segment LR2var, 2276323523 apart from the transcription start point, were cloned from the individual human genome and sequenced [70]. Comparative analysis showed an absence of absolutely identical primary structures among the 36 inserts . More recently, we have studied wide-scale heterogeneity of the 547 LR2var DNA segments with coordinates 22763-23523 apart from the transcription start point obtained from 10 unrelated individuals. A variability of the central (G) n (AG) m microsatellite cluster consists in random changes of the ‗―n‘‖ and ‗―m‘‖ numbers. The ‗―n‘‖ number varies between 4 and 17, and the ‗―m‘‖- between 13 and 30 with random combinations of the (G) n and (AG) m variants. The monomer units‘ numbers (G) 8-11 and (AG) 18-20 are the most abundant in the total representation. The 31 groups of 547 LR2var sequences are shown in figure 17. Nucleotide sequences flanking the central (G)n (AG) m cluster without considering ‗―n‘‖ and ‗―m‘‖ numbers are represented by the two major groups (A and B) with minor variants. The nucleotide sequences of the most abundant group A (82% of all the LR2svar) are practically identical to the GeneBank sequence (GeneBank, U13369).
Common and Special Features of the Human Ribosomal DNA
165
Figure 17. An alignment of the nucleotide sequences of the H1-H5 alleles obtained from LR2var with their counterpart from the LR1var. Homologies are shown by asterisks, and deletions – by hyphens. Homology of the H4 and H5 3‘ ends with the LR2 nucleotide sequence is shown in a bold.
The members of the groups B (13%) and C (3%) exhibit heterogeneity upstream and downstream of the central (G) n (AG) m cluster. The sequences of the B1-B14 and C1-C4 alleles reveal specific features differing them from the A variants. The five uncommon alleles (H1-H5) are depleted of the central (G) n (AG) m cluster, whilst the upstream (AG) 6-10 cluster characteristic for the alleles B, is extended up to (AG) 18-32, and often contains G->C substitutions. The 3‘- part of the H1-H5 sequences harbors base substitutions, deletions, and insertions. A comparison of the H1-H5 sequences with their counterparts from the LR1var reveals in them identical segments (figure 18). The reason is possibly that the two repeats, LP1 and LR2, can exchange by their DNA segments. Different mechanisms possibly promote variability generation in discrete LR2 var segments. The mechanism of microsatellite DNA slip-strands mispairing during replication is mostly consistent with a type of variability in (G) n (AG) m clusters. In studies of minisatellite variability, a convincing body of evidence hasve been accumulated that along with equal exchanging of parts at the cross over points, there is an unequal conversion of one allele by the other. In this model, DNA staggered nicks with the formation of protruding single-stranded ends can invade the allelic partner or sister chromatid [71, 72]. Most strand-invasion events are aborted after a limited extension of the broken single strands, perhaps as a result of mismatch repair systems action. This mechanism could readily account for highly complex, patchwork interallelic transfers [73-74].We infer that differences between nucleotide sequences flanking (G) n (AG) m in the A-H groups could also arise as the result of crossovers and patchwork interallelic transfers. An alignment of the H1H5 sequence variants with their counterparts from LR1var and LR2var lets to propose, that they have arisen as the result of gene conversion between the two LR repeats (figure 19). Rdna status may be inherited and linked to different physiological states of the human organism. The major variably region in the human rDNA was mapped by Southern blot analysis downstream of the initial transcription termination site as the result of the 3‘- end of the 28S rDNA probing [59]. Analysis of this region in 51 individual genomes revealed 8 eight structural variants, two of which were presented in all the genomes studied, while six variants were detected only in some ones in different combinations. Some structural variants were inherited as a total locus, according to Mendel, that possibly reflected their clustering on individual chromosomes. These types of variability were supposed to rise as the result of nonequal crossover between homologous repeats during meiosis [59]. On the other hand, similar genomic analysis of 100 persons belonging to different generations of one family has not revealed a presence of any recombination distinguished from usual meiotic segregation
166
Natalia S. Kupriyanova and Alexei P. Ryskov
[21]. According to other data, children can have sometimes more rDNA copies than their parents, suggesting that nonequal crossover really exist [29, 57]. rDNA methylation status influences on the rRNA transcription activity that, for an example, does not increase in cells with amplified rDNA, which is able to bind antibodies to 5-Me-C, however its demethylation under 5-aza-C action leads to an increase of the rRNA level [75, 76]. It was shown that methylation status of the CpG 145 b.p. upstream of the transcription start point in the rat‘s rIGS can serve as an indicator of the gene activity, while methylation of the CpG 133 b.p. upstream of the transcription start point in the mouse‘s rIGS prevents to binding of the universal transcription factor UBF crucial for the PolI transcription complex formation [77, 78]. It is interesting that CpG is also present in -145 and -135 positions in the human rIGS [3]. Differences in rDNA monomers‘ number and methylation status on ageing were shown for human‘s brain and heart, and a number of mouse tissues [79, 80]. An individual character of decreasing in the rRNA synthesis rate depending on a donor‘s age was detected in human fibroblasts by counting of Ag binding NORs [81]. Werner‘s syndrome (WRN) manifesting itself as premature ageing is caused by mutations in the specifical helicase locus. Werner‘s protein (WRNp) was detected in nucleolus of replicating mammalian cells, where its appearance was connected with transcriptional activity of the rRNA genes [82]. An increase in methylation level was considerably higher on ageing in a cell culture of patients with WRN than its level in control cells [83]. A treatment of a fibroblast cell culture from a patient with rheumatoid arthritis (RA) by an oxidative agent did not result in rRNA synthesis activation, whereas in control cells, rRNA transcription activity showed an increase of 50-80%. The contents of rDNA in blood serum DNA and in DNA from leukocytic nuclei both in healthy donors and in patients with rheumatoid arthritis were compared using dot hybridization method [84]. The transcribed region of rDNA (13.3 kb) contains more than 200 CpG-motifs capable of interacting with TLR9 receptors, which are the mediators of the cell immune response to the action of CpG-rich DNA fragments. The data suggest that DNA from dead cells circulating in the peripheral blood is enriched with sequences possessing potent immunostimulatory properties. An early apoptosis is also characteristic for cells from patients with RA [84]. A comparison of the rDNA copies number in individuals with schizophrenia (42) and healthy ones (33) revealed its higher level on schizophrenia of about 20%, whereas a content of the satellite III DNA and histone genes was practically equal in genomes of all persons. It was shown cytogenetically that a content of active rRNA genes in the genomes of people with schizophrenia was higher than its content in genomes of healthy people (Ag-NORs staining) [85]. An extent of acrocentric (NOR)+ chromosomes association is sometimes using for prognostic and diagnostic purposes in acute deceases, such as immunodeficiency and tumor development, considering that this parameter along with a silver staining test reflects their functional state [86, 87]. Both human and animal malignant cells with structurally abnormal chromosomes often show variation in both the number and location of NORs. Rearrangement and possible amplification of the rRNA gene sites in the human chronic myelogenous leukemia cell line K562 was detected [88]. Karyotypic dissection of Hodgkin‘s disease cell lines (HDLM-1/2/3) revealed ectopic subtelomeres and ribosomal DNA at sites of multiple jumping translocations and genomic amplification [89]. Nascent pre-rRNA overexpressionover expression correlates with an adverse prognosis in alveolar rhabdomyosarcoma [90]. These data and a number of
Common and Special Features of the Human Ribosomal DNA
167
other ones imply that a question about connection between variability and transcriptional activity of the rDNA and a human‘s organism state is far from resolving, and further studies are demanded.
CONCLUSION In spite of crucial importance of an adequate action of the protein synthesizing system for cells and organism as a whole, investigations in this area are far from completion. It concerns information about links between a structural-functional organization, and polymorphism of the rRNA gene regulatory region and trans-factors participating in the gene activity modulation. In actively growing cells, RNA polymeraseI driven rRNA synthesis accounts for 50% of the total cellular RNA production, while a dynamics of the rRNA synthesis activity and mechanisms of polysomes reprogramming in human ontogenesis stay practically unknown. At the same time, the data are accumulated concerning a poly functional role of ribosomal proteins, an important role of nucleolin in the pre-rRNA maturation and traffics. Recent studies have suggested that the nucleolus is involved (possibly in cooperation with the rDNA) in other important functions: in the growth and cell cycle control, tumorigenesis, aging and so on. The data discussed here let to think that the rDNA and ribosomes biogenesis study calls for further investigation and can bring unexpected and important results.
REFERENCES [1] [2] [3]
[4] [5]
[6]
[7] [8]
Dipayan R.,Warner J.R., What better measure than ribosome synthesis? 2004, Genes & Development., v. 18, pp. 2431-2436. Jacob S.T., Regulation of ribosomal gene transcription. 1995, Biochem. J., v.306, pp. 617-626. Gonzalez I.L., Sylvester J.E., Complete sequence analysis of the 43-kb human ribosomal DNA repeat: analysis of the intergenic spacer. 1995, Genomics, v. 27, pp. 431-437. Gonzalez I.L., Sylvester J.E., Beyond ribosomal DNA: on towards the telomere. Chromosoma, 1997, v. 105, pp. 431-437. Gonzalez I.L., Sylvester J.E., Human rDNA: evolutionary patterns within the genes and tandem arrays derived from multiple chromosomes. Genomics., 2001, v. 27, pp. 255263. Gonzalez, I.L., Petersen, R., and Sylvester, J.E., Independent insertion of Alu elements in the human ribosomal spacer and their concerted evolution. Mol. Biol. Evol, 1989, v. 6, pp. 413-423. Grummt I., Life on a planet of its own: regulation of RNA polymerase transcription in the nucleolus. 2003, Genes & Development,. v. 17, pp. 27-35. Grummt I., Regulation of mammalian ribosomal gene transcription by RNA polymerase I. Progr. Nucleic Acid Res., 1999. v. 62, pp. 109-154.
168 [9]
[10]
[11] [12] [13]
[14]
[15]
[16] [17]
[18] [19] [20]
[21] [22]
[23]
[24]
[25]
Natalia S. Kupriyanova and Alexei P. Ryskov Langst G., Schatz T., Langowsky J., Grummt I., Structural analysis of mouse rDNA: Coincidence between nuclease hypersensitive sites, DNA curvative and regulatory elements in the intergenic spacer. Nucleic Acids Res., 1997. v. 25, pp. 511-517. Mayer C., Bierhoff H., Grummt I., The nucleolus as a stress sensor: JNK2 inactivates the transcription factor TIF-IA and down-regulates rRNA synthesis. Genes & Development., 2005. v. 19, pp 933-941. Nosikov V.V., Braga E.A., Structural organization of eukaryotic ribosomal genes. Itogi Nauki Tekhn:Mol. Biol., Moscow,VINITI,. 1982, pp. 110-125. Kupriyanova N.S., Conservation and variation of ribosomal DNA in eukaryotes. Mol. Biol.(Moscow), 2000, v. 34, pp.753-765. Scheer U., Trendelenburg M.F., Krohne G., Franke W.W., Lengths and patterns of transcriptional units in the amplified nucleoli of oocytes of Xenopus laevis. Chromosoma, 1977, v. 60, pp. 147-167. Kupriyanova, N., Popenko, V., Eisner, G., Vengerov Y., Timofeeva M., Tikhonenko A., Skryabin K., Bayev A., Organization of loach ribosomal genes (Misgurnus fossilis L.), Mol. Biol. Rep., 1982, v. 8, pp. 143-148. Srivastava A.K., Harino Y., Schlessinger D., Ribosomal DNA clusters in pulsed-field gel electrophorrsis analysis of human acrocentric chromosomes. Mammal Genome, 1993, v. 4, pp.445-450. Long E.O., Dawid I.B., Repeated genes in eukaryotes. Annu. Rev. Biochem., 1980. v. 49, pp.727-764. Veiko N.N., Lyapunova N.A., Bogush A.V., Tsvetkova T.G., Gromova E.V., Detection of the rRNA genes number in individual human genomes. Mol. Biol.(Moscow)., 1996, v. 30, pp. 1076-1086. Kaplan, F.S., Murray, J., Sylvester, J.E., et al., The topographic organization of repetitive DNA in the human nucleolus. Genomics, 1993, v. 15, pp. 123-132. Moyzis R.K., Torney D.C., Meyne J. et al., The distribution of interspersed repetitive DNA sequences in the human genome. Genomics, 1989, v. 4, pp. 273-289. Gonzalez I.L., Sylvester J.E., Smith T.F., Stambolian D., Schmickel R.D., Ribosomal rRNA gene sequences and hominoid phylogeny. Mol. Biol. Evol., 1990, v. 7, pp. 203219. Ranzani G.N., Bernini LJF., Crippa M., Inheritance of rDNA spacer lengths variants in men. Mol. Gen.Genet., 1984, v.196, pp. 141-145. Nanda I., Zischler H., Epplen C., Gutlenbach M., Schmid M., Chromosomal organization of simple repeated DNA sequences used for DNA fingerprinting. Electrophoresis, 1991, v. 12, pp. 193-203. Braga E.A., Kapanadze B.I., Kupriynova N.S., Brodyansky V.M., Netchvolodov K.K., Shkutov G.A., Ryskov A.P., Nosikov V.V., Yankovsky N.K,. Analysis of the distribution of microsatellites of seven motiffs within a cosmid of an ordered human chromosome 13 library.Mol. Biol. (Moscow), 1995, v. 29, pp. 1001-1010. Ryskov A.P., Kupriynova N.S., Kapanadze B.I., Netchvolodov K.K., Pozmogova G.E., Prosnyak M.I., Yankovsky N.K., Frequency of various mini- and micro-satellite sequences in DNA of human chromosome 13. Genetika. (Moscow), 1993, v. 29, pp.1750-1754. Kupriyanova N.S., Netchvolodov K.K., Ryskov A.P., Microsatellite (ttgc)n, specific for the intergenic spacer of human and chimpanzee rDNA: use for studying the structural
Common and Special Features of the Human Ribosomal DNA
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
169
variations of the prepromoter region of rDNA. Mol. Biol. (Moscow), 1999, v. 33, pp.314-318. Netchvolodov K.K., Boiko A.V., Ryskov A.P., Kupriynova N.S., Evolutionary divergence of the pre-promotor region of ribosomal DNA in the great apes. DNA seq., 2006, v.17, pp. 378-91. Akhunov E.D., Chemeris A.V., Kulikov A.M., Vakhitov V.A., Functional analysis of diploid wheat promoter by transient expression. Biochim. Biophys. Acta, 2001, v. 1522, pp. 226-229. Arnheim N., Krystal M., Wilson G., Ryder O., Zimmer E., Molecular evidences for exchanges among ribosomal genes on non homologous chromosomes in man and apes. Proc. Natl. Acad. Sci., 1980, v. 77, pp. 7323-7327. Kuick R., Asakawa J.-i., Neel J.V., Kodaira M., Saton C., Thoraval D., Gonzalez I.L.,Hanash S.M., Studies of the inheritance of human ribosomal DNA variants detected in two-dimentional separations of genomic restriction fragments. Genetics, 1990, v. 144, pp. 307-316. Sedman Y.E., Shostak N.G., Kupriynova N.S., Serenkova T.I., Fengelghauer P.E., Gimalov F., Lind A.Y., Timofeeva M.Y., Iintragenomic polymorphism of the 5S rRNA primary structure in a loach (Misgurnus fossilis L.). A transcriptional activity determination. Mol. Biol. (Moscow), 1989, v.23, pp. 1295-1307. Kuo B.A., Gonzalez I.L., Gillespie D.A., Sylvester J.A., Human ribosomal RNA variants from a single individual and their expression in different tissues. Nucleic Acids Res., 1996, v. 24, pp. 4817-4824. Kupriyanova N.S., Kirilenko P.M., Netchvolodov K.K., Ryskov A.P., Preferential cleavage sites for Sau3A restriction endonuclease in human ribosomal DNA. Biochem. Biophys. Res. Com., 2001, v. 272, pp. 11-15. Nedospasov S.A., Georgiev G.P., Non-random cleavage of SV40 DNA in the compact minichromosome and free in solution by micrococcal nuclease. Biochem. Biophys. Res. Comm., 1980, v. 92, 532-539. Lehrman M.A., Russel D.W., Goldstein J.L., Brown M.S., Alu-Alu recombination deletes splice acceptor sites and produces secreted low density lipoprotein receptor in a subject with familial hypercholesterolemia. J. Biol. Chem., 1987, v.262, pp. 3354-3361. Saikawa Y., Kaneda H., Yue L., Shimura S., Toma T., Kasahara Y., Yachie A., Koizumi S., Structural evidence of genomic exon-deletion mediated by Alu-Alu recombination in a human case with heme oxygenase-1 deficiency. Hum. Mutat., 2000, v. 16, pp. 178-179. Helisalmi S., Hiltunen M., Vepsalainen S., Iivonen S., Mannermaa A., Lehtovirta M., Koivisto A.M., Alafuzoff I., Soininen H., Polymorphisms in neprilysin gene affects the risk of Alzheimer's disease in Finnish patients. J. Neurol. Neurosurg. Psychiatry, 2004, v. 75, pp. 1746-1748. Shibalev D.V., Voronov A.S., Bashkirov V.N., Kupriyanova N.S., Ryskov A.P., Recombinant products detection on PCR amplification of DNA containing Alu repeats. Docl. Acad. Nauk (Moscow),. 2003, v. 388, pp. 689-693. Kupriyanova N.S., Shibalev D.V., Voronov A.S., Ryskov A.P., PCR- generated artificial ribosomal DNAs from premature termination at Alu sequences. Biomol. Engineering, 2004, v.21, pp. 21-25.
170
Natalia S. Kupriyanova and Alexei P. Ryskov
[39] Childs G., Maxon R., Cohn R.H., Kedes L.M., Orphons: dispersed genetic elements derived from tandem repetitive genes of eukaryotes. Cell, 1981, v. 23(3), pp. 651-663. [40] Lohe A.R. and Roberts P.A., An unusual Y chromosome of Drosophila simulans carrying amplified rDNA spacer without rRNA genes. Genetics, 1990, v.125, pp. 399406. [41] Kominami R. and Muramatsu M., Amplified ribosomal spacer sequence: structure and evolutionary origin. J. Mol. Biol., 1987, v.193, pp. 217-222. [42] Brownell E., Krystal M., Arnheim N., Structure and evolution of human and African ape rDNA pseudogenes. Mol. Biol. Evol., 1983, v.1, pp. 29-37. [43] Mashkova T.D., Tyumeneva I.G., Zinovieva O.L., Romanova L.Y., Jabbs E., Alexandrov I.A., Pericentromeric alpha-satellite DNA in human chromosome 21 bordering with euchromatin DNA. Mol. Biol. (Moscow), 1996, v. 30, pp. 1044-1054. [44] Burdon M.R., Leader D.P., Characterization of a human orphon 28S ribosomal DNA. Gene, 1986, v. 48, pp. 65-70. [45] Sakai K., Ohta T., Minoshima S., Kudof J., Wang Y., Jong P.J., Shimizu N., Human ribosomal RNA gene cluster: identification of the proximal end containing a novel tandem repeat sequence. Genomics, 1995, v. 25, pp. 521-526. [46] Benevolenskaya E.V., Kogan G.L., Tulin A.V., Phillipp D., Gvozdev V.A., Segmented gene conversion as a mechanism of correction of 18S rRNA pseudogene located outside of rDNA cluster in D. melanogaster. J. Mol. Evol., 1997, v. 44, pp. 646- 651. [47] Kirilenko P.M., Kupriyanova N.S., Ryskov A.P., Detection and characteristics of prolonged deletions in the ribosomal DNA cosmid clones from the human chromosome 13. Docl. Acad. Nauk. (Moscow), 2000, v. 371, pp. 60-62. [48] Worton R.G., Sutherland J., Sylvester J.E., Willard H.F., Bodrug S., Dube I., Duff C., Kean V., Ray P., Schmickel R.D., Human ribosomal RNA genes: Orientation of the tandem array and conservation of the 5‘ end. Science., 1988, v. 239, pp. 64-68. [49] Bailey J.A., Yavor A.M., Viggiano L., Musceo D., Horvath J.E., Archidiacono N., Schwartz S., Rocchi M., Eichler E.E., Human-specific duplication and mosaic transcripts: the recent paralogous structure of chromosome 22. Am. J. Hum. Genet., 2002, v.70, pp. 83-100. [50] Melford H.C., Trask B.J., The complex structure and dynamic evolution of human subtelomers. Nat. Rev. Genet., 2002, v. 3, pp. 91-102. [51] Horvath J.E., Bailey J.A., Locke D.P., Eischler E.E., Lessons from the human genome: transitions between euchromatin and heterohromatin. Hum. Mol. Genet., 2001, v.10, pp. 2215-2223. [52] Ambrosini A., Paul S., Hu S., Riethman H., Human subtelomeric duplicon structure and organization. Genome biology, 2007, v. 8:R151 (doi: 10.1186/gb-2007-8-7-r151). [53] Kupriyanova N.S., Shibalev D.V., Voronov A.S., Muravenko O.V., Zelenin A.V., Ryskov A.P., Segment duplications in subtelomeric regions of human chromosome 13. Mol. Biol. (Moscow)., 2003, v.17, pp. 221-227. [54] Durkop H., Oberbarnscheidt M., Latza V., Bulfone-Paus S., Krause H., Pohl T., Stein H., Structure of the Hodgkin‘s lymphoma-associated human CD30 gene and the influence of a microsatellite region on its expression in CD30(+) cell lines. Biochem. Biophys. Res. Com., 2001, v.1519, pp. 185-191.
Common and Special Features of the Human Ribosomal DNA
171
[55] Krystal M., D‘Eustachio, Ruddle F.H., Arnheim N., Human nucleolus organizers on homologous chromosomes can share the same ribosomal gene variants. Proc. Natl. Acad. Sci., 1981, v. 78, pp. 5744-5748. [56] Naylor S.L., Sakaguchi A.Y., Schmickel R.D., Woodworth-Gutal M., Shows T.B., Organization of rDNA spacer fragment variants among human acrocentric chromosomes in somatic cell hybrids. J. Mol. Appl. Genet., 1983, v. 2, pp. 137-146. [57] Schmickel R.D., Gonzalez I.L., Erickson J.M., Nucleolus organizing genes on chromosome 21: recombination and nondisjunction. Ann. N. Y. Acad. Sci., 1985, v. 450, pp. 121-131. [58] Seperack P., Slatkin M., Arnheim N., Linkage disequilibrium in human ribosomal genes: Implications of multigene family evolution. Genetics, 1988, v.119, pp. 943-949. [59] Garkavtsev I.V., Tsvetkova T.G., Yegolina N.A., Gudkov A.V., Variability of human rRNA genes inheritance and nonrandom chromosomal distribution of structural variants of nontranscribed spacer sequences, Hum. Genet.,1988, v. 81, pp.31-37. [60] Sylvester J.E., Gonzalez I.L., Mougey E.B., Structure and organization of vertebrate ribosomal DNA. The nucleolus, Mark Olson ed., 2003 Eurkah.com. [61] Cassidy B.G., Yang-Yen H.F., Rothblum L.I., Transcriptional role for the nontranscribed spacer of rat ribosomal DNA. Mol. Cell. Biol., 1986, v. 6, pp. 27662773. [62] Grummt I., Kuhn A., Bartsch I., Rosenbauer H. A., transcription terminator located upstream of the mouse rDNA initiation site affects rRNA synthesis. Cell, 1986, v.47, pp. 901-911. [63] Moss T., Boseley P.G., Birnstiel M.L., More ribosomal spacer sequences from Xenopus laevis. Nucleic Acids Res., 1980, v. 8, pp. 467-485. [64] Grozdanov P.N., Georgiev O.I., Karagyozov L.K., Complete sequence of the 45-kb mouse ribosomal DNA repeat unit: Analysis of the intergenic spacer. Genomics, 2003, v. 82, pp. 637-643. [65] Dover G., How genomic and developmental dynamics affect evolutionary processes. Bioessays, 2000, v.22, pp. 1153-1159. [66] Cox R., Mirkin S.M., Characteristic enrichment of DNA repeats in different genomes. Proc. Nat.l Acad. Sci., 1997, v. 94, pp. 5237-5242. [67] Borstnik B., Pumpernik D., Mutational dynamics of short tandem repeats in human genome. Europhys. Let., 2004, v. 65, pp. 290-296. [68] Boulikas T., Homeodomain protein binding sites, inverted repeats, and nuclear matrix attachment regions along the human beta-globin gene complex. Cell Biochem., 1993, v. 52, pp. 23-36. [69] Kuhn A., Gottlieb T.M., Jackson S.P., Grummt I., DNA-dependent protein kinase: a potent inhibitor of transcription by RNA polymerase I. Genes & Dev., 1995, v. 9, pp. 193-203. [70] Shibalev D.V., Voronov A.S., Firsov S.Y., Ryskov A.P., Kupriyanova N.S., Detection of intragenomic polymorphism in the LR2 region of human intergenic ribosomal spacer. Mol. Biol. (Moscow), 2004, v. 38, pp. 980-984. [71] Tamaki K., May C.A., Dubrova J.E., Jeffreys A.J., Extremely complex repeat shuffling during germline mutation at human minisatellite B6.7. Hum. Mol. Genet., 1999, v 8, pp. 879-88.
172
Natalia S. Kupriyanova and Alexei P. Ryskov
[72] Paques F., Haber J.E., Multiple pathways of double strand break-induced recombination in Saccharomyces cerevisiae. Microbiol. Mol. Biol. Rev., 1999, v. 63, pp. 349-404. [73] Buard J., Vergnaud G., Complex recombination events at the hypermutable minisatellite CEB1 (D2S90). EMBO J., 1994, v. 13, pp. 3203-3210. [74] Lam K.W., Jeffreys A.J., Processes of copy-number change in human DNA: the dynamics of {alpha}-globin gene deletion. Proc. Natl. Acad. Sci., 2006, v.103, pp. 8921-8927. [75] Ferraro M., Lavia P., Activation of human ribosomal RNA genes by 5-azacytidine. Exptl. Cell Res., 1983, v. 145, pp. 452-457. [76] Giancotti P., Grappelli C., Pogges I., Persistence of increased levels of ribosomal gene activity in CHO-K1 cells treated in vitro with demethylating agents. Mutat. Res., 1995, v. 348, pp. 187-192. [77] Stancheva I., Lucchini R., Coller T., Chromatin structure and methylation of rat rRNA genes studied by formaldehyde fixation and psoralen cross-linking. Nucleic Acids Res., 1997, v. 25, pp. 1727-1735. [78] Santoro R., Grummt I., Molecular mechanisms mediating methylation-dependent silencing of ribosomal gene transcription. Mol. Cell, 2001, v. 8, pp. 719-725. [79] Johnson R.V., Strehler B.L., Loss of genes coding for ribosomal RNA in ageing brain cells. Nature, 1972, v. 240(5381), pp. 412-414. [80] Johnson L.K., Johnson R.V., Strehler B.L., Cardiac hypertrophy, aging and changes in cardial ribosomal gene dosage in man. J. Mol. Cell Cardiol., 1975, v. 7(2), 125-133. [81] Thomas S., Mukherjee A.B., A longitudinal study of human age-related ribosomal RNA gene activity as detected by silver-stained NORs. Mech. Ageing Dev., 1996, v.92, pp. 101-109. [82] Indig F.E., Partridge J.J., von Kobbe C., Aladjem M.I., Latterich M., Bohr V.A., Werner syndrome protein directly binds to the AAA ATPase p97\VCP in an ATPdependent fashion. J. Struct. Biol., 2004, v.146, pp. 251-259. [83] Machwe A., Orren D.K., Bohr V.A., Accelerated methylation of ribosomal RNA genes during the cellular senescence of Werner syndrome fibroblasts. FASEB J., 2000, v.14, pp. 1715-1724. [84] Veiko N.N., Shubaeva N.O., Ivanova S.M., Lyapunova N.A., Spitkovsky D.M., Blood serum DNA in patients with rheumatoid arthritis is considerably enriched with fragments of ribosomal repeats containing immunostimulatory CpG-motifs. Bull. Exp. Biol. Med., 2006, v.142, pp. 313-316. [85] Veiko N.N., Yegolina N.A., Radzwill G.G., Nurbaev S.D., Kosyakova N.V., Shubaeva N.O., Lyapunova N.A., Quantitative analysis of repetitive sequences in human genomic DNA and detection of an elevated ribosomal repeat copy number in patients with schizophrenia (the results of molecular and cytogenetic analysis. Mol. Biol. (Moscow), 2003, v. 379, pp. 409-419. [86] Grabovskaya I.L., Glukhova L.A., Tsvetkova T.G., Kravetz I.A., Mamaeva S.E., Kutsch A.A., The use of DNA hybridization in situ for identifying chromosomal rearrangements in the karyotyping of cell lines. Mol. Biol. (Moscow), 1992. v. 34, pp. 41-46. [87] Pedrazzini E., Slavutsky I.R., Ag-NOR staining and satellite association in bone marrow cells from patients with mycosis fungoides. Hereditas, 1991. v. 123, pp. 9-15.
Common and Special Features of the Human Ribosomal DNA
173
[88] Crossen P., Godwin J., Rearrangement and possible amplification of the ribosomal RNA gene sites in the human chronic myelogenous leukemia cell line K562. Cancer Genet. Cytogenet., 1985, v.18, pp. 27-30. [89] MacLeod R.A., Spitzer D., Sylvester J.E., Kaufman M., Wernich A., Drexler H.G., Karyotypic dissection of Hodgkin's disease cell lines reveals ectopic subtelomeres and ribosomal DNA at sites of multiple jumping translocations and genomic amplification. Leukemia, 2000, v.14, pp. 1803-1814. [90] Williamson D., Lu Y-J., Fang C., Pritchard-Jones K., Shipley J., Nascent pre-rRNA overexpressionover expression correlates with an adverse prognosis in alveolar rhabdomyosarcoma. Genes, Chromosomes, and Cancer, 2006, v. 45, pp. 839-845.
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev and G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 6
ETHNIC GENOMICS OF THE EAST EUROPEAN HUMAN POPULATIONS S. A. Limborska*, D. A. Verbenko, A.V. Khrunin and P.A. Slominsky Institute of Molecular Genetics, Russian Academy of Sciences, Moscow, Russia
ABSTRACT We present the results of studies on ethnic genetics conducted at the Department of Molecular Basis of Human Genetics, Institute of Molecular Genetics, in the Russian Academy of Sciences. Many East European populations were studied for a number of DNA polymorphic markers. Detailed population characteristics of markers for the genes encoding chemokine (C-C motif) receptor type 5 (CCR5), myotonic dystrophy (DM), apolipoprotein B (APOB), tumor suppressor p53 (p53), as well as mitochondrial DNA and Y-chromosome polymorphisms, are discussed. Particular distinctions and general trends of variability in the gene pool of the populations studied are shown, providing new data on the complicated nature of the interactions and mutual influences of the wide variety of ethnic groups inhabiting this territory.
INTRODUCTION The years around the turn of this century were marked by impetuous progress in the field of human molecular genetics, initially arising from studies of human genome sequencing conducted within the framework of international and national '―Human Genome'‖ programs. These studies resulted not only in the accumulation of tremendous amounts of information on human DNA structure but also in the development of new efficient DNA typing technologies, the construction and storage of information databases and the development of methods for processing large volumes of results. Based on these advanced studies, a new area of *
Tel: +74991961858; Fax: +74991960221. E-mail:
[email protected]
176
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
research—genomics—has emerged, which has revolutionized modern biology. This discipline now allows the disclosure of many features of genome organization, comparisons of different organisms‘ genomes, the discovery of new genes and genetic elements and the detection of mutations arising in the course of numerous inherited diseases, including some previously unknown types. The elaboration of so many problems resulted in the considerable broadening of areas of interest in molecular genetics as well as in the application of its methods and approaches in both adjacent and rather remote directions of scientific research. These include medical genetics, pharmacology, comparative biology, forensic medicine and biotechnology, as well as anthropology, archaeology and human history. In this connection, specialized branches started developing within the frame of genomics: functional genomics, comparative genomics, medical genomics, computerized genomics and, finally, ethnic genomics (which we refer to here as ‗―ethnogenomics‘‖), whose problems are the subject of this article. The main goal of ethnogenomics is to investigate genomic diversity in the gene pool of separate populations, ethnic communities and ethnic territorial entities [1]. The investigation of extant human population genomes makes it possible to acquire evidence about the most remote historic events, even as far away as the moment of the origin of our species, and this is one of the most intriguing areas of ethnogenomics. To ‗―read‘‖ this evidence, analysis of genomic markers of numerous human entities must be carried out and the degree of their genetic relationships evaluated. Various studies have revealed a fundamental feature of the human genome: namely, its variability— - polymorphism. This feature can be revealed only when the genomes of different individuals are compared and the differences between them brought to light. Even the first studies showed that all humans are very similar as far as the main principles of their genomic organization are concerned, yet many loci have been found allowing us to distinguish one person‘s genome from another‘s with ease. This finding provides the basis for the identification of a person by DNA testing and for establishing familial relationships (for example, paternity or maternity). Genome polymorphism generally denotes neutral genomic variations in different people. Neutral, or ‗―silent‘‖, variations are those that do not reveal themselves phenotypically and do not affect the individual‘s health. The large numbers of polymorphic markers discovered during human genome sequencing provide a powerful tool for the analysis of the gene pool in terms of its dynamics, history and geography. This allows us to generate new evidence about the gene pools of various regions studied and to establish new approaches to the study of basic microevolution trends and the formation of the modern human gene pool. Several types of polymorphism have been distinguished in the human genome, including single nucleotide substitutions, insertion–deletion polymorphisms and polymorphic mini- and microsatellites. However, detailed characterization of individual markers and groups of markers is needed, so that they will be able to serve specific purposes in the future and so that problems involving various temporal and spatial parameters can be solved. The first type of polymorphism—single nucleotide substitutions (single nucleotide polymorphisms, SNPs)—is the most frequent in the genome. Typically, there is one substitution per 300-1000 nucleotides and these are ‗―neutral‘‖ substitutions not affecting health [2, 3]. Approximately six million SNPs have already been identified in the genome and even though this amount appears enormous, it covers only about 0.2% of the whole genome of three billion nucleotides. It should be noted that this type of polymorphism has very low
Ethnic Genomics of the East European Human Populations
177
mutation rate (about 2 10–8), indicating that one base substitution may occur very rarely, for instance, once in the course of several thousand years. The lower the mutation rate of any polymorphism, the more distant the historic events it can be used to mark. As shown below, this type of polymorphism finds application in cases that require elucidation of events that took place at very remote times. There are other types of polymorphism in the genome, for instance, the ‗―hypervariable‘‖ regions. Numerous regions of the genome appear to contain tandem repeats, in which one small nucleotide sequence can be repeated several times in an end-to-end fashion. For example, the gene for myoglobin bears a 33-nucleotide sequence that is repeated four times [4]. The same genomic position might contain 10 such tandem repeats in one person or 15 repeats in another. With such large individual diversity, the informative ability of these markers is very high . It should be noted that such hypervariable region differences arise much more frequently (several thousand times more often) than do SNPs. As will be shown later, investigation of this type of polymorphism allows one to test for comparatively recent events. From the point of view of population studies, all DNA markers can be subdivided into three groups: mitochondrial DNA (mtDNA) markers, autosomal markers and Y-chromosome markers. Polymorphisms among these markers arise from microevolutionary factors (migration, selection, genetic drift and mutations). However, their modes of variability reflect differently the actions and results of these processes. Mitochondrial DNA polymorphism has long been used in population studies because it is relatively simple to isolate. The major features of these polymorphisms are the absence of recombination, a high level of variability and strict maternal inheritance. Y-chromosome polymorphisms are complementary to mtDNA polymorphisms as they show paternal inheritance and typical absence of recombination (with the exception of the pseudoautosomal region). In practical terms, these two types of polymorphism supplement each other by supplying separate evidence about the paternal and maternal contributions to the evolution of populations. This phenomenon offers hitherto unknown opportunities for population studies: namely, the possibility of tracing and comparing the histories of the paternal and maternal lineages of populations and of evaluating their relative contribution into each population‘s gene pool. Passed from generation to generation through only one parental line and taking no part in recombination, they allow genetic events to be rebuilt, theoretically, starting from the hypothetical ancestors of modern humans—the ‗―Y-chromosomal Adam‘‖ and ‗―mitochondrial Eve‘‖—and proceeding to contemporary populations. Nuclear autosomal DNA markers characterize the whole of the human genome and do not focus on the particular genetic contribution of either sex. As many researchers believe, the study of distinct types of nuclear polymorphisms makes it possible to assess many temporal events that happened in the history of a population. At present, DNA polymorphisms are being explored among many human populations of the world. Such studies allow to revealrevealing considerable intra- and intergroup differences for frequencies of polymorphic DNA fragments across many geographic regions, and they have become one of the most important characteristic of the genetic structure of human populations.
178
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
MITOCHONDRIAL DNA POLYMORPHISMS Mitochondrial DNA polymorphisms were among the first used for studying human populations. It should be mentioned here that almost every cell of our organism contains two genomes: the nuclear genome with our essential characteristics encoded and another genome located outside the nucleus in the mitochondria, whose principal role is to provide the cell with energy. Every cell bears between several dozen and several thousand mitochondria, and the genomes of all mitochondria originating from each organism are similar. The mitochondrial genome is very small (16, 569 nucleotides) and carries only 37 genes encoding the proteins and RNAs needed for the functioning of the organelle. It displays a very high level of polymorphism, as mutations accumulate in it substantionally faster than in the nuclear genome. Inheritance of the human mitochondrial genome is maternal, and its analysis therefore supplies evidence about the genetic history of the maternal lineage. Linkage disequilibrium between polymorphisms in mtDNA makes it possible to regard mtDNA as a united locus represented by a majority of alleles: haplotypes whose definite groups correspond to the linkage groups of definite mutations [5]. This particular feature of mtDNA molecules is very useful for molecular studies of evolution, as the mitochondrial gene pool includes numerous combinations that allow the temporal variability of mtDNA molecules to be traced and molecular changes imparted by the evolution of populations to be classified. The geographic region of our interest, Northern Eurasia including the East European Plain, is presently inhabited by peoples listed as being of European and Asian origins as well as by those who combine both components. The mtDNA haplotype sets (‗―mitotypes‘‖) of European and Asian groups differ considerably. Moreover, Asian groups are heterogeneous with several haplotype variants whereas the European groups are less heterogeneous. We studied the mtDNA of three East European populations [6], comprising Eastern Slavs, one Byelorussian population and two Russian populations. Our results showed these populations bear quite a number of different mitotype variations, the most frequent being the so-called haplogroup H typical of most European peoples. The frequency of this haplogroup in mixed populations has helped us to evaluate the European contribution within each maternal lineage.
Y-CHROMOSOME DNA POLYMORPHISMS The human genome also carries a system of markers that allow the evaluation of the male lineage‘s genetic contribution to ethnic history. The Y chromosome is found only in the male genome; it passes from father to son and retains the same genetic material and the same combinations of polymorphous markers. Thus, the structure is very stable in time, although it undergoes changes caused by spontaneous mutations. Investigation of the polymorphism of Y-chromosome markers in Europeans has pointed to their ancient origin. The study by Semino et al. [7], “The Genetic Legacy of Paleolithic Homo sapiens in Extant Europeans: a Y Chromosome Perspective‖, was conducted by a large international team of researchers from two American and several European laboratories, including ours. More than 1,000 men originating from 25 different regions of Europe and the Near East were examined. Analysis of 22 binary markers in the Y chromosome showed that
Ethnic Genomics of the East European Human Populations
179
more than 95% of the samples studied could be restricted to 10 haplotypes or historic pedigrees, with two of them, at that time denoted by Eu18 and Eu19, emerging in Europe during the Paleolithic. More than 50% of the European males studied belong to these ancient haplotypes. Both are related, the only difference being one single point substitution (mutation M17). However, their geographic distribution evolved in opposite directions. Eu18, most common among the Basques, diminishes in frequency from west to east. The age of this haplotype is estimated to be 30, 000 years—thus this is likely to be the most ancient pedigree in Europe starting in the High Paleolithic among a population that inhabited the region of the Iberian Peninsula. The related Y-chromosomal haplotype Eu19 has been distributed differently in European populations. It is not found in Western Europe, and its frequency grows eastward to reach its maximum in Poland, Hungary and the Ukraine, where Eu18 is practically absent. Moreover, the Ukraine can boast the largest diversity of microsatellite markers apart from haplotype Eu19. These combined data allow the assumption that the expansion of this historic pedigree started from this very region. The distribution data for the two main European haplotypes suggest the following scenario. During the Last Glacial Maximum, people who occupied the northeastern and central parts of Europe were forced to migrate westward and southward. Some of them settled down in the Franco–Cantabrian refuge area while the others found refuge in the Balkans. Consequently, people survived in these two distantly isolated regions. Some other data, including those for other DNA markers, support this restored pattern. After the glacial retreat, the second inhabitation of Europe took place, with the Franco–Cantabrian and Balkan refuges being the main sources. Most of the other Y-chromosomal haplotypes are distributed geographically, an indication of their origin from the Near East. However, two of them, Eu7 and Eu8, also emerged in Europe during the Paleolithic, and they probably mark historic events connected with the spreading in Europe of Near Eastern populations in a period before the Last Glacial Maximum. All other Y-chromosomal haplotypes emerged in Europe later. During the Neolithic, there was an expansion of a number of haplotypes from the Near Eastern Region, possibly associated with the expansion of agriculture. Interestingly, a new variant of the Y chromosome was discovered in the course of this study: mutation M178, found only in the northeastern parts of Europe. This haplotype was estimated as being no more than 4,000 years in age, and its distribution might reflect a comparatively recent migration of populations from the Urals. In this way, this study showed that only a little more than 20% of European males belong to those historic lineages that appeared in Europe comparatively recently in the Neolithic, following the Last Glacial Maximum. About 80% of European males belong to ancient lineages that can be traced back to the time of the High Paleolithic. In other words, 80% of the current European male gene pool has Paleolithic and 20% has Neolithic ancestry. Subsequent studies conducted by other authors have confirmed particular details of these results [8, 9]. The tandemly organized hypervariable Y chromosome regions are appropriate in cases where comparatively recent events—1,000 to 2,000 years ago—are of interest. For example, we studied three groups of Eastern Slavs (the Kiev, Novgorod and Pinsk populations). The study was performed in cooperation with colleagues from the Ukraine and the Belarus [10, 11]. Because the divergence of Eastern Slavs is relatively recent, hypervariable regions of the Y chromosome were selected for investigation. Five polymorphic markers (DYS393, DYS392,
180
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
DYS391, DYS390 and DYS19) were analyzed and their combinations (haplotypes) determined. Fourteen haplotypes were discovered, with one 13/11/10/25/16, denoted as No. 1, being the most frequent and another, 13/11/11/24/16 (No. 2), the second (for loci DYS393/DYS392/DYS391/DYS390/DYS19, respectively). Haplotypes No. 1 and No. 2 appeared to be different at only two loci. Both are found in all three populations. Interestingly, haplotype No. 1 was most frequent in Russians, No. 2 in Ukrainians and both haplotypes were represented almost equally in Byelorussians. Analysis of the median network showed that for Byelorussians the genomic environment near haplotype No. 1 was similar to that in Russians and that near haplotype No. 2 was similar to that in Ukrainians. According to these characteristics, the Byelorussian population appears to be the closest to the ancestor Eastern Slavic population with the two remaining populations being its derivatives. The same opinion is shared by some researchers engaged in neighboring fields of research and the cited results support it at the current stage of the studies. Using the same kind of markers, some problems of local significance can be solved. For instance, we carried out a survey of allelic polymorphisms and haplotypes for the same five microsatellites of the Y chromosome in samples from Russian men living in geographically distinct regions (Archangelsk and Kursk) of the European part of Russia [11]. With regard to differences in the culture and the mode of life of these peoples, the first sample can be referred to as Northern Russians and the second one as Southern Russians. Comparative analysis of the allelic frequencies over all loci revealed statistically significant differences between the two populations (p = 0.001). The main contributions to the differentiation were made by the DYS392 (p = 0.005) and DYS393 (p = 0.003) markers. Allelic diversity indices calculated for them were more than 1.5 times higher, and they were close to the maximum values observed in some European populations. On the other hand, in the Kursk population, the values of Y-chromosomal allelic diversity indices in most cases were close to those for populations of the Novgorod region, Ukraine and Belarus [10, 11]. Interpopulation differences in the values of allelic diversity indices for the DYS392 and DYS393 loci revealed resulted from the high frequency of the alleles with 14 repeats in the Archangelsk population. Major alleles with 14 repeats of the DYS392 and DYS393 loci are typical for some Northern European populations [13, 14]. Based on data on allele frequency distributions for the loci of interest, genetic distances were estimated between populations from the Archangelsk and Kursk regions, and some of the European populations including Eastern Slavic ones. Irrespective of the chosen measure of genetic distance (GST, DA, DC, ()2 or DSW), the population from the Archangelsk region was closer to the populations of the Finno-Ugric linguistic group (Saami and Estonians) and to the Latvians, who are geographic neighbors of the Estonians, whereas the Kursk population was always a member of a cluster formed by Eastern Slavic populations (Russians of the Novgorod region, Ukrainians and Byelorussians). A comparative pairwisepair-wise analysis of haplotype frequency (using Fst values as a measure of genetic similarity) confirmed the absence of notable differences between the Russian population of the Archangelsk region and the populations of Saami, Estonians and Latvians. It also showed the genetic similarity of Russians from the Kursk region with Russians from the Novgorod region and with Ukrainians and Byelorussians as well. Phylogenetic analysis of the most frequent Y-chromosomal haplotypes (occurring more than once), based on the step-wise mutation model (where the neighboring haplotypes differ only by one repeated unit), demonstrated substantial differences in haplotype distributions in
Ethnic Genomics of the East European Human Populations
181
median networks of the Kursk and Archangelsk populations. The median network of the Archangelsk population consisted of two haplotype groups that showed equal frequency and were separated by six single-step mutation events. In contrast, the median network of the Kursk sample displayed structural unipolarity (23 haplotypes in one part of the network versus five in the other one). In addition, if haplotypes of one of the median network poles of the Archangelsk population are integrated into the net of major haplotypes of the Kursk population, the remaining ones are neither common nor neighboring for both populations. To determine the possible sources of such dissimilarity in sets of haplotypes among populations, the major haplotypes were included into more extensive median networks along with Byelorussians, Ukrainians, Novgorod regional Russians, Saami and Estonians. This analysis allowed us to show that the differences between the Kursk and the Archangelsk populations were associated with a high prevalence of major haplotypes in the latter, typical mainly for Finno–Ugric populations. The specific genetic nature of people from the Archangelsk region compared with other Slavic populations was also noted in our study of mtDNA polymorphisms when the Russians from Oshevensk were compared with the Russians from the town of Ufa and the Byelorussian population [6]. In mtDNA samples collected in Oshevensk, subcluster U5b1, which is not typical for European populations and is described in the literature as specific to the Saami population, was found at a frequency of 0.07.
AUTOSOMAL DNA MARKERS Markers of the essential part of the genome that are inherited in a sex-independent manner allow the characteristics of entire populations to be studied. These are single locus autosomal DNA markers, comprising two distinct groups: diallelic and multiallelic markers. Diallelic markers are represented by single nucleotide substitutions and insertion–deletion polymorphisms. Multiallelic markers include tandemly organized repeated sequences of miniand microsatellites.
Insertion–Deletion Polymorphisms of the Chemokine (C-C Motif) Receptor Type (CCR5) Gene Insertion–deletion polymorphisms in the gene for CCR5 can exemplify DNA diallelic polymorphism. CCR5 is a coreceptorco-receptor for macrophagotrophic strains of the human immunodeficiency virus HIV1 that is used by this virus for penetration into cells. The gene is localized in the p.21.3 region of chromosome 3. In 1996, a 32 bp deletion was revealed in the gene‘s segment coding for the second extracellular loop of the CCR5 protein. This deletion, denoted CCR532, is likely to prevent the interaction of the receptor with the virus and individuals who are homozygous for CCR532 are resistant to HIV1 infection. The mutant allele is found in European populations and in white Americans at frequencies of 2%–15% (mean 9%). This mutant allele is either rare or absent in populations from Black Africa and the Far East [15, 16]. Thus, the significant ethnic-specific property of this polymorphism was discovered because the frequency of the marker‘s allelic variants differs significantly between
182
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
human populations. The high frequency of CCR532 in some Caucasian populations raises the question of whether it is the result of random genetic drift or a consequence of selective pressure, possibly driven by an increased resistance to some infectious agents or by other factors. The frequency of the CCR532 allele varied widely in studied populations, ranging from 3% to 8.5% in five Asian populations (Tuvians, Uygours, Azerbaijanians, Kazakhs and Uzbeks) to 12% to 14% in populations of Tatars, Russians and Byelorussians [17, 18]. A homozygotic genotype in the mutant allele was only revealed in one individual from the Udmurtian population. In the Volga–Urals region, the lowest frequencies of the CCR532 allele were found in the northeastern and southeastern ethnogeographic groups of Bashkirs (2.17% and 2.50%, respectively). It was also calculated as low for the total population of Bashkirs (3.66%). The highest frequency of the allele was observed among Tatars (13.44%) who, like the Bashkirian population, are considered to belong to the Turkish branch of the Altai linguistic family. The mean frequency of the deletion was 7.02% in the Volga–Urals region [19]. The centers of maximal frequency are located in the northern and eastern parts of Europe. The emergence in these regions of centers with a large accumulation of such mutant genes seems unexpected, as these people had never encountered HIV1-related acquired immune deficiency syndrome (AIDS) before its emergence. Presumably, other infectious agents also make use of the CCR532 receptor for penetration into cells. In this case, selection could result in the accumulation of this mutation in the focal points of infection. Regardless, the presence or absence of this particular deletion is a type of polymorphism that can be used effectively in population analysis. The cartographic simulation of the distribution of the CCR532 mutation has been based on our own data combined with data from the literature, using 77 populations from different regions of the Old World [20]. The frequency of the CCR5 deletion allele was the highest in the populations of northeasternNorth-eastern Europe and gradually decreased from the Baltic region in all directions. This mutant allele was either rare or absent in populations from Black Africa and the Far East. Climatic and geographic data (annual radiation balance, average January temperature, average July temperature, total amount of insolation, altitude and the annual precipitation rate) were obtained from an atlas [21]. Spearman‘s rank–order correlation coefficients were computed between CCR532 frequencies and climatic and geographic variables. Table 1 presents the correlations between the CCR532 allele frequencies and each climatic or geographic factor addressed. We found a strong positive correlation with latitude (r = 0.72) and a somewhat weaker negative correlation with longitude (r = –0.34). Our data also suggest that the annual radiation balance, total amount of insolation and the average July temperature also affect the CCR532 allele frequency and its expansion throughout the world (r = –0.66, r = –0.66 and r = –0.64, respectively). The average January temperature and altitude have weaker negative effects (r = –0.50 and r = – 0.26, respectively). The annual precipitation rate showed no correlation with the frequency of CCR532 gene distribution in the Old World.
Table 1. Coefficients of correlation between CCR5Δ32allele frequencies and climatic-geographic parameters Parameter Number
1 2 3 4
Climatic parameters
Annual radiation balance (kcal/cm2/year) Average January temperature (°C) Average July temperature (°C) Total amount of insolation (kcal/cm2/year)
5 6
Longitude Latitude
7 8
Altitude (m) Annual precipitation rate (mm/year)
Coefficients of rank-order Spearman correlation
Coefficients of partial correlation
coefficient values depending on temperature parameters Temperature parameters –0.66*** –0.66
numbers of parameters which are held a constant
coefficient values if latitude is held constant
2,3,4
-0.42
–0.50***
+0.50
1,3,4
+0.25
–0.64***
–0.22
1,2,4
–0.09
–0.66***
–0.06
1,2,3
–0.22
Geographical coordinates –0.34** 0.72*** Common parameters –0.26* –0.07
– – –0.34 –0.07
Significance levels: *** - p<0,001, ** - p<0,01, * - p<0.05 a 1 designates annual radiation balance; 2 designates average January temperature; 3 designates average July temperature; 4 designates total amount of insolation.
184
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
Coefficients of partial correlation were also calculated (table 1). The first four factors are closely interrelated, so it was necessary to clarify which of these was the most important after the three other factors are excluded. We found that only one factor, the annual radiation balance, still correlated strongly with CCR532 allele frequencies (r = –0.62). Interestingly, compared with pair correlations, partial correlations between allele frequencies and the average July temperature, and between allele frequencies and the total amount of insolation, both decreased markedly, whereas the partial correlation with average January temperature increased. The temperature parameters (parameters 1–4, table 1) are apparently tied to latitude. Therefore, another range of partial correlations was calculated: partial correlations without the influence of latitude (table 1, last column). Lower coefficients were found for almost all parameters, but the correlation with radiation balance remained higher than for the others. Although this latter correlation had decreased (r = –0.42), it was still statistically significant. Thus, the analyses of paired and partial correlations revealed a negative correlation between CCR532 allele frequencies and the annual radiation balance. This association can also be explained by correlation with other climatic factors not analyzed in this study, but tied to the annual radiation balance. Hence, we can assume that the particular features of the geographic distribution of the CCR532 allele are dependent—most likely indirectly—on the influence of climatic and geographic factors. The data presented show that the frequency of the CCR532 allele is highest in the populations of northeastern Europe. Its frequency gradually decreases from the Baltic region in all directions. There are two alternative explanations for such clines: possible effects of geographically variable selective factors that we cannot yet specify, or some episode of gene flow in the course of the history of the Old World populations. A cline or gradient may reflect adaptation to variable environments, a population expansion at a moment in time, or continuous gene flow between groups that initially differed in allele frequencies. Selection tends to affect single genes, whereas demographic changes determine similar patterns across the genome. In Europe, many genetic markers show broad gradients spanning from the Levant to nNorthern and westernWestern Europe [22]. These clines are generally attributed to the effects of a demographic expansion from the Levant in Neolithic times [23, 24]. However, the likely age of CCR532 is about 2,000 years [25] or less [26]; consequently, its distribution cannot be due to a Neolithic process. Therefore, there are two possibilities: geographically variable selection, or a more recent migration process. The CCR532 mutation may have arisen in populations in the north and then spread to neighboring groups in different proportions, depending largely on geographic distance. Alternatively, considering the function of this gene and the age of the allele, it might be that CCR532 protects carriers from infections other than HIV/AIDS, a premise that is difficult to prove at present. The results presented here are consistent with the idea that climatic factors can play a certain selective role, either directly related to the expression of the CCR532 allele, or to the action of a pathogen against which that allele confers some degree of protection. The correlations with climatic and geographic parameters show the key role of climatic factors for the complexity of genetic microevolution involving mutations, gene drift, migrations, resistance to infections and adaptation to the environment.
Ethnic Genomics of the East European Human Populations
185
Multiallelic Markers Multiallelic markers, unlike diallelic ones, are a special structural type represented by tandemly organized repeated sequences, mainly mini- and microsatellites. These regions of the genome belong to the so-called hypervariable regions. As stated above, they contain short DNA sequences (the oligonucleotide core) that are repeated many times. Minisatellites have a longer elementary core of 10 or more nucleotides. For instance, the core in the well-known minisatellite locus D1S80 is 16 bp long [27]; the intron of the myoglobin gene bears a minisatellite with a several times repeated 33 bp repeated core [4]. The length of each microsatellite‘s core is less than 10 bp, and researchers generally deal with microsatellites with a core of 2–6 bp. These are termed simple repeats, and they are extremely numerous in the genome. CA repeats are the most common in the human genome, and several thousands of these loci have been revealed with most being variable. The variability of mini- and microsatellites is because different individuals possess different numbers of tandem repeats at the same locus on the same chromosome. For instance, the number of tandem repeats in the APOB locus may vary from 21 to 55, yielding altogether 10 to 14 allelic variants per population [28]. There are loci with higher levels of multiallelity; for example, D14S1 bears more than 80 allelic variants [29]. Thus, the polymorphism of multiallele loci is high enough to ensure considerable informative capacity for population genetics. Studies have shown that the mutation rate in autosome-located microsatellite areas is 1 10–3 [30, 31]. A very similar value (2 10–3) was also estimated for the microsatellite loci found in the nonrecombining region of the Y chromosome [32, 33]. These almost identical mutation rates for autosome and Y-chromosome microsatellites provide an indirect indication that recombination in itself is not the cause of the mutations of these sequences (the formation of new allelic variants differing in length, or in the number of elementary links). At the same time, the structure of microsatellites allows one to suspect that these regions would tend to undergo unequal crossing-over. However, as this is not observed, some inhibiting mechanisms are likely to be involved. Detailed analysis of mutation events at microsatellite loci has shown either addition or omission of a one elementary link-equal region (a one-step change) to be the most frequent. Thus, elongation of a given microsatellite occurs two and a half times more frequently than shortening. It follows from the above that the tendency to form longer and longer allele variants is inevitable given enough time. Thus, shorter alleles can be treated as more ancient than longer ones. In fact, if we compare humans with primates, the allelic variants at each given locus appear to be shorter in the latter. However, the tendency for allele elongation is not observed at all loci, as some have mutations that both shorten and elongate their allelic variants. In addition, the mutation rate is higher in longer microsatellites. As for minisatellite loci, several—particularly those most actively employed in population and identification studies (e.g., APOB, D1S80 and D17S30)—have been found to obey similar regular mutation rates as those observed in the case of microsatellites. The average mutation rate value for hypervariable minisatellite loci is 5 10–4, which twice as low with the value for tetranucleotide microsatellites of 1 10–3 [31]. Mutations with a locus length change for a small number of repeated units (from 1 to7 ones), independently of the size of the repeated unit at the locus, are the most frequent in minisatellites also.
186
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
The population and genetic properties of multiallelic microsatellite loci may be revealed using the example of a triplet repeat in the myotonin protein kinase gene (chromosome 19). This repeat is found in the gene‘s nonencoding region and is well known because major elongation causes the inherited neurological disease myotonic dystrophy [34]. The number of triplet repeats at this locus normally varies from five through 30 or more. In patients with myotonic dystrophy, there are hundreds or even thousands of repeats, forming gigantic microsatellite areas on the chromosome. Based on the name of this disease, this region is frequently named the DM locus. However, notwithstanding this close association with inherited pathology, the ‗―normal‘‖ microsatellite locus behaves like a neutral marker and its characteristics make it applicable to population studies [35, 36]. There are more than 20 allele variants in different populations. Figure 1 shows the distribution of DM alleles in three populations: Russians of the Kursk region, Yakuts and Bashkirs. It is evident that the allele with fivecorefive-core repeats is mostly represented in the Russians (the so-called major allele), with a frequency in Slav populations reaching 40% and at times 50%. A set of alleles with 11–14 repeats is also significant because of this set shows frequency ratios that vary between populations. In some instances, the frequency of the 13-member repeat attains a value of up to 20%. Interestingly, a considerable lowering of frequencies is observed in the region between the five-5 core repeats and the 11-core repeats, with allele frequencies in this region (from the 6core six-core repeats to the 10-core repeats) being extremely low or totally absent for some populations. It remains obscure whether this phenomenon results from some steric peculiarities of this DNA region or from some effect on the functioning of the gene. Nevertheless, this region shows some influence of strong negative pressure. Afterwards a diluted region of longer allelic variants, of values exceeding at times 30 repeats, can be detected.
Figure 1. Allele frequency distributions for microsatellite DM locus in populations of different ethnic groups: Russians, Bashkirs, Yakuts.
Ethnic Genomics of the East European Human Populations
187
For instance, the frequency of the 5 five-core repeats repeat decreases in the Bashkirs and many other Uralic peoples, with the 12- core repeats, 13 -core repeats or sometimes the 14 core repeats being the second most frequent alleles. For comparison, figure 1 shows the distribution of allelic frequencies in the Yakuts, who have been classified as being of Asian origin. The spectrum of frequencies is essentially different here, manifesting thereby the specificity of this locus with respect to the populations of European and Asian origin. Similar distinctions can be found for some other microsatellite markers [37-39] and in every case, they are regular and not random. The same regularity is observed if we examine the Byelorussian population instead of the Kursk or Buryat groups in place of the Yakuts. This can be explained by differences in the evolutionary trajectories of European and Asian groups, as reflected in numerous characteristic features of their gene pool. Besides, their adaptation to environmental conditions could also contribute to the distribution peculiarities of genetic markers in the populations under study. It should be emphasized that all DNA markers employed so far have shown specificity with respect to the populations of European and Asian origin [35-42]. Whether this is their common property is not yet clear, although it is already worth consideratingconsideration. In addition to microsatellite markers, population studies make active use of minisatellites. The most frequently employed marker of this type is the hypervariable locus near the 3 end of the apolipoprotein B gene (APOB). This is situated on the short shoulder of chromosome 2 in region 2p23–p24. It is 43 kb long, and it contains 29 exons and 28 introns [43]. The major protein encoded by this gene is APOB-100, one of the largest known proteins at 550 kDa. It forms low-density lipoproteins and plays an important role in cholesterol exchange by ensuring its transport into cells of different tissues. In 1986, the tandem repeat minisatellite of 11 related AT-rich repeated units of about 15 bp was discovered. Subsequent comprehensive analysis of the minisatellite showed considerable interindividual polymorphism [43, 44]. This marker has been used actively in forensic and population studies. The repeated link appears to be a tandem of two similar sequences of 14 and 16 bp, therefore, the nearby alleles differed by 30 bp from one another, or by two repeats. Different alleles consist of 25 to 55 repeats, and their notation corresponds to the number of repeat units. Polymorphisms of the 3′-end APOB gene minisatellite region (APOB minisatellite) were studied in a number of populations of Eastern European region [45-48]. Minor alleles with an odd number of repeats (25, 27 and 29) were found in studies of West European populations as well as in other studies [28, 43, 49], with different populations accounting for 10 to 17 different allelic variants. Although allelic frequencies vary considerably between populations around the world, some common features of their distribution have been traced. As a rule, the two allele variants, APOB34 (34 repeats) and APOB36 (36 repeats), are the most frequent, accounting for 60%–80%. Other allele variants are rarer—generally APOB46, APOB30 or APOB48 are seen at a frequency of 10%–20%— and the remaining alleles have a frequency of less than 5%. APOB34 is prevalent in populations of Asian origin [28, 43, 49], whereas APOB36 prevailed in all the studied populations of European origin (from Northern America and Europe including Russia) [28, 43–52]. Thus, the APOB minisatellite region shows a distinct ethnic-specific property. Interestingly, populations of African origin exhibited especially high variability of APOB minisatellite loci with numbers ranging from 11 to more than 20 [53]. These groups showed considerably more unusual alleles than populations of European and Asian origin. Although the frequency ratio of APOB34 and APOB36 is similar to the ratio in populations of European origin (APOB36 being more frequent than APOB34),
188
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
the frequencies of these alleles in Africans are considerably lower than those in Europeans (0.15–0.25 versus 0.25–0.42, respectively). Modality is one more distribution trait of APOB allele frequencies, which can be treated as an ethnic-specific trait. In addition to the frequency peak of APOB34 and APOB36, populations of European origin display an additional peak in the APOB48, with a frequency of up to 10%. Thus, the distribution of allelic frequencies in populations of European origin is generally of bimodal character. In contrast, populations of Asian and African origin show a unimodal frequency distribution [46, 54]. The internal structure of the Eastern European gene pool is characterized by distinctive subdivision into ethnic groups arising from the diverse evolutionary origin of the various populations [55]. Most anthropological studies presume that the ethnic groups inhabiting the Ural Mountains region, which forms the border between Europe and Asia, are very heterogeneous and are characterized by different levels of admixture between European and Asian components [56-59] (see table 2). From this point of view, the populations of the Eastern European (Russian) plain—Eastern Slavs—are more homogeneous and most similar to the populations of Central Europe [60, 61]. Our study of the APOB loci concerned ethnic groups of both European (the Russians, Byelorussians, Kuban Cossacks, Adygeis, Shapsugs, Cherkess and Abkhazs) and Asian origin (the Yakuts and Kalmyks). The studies included ethnic groups belonging to the following four linguistic families: Indo-European (Eastern Slavs, i.e., the Russians, Ukrainians and Byelorussians), the Northern Caucasian (the Adygeis, Shapsugs, Cherkes and Abkhazs), Uralic (the Komi-Permyats) and Altaic (the Bashkirs, Kalmyks and Yakuts). Among all studied samples, there were 32 APOB minisatellite allele variants with 24 to 55 repeats at different frequencies. Alleles APOB34 and APOB36 were the most frequent, APOB34 being more prevalent in the populations of European origin and APOB36 more common in populations of Asian origin (the Kalmyks and Yakuts). Comparisons of the frequency distributions of the APOB minisatellite in the populations under study with data from the literature showed the largest similarity of the East European, Ural and the North Caucasian linguistic families to other populations of European origin. Populations of the Altai linguistic family (with the Bashkirs excluded) were more similar to populations of Asian origin as far as the frequency of the major alleles and the mode of frequency distribution were concerned. When using a multiallelic marker in population genetics, analysis of the distribution of its frequency using regression statistics substantially facilitates the interpretation of results. One of these methods, namely multidimensional scaling (MDS), allows the visualization of discrete data and evaluation of the multiplicity of interconnections between populations. Figure 2 shows a plot obtained by MDS analysis of APOB variability in the studied populations. The populations of European origin cluster densely, whereas populations of Asian origin—the Kalmyks and the Yakuts—are considerably remote from those of European origin. The plot shows with particular distinction the resemblance and proximity of East European peoples to the population of the Black Sea Caucasian coast (the Adygeis-Shapsugs and the Abhazs) and to the Komi-Permyats as well. The peoples of the Northern Caucasus and the Volga–Ural Region are placed on the plot halfway between the populations of European and Asian origin. These peoples show a greater diversity than Eastern Slavs. It is worth mentioning that the peoples belonging to the North Caucasian and Ural linguistic families are isolated from Eastern Slavs on the plot. Among Eastern Slavs, the core of the cluster is constituted by Russian and Ukrainian populations, and Byelorussian populations are
Ethnic Genomics of the East European Human Populations
189
located along the border of the oval enveloping the cluster of the Indo-European linguistic family. Table 2. Classification of the populations studied Linguistic family
Linguistic group
Ethnic group
Population
Geographic region
Velij Puchej Belaia Sluda Kholmogory Novgorod
Russian Plane
Russians Kursk Smolensk Kostroma Oschevensk IndoEuropean
Eastern Slavonic
Kuban Cossaks
North Caucasus
Grodno Nesvij Byelorussians
Khoiniki Pinsk Russian Plane Bobruisk Mjadel‘ Alchevsk
Ukrainians
Kiev L‘vov
Carpaty Mountains
Shapsugs (Coastal) Adygeis Northern Caucasian
Uralic
AbkhazAdygei
Eastern Adygeis (Coastal) Western Adygeis (Highlanders)
Circassians
Circassians
North Caucasus
Abkhazians
Abkhazians
FinnoUgric
Komis
Permyats
South Urals
Mongolian
Kalmyks
Elista
Northwestern Caspian
Bashkirs
Beloretsky
South Urals
Yakuts
Tiungiuliu
Siberia
Altaic Turkic
190
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
Figure 2. The multidimensional scaling plot (two dimensions) of Nei‘s genetic distances between 28 East European populations based on 3´ApoB VNTR variability. Abbreviations are: s, Shapsugs; a, western Adygeis; Adygeis, eastern Adygeis; Coss, Kuban Cossacks; Kost, Kostroma; Osch, Oschevensk; K, Kholmogory; BS, Belaia Sluda. Ethnic communities of the northern Caucasus, East Slavonic and coastal Adygei are circled to make comparisons easier. Bold circle designate East Slavonic populations: Byelorussian populations are arranged closest to the circle; dark points are Ukrainians, and others (inside ellipse with long dashes) are Russian populations. Additional data were analyzed for Ukrainians (populations from Kiev, L‘vov, and Alchevsk), for Russians (populations from Kursk, Novgorod, Kostroma, Smolensk, and Oschevensk), for Byelorussians (the Mjadels, and the Bobruisks), the Kalmyks and the Yakuts, and for Kuban Cossacks, Abkhazians, Circassians, Shapsugs, eastern Adygeis (highlanders) and western (coastal) Adygeis.
This variability of the APOB minisatellite marker in East European populations is consistent with the historic and ethnographic data. The analysis of biochemical polymorphism and anthropologic traits of these groups and the data of mtDNA and Y-chromosomal haplotype variability in the ethnic groups yielded very similar results [1-3, 6, 8, 10-12, 22, 5662]. Figure 3 presents our data in comparison with other published data, and it brings together the results for each particular ethnic group. It is apparent that the studied populations of Asian origin (Yakuts and Kalmyks) are situated in the same cluster as the Chinese and the Japanese. In the opposite part of the plot, European populations are all found in a united cluster, with Germans, French, Swedes, Russians, Byelorussians and Ukrainians in the core of the cluster. These results, showing the proximity of Eastern Slavic populations to the majority of West European populations, confirm the opinion of archaeologists and anthropologists in favor of the Central European origin of Eastern Slavs. Slavs as a separate group were being formed about the middle of the first millennium BC on the basis of the Lusatian (Lausitz) culture, which belonged to the Central European community of Urnfield cultures, in the region of the Middle and Upper Vistula and the right bank of the Oder [62]. Thus, these minisatellite markers show highly differentiating properties in terms of the ability to discriminate between
Ethnic Genomics of the East European Human Populations
191
closely related groups and to retain the distinctions that allow the relationships among groups that are close in origin to be established. This analysis of the hypervariable APOB locus polymorphism demonstrates the high information capacity of this locus for population genetics studies at the level of ethnic groups thanks to the wide allele spectrum and high heterozygosity of this locus between populations.
Figure 3. The multidimensional scaling analysis plot (dimension I/II) of 23 Caucasoid and Mongoloid ethnic groups of Eurasia. Abbreviations are: RUS (Russians), BEL (Byelorussians), U (Ukrainians), A (Austrians), G (Germans), FRA (The French), P (PortuguesesPortuguese), GR (Greeks), H (Hungarians).
Analyses for other tandem-repeated loci, D1S80, DRPLA, SCA1—not just those mentioned earlier—have confirmed that the gene pool of populations inhabiting the territory of Eastern Europe is diverse and characterized by a high level of polymorphism [63-67]. The patterns obtained with MDS of variability in the tandem-repeated loci studied are consistent with ethnohistoric data and reflect the complicated features of population genetic interrelations, consistent with studies of other DNA markers. Thus, the variability of the tandem repeat loci studied is in good agreement with the evolutionary lineages of these populations. Our results confirm that these loci are highly informative for general applications in population genetics and can resolve population affinities in both major continental populations and local ones, even when they belong to the same ethnic group. Using a set of DNA markers with similar properties should allow the easy resolution of small differences in population structures. The data reviewed here will be further used for assessment of the gene pool variability dynamics of Eastern Europe. Genetic diversity for the minisatellite hypervariable region of 3´APOB was evaluated for two populations from distant regions of the Komi Republic: the Izhemski and Priluzski Komi [67]. The allele spectra in Komi populations were bimodal, with the main peak in alleles
192
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
APOB34–36 and a secondary mode around APOB45. Most frequent in the Priluzski Komi sample was APOB36 (43.4%), followed by APOB34 (26.2%), revealing a high degree of similarity with Eastern Slavic populations (Russians, Ukrainians and Byelorussians). As the proximity of the 3´APOB variable number tandem repeat (VNTR) allele frequency distributions of Eastern Slavonic populations to European ones has been demonstrated (Verbenko et al., 2003b), it follows that the Priluzski Komi group shows similarity with Europeans. In contrast to Eastern Slavonic, European and Priluzski Komi populations, the Izhemski Komi showed the highest frequency of alleles APOB36 (54.9%) and APOB45 (11.1%), whereas the frequency of APOB34 was only 9.4%, which discriminates this population from others. PairwisePair-wise Chi-squared tests for heterogeneity between samples, both those involved in the study and those from Scandinavian populations available as reference data (Finns, Swedes and Lapps), revealed statistically significant differences between the Izhemski Komi and other populations. Despite living at a common latitude of and belonging to the same language family as the Komi, Finnish and Lapp populations, the Priluzski and Izhemski Komi show the closest clustering on MDS plots, but no similarity with Scandinavian populations. This makes it possible to assume different origins of Finnish and Komi populations. Alternatively, it could indicate the differential selection pressure of environmental factors, which are not the same in Scandinavia and the northern Urals. This peculiarity of the Izhemski Komi might have its origin in ethnogenesis, as they formed later than other subethnic groups of Komi and used to live a unique lifestyle (nomadic and reindeer breeding) that was assimilated from the Nentsy culture during their formation as a distinct group. At the same time, the unusual DNA diversity of the Izhemski Komi might have been caused by random genetic drift under the selection pressure of environmental factors during the formation of this population, such as special nutrition or adaptation to extreme climate conditions.
Haplotype Diversity of the p53 Gene in Peoples of Russia and Byelorussia Studies on haplotype diversity of particular genes or DNA regions form an increasingly growing area in modern population genetics. Currently, haplotypes are considered as one of the most effective tools for the assessment of genomic variability as well as for medical and ecological genetic studies, because their application in contrast with individual markers allows extended regions of different chromosomes to be made a part of the analysis. A haplotype is a combination of alleles of neighboring genetic markers on a single chromosome that are characterized by combined inheritance across a number of generations. Haplotype analysis is based on the idea of linkage disequilibrium (LD) between neighboring polymorphic loci of one chromosome, SNPs being regarded as these loci in the majority of the studies. LD is a phenomenon of the nonrandom association of alleles at adjacent loci, when a particular allele at one locus is found together on the same chromosome with a specific allele at a second locus. When such an allele combination occurs more often than expected if the loci were segregating independently in a population, the loci are in disequilibrium. Mathematically, for the case of a two-loci system with two alleles at each locus and using one of the earliest and most widely used measures of disequilibrium, D, disequilibrium may be quantified as follows:
Ethnic Genomics of the East European Human Populations
193
D = pAB – pA pB, where pAB is the observed frequency of the haplotype consisting of alleles A and B and pA pB is the expected frequency of haplotype AB that could be observed if the alleles were segregating at random [68]. Groups of alleles of neighboring SNPs that are in disequilibrium to each other represent regions with limited genetic diversity: so-called ‗―haploblocks‘‖. The average size of these haploblocks is 5–20 kbs, but they can reach several hundred thousand bp [68-71]. Linkage disequilibrium studies play a central role in human genetics. Most prominently, patterns of LD and haplotype variation serve as the backdrop for the design of association mapping studies [72]. At the same time, variation of LD patterns and underlying haplotype frequencies is a function of the population history and population structure [73, 74]. We have carried out studies on the haplotype diversity of a number of genome regions, including the gene for the p53 tumor protein, which plays a central role in cell function [75]. As with many other human genes, this gene shows polymorphisms. Three of these polymorphisms, a 16 bp duplication in intron 3 and single nucleotide substitutions in exon 4 (the Bsh12361 RFLP) and in intron 6 (the MspI RFLP), have been reported to be involved in cancer. Furthermore, one of their combinations—haplotype 1-2-2 (comprising the loss of the 16-bp duplication at intron 3/Bsh12361 «+»/ MspI «+»)—shows a continuous south-to-north increase of frequency in populations, which may be the consequence of their adaptation to the level of insolation [76]. We have performed an analysis of distribution of such polymorphisms in nine populations belonging to seven ethnic groups: the Byelorussians (Pinsk town, Western Belarus); two Russian populations from the central (Smolensk) and northern (Archangelsk, Oshevensk) regions of the European part of Russia; the Mordvinians (Volgo-Vyatsky region); the Kalmyks (the steppe region to the northwest of the Caspian sea); the Khants (Western Siberia, the Middle Ob‘ River); the Buryats (Ust‘-Ordynski Buryatski Autonomous District, to the west of Baikal Lake); and in two Yakut populations from Megino-Khangalassky (the Middle Lena River) and Verkhoyansky districts of Yakutia. Four polymorphic sites were involved in our study. Three of them were the above-mentioned polymorphisms and the other was the BamHI RFLP in the 3 flanking region of the gene. The A1 alleles of the Bsh1236I RFLP at codon 72 and the MspI RFLP in intron 6 are defined as the absence of the restriction site, and the A1 allele of the 16-bp duplication in intron 3 is defined as the absence of the duplication (76). By analogy, we defined the BamHI A1 allele as the absence of the restriction site in the 3´ flanking region of the p53 gene. The second allele in each case was designated the A2 allele. The Yakuts differed from all other populations, as they had a significantly higher (p < 0.02) frequency of the BamHI A1 allele. The frequencies of the Bsh12361 alleles were similar in most populations and demonstrated no marked latitude variation, which has been reported for this polymorphism in earlier studies [77]. The lack of significant differences between our populations for Bsh1236I polymorphism seems to be determined by their location within the small range of latitudes (about 20). The 16 bp duplication A1 allele and MspI A2 allele frequencies were significantly higher in the Yakut and Khant populations. Generally, the 16-bp A1 and MspI A2 alleles demonstrated an increasing west–east cline.
194
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
Alongside this analysis of allele distribution in populations studied, the frequency of p53 haplotypes was also evaluated. As was mentioned earlier, a haplotype is an inheritable combination of alleles of neighboring genetic markers on a single chromosome. To evaluate the existence of stable allelic combinations (haplotypes) in the gene for p53, LD between the studied polymorphisms was tested. Linkage disequilibrium values (D) between BamHI and other polymorphic sites were not significant in many cases, so we have used the 16-bp– Bsh1236I–MspI haplotype frequencies only. Of eight possible haplotypes, five were observed in the populations investigated (haplotype frequencies were estimated with Arlequin software v.2.000), and only three of them were common for all the populations. The frequency of haplotype 1-2-2 was the highest in all populations (66.5%–75.5%). The next most common haplotype, 1-1-2, had very similar frequencies in the Byelorussians (12.9%) and Russians from the Smolensk district (12.3%); however, it showed far higher frequency in other populations (20.0%–32.2%). Interestingly, the frequency of haplotype 1-1-2 in African, South Indian and South Chinese populations can be even higher (35%–45%), whereas haplotype 12-2, in contrast, occurs less frequently (35%–57%), thus coming close to a 1:1 ratio [76, 78]. These two haplotypes have a presumably ancient origin, and they were likely to be present as major haplotypes in the African population at the origin of humankind. Therefore, they are found at high frequencies in all contemporary entities. The frequency of the 2-1-1 haplotype (a 16-bp duplication as Bsh12361 « - »/ MspI « - ») showed a nearly continuous west-to-east decrease across the region studied (from 17.9% in the Byelorussians to 0.7% in the Yakuts from Verkhoyansk; figure 4) and demonstrated a reasonably good correlation with longitude of each population‘s location (Spearman‘s r = – 0.8667, p = 0.0025). There are two possible explanations. On one hand, it may reflect the interaction of historical lineages of Asian and European origins that have been occurred in this territory. On the other hand, the correlation might be interpreted in terms of the effect of environmental factors, probably yearly average temperatures, which have a decreasing west– east cline in Russia. Further studies are needed to check this hypothesis. Thus, we have confirmed a connection between p53 haplotype frequency distribution and the ethnicity of populations. At the same time, we found a pronounced correlation between the frequency of one of the most common haplotypes, 2-1-1, and geographic longitude across the Eurasian Continent.
Figure 4. p53 2-1-1 haplotype frequency versus longitude.
Ethnic Genomics of the East European Human Populations
195
CONCLUSIONS The amount of data concerning the existence of environmentally influenced genetic markers is growing. There are grounds to believe that future application of this group of markers in population analysis will allow various problems of environmental genetic adaptation to be elucidated, and its contribution to the formation of gene pools of other regions, including Northern Eurasia, to be assessed. The first ‗―ethnogenomic‘‖ studies, some of which have been illustrated here, use the concept of DNA markers as possessing properties that will help to acquire more information about the gene pool than possible with the use of former markers. In this connection, much hope is placed on their use to elucidate the temporal parameters of the historic development of ethnic and regional gene pools. Importantly, this approach should provide more precise data on the mutation rates in different portions of the genome, including the hypervariable DNA markers [-]. We anticipate extensive employment of such ethnogenomic approaches in studies devoted to ecological genetics. In this chapter, we have reported the results of the correlations of the geographic distribution of some genes with environmental factors. These studies are the first stage in the genomic and ecological research of the region in which the leading role will be played by such a ‗―genegeographic‘‖ approach. Another prospect of this kind of study is their further development directed at medical genetics, especially studies conducted in the field of hereditary pathology and the genetic factors of predisposition to such diseases as heart troubles, diabetes and hypertension, which all depend to some extent on environmental factors. Evaluation of the mutation range and of population-based characterization of the markers linked with susceptibility genes, combined with the analysis of environmental parameters, will allow the discovery of important new biological laws that affect humanity. In conclusion, our results form grounds for the belief that the development and refinement of genomic studies will broaden our notions about the human gene pool and contribute substantially to our understanding of problems of historic development and human evolution. It will also contribute to important practical tasks related to medicine.
REFERENCES [1] [2]
[3] [4] [5]
Limborska S.A., Khusnutdinova E.K., Balanovskaya E.V. Ethnogenomics and Gene Geography of the peoples of Eastern Europe. Moscow, Nauka, 2002, 261p. Limborska S.A., Slominsky P.A. Human molecular genetics: medico-genetic and population studies. In: Problems and prospects of molecular genetics (ed. Sverdlov E.D.). Moscow, Nauka, 2003, pp 307-371. Khusnutdinova E.K., Limborska S.A. Ethnogenomics. In: Genomics for medicine (Eds. Ivanov V.I., Kisilev L.L.). Moscow, Academkniga, 2005, pp 312-349. Jeffreys AJ, Wilson V, Thein SL. Hypervariable '―minisatellite'‖ regions in human DNA. Nature. 1985;314(6006):67-73. Ward R.H., Frazier B.L., Dew K., Paabo S. Extensive mitochondrial diversity withing a single Amer-indian Indian tribe. Proc. Nat. Acad. Sci. US, 1991, Vol. 88, pp.87208724.
196 [6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al. Belyaeva O., Bermisheva M., Khrunin A., Slominsky P., Bebyakova N., Khusnutdinova E., Mikulich A., Limborska S., Mitochondrial DNA variation in Russian and Belorussian populations. Human Biology, 2003, Vol. 75, N5, pp.647-660. Semino O., Passarino G., Oefner P.J., Lin A.A., Arbuzova S., Beckman L.E., de Benedictis G., Francalacci P., Kouvatsi A., Limborska S., Marcikiae M., Mika A., Mika B., Primorac D., Santachiara-Benerecetti A.S., Cavali-Sforza L.L., Underhill P.A., The genetic legacy of Paleolithic Homo sapiens sapiens in extant Europeans: a Y chromosome perspective. Science, 2000, Vol.290, N5494, pp.1155-1159. Rootsi S, Magri C, Kivisild T, Benuzzi G, Help H, Bermisheva M, Kutuev I, Barac L, Pericic M, Balanovsky O, Pshenichnov A, Dion D, Grobei M, Zhivotovsky LA, Battaglia V, Achilli A, Al-Zahery N, Parik J, King R, Cinnioglu C, Khusnutdinova E, Rudan P, Balanovska E, Scheffrahn W, Simonescu M, Brehm A, Goncalves R, Rosa A, Moisan JP, Chaventre A, Ferak V, Furedi S, Oefner PJ, Shen P, Beckman L, Mikerezi I, Terzic R, Primorac D, Cambon-Thomsen A, Krumina A, Torroni A, Underhill PA, Santachiara-Benerecetti AS, Villems R, Semino O., Phylogeography of Y-chromosome haplogroup I reveals distinct domains of prehistoric gene flow in europe. Am. J. Hum. Genet., 2004, Vol. 75, N1, pp.128-137. Semino O, Magri C, Benuzzi G, Lin AA, Al-Zahery N, Battaglia V, Maccioni L, Triantaphyllidis C, Shen P, Oefner PJ, Zhivotovsky LA, King R, Torroni A, CavalliSforza LL, Underhill PA, Santachiara-Benerecetti AS., Origin, diffusion, and differentiation of Y-chromosome haplogroups E and J: inferences on the neolithization of Europe and later migratory events in the Mediterranean area. Am. J. Hum. Genet. 2004, Vol. 74, N5, pp.1023-1034. Kravchenko SA, Maliarchuk SG, Pampukha VM, Ekshian OYu, Nechiporenko MV, Livshits LA. Genetics and selection in Ukraine at the Millennium Threshold. Kiiv. Logos, 2001. Vol.4. p.410-422. Khrunin A.V., Bebiakova N.A., Ivanov V.P., Solodilova M.A., Limborskaia S.A. Polymorphism of Y-chromosomal microsatellites in Russian populations from the northern and southern Russia as exemplified by the populations of Kursk and Arkhangel'sk Oblast. Russ. J. Genetics. 2005, Vol. 41, N8, p.922-927. Kravchenko SA, Slominskiĭ PA, Bets LA, Stepanova AV, Mikulich AI, Limborskaia SA, Livshits LA. Polymorphism of the STR-locus of Y chromosomes in Eastern Slavs in three populations from Belorussia, Russia and the Ukraine. Russ. J. Genetics. 2002, Vol. 38, N1, p.80-86. Kayser M., Krawczak M., Excoffier L. et al., An extensive analysis of Y-chromosomal microsatellite haplotypes in globally dispersed human populations. Am. J. Hum. Genet. 2001, Vol. 68, №4, pp. 990-1018. Zerjal T., Beckman L., Beckman G. et al., Geographical, linguistic, and cultural influences on genetic diversity: Y-chromosomal distribution in Northern European populations. Mol. Biol. Evol., 2001, Vol.18, №6, pp. 1077-1087. Dean M., Carrington M., Winkler C., Huttley G.A., Smith M.W., Allikmets R., Goedert J.J., Buchbinder S.P., Vittinghoff E., Gomperts E., Donfield S., Vlahov D., Kaslow R., Saah A., Rinaldo C., Detels R., O'Brien S.J., Genetic restriction of HIV-1 infection and progression to AIDS by a deletion allele of the CKR5 structural gene. Science, 1996, Vol. 273, pp.1856-1862.
Ethnic Genomics of the East European Human Populations
197
[16] Martinson J.J., Chapman N.H., Rees D.C., Liu Y.T., Clegg J.B., Global distribution of the CCR5 gene 32-basepair deletion. Nature Genet., 1997, Vol. 16, pp.100-103. [17] Aseev M.V., Skakun V.N., Baranov V.S. Analysis of the allelic polymorphism of four short tandem repeats in a population from the northwestern region of Russia. Russ. J. Genetics. 1995, Vol. 31, N 6, p.839-845. [18] Slominskiĭ PA, Shadrina MI, Spitsyn VA, Mikulich VA, Khusnutdinova EK, Limborskaia SA. A simple and rapid method for determining a 32-bp deletion in the gene for the chemokine receptor CCR5. Russ. J. Genetics. 1997, Vol. 33, N11, p.13681370. [19] Galeeva AR, Khusnutdinova EK, Slominskiĭ PA, Limborskaia SA. Distribution of the 32 bp deletion in the CCR5 chemokine receptor gene in populations of the Volga-Ural region. Russ. J. Genetics. 1998, Vol. 34, N8, p.976-978. [20] Limborska S.A., Balanovsky O.P., Balanovskaya E.V., Slominsky P.A., Schadrina M. I., Livshits L.A., Kravchenko S.A., Pampuha V.M., Khusnutdinova E.K., Spitsyn V.A., Analysis of CCR5Δ32 geographic distribution and its correlation with some climaticogeographic factors. Human Heredity, 2002, Vol.53, pp.49-54. [21] Physico-geographical World Atlas. Moscow, 1965, p 295. [22] Cavalli-Sforza LL, Menozzi P, Piazza A.: History and Geography of Human Genes. Princeton, Princeton University Press 1995, p 1069. [23] Chikhi L, Destro-Bisol G, Bertorelle G, Pascali V, Barbujani G.: Clines of nuclear DNA markers suggest a largely Neolithic ancestry of the European gene pool. Proc. Natl. Acad. Sci. U.S.A. 1998, 95:9053–9058. [24] Sokal RR, Oden NL, Wilson C.: Genetic evidence for the spread of agriculture in Europe by demic diffusion. Nature. 1991, 351:143. [25] Libert F, Cochaus P, Beckman G, Samson M, Aksenova M, Cao A, Czeizel A, Claustres M, Rua C, Ferrari M, Ferrec C, Glover G, Grinde B, Guran S, Kucinskas V, Lavinha J, Mercier B, Ogur G, Peltonen L, Rosatelli C, Schwartz M, Spitsyn V, Timar L, Beckman L, Parmentier M, Vassart G.: The deltaCCR5 mutation conferring protection against HIV-1 in Caucasian populations has a single and recent origin in Northeastern Europe. Hum. Mol. Genet. 1998, 7:399–406. [26] Stephens JC, Reich DE, Goldstein DB, Shin HD, Smith MW, Carrington M, Winkler C, Huttley G, Allikmets R, Schriml L, Gerrard B, Malasky M, Ramos MD, Morlot S, Tzetis M, Oddoux C, Giovine FS, Nasioulas G, Chandler D, Aseev M, Hanson M, Kalaydjieva L, Glavac D, Gasparini P, Kanavakis E, Claustres M, Kambouris M, Ostrer H, Duff G, Baranov V, Sibul H, Metspalu A, Goldman D, Martin N, Duffy D, Schmidtke J, Estivill X, O‘Brien SJ, Dean M.: Dating the origin of the CCR5-Δ32 AIDS-resistance allele by the coalescence of haplotypes. Am. J. Hum. Genet. 1998, 62:1507–1515. [27] Kasai K., Nakamura Y., White R., Amplification of a variable number of tandem repeat locus (pMCT118) by PCR and its application to forensic science. Journal of Forensic Science, 1990, Vol. 35, N5, pp.1196-1200. [28] Renges H.H., Peacock R., Dunning A.M., Talmud P., Humphries S.E., Genetic relationship between the 3‘ VNTR and diallelic apolipoprotein B gene polymorphism: Haplotype analysis in individuals of European and South Asian origin. Ann. Human Genet., 1992, Vol. 56, pp.11-33.
198
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
[29] Wyman A., Wolfe L., Botstein D., Propagation of some human DNA sequences in bacteriophage lambda vectors requires mutant Esherichia coli hosts. Proc. Nat. Acad. Sci. US., 1985, Vol. 82, N9, pp. 2880-2884. [30] A.J. Jeffreys, R. Neumann., Somatic mutation processes at a human minisatellite, Hum. Mol. Genet.,1997, Vol. 6, pp 129–136. [31] Ellegren H. Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet. 2004, Vol. 5, 435-445. [32] Kayser M., Roewer L., Hedman M., Henke L., Henke J., Brauer S., Kruger C., Krawczak M., Nagy M., Dobosz T., Szibor R., de Kniff P., Stoneking M., Sajantila A. Characteristics and frequency of germline mutations at microsatellite loci from the human Y chromosome, as revealed by direct observation in father/son pairs. Amer. J. Human Genet, 2000, Vol. 66, N5, pp.1580-1588. [33] Zhivotovsky LA, Underhill PA, Cinnioglu C, Kayser M, Morar B, Kivisild T, Scozzari R, Cruciani F, Destro-Bisol G, Spedini G, Chambers GK, Herrera RJ, Yong KK, Gresham D, Tournev I, Feldman MW, Kalaydjieva L., The effective mutation rate at Y chromosome short tandem repeats, with application to human population-divergence time. Am. J. Hum. Genet., 2004, Vol. 74, N1, pp.50-61. [34] Tishkoff S.A., Goldman A., Calafell F., Speed W.C., Deinard A.S., Bonne-Tamir B., Kidd J.R., Pakstis A.J., Jenkins T., Kidd K.K., A global haplotype analysis of the myotonic dystrophy locus: Implication for the evolution of modern humans and for the origin of myotonic dystrophy mutations. Amer. J. Human Genet., 1998, Vol. 62, N6, pp.1389-1402. [35] Popova SN, Mikulich AI, Slominskiĭ PA, Shadrina MI, Pomazanova MA, Limborskaia SA. Polymorphism of the (CTG)n repeat in the myotonin protein kinase (DM) gene in Belarussian populations: analysis of interethnic heterogeneity. Russ. J. Genetics. 1999, Vol. 35, N 7, p.854-856. [36] Slominskiĭ PA, Popova SN, Shadrina MI, Pomazanova MA, Lomova TIu, Fatkhlislamova RI, Khusnutdinova EK, Guseva IA, Erdes Sh, Mikulich AI, Spitsyn VA, Limborskaia SA. Normal polymorphism of the (CTG)n repeat in the myotonin protein kinase (DM) gene on chromosome 19q13.3 in Western European populations. Russ. J. Genetics. 2000, Vol. 36, N7, p.809-813. [37] Popova S.N., Slominsky P.A., Pocheshnova E.A., Balanovskaya E.V., Tarskaya L.A., Bebyakova NA., Bets L.V., Ivanov V.P., Livshits L.A., Khusnutdinova E.K., Spitsyn V.A., Limborska S.A., Polymorphism of trinucleotide repeats in loci DM, DRPLA and SCA1 in East European populations. European Journal of Human Genetics, 2001, Vol.9, N11, pp. 829-835. [38] Popova SN, Slominskiĭ PA, Bebiakova NA, Limborskaia SA. Polymorphism of triplet repeats in DM, DRPLA, and SCA1 genes in populations of Russia. Ecologia cheloveka. 2000, 3, p.21-23. [39] Popova SN, Slominskiĭ PA, Galushkin SN, Tarskaia LA, Spitsyn VA, Guseva IA, Limborskaia SA. Analysis of the allele polymorphism of (CTG)n and (GAG)n triplet repeats in DM, DRPLA, and SCA1 genes in various populations of Russia. Russ. J. Genetics. 2002, Vol. 38, N11, p.1312-1315. [40] Belyaeva O.V., Balanovsky O.P., Ashworth L.K., Lebedev Yu.B., Spitsyn V.A., Guseva N.A., Erdes Sh., Mikulich A.I., Khusnutdinova E.K., Limborska S.A. Fine
Ethnic Genomics of the East European Human Populations
[41]
[42] [43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
199
mapping of a polymorphic CA repeat marker on human chromosome 19 and its use in population studies. Gene, 1999, Vol.230, pp. 259-266. Shabrova E.V., Limborska S.A., Ryskov A.P., Multilocus DNA fingerprintinggenotyping based on micro and minisatellite polymorphisms. In: Focus on DNA fingerprinting research. Eds. M.M.Read. Nova Publishers, 2006. Limborskaia SA. Human molecular genetics: medico-genetic and population studies. Mol. Biol. (Mosk) 1999, Vol. 33, N1, p.63-73. Ludwig E.H., Friedl W., McCarthy B.J., High resolution analysis of a hypervariable region in the human apolipoprotein B gene. Amer. J. Human Genet., 1989, Vol. 45, N3, pp.458-464. Boerwinkle E., Xiong W., Fourest E., Chan L., Rapid typing of tandemly repeated hypervariable loci by the polymerase chain reaction. Application to the apolipoprotein B 3‘ hypervariable region. Proc. Nat. Acad. Sci. US, 1989, Vol. 89, N1, pp.212-216. Spitsyn V., Khorte M., Pogoda T., Slominsky P., Nurbaev S., Agapova R., Limborska S.A., Apolipoprotein B 3‘-VNTR polymorphism in the Udmurt population. Human Heredity, 2000, Vol.50, N4, pp. 224-226. Verbenko D.A., Pogoda T.V., Spitsyn V.A., Mikulich A.I., Bets L.V., Bebyakova N.A., Ivanov V.P., Abolmasov N.N., Pocheshkhova E.A., Balanovskaya E.V., Tarskaya L.A., Sorensen M.V., Limborska S.A., Apolipoprotein B 3‘-VNTR polymorphism in Eastern European populations. European Journal of Human Genetics, 2003, Vol. 11, N6, pp. 444-451. Verbenko D.A., Pocheshkhova E.A., Balanovskaya E.V., Marshanija E.Z., Kvitzinija P.K., Limborska S.A., Polymorphisms of D1S80 and 3‘ApoB minisatellite loci in Northern Caucasus Populations. Journal of Forensic Science, 2004, Vol. 50, N1, pp.180-183. Verbenko D.A., Knjazev A.N., Mikulich A.I., Khusnutdinova E.K., Bebyakova N.A., Limborska S.A., Variability of the 3'APOB minisatellite locus in Eastern Slavonic populations. Human Heredity, 2005, Vol. 60, pp.10-18. Chistiakov DA, Gavrilov DK, Ovchinnikov IV, Nosikov VV. Analysis of the distribution of alleles of four hypervariable tandem repeats among unrelated Russian individuals living in Moscow, using the polymerase chain reaction. Mol. Biol. (Mosk) 1993, Vol. 27, N6, p.1304-1314. Kotliarova SE, Maslennikov AB, Kovalenko SP. Polymorphism of 3'-flanking region of gene for apolipoprotein B in a population of Siberian region. Russ. J. Genetics 1994, Vol. 30, N5, p.709-712. Pogoda TV, Nikonova AL, Kolosova TV, Liudvikova EK, Perova NV, Limborskaia SA. Allelic variants of apolipoproteins B and CII genes in patients with ischemic heart disease and in healthy persons from the Moscow population. Russ. J. Genetics. 1995, Vol. 31, N7, p.1001-1009. Deka R., Chakraborty R., De Croo S., Rothhammer F., Barton S.A., Ferrell R.E., Characteristics of polymorphism at a VNTR locus 3‘ to the apolipoprotein B gene in five human populations. Amer. J. Human Genet., 1992, Vol. 51, pp.1325-1333. Calo C.M., Autuori L., Di Gaetano C., Latini V., Mameli G.E., Memmi M., Varesi L., Vona G., The polymorphism of the APOB 3‘ VNTR in the populations of the three largest islands of the Western Mediterranean. Anthropol. Anz., 1998, Vol. 56, N3, pp. 227-238.
200
S. A. Limborska, D. A. Verbenko, A.V. Khrunin et al.
[54] Destro-Bisol G, Belledi M, Capelli C, Maviglia R, Spedini G. Genetic variation at the ApoB 3' HVR minisatellite locus in the Mbenzele Pygmies from the Central African Republic. Am. J. Hum. Biol. 2000, 12(5):588-592. [55] Ageeva RA. Whom did we originate from? Peoples of Russia: names and fates. An explanatory dictionary. Academia, Moscow, 2000. (in Russian). [56] Okladnikov AP. 1950. To the problem of the Initial Development of Siberian Ethnic Groups: The Baikal Population in the Neolithic and Early Bronze Age. Soviet Ethnography. 2:54-157. [57] Debetz GF. 1961. About the roots of colonization of Northern area of Russian Plain and Eastern Baltic areas. Soviet ethnography. 6. [58] Alexeev, VP. 1974. ―‗The Geography of Human races‘.‖ Moscow: ―Mysl‘‖ (Book in Russian). [59] Simchenko, YuB. 1980. Early Ethnogenesis of Ethnic Groups of the Ural Language Family from Transpolar and circumpolar Eurasia. In: Ethnogenez narodov severa (Ethnogenesis of Peoples of the North), Moscow: Nauka, pp. 11-27 (Book in Russian). [60] Bunak, VV. 1965. Anthropological types and Several Issues of Ethnic History. In: Proiskhozhdenie I etnicheskaya istoriya russkogo naroda (The Origin and Ethnic History of Russians). Moscow: Nauka, pp.174-196 (Book in Russian). [61] Alekseeva, TI, editor. 2000. ―‗Vostochnye slavyane. Antropologiya i etnicheskaya istoriya‖‘ (Eastern Slavs: Anthropology and Ethnic History). Moscow: Nauchnyi Mir. (Book in Russian). [62] Sedov VV. Slavs in Antiquity. Archaeological Fund, Moscow, 1994. [63] Verbenko D.A., Limborska S.A. Hypervariable human minisatellite DNA markers: D1S80 locus in population studies Molecular Genetics, Microbiology and Virology. Vol. 23, N 2, pp. 53-62. [64] Mikulich AI, Limborska SA, Popova SN, Slominskii PA, Tsibovski IS., Shadrina MI. An application of DNA polymorphic markers for human ethnic anthropology and genetics. Dokladi NASB. 1999, Vol. 43, N2, p.75-79. [65] Mikulich AI, Tsibovski IS, Slominskii PA, Limborska SA, Kartel NA. DNA polymorphism and ethnic genomics of Byelorussian populations. Dokladi NASB. 2002, Vol. 46, N5, p.78-84. [66] Limborskaia SA. Human molecular genetics: study in the area of medical and ethnic genomics. Mol. Biol. (Mosk) 2004, Vol. 38, N1, p.117-128. [67] Khrunin A, Verbenko D, Nikitina K, Limborska S. Regional differences in the genetic variability of Finno-Ugric speaking Komi populations. Am. J. Hum. Biol. 2007, 19(6):741-50. [68] Ardlie KG, Kruglyak L, Seielstad M., Patterns of linkage disequilibrium in the human genome. Nat. Rev. Genet., 2002,. Vol.3, N4, pp.299-309. [69] Kruglyak L., Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet., 1999, Vol.22, N.2, pp.139-144. [70] Wall J.D., Pritchard J.K., Haplotype blocks and linkage disequilibrium in the human genome. Nat. Rev. Genet., 2003, Vol. 4, N.8, pp.587-597. [71] Gabriel S.B., Schaffner S.F., Nguyen H. et al., The structure of haplotype blocks in the human genome. Science, 2002, Vol.296, pp.2225-2229. [72] The International HapMap Consortium. The International HapMap Project. Nature. 2003, 426(6968):789-96.
Ethnic Genomics of the East European Human Populations
201
[73] Wang N., Akey J.M., Zhang K., Chakraborty R., Jin L., Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation. Am. J. Hum. Genet., 2002, Vol.71, N.5, pp.1227-1234. [74] Bertranpetit J., Calafell F., Comas D., Gonzalez-Neira A., Navarro A. Structure of linkage disequilibrium in humans: genome factors and population stratification. Cold Spring Harbor Symposia on Quantitative Biology. 2003, Vol. LXVIII, 79-88. [75] Khrunin A.V., Tarskaia L.A., Spitsyn V.A., Lylova O.I., Bebyakova N.A., Mikulich A.I., Limborska S.A., p53 polymorphisms in Russia and Belarus: correlation of 2-1-1 haplotype frequency with longitude. Mol. Genet. Genomics. 2005, Vol.272, №6, pp. 666-672. [76] Sjalander A., Birgander R., Saha N., Beckman L., Beckman G., p53 polymorphisms and haplotypes show distinct differences between major ethnic groups. Human Heredity. 1996, Vol. 46, pp.41-48. [77] Beckman G., Birgander R., Sjalander A. et al., Is p53 polymorphism maintained by natural selection? Human Heredity. 1994, Vol.44, N.5, pp.266-270. [78] Mitra S., Chatterjee S., Panda C.K., Chaudhuri K., Ray K., Bhattacharyya N.P., Sengupta A., Roychoudhury S., Haplotype structure of TP53 locus in Indian population and possible association with head and neck cancer. Ann. Hum. Genet. 2003, Vol.67, pp.26-34.
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev and G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 7
RETROELEMENT INSERTION POLYMORPHISM AND MODULATION OF HUMAN GENE ACTIVITY I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova and Y. B. Lebedev* Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russ.Acad.Sci.; Moscow, Russia
ABSTRACT Almost a half of sequenced mammalian genomes can be recognized as molecular fossils of mobile DNA elements, retroelements (REs) mostly. Comparisons of primate with rodent and other mammals reveal presence in their genomes vast range of lineagespecific and species-specific retroposon subfamilies. Such primate retroposon features as their abundance, capacity for lasting propagation in ancestral genomes and regulatory ability soundly suggest retroelements‘ involvement in the human genome evolution and their great contribution to genetic variability of the modern human. In this review, we summarize published data and our recent results concerning impact of evolutionary young retroelements on human genome functioning and mutability. Modern strategies of whole-genome search for human specific or polymorphic RE insertions and new approaches to their functional analysis are subjects of our special consideration. Presented data support a hypothesis that regards human retroelements as active cisregulatory elements for modulation of surrounding genes.
INTRODUCTION Genetic variability of modern humans naturally arises from a process of primate genome evolution. Since their divergence from chimpanzee‘s lineage around 6 six million years ago (Mya), ancestral hominid tribes overpassed a number evolutional steps turning to dominant species Homo sapiens sapiens [1, 2]. The evolutionary history of the human species is *
[email protected]
204
I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova et al.
reflected in its genome structure, so that an origin of some sequences could be traced backward to an ancient ancestor, whereas a nascence of other genetic elements is referred to rather recent periods. Such reconstructions, in particular case, and comparative genomics in whole are in good progress because of achievements in the sequencing of several primate genomes including hominid genomes [3, 4]. Comparative genomics provides new data supporting the hypothesis of regulatory mutation prevalence in evolution of primates [5, 6]. Intensive research is needed, however, to elucidate molecular mechanisms of human genome evolution and to discover genetic reasons for wide variety of modern primate phenotypes. Bioinformatic comparison of human and chimpanzee genomes with sequenced genomes of other mammals reveals existence of abundant groups of retroelements (RE) that are only presented within genomes of related species. For instance, L1P and Alu subclasses, as well as LTRs of endogenous retroviruses (HERV LTR), are founded in no mammalian genomes but primate ones. Over 37% of the human genome areis comprised of various REs and as many as 10, 000 Alu, L1 and LTR insertions are presumably species-specific or polymorphic in human populations. A saturation of primate genome by RE copies continued during the last 65 million years. Then, it was suggested that long-lasting propagation of REs can actively reshape the genome by causing genome rearrangements, creating new genes and modifying the regulatory machinery of the existing genes [7, 8, 9, 10]. Many of RE sequences, and especially solitary LTRs of endogenous retroviruses, retain in their structure numerous regulatory elements. Newly transposed LTRs could, therefore, be served as alternative promoters, enhancers, locus controlling regions (LCRs), and other regulatory sites [7, 9]. Moreover, each of occurred in new locus REs might realize its regulatory potential in a distinct manner depending ofn both genomic neighborhood and signal environment. In this respect, functional analysis of RE insertions, those are polymorphic in human populations, is of special interest. Once REs are hypothesized to be evolutionary pacemakers, then the identification of activated recent insertions in the vicinity of human genes could reveal among REs candidate elements capable of affecting gene regulation. Although the reports on an individual RE and neighboring gene interaction are increasing in number, sufficient analysis of human specific RE insertions is restrained at the moment due to absence of appropriate experimental approaches. In this review, we describe briefly peculiarities of three main classes of the humanspecific REs and summarize our current results on whole-genome identification of polymorphic RE insertions and their functional analysis.
ORIGIN, STRUCTURE AND FUNCTIONAL FEATURES OF HUMAN RETROELEMENTS SINE, Alu Repeats Occurrence and initial amplification of Alu repeats are antedated to the earliest period of primate evolution shortly before the prosimian divergence 65Myr ago [11]. According to current theory, Alu lineage comprising over 1,1 millions individual elements in the human genome originated from a single -gene duplication, which have occurred in an ancestral genome. Structurally Alu element includes left arm, the free left arm monomer (FLAM) and
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity
205
the free right arm monomer (FRAM) that are linked in ‗―tail-to-head‘‖ manner with approximately 130bp in each length (figure 1). Alu-monomers have strong homology with 5‘and 3‘-ends of 7SL RNA, whereas 155bp central region of 7SL RNA is deleted from Alu sequence. The Pol III internal promoter consisting of separated A- and B-boxes is located within 5‘-tnd region of Alu element. Alu repeats, as well as other non-autonomous SINE element doing, do not encode their own revertase, but engage LINE1 retrotransposition machinery for amplification within the host genome [12]. Efficiency of ongoing retroposition of Alu depends on presence of several factors controlling processes of Alu transcription, cDNA synthesis, and new copy integration. In its turn, transcriptional capacity of all and sundry of Alu copies is apparently conditional on surrounding genomic sequences, DNA methylation profile of the locus, chromatin structure, existence of individual mutations in the Alu sequence, local concentration of SRP9/14, La, and other cellular proteins involved in regulation of bulk transcription. Since their occurrence in ancestral genome, a few copies of Alu repeats named ―master-genes‖ kept transposition activity during certain evolutionary period [13]. Although some of human Alu elements still retain ability to transpose, most of Alu repeats were inactivated by mutations in the promoter region. The high mutability of recently inserted Alu copies are related with presence 24 CpG sites in the master-gene sequence. Degeneration of CpG units through the very rapid transition of 5me-C to T is commonly observed for all members of older Alu subfamilies.
Figure 1. Structures of REs, mechanisms of their expansion, and predicted effects. Black lines denote genomic DNA; grey and black rectangles, exons; blue lines, transcripts; green lines with terminal arrows, cDNA of the REs; color rectangles, REs and their regions; blue arrows promoters; color circles and ovals, cellular and RE‘s proteins.
206
I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova et al.
According to current systematics monophyletic Alu superfamily consists of three abundant families: old AluJ, intermediate AluS, and young AluY. The base substitutions at key nucleotide positions, which accrue sequentially during the evolution of the master-genes, define 12 major subfamilies. AluJ combines Jo and Jb subfamilies. Consensus sequence of AluJo displays the highest homology to 7SL RNA pseudogene. Members of both subfamilies presumably represent copies of the single master gene occurred 65Mya (millions years ago) [14]. Intermediate AluSx, Sp, Sq, Sg, and Sc subfamilies have spread in ancestral genome at period of 31-48Mya [15]. Observed sequence differences between members of S family suggest that AluS elements descend from two distinct master-genes retroposed at significantly different rates during overlapping time periods [16]. AluJ coupled with AluS family count in over one1 million members in the human genome. Such features as high degree of sequence homogeneity among both family members and subfamily consensuses, abundant CpG units, intact 3‘-end poly A tract characterize young AluY family. This family is are represented in the human genome by approximately 200, 000 members. Decrease in copy number of AluY elements reflects relative decline of retroposition rate in time period of the family propagation. However, members of Y family were only Alu elements those kept amplifying during pongid-hominid divergence time. Simultaneous activity of several AluY master-genes resulted in formation of AluYa, Yb, Yc, Yd, and Ye subfamilies [17]. Thousands retroposition events transpired so recently that the corresponding AluY insertions are polymorphic in modern human population. The current rate of Alu amplification in the human genome is estimated to be of the order of one AluY insertion in every 200 births. These Alu insertions arisen in genome of a parental germline are named de novo. Expected consequences of Alu retroposition for the human genome functioning are discussed in quite a few reviews. The most striking of observed so far effects is the role of de novo Alu insertions in etiology of several genetic disorders. The already- discovered evidences are summarized in table 1. The listed examples could be enlarged with variety of other predicted interactions between Alu elements and genomic loci affected by the retroposition events. Great abundance in the genome and high structural similarity of Alu repeats make them a perfect target for homologous recombination. The current estimates indicate that roughly 0,5% of human genetic disorders result from recombination-mediated gene disruptions. Since retroelements are well-known methylation hotspots, the new Alu insertions might exchange epigenetic regulation of neighboring genes. The Alu sequences occurred in introns might provide new splice sites or polyadenylation signals and, thus, initiate alternative gene products.
LINE, L1P Autonomous L1 retroposons compose the largest group of three LINE classes recognized in the human genome. L1 ancestors have been apparently formed in mammal genomes once upon the time of eutherians-monotremes divergence about 170 Mya [14]. Despite its age, L1 class is the only one LINEs that keeps transposing over millions years of primate evolution and produces the youngest L1PA subfamilies in the human genome.
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity
207
Table 1. Inherited diseases associated with Alu insertions RE AluYa5 AluSb1
affected gene NF1 FAS
AluY AluYa5
BRCA2 EYA1
AluYa5 AluY
Фактор IX CASR
RE AluYb8 AluYb8 AluYa5 AluYa5 AluYa5
affected gene APC BCHE PBGD Фактор VIII OAT
AluYa5
COL4A3
AluY, AluSx AluYa5 AluYa5 AluYb8 AluYa5 AluYa5
GK GUSB FGFR2 FGFR2 IL2RG ACE
AluYa5 AluYa5
PROGINS Mlvi-2
disease neurofibromatosis [18] autoimmune lymphoproliferative syndrome [19] breast cancers [20] branchio-oto-renal (BOR) syndrome [21] hemophilia B [22, 23] familial hypocalciuric hypercalcemia and neonatal severe hyperparathyroidism [24, 25] disease hereditary desmoid disease [26] acholinesterasemia [27] acute intermittent porphyria [28] hemophilia A [29] deficiency of ornithine deltaaminotransferase [30] autosomal recessive Alport syndrome [31] glycerolkinase deficiency [32, 33] mucopolysaccharidosis type VII [34] Apert syndrome [35] Apert syndrome [35] X-linked immunodeficiency [36] risk factor for myocardial infarction [37, 38] ovarian carcinoma [39] leukemia [41]
mechanism Exon skipping Alternative splicing Exon skipping Exon skipping Frameshift Termination of the translation
mechanism Termination of the translation Termination of the translation Stop codon Stop codon Alternative splicing Alu exonisation Alu exonisation Alu exonisation Ectopic expression of KGFR Unknown Unknown Unknown Unknown Unknown
Even though there are approximately 516,000 L1 repeats in the human genome, the only ~5,000 elements are full-size, whereas the rest number represents 5‘-truncated copies. Also, the only small subset of full-length elements with 30-60 members appear to be capable of retrotransposition in human. The full-length human L1 is about 6000 bp long and is composed of the 5‘-untranslated region (5‘-UTR) bearing an internal promoter, two open reading frames (ORF1 and ORF2) separated by intergenic region and 3‘-UTR containing a poly(A) tail (see figure 1). The process of LINE retrotransposition includes L1 transcription by cellular Pol II, translation of bi-cistronal L1 transcript resulting in p40 and p150 proteins, reverse transcription of L1 RNA, and integration of the cDNA in new locus of the host genome. The ORF1 gene encodes p40 protein which was found in variety of tumourtumor cells including human breast cancer, human medulloblastoma, and some other carcinomas [42]. Cterminal region of the protein is able to bind RNAs, whereas the N-terminal α-helix is responsible for protein-protein interlocking [43]. The ORF2 gene encodes multifunctional p150 protein consisting of the DNA-endonuclease domain (EN) [44], reverse transcriptase domain (RT) [45], and cysteine-rich Zn-finger [46]. EN domain is located within N-terminal region and has site-specific apurinic/apyrimidinic endonuclease activity. As RT has low affinity to L1 transcripts, the enzyme is recruited by various non-autonomous retroposons for
208
I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova et al.
their expansion [47]. Both EN and RT domains of non-LTR retroposons of various types are highly conservative, although RT domain of human L1 has sequence similarity with RTs of several retroviruses [44, 48]. Evidence of structural similarity of L1 Zn-finger domain with RNA-binding CCHC-containing proteins from yeasts and retroviruses suggests capacity of p150 to recognize L1 RNA [49]. There are two peculiarities of L1 3‘-end regions as compared with 3‘UTRs of cellular genes. The first, L1 elements contain a long A-rich sequence following poly(A) signal, and the second, there is no GT-rich site at the 40-60 bp distance downstream of poly(A) signal [43]. These abnormalities allow Pol II enzyme to move through the retroelement and to transcribe the L1 3‘-ajusted genomic sequence. The resulted long transcript could be inserted in a new locus by L1 retroposition machinery [50]. Apparently, this scenario, named ―‗L1mediated transduction,‘‖, was often realized in course of primate evolution, as L1-tranduced fraction comprises up to 1% of the human genome [50]. L1 expansion in primate genome appears to be a permanent process, while L1 mastergene activity waved in course of primate evolution. Calculated ageing of major L1 subfamilies indicates that large retroposition waves preceded divergences of primate main taxa [51, 52]. Also simultaneous activity of several distinct L1 master-genes is suggested for periods of enhanced expansion. The youngest L1PA1 subfamily is named L1Hs (human specific L1) and is characterized with the lowest value of intragroup sequence divergency. L1Hs subfamily combines two distinct groups that differ from each other due to the sequence at 5930-5932 positions of their consensuses. Members of Ta group have ACA triplet, whereas the older pre-Ta group is distinguished by presence of ACG nucleotides at the positions [53, 54]. Modern counting reveals existence 535 L1-Ta elements and 415 pre-Ta elements in the human genome. Recent analysis of multiple alignments of these L1 sequences supports further subdivision of Ta group into Ta-0, Ta-1d, and Ta-1nd branches according to diagnostic deletion within 3‘-end region of the consensus sequence [54]. Several members of Ta group being tested by using of a special in vitro assay display high frequent retroposition in human cell lines [55]. There are 14 known de novo L1-Ta insertions that caused inherited diseases in the inspected human families [47]. Most of discovered to date polymorphic L1 insertions also represent members of Ta and pre-Ta group. Summarizing these evidences, one can go to a conclusion of exclusive retroposition activity of L1-Ta group members in modern human populations.
LTR Elements, HERV-K(HML-2) According to commonly accepted hypothesis, human endogenous retroviral (HERV) elements occurred in ancestral genome due to ancient germ line infections of exogenous retroviruses and to subsequent fixation of the proviruses in genome of ancestral populations. During further steps of the endogenization process, retroviruses lost their ability to infect new cells, while integrated proviruses having intact pol gene retain capacity to transpose their cDNA copies in new loci of ‗host genome. Newly integrated proviral copies are inherited in a Mendelian fashion, when the retroposition events have occurred in a germline. Endogenization of various retroviruses characterizes evolution of primate genome. A recent analysis of available databases resulted in the identification of 22 HERV families independently acquired by ancestral genome, so multitude members of the families were
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity
209
identified in orthologous loci of the human, chimpanzee, gorilla, and other primate genomes [7, 10, 56]. Most of HERV sequences recognized so far in the human genome resemble full length proviruses with two LTRs bordering internal regions with similarity to the gag, pol and env genes of known exogenous retroviruses (figure 1). However, great majority of HERV elements are disrupted by numerous mutations located in both the regulatory, e.g., LTR, and in the coding sequences. Accumulation of the corrosive mutations has resulted in attenuation of HERV families‘ expansion on time of pongid divergence. The single exception is HERVK(HML-2) family, which continued its expansion even in hominid genome. In the human genome, HERV-K(HML-2) family comprises 30-50 provirus-like elements and over 1,000 solitary LTRs, which are presumably derived from recombination events between identical 5‘ and 3‘ LTRs of proviral progenitor. Recently, we performed comprehensive analysis of human HERV-K LTR sequences and compared variety of the orthologous loci in human and other primates‘ genomes [56-61]. As a result, two major subfamilies including 16 distinct groups of HERV-K(HML-2) LTRs have been defined. The LTRs belonging to these groups have presumably originated from distinct ‗―master genes‘‖ that appeared at different periods of evolution. Initially, both of subfamilies and several old groups have been formed during the largest HERV-K expansion in a progenitor of Old World monkey [57, 61]. Later on primate evolution LTR-I subfamily has exhausted its retrotransposition potential, while members of LTR-II subfamily evolved to an active mastergenes forming younger LTR groups [58, 59]. Unlike most of other HERVs that drove ‗―retrotranspositional explosions,‘‖, HERV-K expanded by serial waves of activities of distinct master-genes‘ groups [56,57, 59]. Two youngest groups named II-T and II-L are characterized by a very low degree of intragroup divergence, and the time of their master genes expansion was estimated at 8 and 4 Mya, respectively, which is close to the time of the hominid and pongid lineages divergence. According to the results of locus- specific PCR assay, several LTRs belonging to II-L group were detected in human orthologous loci but not in apes [56, 61]. Thus, the only HERV-K family produces human specific groups of LTR elements, also all known polymorphic insertion of endogenous retroviruses and LTRs, are formed by HERV-K LTR-IIL members [62-64].
APPROACHES TO IDENTIFICATION SPECIES-SPECIFIC AND POLYMORPHIC INSERTIONS OF RETROELEMENTS Modern strategies of comprehensive search for human specific and polymorphic RE insertions can be operationally subdivided into two categories. The first one utilizes the current release of the human genome data base and exploits phylogenetic analysis of sequenced RE copies aimed at the detection of ‗―evolutionary young‘‖ branches. Apparently, such a computer search should be accompanied by experimental checking of which of the REs identified are indeed polymorphic. In addition, many polymorphic insertions can be absent from the available human genome sequence since not only it contains gaps but it also represents only a few haplotypes taken for sequencing just by chance. Moreover, the computer search is impossible for non-sequenced genomes, in particular those of primates other than human. Unfortunately, known experimental techniques did not permit to make
210
I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova et al.
genome-wide analyses, although some approaches were successfully applied to detecting polymorphic retroelements. The alternative strategy is based on direct experimental comparison of human genomic DNA with DNAs of primates by cloning and analysis of integration sites of REs under study. Both of the strategies received a great momentum from the sequence of both the human and ape genomes, because it allows immediate assignment of identified human- specific REs and their integration sites to precise genomic locations. Being the most abundant REs in the human genome, Alu elements were chosen for the first attempts to perfume their phylogenetic analysis. As a result, the AluY family was characterized as the youngest class of Alu repeats [65, 66]. The further analysis of AluY members extracted from the draft of the human genome allowed to construct consensus sequences of several closely related groups, including the AluYa5, AluYb8, and AluYc1 branches and to predict human specificity of the branches [12]. To inspect the prediction, the authors selected a subset of almost 500 human loci containing members of the AluY branches and suitable for designing pairs of unique primers for subsequent PCR assay of orthologous loci in genomes of human population and non-human primates. Indeed, results of the assay verified human-specificity of absolute majority of the tested AluYa5/b8 insertions, also ~25% of the insertions were found to be polymorphic in human populations [67, 68]. Similar phylogenetic approach and corresponding PCR assay of four human populations were implemented to study other evolutionary young AluY branches including Ya [69], Yb [68, 70], Yc [68, 71], Yd [72], Yg and Yi [73], Ye [74], and several small AluY subgroups located on ChrX [75]. An over 800 polymorphic AluY insertions were identified during these largescale studies and wide range of the AluY-containing allele‘s frequency was observed in four4 populations inspected so far. Phylogenetic approaches including multiple alignment tools and bioinformatic search of available genome data bases were also used to study human LINE1 elements. Currently, the human specificity of the L1-Ta family [53, 54, 76] can be considered as proven. An improved consensus L1-Ta sequence was used for a whole genome identification of polymorphic L1 insertions [55, 77]. The attempts to identify recently integrated HERV-related sequences using phylogenetic analysis were made predominantly for most abundant HERV families, namely HERV-H, HERV-E, ERV-9, HERV-W, and some others. Each of the families and/or their solitary LTRs was shown to consist of distinct branches of various ages in its phylogenetic tree. Although the time of the youngest HERV-H Ia copy insertion in the human genome was estimated at approximately 4.5 Mya, the expansion of the HERV families in the ancestral genome was ceased soon after the divergence of apes and Old World monkeys, that is about 20-25 Mya [7, 9, 10]. The only representatives of HERV-K(HML-2) family retained their transpositional activity at the recent period of primate evolution. As it was described in our previous articles [59, 61], two groups of HERV-K LTR-II subfamily were active at the period of ape lineage formation and even kept transposing after the divergence of pongids and hominids. Although phylogenetic analysis is generally a powerful tool of identifying very recently formed subsets of REs, comprehensive experimental genome-wide comparisons of human and chimpanzee genomic DNAs are needed to identify all human-specific RE insertions. This is indispensable for making definite conclusions regarding their impact on hominid speciation. We have recently proposed such an approach to experimental comparison of related genomes [78, 79]. The strategy of the developed technique (figure 2) includes the following
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity
211
stages: (i) a whole genome amplification of the flanking sequences adjacent to target interspersed repetitive elements in both genomic DNAs under comparison, (ii) subtractive hybridization of the extracted amplicons followed by construction of an arrayed library of DNA fragments specific for one of the amplicons, (iii) differential hybridization of the resulting library, sequencing and mapping of differential clones, and (iv) verification of species specificity of the identified RE insertions by locus-specific PCR assay of primate DNAs. This technique enabled us to perform the first whole genome experimental search for differences in LTR and L1 integrations between the human and chimpanzee genomes. Further analysis of the integration sites in primate genomes and structural analysis of human REs has led to the following conclusions [56, 78-81]: • •
•
As many as ~4000 L1 and ~150 HERV-K insertions differ the human genome from that of chimpanzee; Human-specific insertions represent members of three LTR and two L1 phylogenetic groups, suggesting long -term parallel retrotranspositions of different master-genes of these REs; Significant part of human-specific LTR integrations are unevenly distributed both along individual chromosomes and among them and often located in close vicinity or even within human genes, suggesting their possible involvement in the regulation of the neighboring genes expression.
Figure 2. Principal scheme of genome-wide subtractive hybridization of sequences flanking RE insertions. Double straight lines denote DNA fragments; R with vertical bars, positions of restriction sites; light or dark rectangles with letters RE, positions of identical or species specific RE insertions; dark circles denote suppression adapters.
212
I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova et al.
Figure 3 presents examples of human-specific LTRs located close to or within genes. This finding is in accord with the data reported by other authors on various types of REs functional involvement in regulation of gene expression [10, 41, 83]. Further improvements of the subtractive hybridization allowed us to develop an advanced technique aimed at the identification of RE polymorphic insertions in the human genome [84]. A principal distinction of our approach from that developed for interspecies comparison is that an individual human genome was compared to a mixture of tens other individual genomes in a single experiment. This makes it possible to detect RE insertions existed in several or even single genome of tracer, while the insertions is are absent from the driver genome. The technique was applied in our study to identify AluYa5 insertion polymorphism by using tracer composed of 10 individual genomes representing unrelated ethnic groups of Russian population.
Figure 3. Localization of human specific LTRs in introns of genes (examples). Genes are denoted by arrows pointing the direction of transcription. Vertical bars indicate positions of exons. Short arrows mark positions of LTRs and point U3-U5 directions.
Flow chart of the approach is shown in figure 4. Products of the two-step subtractive hybridization were cloned, and an arrayed library of AluYa5 flanks was constructed. Initial analysis of 200 randomly selected clones includes (i) its sequencing followed by a selection of individual sequences, (ii) mapping of the individual sequences on the human genome using the NCBI human genome databases and the UCSC Genome Browser, (iii) design and synthesis pairs of unique oligonucleotide primers corresponding to each of mapped integration sites of AluYa5 elements, (iv) inspection for AluYa5 presence in the loci by locus-specific PCR of individual genomic DNAs composed the mixed tracer. Examples of individual locus-specific PCRs are shown in figure 5. In order to finally verify the dimorphic status of the Alu insertions in human population, we performed the PCR assay of 260 individual DNAs sampled from eight8 unrelated over world population. The analysis of small subset of the arrayed library was revealed in identification of 40 AluYa5 and 1 AluYa8 polymorphic insertions (table 2). Evidence of AluYa5\ a8 insertional polymorphism for 23 of detected loci has been described recently by other authors [67, 69]. In contrast, the rest 18 loci were presented in human genome data bases by only Alu-lacking alleles. To verify whether the PCR length polymorphism is actually due to Alu inserts, the corresponding PCR-products
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity
213
were cloned and sequenced (AY736285 - AY736302). For five of these 18 polymorphic insertions, Alu positive alleles were present in only one genome of the complex tracer, and all of them were heterozygous [84]. The sensitivity of the technique is sufficient to isolate insertions present in only one genome and in a heterozygous. Thus, we concluded that the efficiency of the method with regard to detection of polymorphic insertions is rather high. One of major advantages of the technique is the possibility to identify polymorphic Alu repeats not available in human genome databases. The technique allows detecting polymorphisms in wide range of their frequencies in population. The developed technique can be successfully applied to comprehensive searches for polymorphic Alu insertions of other subfamilies, as well as polymorphisms resulted from retroposition of other retroelements. Other possible application of the technique is identification of polymorphic RE insertions that distinguish human subpopulations or groups and identification of RE polymorphisms presumably associated with hereditary multigene diseases. Table 2. Characteristics of identified AluYa5 insertions Accession 1
Marker name
Chromosome
Alu +
Alu -
Ya5-MLS09
AK056306
AL162431
1q25.3
Ya5-MLS51 Ya5-MLS58 Ya5-MLS59 Ya5-MLS48 Ya5-MLS26 Ya5-MLS29 Ya5-MLS47 Ya5-MLS57 Ya5-MLS05 Ya5-MLS50 Ya5-MLS04 Ya5-MLS06 Ya5-MLS44 Ya5-MLS19 Ya5-MLS10 Ya5-MLS14
AY736296 2 AY736298 2 AL356501 AC073577 AY736289 2 AC011325 AC024248 AY736297 2 AC105756 AY736295 2 AC114316 AC020921 AY736294 2 AC01094 2 AY736285 2 AY736286 2
AL592148 AL390117 AL365175 AC073046 AC099331
1q41 1p12 1p31.1 2p13.1 3p22.1 3q29 4q26 4q31.22 4q34.3 4q35.2 5q14.3 5q14.3 6q22.31 6q22.33 6p12.2 7p12.3
Ya5-MLS39 Ya5-MLS23 Ya5-MLS41 Ya5-MLS37 Ya5-MLS28 Ya5-MLS07 Ya5-MLS35 Ya5-MLS36 Ya5-MLS18 Ya5-MLS34 Ya5-MLS70 Ya5-MLS12 Ya5-MLS46 Ya5-MLS69 Ya5-MLS22
AC005377 AP003357 AP005354 AY736292 2 AY736290 2 AC025427 AL731574 AC090832 AC080183 AY736291 2 AY736302 2 AL138681 AL390722 AY736301 2 AL161897
AC010683 AC087107 AC115540 AC116332 AC007554 AL365508 AL121969 AC019066
AC012482 AC103853 AC022389 AL358033
AC023869 AC009721 AL159153
AL390964 AE014308
7q34 8q22.1 8q23.1 10q23.1 10p13 10q21.1 10q25.1 11p11.2 11p14.3 12q24 13q34 13q12.3 13q14.3 13q21.1 13q33.2
Neighboring genes 3 2kb downstream XPR1 in AK096526 No genes No genes in ACTG2 in MYRIP No genes No genes No genes No genes No genes No genes No genes in c6orf170 in LAMA2 No genes in HUS1, PKD1L1 No genes 4k up LAPTM4B in ZFPM2 4k down PCDH21 No genes No genes No genes in BHC80 in AK127695 in RPC2 in COL4A2 No genes No genes No genes No genes
Alu
+
frequency4 7\20 1\20 1\20 7\20 14/20 9\20 13\20 3\20 2\20 5\20 4\20 10\20 8\20 2\18 10\20 5\20 1\20 6\20 5\20 14\20 6\20 2\20 3\20 18/20 11\16 12\20 6\18 14/20 14\20 14/20 4\20 4\20
214
I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova et al. Table 2. (Continued). Accession 1
Marker name Alu + Ya5-MLS68 Ya5-MLS65 Ya5-MLS11 Ya5-MLS21 Ya5-MLS63 Ya8-MLS32 Ya5-MLS16 Ya5-MLS30 Ya5-MLS20
Chromosome Alu -
2
AY736300 AY736299 2 AC010674 AC021958 AC130456 AC139236 AY736287 2 AC016586 AY736288 2
AL132990 AL445883 AC026583 AC003108 AC140504 AC108483 AL135935
14q32.13 14q13.1 15q21.2 15q23 16p12.3 16p13.1 16q23.2 19p13.3 20p12.2
1
Neighboring genes 3 3k up SERPINA 4 No genes in MYO5C No genes in TMC5 No genes No genes 8k up EEF2 in PAK7
Alu
+
frequency4 5\20 1\20 5\20 9\20 3\18 3\20 3\16 16\18 1\20
NCBI accessions for Alu-containing (Alu +) and Alu-lacking (Alu -) alleles are indicated. 2 newly identified Alu repeats. 3 -Neighboring genes were detected by Human Genome Browser under RefSeq Genes; up – Alu is N Kb upstream gene, down – Alu is N Kb downstream gene, in - Alu is into intron of gene, No genes – no neighboring genes were detected within 10 Kb of Alu. 4 For polymorphic Alu repeats fraction of Alu-positive haplotypes to the total haplotypes tested for Tracer DNA. Driver DNA for polymorphic insertions is always Alu negative.
Figure 4. Scheme of the method of polymorphic RE identification.
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity
215
Figure 5. Examples of locus-specific PCR. Electrophoregamms of locus-specific PCR amplification with three individual polymorphic Alu-containing loci (MLS 19, MLS 50 and MLS 65) in ten tracer (lines t1-t10) and one driver (line Dr) DNA samples. Lines K+ and K- represents positive and negative controls, respectively. M - DNA fragments of a 100 bp ladder length marker (SybEnzime). Grey and white arrows to the left of the electrophoregrams indicate the predicted locations of the Alu containing and Alu-free PCR products, respectively. Scheme of genomic primers location is represented at the bottom.
One of essential steps thatose govern sensitivity of the technique is formation of the amplicons by selective PCR. There are several factors exercising a decisive influence on the stage production. As it is shown in figure 6, RE-lacking restriction fragments of the genomic DNA could not be amplified due to PCR-suppressive adaptors ligated to both of their ends. Choosing of the most suitable endonuclease and proper design of the adaptors are crucial factors for efficient PCR suppression. At once, completeness and specificity of the selective PCR depend on how perfectly T primers target the investigated RE subset. Also, a proper design of A/T primers has to provide the maximal efficiency of the selective PCR. Considering listed above factors, we developed sets of primers and adapters selective amplification of integration sites of human AluYa5/a8, AluYb8/b9, and AluYc elements. Using the sets we implemented the subtractive hybridization for various tracers containing up to 50 individual genomic DNA samples. Current analysis of the constructed arrayed libraries allowed us to identify for the first time 49 polymorphic AluY insertions. The proportion of newly identified Alu among already sequenced clones varies between 27% and 35% depending of AluY branch taken for the analysis. This result suggests that significant part of polymorphic REs were not identified during genome sequencing and remain to be detected and characterized. This conclusion is in a good accord with data most recently published by Boissinot and co-workers [85]. The authors implemented experimental search for dimorphic L1 Ta-1 insertion in restricted set of individual genomic DNAs and discovered about 20 L1Hs
216
I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova et al.
elements that were lacking from the available human genome databases. Therefore, further applications of improved experimental approaches to identification of recently integrated REs will elucidate our knowledge of RE insertional polymorphism in human.
Figure 6. PRED: examples of request input.
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity
217
According to modern estimations, human genome contains thousands of polymorphic retroelement copies considered to be perspective molecular genetic markers of new generation. However, utilization of polymorphic retroelements as molecular genetic markers is limited due to lack of systematic data on their number, genomic context and distribution among human populations. Currently, we have created the first bilingual (Russian/English) internet-resource devoted to known polymorphic retroelement discovered in human genome by our group as well as by other researchers worldwide [96]. The database contains information about each retroelement copy location, position relative to known and predicted genes, frequency of alleles in human populations and others. Figure 6 presents an example of the database search pages. At the moment, the database includes over 3,000 REs large part of which is mapped within introns of human genes (table 3). Our internet portal allows to perform a search in database using multiple search conditions and available on http://labcfg.ibch.ru/home.html. The database provides an opportunity to investigate distribution of polymorphic retroelements in human genome and to design new genetic marker for various population and medical studies. Table 3. Retroelements presented in PRED database RE type SINE Alu Ya Alu Yb Alu Yc others LINE L1HS L1Ta preTa others HERV-K LTR II-L others SVA
Total number 1673 761 511 134 267 407 270 79 4 54 12 9 3 63
Located in introns 406 174 145 30 57 64 43 9 1 11 0 0 10
POLYMORPHIC RE INSERTIONS AS MARKERS FOR GENOTYPING Insertion polymorphism of retroelements is now attracting considerable attention. Due to some features, this type of polymorphism seems to be advantageous over the others (SNPsingle nucleotide polymorphism or microsatellites). These features include a known ancestral state (absence of RE), stability of insertion (there is no special mechanism of removing inserted REs). Therefore, the presence of an RE represents identity by descent, whereas the ancestral state of the RE insertion polymorphism is known to be the absence of the RE. Another advantage of polymorphic REs is relatively easy detection by standard locus-specific PCR assay. Therefore, molecular-genetic markers utilized polymorphic RE insertions becomes a new powerful tools of considerable interest for modern studies on human genome evolution and relationships of various human populations, as well as for gene mapping. For example, most recent publications of M. Batzer‘s lab [86], A. Furano with co-workers [87], and many other groups demonstrate great applicability of AluY- and L1Hs-derived markers
218
I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova et al.
for efficient researches of history and genetic structure of human populations. Obviously, there are many factors influencing efficiency of the marker application. They include frequencies of RE-containing allele in different human populations, reproducibility of the allele detection, genomic content of the polymorphic RE insertion. The last of factors needs a special consideration. Non-random distribution of polymorphic insertions of REs belonging various phylogenetic groups was reported by us and other authors [56, 82, 85]. An example of data supporting the observation is presented in figure 8. The diagram shows amounts of known human genes (annotated at www.ncbi.nlm.nih.gov/RefSeq) and polymorphic Alu insertions (newly identified by us and currently described by other authors) in human loci with different GC-content. Density of Alu distribution differs from those of genes. The highest frequency of Alu insertion is observed in loci with 35%-45% of GC-content, whereas most of human genes are located in regions having over 45% of GC. Similar tendency of co-localization of polymorphic L1-Ta elements with GC-mediate genes was observed by other authors [85]. According to current estimation, about 1/3 part of polymorphic RE insertion exists in introns of known or predicted human genes. Obviously, the intronic RE insertions are the best candidates for construction of new markers since these REs are apparently cis-regulatory elements for the surrounding genes.
Figure 7 Scheme of selective PCR. Double straight lines denote DNA fragments; light rectangles, RE insertions or their fragments (ΔRE); R with vertical bars, positions of restriction sites; grey rectangles, suppression adapters; symbols A1, A2, T1 and T2 above horizontal arrows mark position of the PCR primers used.
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity
219
Figure 8. Histogram of known gene and polymorphic AluY distributions within human loci of various GC-content.
Recently, we developed a set of informative markers based on insertion polymorphism of human AluYa5/a8 and L1-Ta retroelements [88]. The set included 47 pairs of PCR primers corresponding to introns of the human genes with dimorphic L1 and Alu insertions. Implementing locus-specific PCR assay, we genotyped a panel of widely used human cell lines of various origins. Variety of eukaryotic cell lines, including human ones, is are commonly used in various fields of modern biology and medicine being an indispensable, renewable resource for numerous studies. It is commonly assumed that the cell features in a particular cell line and those in the human tissue from which this line originated are similar. On the other hand, most human cell lines are prone to various sorts of genetic rearrangements that affect biochemical, regulatory and other phenotypic features of cells during their cultivation. Various chromosomal abnormalities including aneuploidy, numerous rearrangements and loss of chromosome regions are characteristic alterations of cell genomes, especially if the cells are of tumor origin. Therefore, it would be very useful to permanently monitor the authenticity of the cell lines and/or to have a comprehensive set of standard tests for confirming the cellular identity. Routinely used descriptions of established cell lines include their origin and cultivation conditions, but this can be insufficient. Appropriate tests should be performed to authenticate the cell line identity. To test the identity, phenotypic (e.g., morphological analysis) or genotypic characteristics can be used. In many cases, isoenzyme analysis may be applied to identify the cell line species of origin. An alternative strategy would be to demonstrate the presence of unique markers, for example, by using banding cytogenetics to detect a unique marker chromosome, or DNA analysis to detect a
220
I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova et al.
genomic polymorphism pattern (for example, restriction fragments length polymorphism, variable number of tandem repeats, or genomic dinucleotide repeats). Expression of a particular product may represent a complementary approach for confirmation of identity. At present, several types of tests including those for tissue specific marker expression, detection of a karyotype and chromosome aberrations by G-banding or by FISH [89] are in rather common use for characterization of distinct cell lines. Modern approaches for more scrupulous examination of cell lines also include M-FISH (multiplex-FISH), SKY (spectral karyotyping), CGH (comparative genomic hybridization), FICTION (fluorescence immunophenotyping and interphase cytogenetics as a tool for investigation of neoplasms), and PRINS (oligonucleotide-primed in situ DNA synthesis) [89-92]. However, these methods are time-consuming and require high-technology instrumentation. It is highly desirable, therefore, to have a standard set of genomic markers that would allow a fast, reliable and efficient authentication of the cell line identity. One type of informative markers we suggested to apply thereto is polymorphic insertions of various retroposons. The results of cell lines genotyping performed with using of the renovated set of 58 AluY/L1-Ta markers are listed in tables 4A and 4B. To develop the set of markers, we preferred to select the dimorphic AluYa5/a8 and L1-Ta insertions that display intermediate frequency (i.e., within the range from 20% to 70%) in human populations. Indeed, 38 of 58 inspected loci were found to be presented by both alleles in genomes of the tested human cell lines (table 4). Taking into account that a diploid cell line may have three genotypes for each locus (+/+, +/-, or -/-) and assuming for simplicity that all these genotypes are equally probable, one can calculate that there can be over 1018 (1/3-38) unique genotypes differing in combinations of 38 dimorphic loci. Thus, the probability that two different cell lines have identical genotyping patterns is extremely low, and the developed set of polymorphic markers is sufficient to provide an individual fingerprint for any human cell line.
POLYMORPHIC RETROELEMENTS’ IMPACT ON HUMAN GENE EXPRESSION The assumed role of young RE insertion in normal and pathological intracellular processes is strenuously debated problems now. Although the number of observations in favor of REs‘ influence on human gene activity is constantly growing, the understanding of the impact of recent Alu, L1, and LTR insertions on genome functioning is far from being comprehensive. One of possible approaches to study the effect of polymorphic REs on gene expression could be a comparison of expression levels of two alleles differing in that one of them contains an intronic RE insertion, whereas the other does not. In this approach, the expression levels can be compared in the same cellular environment, thus making the comparison more accurate.
Table 4А. Cell lines’ genotyping with AluY-dedived markers RE
Cell line
Gene
Locus
Tera1
Tera2
NGP
HL60
OsA-CL
RMS-13
HeLa
HEK293
HT1080
HepG2
Jurkat
AluYa5
RPC4Y1
Yp11.31
+
-
+
-
+
+
-
-
+
+
+
AluYa5
UGB
11q12.3
-
-
-
-
-
-
-
-
-
-
-
AluYa5
RPN2
20q11.23
-
-
-
-
+/-
-
-
-
+/-
-
-
AluYa5
MRC2
11q23.2
+
+
+
+
+
+
+
+
+
+
+ +
AluYa5
CFTR
7q31.2
+
+
+
+
+
+
+
+
+
+
AluYa5
SEMA3A
7q21.11
-
+/-
-
-
-
-
+
-
+/-
+
-
AluYa5
AMPH
7p14.1
+
+/-
-
+
+
-
+/-
+
-
+
+
AluYb8
RAB6IP2
12p13.33
+
+
+
+
+
+
+
+
+
+
+
AluYb8
KCNJ6
21q22.13
+
+
+
+
+
+
+
+
+
+
+ +/-
AluYb8
DSCAM
21q22.2
-
-
+/-
+
+
-
-
-
+
+/-
AluYb8
SLC25A18
22q11.22
+
-
+/-
+/-
-
-
-
-
+/-
+
-
AluYb8
TSHR
14q31.1
+
+/-
+/-
+
+/-
+
+/-
+
+
+
+
AluYb8
NRP1
10p11.22
+/-
+
+
-
+
+
+
-
+
+
+
AluYb8
MAP4K3
2p22.1
+
+
+
+
+
+
+
+
+
+
+
AluYb8
TSC2
16p13.3
-
-
+/-
-
-
-
-
-
-
-
-
AluYb9
SYT17
16p12.3
-
-
-
-
-
-
-
+/-
-
-
+/-
AluYc1
KCNH2
7q36.1
+
+
+
+
+
+
+
+
+
+
+
AluYc1
MAP3K7
6q22.31
-
-
+/-
-
+/-
+/-
-
+/-
-
-
+/-
AluYc1
ARFGEF2
20q13.13
+
+
+
+
+
+
+
+
+
+
+
AluYa5
MYO5C
15q21.2
-
+/-
-
-
+/-
-
-
-
-
-
-
AluYa5
HUS1
7p12.3
+/-
-
-
+/-
-
-
-
-
-
-
-
AluYa5
LUZP2
11p14.3
+
+
-
+
-
+
+
+/-
-
+
-
AluYa5
LAMA2
6q22.33
+/-
-
-
+
+/-
+
-
-
-
+/-
-
AluYa5
PAK7
20p12.2
-
-
-
-
-
-
-
-
-
-
-
AluYa5
MYRIP
3p22.1
-
-
-
-
-
+
-
-
+/-
+/-
+/-
AluYa5
RPC2
12q24
-
-
-
+/-
-
-
-
+/-
+/-
+/-
-
Table 4А. (Continued). RE
Cell line
Gene
Locus
Tera1
Tera2
NGP
HL60
OsA-CL
RMS-13
HeLa
HEK293
HT1080
HepG2
Jurkat
AluYa5
PHF21A
11p11.2
-
-
-
+
-
-
-
+
-
-
+/-
AluYa5
ZFPM2
8q23.1
+/-
+
+/-
-
-
-
+
+/-
+/-
-
+/-
AluYa5
ACTG2
2p13.11
+
+
-
+
+/-
+
+
+/-
+/-
+
+
AluYa5
TMC5
16p12.3
-
+/-
+/-
+/-
+
-
+/-
-
-
-
+/-
AluYa5
COL4A2
13q34
+
+
+/-
+
+/-
+/-
+
+
+/-
+/-
+/-
Table 4А. Cell lines’ genotyping with L1Ta-dedived markers РЭ
Ген
Хромо-сомный локус
Tera-1
Tera-2
NGP
HL-60
OsA
Клеточные линии RMS
HeLa
HEK293
HT1080
Jurkat
L1
FLJ21423
7q11.23
-
-
-
-
-
-
-
-
-
-
ΔL11
GRIN3a
9q31.1
-
-
-
-
-
-
-
-
+/-
-
L1
AK096055
15q21.3
-
-
+/-
-
+/-
-
-
-
-
-
L1
RPIA
2p11.2
-
-
+/-
-
-
+/-
-
-
-
-
L1
SPOCK3
4q32.3
-
+/-
+/-
+/-
+/-
+/-
+/-
-
-
-
L1
SMG1
16p12.3
-
+/-
+/-
-
+/-
+/-
+/-
-
+/-
+/-
ΔL11
BPGM
7q33
-
-
-
+/-
+/-
+/-
-
+/-
+
+/-
L1
FLJ21269
6q25.2
-
-
+
+/-
-
+/-
-
-
-
-
L1
GRID2
4q22.3
-
-
-
+/-
+
+/-
+/-
-
+
+
ΔL11
NF1
17q11.2
+/-
+/-
+
+/-
+
+/-
+/-
+/-
+/-
-
ΔL1
1
ABLIM2
4p16.1
-
+
+
+/-
+
+/-
+
+
+/-
-
ΔL11
PCM1
8p22
+/-
+/-
+/-
+
+/-
+
+/-
+
+
+
L1
NRCAM
7q31.1
+/-
+
+/-
+
+/-
+
+
+
+
+/-
L1
HSDR8
4q22.1
+/-
+
+
+
+/-
+/-
+
+
+/-
+
L1
LRP2
2q31.1
+
+
+
+
+/-
+
+
+
+
+/-
ΔL11
AUTS2
7q11.22
+
+/-
+
+
+
+
+/-
+
+
+
L1
EPHA3
3q11.1
+/-
+
+
+
+
+
+
+
+
+
ΔL11
LRP1B
2q22.2
+
+
+/-
+
+
+
+
+
+
+
ΔL11
CBLB
3q13.11
+
+
+
+
+
+
+
+
+
+
L1
CD38
4p15.32
+
+
+
+
+
+
+
+
+
+
L1
FBXL7
5p15.1
+
+
+
+
+
+
+
+
+
+
ΔL11
CGI-130
6q22.31
+
+
+
+
+
+
+
+
+
+
L1
FLJ90754
9q31.3
+
+
+
+
+
+
+
+
+
+
L1
CNTN5
11q22.1
+
+
+
+
+
+
+
+
+
+
Table 4А. (Continued). РЭ ΔL1
Хромо-сомный локус
Tera-1
Tera-2
NGP
HL-60
OsA
RMS
HeLa
HEK293
HT1080
Jurkat
GBE1
3p12.2
+
+
+
+
+
+
+
+
+
+
ZNF407
18q22.3
+
+
+
+
+
+
+
+
+
+
Ген 1
ΔL11
Клеточные линии
Genotype characteristics: ―+‖, both alleles contain the RE; ―-―, both alleles lack the RE. ―+/-―, one of the alleles contains, and the other lacks the RE. 1 Symbol ―ΔL1‖ indicates the presence in the locus of 5‘-truncated L1-Ta insertion.
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity
225
The suggested approach is implemented in our current research aimed at getting a deeper insight into the impact of polymorphic AluY and L1-Ta insertions on human genes expression. Recently, we have developed an appropriate technique based on RT-PCR analysis of pre-mRNA content in cells heterozygous for the RE insertions [93-95]. Initially, we have selected human tumourtumor cell lines of various tissue origins including epithelial cells (Tera-1, Tera-2, HEK293, HeLa, HT1080, and HepG2 line), lymphoblasts (HL60 and Jurkat), fibroblasts (RMS-13 and OsA-CL), and neuroblasts (NGP-127). To select polymorphic RE insertions suitable for the research, we took into account intronic location in known or predicted human genes, data on tissue-specificity of gene expression, known frequencies of RE-containing alleles, and other characteristics of the corresponding human loci. The selected genes were also different in function and specificity of transcription. This diversity provided a good basis for generalization of the study results. For comparison of primary transcription of human allele pairs, we selected loci heterozygous for the RE insertions in at least one of the studied cell lines. As a result, we have chosen 59 loci and constructed a set of PCR primer pairs corresponding to unique genomic sequences adjacent to the RE sites. The principle of the assay is schematized in figure 9. The amounts of primary transcripts for RE+ and RE0 gene alleles were measured by a semi-quantitative RT-PCR assay. hncDNA prepared from hnRNA isolated from the cell lines was RT-PCR amplified with pairs of primers corresponding to intronic sequences adjacent to the sites of RE integrations. Primary transcript contents for each allelic pair were estimated by the amount of RT-PCR products obtained after the specified number of PCR cycles, as shown in figure 9b. Two alleles of the inspected gene easily distinguish from each other in length of RT-PCR products. Therefore, the ratio of the numbers of PCR cycles that are sufficient to visualize the amplification products reflects directly the proportion of the allelic transcripts in particular cells. An example of the results obtained for alleles of the TMC5 gene from Jurkat cells is presented in figure 9b. According to the results of semi-quantitative RT-PCR assays RT-PCR products of Alu+ allele are detected at the 36th cycle, whereas those of Alu0 allele at 33rd cycle. This means that the primary transcription level of the Alu+ TMC5 allele in Jurkat cells is lower than that of Alu0. The reproducibility of the effect was verified by additional RT-PCR experiments using both hnRNA purified from independently cultured cell lines and different cDNA preparations. To make the analysis more accurate and to exclude possible artifacts of different priming efficiency, each primer pair used was examined also on genomic DNA templates. To this end, disproportion in allele transcripts can be estimated by a comparison of NAlu+/NAlu- ratios for genomic DNA and hncDNA templates. Here, NAlu+ and NAlu- denote the number of PCR cycles sufficient to detect the products for Alu+ and Alu0 alleles, respectively. Results of the RT-PCR assay are summarized in table 5.
Таble 5. Inhibition of RE+ alleles’ transcription in heterozygous cell lines Gene
1
PHF21A DSCAM MYO5C LAMA2 ZFPM2 TSHR MYRIP
type AluYa5 AluYb8 AluYa5 AluYa5 AluYa5 AluYb8 AluYa5
RE orientation2 a/s a/s a/s a/s a/s s a/s
RPC2
AluYa5
a/s
27/30
TMC5 MAP3K7
AluYa5 AluYc1
a/s s
7/33 17/25
COL4A2
AluYa5
a/s
4/53
RPN2
AluYa5
s
8/53
BPGM PCM1 WDR72 NRCAM RPIA SMG1
∆L14 ∆L14 iL15 L1 dL16 L1
s a/s a/s s s s
3/3 31/43 13/23 16/24 4/8 58/63
3
position 11/35 11/34 25/42 40/70 7/10 10/12 12/19
Tera-1
Tera-2
nd
~1↓
NGP
HL-60
Cell lines1 OsA RMS
HeLa
HEK293
HT1080
Jurkat ~1↓
nd nd
128↓
1
8↓
8↓
nd 8↓ 8↓
nd nd 8↓
nd 8↓
nd
1
1024↓ 8↓
~1↓
nd
8↓
nd
8↓ 128↓
8↓
nd
1024↓
1 1
1 64↓
1 1 8↓ 8↓
8↓ 1
nd 1024↓
1024↓ 1
nd
nd
8192↓ 1
1 8↓
nd
1
1 64↓
64↓
64↓ 8↓
1
8↓
1
The number indicates the ratio of the gene alleles amounts RE+/RE0= 2m/2n, where m – the number of cycles that is needed for formation of visible RE+ PCR products, n – the number of cycles that is needed for formation of visible RE0 PCR product; decrease of RE+, transcripts as compared to RE0 transcript amount, is designated by the arrow. 2 Symbols ―s‖ and ―a/s― designate the forward and reverse orientation of the L1 insert relative to the transcription direction of the gene, respectively. 3 the number of introns with theRE insertion/ total numbers of introns in the gene. 4 5‘-trunceted L1; 5 L1 with internal inversion. 6 L1 withiout 5‘UTR
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity
227
Figure 9. Strategy of the RT-PCR assay. A Scheme of RT-PCR assay. Polymorphic AluY inserts in hncDNA and in the Alu+ PCR fragment are depicted by grey rectangles. Positions and orientations of locus-specific F and R primers are shown by arrows. Small black triangles denote target site duplications (TSD). AAA represents a poly(A) tail of the AluY element. Fragments of exons are depicted by truncated rectangles marked with ―Ex‖, and the intron in between - by solid line. B Electrophoretic patterns of RT-PCR amplicons of TMC5 primary transcripts. Lanes are designated as follows: DNA, Alu+ and Alu0 amplicons of genomic DNA from the Jurkat cell line; M, a 100 bp ladder length marker (SibEnzyme); RT-, negative control. Numbers above the lanes correspond to the number of RT amplification cycles.
Analysis of the results shown in table 5 allows to makemaking the following conclusions: • • •
in most cases of detectable transcription, hnRNA content of RE+ alleles in heterozygous cells is significantly lower than those of RE0 alleles; observed disproportion in transcripts of L10 and L1+ alleles varies between 8 to 64; several AluY insertions can suppress completely transcription of affected allele; the inhibitory effect is characteristic of only nearly full-sized L1s;
228
I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova et al. • •
the effect of the intronic RE insertions seems to be independent on particular RE sequences, their orientations or their relative position within the gene; the disproportion in primary transcripts of RE+ and RE0 alleles of each particular gene was observed not in all tested cell lines, and the sets of ―positive‖ and ―negative‖ cell lines for different genes did not coincide, thus suggesting that the effect of Alu insertions is cell type specific.
Thus, using a model of human cell lines heterozygous for intronic AluY or L1-Ta insertions, we discovered the inhibitory effect of most recently inserted REs on human gene transcription and observed tissue-specific manner of the effect expression [93-95].
CONCLUSION Obviously, an advantage of evolutionary young REs, as useful genetic markers, is their regulatory potential demonstrated by variety of modern studies. Another REs‘ feature of great importance is an abundance of recent insertions spread over the human genome. Figure 10 shows distribution of 1,500 presently known human specific and polymorphic RE insertion on human chromosomes. Apparent co-localization of the REs and coding sequences through loci suggests functional interactions between activated REs and human genes. Identification of such interactions and investigation of their mechanisms are most promising directions of modern functional genomics. In this review, we tried to summarize current knowledge of REs‘ impact on human genome functioning ―‗in norma and in pathology‘.‖ We aspired also to demonstrate advantages of new experimental approaches to researches on REs‘ role for structural and functional variability of human genome. Presented data and its analysis allow to make several conclusions that could be of interest for further research in the field: •
•
•
•
There are abundant set of recently integrated REs, which distinguishes human genome from genomes of our closest relatives, chimpanzee and other apes. These integrations frequently occur in close vicinity to various human genes thus being potentially capable of changing transcriptional features of the genes. The number of most recent RE insertions existed as polymorphic ones in modern human populations is much higher than it was estimated by the initial analysis of the human genome draft. An unexpectedly large number of human specific and polymorphic REs retain their functional activity, although each of the RE copies could be contributed individually, depending of genomic content and cellular circumstances. Expected influence of REs on the human genome does not restricted with gene destructions, genomic rearrangements, or other well- described effects. Quite a few of young REs might be served as active cis-regulatory elements for nearby situated human genes.
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity
229
Figure 10. Chromosomal distribution of human- specific and polymorphic RE insertions. Brown histograms denote gene density; red lines mark positions of human specific HERV-K LTRs; blue lines, polymorphic L1-Ta insertions; green lines, polymorphic AluY insertions.
Co-evolution of ―‗host‘‖ genome and its retroelements is a commonly considered theory that assumes long- term interactions of expanded REs with pre-existed coding and regulatory regions. A number of currently reported data suggests ubiquitous involvement of evolutionary young REs in regulatory networks, although molecular mechanisms of this interaction are still
230
I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova et al.
poorly understood. At the same time, outstanding features of human specific and polymorphic REs make them promising candidates for a role of universal regulators that are capable ofto modifying human gene expression.
TERMS AND ABREVIATIONS RE(s) – retroelement(s) - mobile element that transposes via an RNA intermediate. Retro(trans)position – the process of REs‘ expansion; it includes reverse transcription of RE RNAs and integration of cDNA copies in new loci of the host genome. Integrase – protein complex coding by autonomous retroelements LINE – long interspecies nuclear element non-LTR retroelements encoding their own reverse transcriptase. LINE1(L1) - the most abundant LINE in mammalian genomes. L1-Ta, L1PA1, L1PA2 – L1PA14 – distinct branches and subfamilies of primate specific LINE1 SINE – short interspersed nuclear element non-autonomous retroelement typically derived from a small functional RNA that has amplified in the genome by retrotransposition. Alu, SVA, SINE-R – distinct classes of human SINEs. Alu elements – primate specific SINEs. The class includes 3 superfamilies (AluY, AluS, and AluJ) subdivided for distinct families and groups (AluYa5, AluYb8, AluSx, AluSq, etc.) HERV – human endogenous retrovirus – autonomous LTR retroelements of primate genomes. HERV-К, HERV-H, HERV-E, ERV-9, HERV-W – distinct HERV families presumably originated from different exogenous retroviruses. Each of the families consists of both provirus-like sequences and solitary LTRs LTR – long terminal repeat – duplicated regulatory element bordering HERV provirus and including three (U3, R and U5) functional regions hominid (Hominidae) – the primate family includes modern humans and epiobiotic species of genus Homo. hominoid (Hominoidea) – superfamily consists of hominids (Hominidae) – human; pongids (Pongidae) – chimpanzee, gorilla, and orangutan; and Hylobatidae – gibbon, syamang
REFERENCES [1] [2] [3]
[4]
[5]
Foley R. The context of human genetic evolution. Genome Res, 1998, 8:339-347. Schwartz JH. Race and the odd history of human paleontology. Anat. Rec. B New Anat., 2006, 289B:225-240. Noonan JP, Coop G, Kudaravalli S, Smith D, Krause J, Alessi J, Chen F, Platt D, Paabo S, Pritchard JK, Rubin EM. Sequencing and Analysis of Neanderthal Genomic DNA. Science, 2006, 314(5802):1113-1118. Green RE, Krause J, Ptak SE, Briggs AW, Ronan MT, Simons JF, Du L, Egholm M, Rothberg JM, Paunovic M, Paabo S. Analysis of one million base pairs of Neanderthal DNA. Nature, 2006, 444(7117):330-336. Khaitovich P, Enard W, Lachmann M, Paabo S. Evolution of primate gene expression. Nat. Rev. Genet., 2006, 7(9):693-702.
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity [6] [7] [8] [9] [10]
[11] [12] [13] [14] [15] [16] [17] [18]
[19]
[20] [21]
[22]
[23] [24]
[25]
231
Gilad Y, Oshlack A, Rifkin SA. Natural selection on gene expression. Trends Genet., 2006, 22(8):456-461. Lebedev Iu.B. Endogenous Retroviruses: A possible role in human cell function. Molecular Biology (Moscow) 34:544-555. Kazazian HH Jr Mobile elements: drivers of genome evolution. Science, 2004, 303(5664):1626-1632. Sverdlov ED. Retroviruses and primate evolution. BioEssays, 2000, 22:161-171. Medstrand P, van de Lagemaat LN, Dunn CA, Landry JR, Svenback D, Mager DL Impact of transposable elements on the evolution of mammalian gene regulation. Cytogenet Genome Res., 2005; 110(1-4):342-352. Zietkiewicz E, Richer C, Sinnett D, Labuda D. Monophyletic origin of Alu elements in primates. J. Mol. Evol, 1998. 47(2): 172-182. Jurka J, Klonowski P. Integration of retroposable elements in mammals: selection of target sites. J. Mol. Evol, 1996, 43(6):685-689. Batzer MA, Deininger PL. Alu repeats and human genomic diversity. Nat. Rev. Genet, 2002, 3(5): 370-379. Labuda D, Striker G. Sequence conservation in Alu evolution. Nucleic Acids Res, 1989, 17(7): 2477-2491. Kapitonov V, Jurka J. The age of Alu subfamilies. J. Mol. Evol, 1996, 42(1):59-65. Jurka J, Milosavljevic A. Reconstruction and analysis of human Alu genes. J. Mol. Evol, 1991, 32(2):105-121. Batzer MA, Stoneking M, Alegria-Hartman M, et al. African origin of human-specific polymorphic Alu insertions. Proc. Natl. Acad. Sci. U.S.A. 1994,91(25): 12288-12292. Wallace MR, Andersen LB, Saulino AM, Gregory PE, Glover TW, Collins FS. A de novo Alu insertion results in neurofibromatosis type 1. Nature, 1991, 353(6347):864866. Tighe, P.J., et al., Inactivation of the Fas gene by Alu insertion: retrotransposition in an intron causing splicing variation and autoimmune lymphoproliferative syndrome. Genes Immun, 2002, 3 Suppl 1:S66-70. Miki, Y., et al., Mutation analysis in the BRCA2 gene in primary breast cancers. Nat. Genet, 1996, 13(2): 245-247. Abdelhak, S., et al., Clustering of mutations responsible for branchio-oto-renal (BOR) syndrome in the eyes absent homologous region (eyaHR) of EYA1. Hum. Mol. Genet. 1997, 6(13):2247-2255. Vidaud, D., et al., Haemophilia B due to a de novo insertion of a human-specific Alu subfamily member within the coding region of the factor IX gene. Eur. J. Hum. Genet, 1993, 1(1): 30-36. Wulff, K., et al., Molecular analysis of hemophilia B in Poland: 12 novel mutations of the factor IX gene. Acta Biochim. Pol, 1999, 46(3): 721-726. Janicic, N., et al., Insertion of an Alu sequence in the Ca(2+)-sensing receptor gene in familial hypocalciuric hypercalcemia and neonatal severe hyperparathyroidism. Am. J. Hum. Genet, 1995, 56(4): 880-886. Bai, M., et al., Markedly reduced activity of mutant calcium-sensing receptor with an inserted Alu element from a kindred with familial hypocalciuric hypercalcemia and neonatal severe hyperparathyroidism. J. Clin. Invest, 1997, 99(8): 1917-1925.
232
I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova et al.
[26] Halling, K.C., et al., Hereditary desmoid disease in a family with a germline Alu I repeat mutation of the APC gene. Hum. Hered, 1999, 49(2): 97-102. [27] Muratani, K., et al., Inactivation of the cholinesterase gene by Alu insertion: possible mechanism for human gene transposition. Proc. Natl. Acad. Sci. U.S.A. 1991, 88(24): 11315-11319. [28] Mustajoki, S., et al., Insertion of Alu element responsible for acute intermittent porphyria. Hum. Mutat. 1999, 13(6): 431-438. [29] Sukarova, E., et al., An Alu insert as the cause of a severe form of hemophilia A. Acta Haematol. 2001,106(3): 126-129. [30] Mitchell, G.A., et al., Splice-mediated insertion of an Alu sequence inactivates ornithine delta-aminotransferase: a role for Alu elements in human mutation. Proc. Natl. Acad. Sci. USA, 1991, 88(3): 815-819. [31] Knebelmann, B., et al., Splice-mediated insertion of an Alu sequence in the COL4A3 mRNA causing autosomal recessive Alport syndrome. Hum. Mol. Genet. 1995. 4(4): 675-679. [32] Zhang Y-H, H.B.-L., Finlayson G, Deininger PL, McCabe ERB., Alu Sx insertion in a patient with benign glycerolkinase deficiency. Am. J. Hum. Genet. 1998, 63(A395). [33] Zhang, Y.H., et al., Asymptomatic isolated human glycerol kinase deficiency associated with splice-site mutations and nonsense-mediated decay of mutant RNA. Pediatr. Res. 2006. 59(4 Pt 1): p. 590-592. [34] Vervoort, R., et al., A mutation (IVS8+0.6kbdelTC) creating a new donor splice site activates a cryptic exon in an Alu-element in intron 8 of the human beta-glucuronidase gene. Hum. Genet. 1998, 103(6): 686-693. [35] Oldridge, M., et al., De novo Alu-element insertions in FGFR2 identify a distinct pathological basis for Apert syndrome. Am. J. Hum. Genet. 1999, 64(2): 446-461. [36] Lester , T., McMahon, C., VanRegemorter, N., Jones, A., Genet, S., X-linked immunodeficiency caused by insertion of Alu repeat sequences. J. Med. Gen. 1997, 34(Suppl 1): S81. [37] Tiret, L., et al., Evidence, from combined segregation and linkage analysis, that a variant of the angiotensin I-converting enzyme (ACE) gene controls plasma ACE levels. Am. J. Hum. Genet. 1992, 51(1): 197-205. [38] Cambien, F., et al., Deletion polymorphism in the gene for angiotensin-converting enzyme is a potent risk factor for myocardial infarction. Nature. 1992,. 359(6396): 641644. [39] Rowe, S.M., et al., Ovarian carcinoma-associated TaqI restriction fragment length polymorphism in intron G of the progesterone receptor gene is due to an Alu sequence insertion. Cancer Res. 1995. 55(13): 2743-2745. [40] Economou-Pachnis, A. and P.N. Tsichlis., Insertion of an Alu SINE in the human homologue of the Mlvi-2 locus. Nucleic Acids Res. 1985. 13(23): 8379-8387. [41] Smit, AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr. Opin. Genet. Dev. 1999, 9: 657-663. [42] Asch HL, Eliacin E., Fanning TG, Connolly JL, Bratthauer G, and Asch BB. Comparative expression of the LINE-1 p40 protein in human breast carcinomas and normal breast tissues. Oncol. Res. 1996, 8: 239-247. [43] Ostertag, E.M. and H.H. Kazazian, Jr.. Biology of mammalian L1 retrotransposons. Annu. Rev. Genet. 2001, 35: 501-538.
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity
233
[44] Feng, Q, Moran JV, Kazazian HH Jr., and Boeke JD. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell. 1996, 87: 905916. [45] Mathias SL, Scott AF, Kazazian HH Jr., Boeke JD, and Gabriel A. Reverse transcriptase encoded by a human transposable element. Science. 1991, 254: 18081810. [46] Fanning, T. and M. Singer. The LINE-1 DNA sequences in four mammalian orders predict proteins that conserve homologies to retrovirus proteins. Nucleic Acids Res. 1987, 15: 2251-2260. [47] Moran JV. Human L1 retrotransposition: insights and peculiarities learned from a cultured cell retrotransposition assay. Genetica. 1999, 107: 39-51. [48] Malik HS, Burke WD, Eickbush TH. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 1999, 16: 793-805. [49] Barabino SM, Hubner W, Jenny A, Minvielle-Sebastia L, Keller W. The 30-kD subunit of mammalian cleavage and polyadenylation specificity factor and its yeast homolog are RNA-binding zinc finger proteins. Genes Dev. 1997, 11: 1703-1716. [50] Goodier JL, Ostertag EM, Kazazian HH, Jr. Transduction of 3'-flanking sequences is common in L1 retrotransposition. Hum. Mol. Genet. 2000, 9: 653-657. [51] Boissinot S, Furano AV. The recent evolution of human L1 retrotransposons. Cytogenet. Genome Res. 2005, 110: 402-406. [52] Khan H, Smit A, Boissinot S. Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates. Genome Res. 2006, 16: 78-87. [53] Skowronski J. and M.F. Singer. The abundant LINE-1 family of repeated DNA sequences in mammals: genes and pseudogenes. Cold Spring Harb. Symp. Quant. Biol. 1986, 51( Pt 1): 457-464. [54] Boissinot S, Chevret P, Furano AV. L1 (LINE-1) retrotransposon evolution and amplification in recent human history. Mol. Biol. Evol. 2000, 17: 915-928. [55] Moran JV, Holmes SE, Naas TP, DeBerardinis RJ, Boeke JD, Kazazian HH, Jr. High frequency retrotransposition in cultured mammalian cells. Cell. 1996, 87:917-927. [56] Lebedev Y. Genome-wide search for human specific retroelements. In: Sverdlov ED (editor). Retroviruses and primate genome evolution. Georgetown, TX, USA. Landes Bioscience. 2004. pp 146-163. [57] Khodosevich K, Lebedev Y, Sverdlov E. Endogenous retroviruses and human evolution. Comparative and Functional Genomics. 2002, 3:494-498. [58] Kurdyukov SG, Lebedev YB, Artamonova II, et al. Full-sized HERV-K (HML-2) human endogenous retroviral LTR sequences on human chromosome 21: map locations and evolutionary history. Gene. 2001, 273:51-61. [59] Lebedev YB, Belonovitch OS, Zybrova NV, et al. Differences in HERV-K LTR insertions in orthologous loci of humans and great apes. Gene. 2000, 247:265-727. [60] Domansky AN, Kopantzev EP, Snezhkov EV, Lebedev YB, Leib-Mosch C, Sverdlov ED. Solitary HERV-K LTRs possess bi-directional promoter activity and contain a negative regulatory element in the U5 region. FEBS Lett. 2000, 472:191-195. [61] Lavrentieva I, Khil P, Vinogradova T, Akhmedov A, Lapuk A, Shakhova O, Lebedev Y, Monastyrskaya G, Sverdlov ED. Subfamilies and nearest-neighbour dendrogram for the LTRs of human endogenous retroviruses HERV-K mapped on human chromosome
234
[62]
[63] [64]
[65] [66] [67]
[68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78]
[79]
I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova et al. 19: physical neighbourhood does not correlate with identity level. Hum. Genet. 1998, 102:107-116. Mamedov I, Lebedev Y, Hunsmann G, Khusnutdinova E, Sverdlov E. A rare event of insertion polymorphism of a HERV-K LTR in the human genome. Genomics. 2004, 84: 596-599. Medstrand P, Mager DL: Human-specific integrations of the HERV-K endogenous retrovirus family. J. Virol. 1998, 72:9782-9787. Turner G, Barbulescu M, Su M, Jensen-Seaman MI, Kidd KK, Lenz J. Insertional polymorphisms of full-length endogenous retroviruses in humans. Curr. Biol. 2001, 11(19):1531-1535. Britten RJ, Baron WF, Stout DB, et al. Sources and evolution of human Alu repeated sequences. Proc. Natl. Acad. Sci. U.S.A. 1988, 85:4770-4774. Jurka J and Smith T. A fundamental division in the Alu family of repeated sequences. Proc. Natl. Acad. Sci. U.S.A. 1988, 85:4775-4778. Carroll ML, Roy-Engel AM, Nguyen SV, et al. Large-scale analysis of the Alu Ya5 and Yb8 subfamilies and their contribution to human genomic diversity. J. Mol. Biol. 2001, 311:17-40. Roy-Engel AM, Carroll ML, Vogel E, et al. Alu insertion polymorphisms for the study of human genomic diversity. Genetics. 2001, 159:279-290. Otieno AC, Carter AB, Hedges DJ, et al., Analysis of the human Alu Ya-lineage. J. Mol. Biol. 2004, 342(1):109-118. Carter AB, Salem AH, Hedged DJ, et al. Genome-wide analysis of the human Alu Yblineage. Hum. Genomics. 2004, 1(3):167-178. Garber RK, Hedges DJ, Herke SW, Hazard NW, Batzer MA. The Alu Yc1 subfamily: sorting the wheat from the chaff. Cytogenet. Genome Res. 2005, 110(1-4):537-542. Xing J, Salem AH, Hedges DJ, et al. Comprehensive analysis of two Alu Yd subfamilies. J. Mol. Evol. 2003;57 Suppl. 1:S76-89. Salem AH, Kilroy GE, Watkins WS, Jorde LB, Batzer MA. Recently integrated Alu elements and human genomic diversity. Mol. Biol. Evol. 2003, 20(8):1349-1361. Salem AH, Ray DA, Hedges DJ, Jurka J, Batzer MA. Analysis of the human Alu Ye lineage. BMC Evol. Biol. 2005, 5(1):18. Callinan PA, Hedges DJ, Salem AH, et al. Comprehensive analysis of Alu-associated diversity on the human sex chromosomes. Gene. 2003, 317(1-2):103-110. Myers JS, Vincent BJ, Udall H, et al. A comprehensive analysis of recently integrated human Ta L1 elements. Am. J. Hum. Genet. 2002, 71:312-326. Sheen FM, Sherry ST, Risch GM, et al. Reading between the LINEs: human genomic variation induced by LINE-1 retrotransposition. Genome Res. 2000, 10:1496-1508 Mamedov I, Batrak A, Buzdin A, Arzumanyan E, Lebedev Y, Sverdlov ED. Genomewide comparison of differences in the integration sites of interspersed repeats between closely related genomes. Nucleic Acids Res. 2002, 30:e71. Buzdin A, Khodosevich K, Mamedov I, Vinogradova T, Lebedev Y, Hunsmann G, Sverdlov E. A technique for genome-wide identification of differences in the interspersed repeats integrations between closely related genomes and its application to detection of human-specific integrations of HERV-K LTRs. Genomics. 2002, 79:413422.
Retroelement Insertion Polymorphism and Modulation of Human Gene Activity
235
[80] Buzdin A, Ustyugova S, Gogvadze E, Lebedev Y, Hunsmann G, Sverdlov E: Genomewide targeted search for human specific and polymorphic L1 integrations. Hum. Genet. 2003, 112:527-533. [81] Buzdin A, Ustyugova S, Khodosevich K, Mamedov I, Lebedev Y, Hunsmann G, Sverdlov E: Human-specific subfamilies of HERV-K (HML-2) long terminal repeats: three master genes were active simultaneously during branching of hominoid lineages. Genomics. 2003, 81:149-156. [82] Buzdin AA, Lebedev IuB, Sverdlov ED. [Human genome-specific HERV-K intron LTR genes have a random orientation relative to the direction of transcription, and, possibly, participated in antisense gene expression regulation] Bioorg. Khim. 2003;29(1):103-106. [83] Kreahling J, Graveley BR. The origins and implications of Aluternative splicing. Trends Genet. 2004, 20(1):1-4. [84] Mamedov IZ, Arzumanyan ES, Amosova AL, Lebedev YB, Sverdlov ED. Wholegenome experimental identification of insertion/deletion polymorphisms of interspersed repeats by a new general approach. Nucleic Acids Res. 2005, 33(2):e16. [85] Boissinot S, Entezam A, Young L, Munson PJ, Furano AV. The insertional history of an active family of L1 retrotransposons in humans. Genome Res. 2004, 14(7):12211231. [86] Watkins WS, Rogers AR, Ostler CT, et al., Genetic variation among world populations: inferences from 100 Alu insertion polymorphisms. Genome Res. 2003, 13(7):16071618. [87] Witherspoon DJ, Marchani EE, Watkins WS, et al., Human population genetic structure and diversity inferred from polymorphic L1(LINE-1) and Alu insertions. Hum. Hered. 2006, 62(1):30-46. [88] Ustyugova SV, Amosova AL, Lebedev YB, Sverdlov ED. Cell line fingerprinting using retroelement insertion polymorphism. Biotechniques. 2005, 38(4):561-565. [89] Naumann S, Reutzel D, Speicher M, Decker HJ. Complete karyotype characterization of the K562 cell line by combined application of G-banding, multiplex-fluorescence in situ hybridization, fluorescence in situ hybridization, and comparative genomic hybridization. Leuk. Res. 2001, 25:313-322. [90] Lindbjerg AC, Ostergaard M, Nielsen B, Pedersen B, Koch J. Characterization of three hairy cell leukemia-derived cell lines (ESKOL, JOK-1, and hair-M) by multiplex-FISH, comparative genomic hybridization, FISH, PRINS, and dideoxyPRINS. Cytogenet. Cell Genet. 2000, 90:30-39. [91] Harris CP, Lu XY, Narayan G, Singh B, Murty VV, Rao PH. Comprehensive molecular cytogenetic characterization of cervical cancer cell lines. Genes Chromosomes Cancer. 2003, 36:233-241. [92] Roschke AV, Giovanni T, Kristen SG, et al. Karyotypic complexity of the NCI-60 Drug-Screening Panel. Cancer Res. 2003, 63:8634-8647. [93] Ustiugova SV, Amosova AL, Lebedev IuB, Sverdlov ED. A tissue-specific decrease in the pre-mRNA level of L1- and alu-containing alleles of human genes Bioorg. Khim. 2006;32(1):103-106. [94] Ustyugova S.V, Lebedev Y.B, Sverdlov E.D. Long L1 insertions in human gene introns specifically reduce the content of corresponding primary transcripts. Genetica. 2006, 128(1-3):261-272.
236
I. Z. Mamedov, S. V. Ustyugova, F. L. Amosova et al.
[95] Lebedev Y.B, Amosova AL, Mamedov IZ, Fisunov GY, Sverdlov ED. Most recent AluY insertions in human gene introns reduce the content of the primary transcripts in a cell type specific manner. Gene. 2007, 390: 122-129. [96] Mamedov IZ, Amosova AL, Fisunov GY, Lebedev Y.B. A new database on polymorphic retroelements in human genome. Mol. Biol. 2008, 42(4): (in press).
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev and G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 8
BIOMEDICAL ASPECTS IN INVESTIGATIONS OF BIOCHEMICAL POLYMORPHISM OF ACTINS AND SOME ACTIN-BINDING PROTEINS S.S. Shishkin*, L.I. Kovalev, I.N. Krakhmaleva, M.A. Kovaleva, L.S. Eremina and V.O. Popov A.N. Bach Institute of Biochemistry, Russian Academy of Sciences
ABSTRACT Materials indicative of the expressed biochemical polymorphism of actins and various actin-binding proteins (ABP) which take part in formation of actin microfilaments, and also data on associations between some properties of proteins of the group specified and certain physiological (and in extreme cases pathological) characteristics of muscular and other tissues of the human are considered in the review. It is noted that being a part of microfilaments, actin proteins and ABP perform important functions connected with cellular mobility, and also take part in securing of some other cellular properties. By now, data have been collected on existence of various changes (polymorphisms and mutations) in human genes which code actins and ABP, and this fact finds its application at solving diagnostic and other biomedical problems. By the example of studying one of SNP in the -actinin 3 gene (1747 CT) which causes arginine replacement by stop codon - R577X, possibilities of using single nucleotide polymorphism analysis for sportsmen profiling are discussed. With the beginning of the post-genomic period in the development of biochemistry and other life sciences, we can talk about increasing attention to biomedical aspects in investigations of biochemical polymorphism of actins and ABP. The key function in these investigations is performed by proteome technologies, which is reflected, in particular, in goal-oriented searches of new markers of neoplastic processes. Data on the proteome analysis of prostata gland proteins of the human are presented which are indicative of the expressed polymorphism of transgelins in samples of prostata gland tissues of patients with hyperplasias and malignant neoplasms.
*
[email protected]
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
238
INTRODUCTION The first references to the special muscular protein actin which that is directly involved in muscular contraction support, are dated to the middle of the XX century, and from that time attention to this protein, to its various isoforms and related proteins has not become weaker, which in many respects is connected with studying problems of biological mobility [1-4]. Even in early investigations, it was shown that one of characteristic properties of actin proteins is their ability to cooperate (communicate) in a peculiar way with some other proteins which were named actin-binding proteins (ABP) [3,4]. After that, only during the period from middle 70-s to middle 80-s, it was possible to reveal more than 100 ABP, and henceforth, their number was constantly increasing, and it has appeared that there are large families and groups of proteins with similar functions [1, 3-6]. It should be noted that many ABP families contain special domains providing their ability to connect actin, in particular, the so-called calponin-homology domains are widespread (CH-domains) [6-8]. Amino acid sequence of CH-domain of calponin is presented in figure 1, and general characteristics of some families and groups of ABP of the human containing CH-domains are shown in table 1. It is separate representatives such as ABP that will be considered below. Table 1. Some isoforms in several families and special groups of human actin-binding proteins, contenting calponin homology (CH) domains [7,8,6,9,10,11,12,13,14,15,16 and OMIM NCBI] Names (OMIM numbers) calponins (600806, 602373, 602374)
transgelins (604634) -actinins (102575, 102573 и др.)
dystrophin group (300377, 128240)
Members of families Tissue and-or or groups intracellular localization One CH-domain-containing proteins (1CH) calponins 1, 2, 3 Smooth muscles – isoform 1; heart and nonmuscle cells – isoform 2; smooth muscles and nonmuscle cells – isoform 3 transgelin 1, 2, 3 and Smooth muscles , liver, some variants kidneys, prostate and etc. Two CH-domain-containing proteins (2CH) Skeletal muscles, Z-discs -actinins 1, 2, 3, 4 - isoforms 2 and 3; nonmuscle cells isoforms 1and 4. dystrophin, utrophin
Skeletal muscles, interactions cytoskeleton with cellular membrane isoform Dp427m; neuronal cells - Dp427n
Some features of structure
Mm – 34 kDa, in its structure a special motif, which, as a tandem, three times are repeated, and a site for phosphorilation in С-terminal are contained Mm – 22 kDa, high homology with calponins Mm – 100-105 kDa, in its structure four spectrin-like repeats and С-terminal Ca+2-binding domain are contained Mm – 400-427 kDa , in its structure several spectrinlike repeats, cysteine-rich and С-terminal domains are contained
Biomedical Aspects in Investigations of Biochemical Polymorphism…
239
Figure 1. Amino acid sequence of calponin 1, amino acid residues making CH-domain are set off in red colour [by database UniProtKB/Swiss-Prot:P26932 (CNN1_CHICK)].
It is well -known today that actin and numerous ABP are present practically in all types of eukaryotic cells, and not only in muscular cells. Usually, the most part of cellular actins is detected as a part of special cytoplasmic thin threadlike structures with the diameter of 6-8 nm which are often called actin microfilaments [1, 4, 8]. Besides, a number of actin proteins can be revealed in cell nuclei where they are also structured in a certain way. At the same time, along with actin microfilaments as one of the basic components of cytoskeleton, there is also an amount of free or globular actin (G-actin). Being a part of actin microfilaments, actin proteins and ABP perform important functions connected not only with cellular mobility;, in particular, it is shown that they take part in the processes of cell and cellular surface deformation, take part in membrane transport, and also provide a variety of other cellular properties [1,3,4]. By now, there are significant materials about existence of the expressed biochemical polymorphism of both actins and ABP [2-5]. The revealing of principles regarding some properties of proteins of the group specified and certain physiological (and in extreme cases pathological) characteristics of muscular and other tissues of the human has drawn special interest to actins and ABP, which has promoted development of certain biomedical investigations [4,5,17]. At the turn of the ХХ and XXI centuries, there was started a qualitatively new stage in the study of actins and ABP of the human caused by transition of practically all biomolecular analyses to the post-genomic era [18-21]. The formal beginning of the post-genomic era is considered to be forthcoming of summary data on genomes structure of a number of eukaryotic organisms and especially on the human genome [20,21]. At the same time within the period specified several new science disciplines were formed – structural genomics, transcriptomics, proteomics, etc. [18,19,22,23]. Among the characteristic features of the postgenomic era are wide use of the international project «Human genome» results and active application of post-genomic technologies for macromolecules analysis, which has provided both biochemists, and experts of related disciplines with new possibilities, and also with new fundamentally important information about human proteins [18-21]. Along with it, there is the increasing attention to biomedical aspects of protein investigations in the post-genomic period, which is reflected, in particular, in goal-oriented searches of new markers of neoplastic processes [24-27], and also in studying of the molecular nature of myopathies,
240
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
cardiomyopathies and some other hereditary diseases [17,28,29]. As a consequence, for some polymorphic proteins of actin microfilaments, it was possible to define functional roles in health and disease, and a number of other ones are being actively developed [28,4,5].
POSTGENOMIC AND SOME BIOCHEMICAL APPROACHES IN INVESTIGATIONS OF POLYMORPHISM OF PROTEINS FORMING ACTIN MICROFILAMENTS Usually the term "polymorphism" occurring from the Greek word, - polymorphos, is translated as the varied one [30] and is widely used at the present time in different scientific disciplines. It is considered that the part of this term, – «morphos» concerns the Greek Hypnos Morpheus (Morphus) who could appear to people in sleep in the most different forms, that is possessed expressed "polymorphism" [31]. For investigations of the phenomenon of polymorphism, it is crucially important that the studied objects or certain qualities beside "variety" should possess something general, allowing to characterize them as a certain assembly. In the general biochemistry studying of polymorphism of proteins and other biopolymers, generally, forms one of the traditional areas of investigations. Many results received have greatly contributed to fundamental life sciences, and also have led to a series of practically important achievements, in particular for public health service. The systemic pointed field has started to be formed, as a part of enzymology when in 1950 – 1970s, active work on studying the most different isoenzymes [32, 33] was developed. When revealing and studying isoenzymes basic value has rendered the combined use of methods of electrophoretic fractionating and methods of detection of enzymatic activity [32, 33]. As one of the successful examples of the analysis of isoenzymatic spectrum of the human, the works about polymorphism of creatine phosphokinase – that is the enzyme playing the important role in power supply of muscular reduction – can be given. The results of these investigations have become the basis for creation of some diagnosticums, including myocardial infarction tests [34-36]. Usually the phenomenon of isoenzymes has been defined as existence of proteins with physically discernible forms (i.e., polymorphism), but capable to show the same (or very similar) catalytic activity [37-39]. Accordingly, in the general view, biochemical polymorphism of proteins could be distinguished as existence of proteins differing by the structure, but intended for maintenance of the same molecular function (not only catalytic, but also transport, receptor, structural, protective, etc.) [21, 37, 39, 40]. At the same time, in traditional biochemical investigations, when individual proteins were studied by their consecutive allocation from various objects under the control of certain functional activity (as a rule, enzymatic one), polymorphism of enzymes was in the centre of attention for a long time. Against such background, one of the rare exceptions was successful investigations of polymorphism of actins of the human [1, 41]. By now, it is well known that the reasons of biochemical polymorphism of proteins can be different kinds of genetic polymorphism caused both by the existence of the many families of related genes (multilocus), and by polyallelism of certain genes [17, 39, 40, 42].
Biomedical Aspects in Investigations of Biochemical Polymorphism…
241
Besides, it should probably be noted that the important displaying of biochemical polymorphism of proteins in cells or tissues can be quite often observable appreciable changes of their quantitative content [37, 39, 40, and 42]. Acknowledgement of such point of view is numerous examples of genetically caused changes of the quantitative content of some or other proteins. In particular, it is shown that content changes (up to full absence) of some ABP (-actinin 3, dystrophin, etc.) are caused both by single nucleotide polymorphism and by large deletions in corresponding genes [17, 28, 43, 44]. The so-called biological minirevolution of 1980s was marked by opening of intermittent (exon-intron) structures of set of genes of eukaryotic organisms (including the human) and the phenomenon of splicing and also alternative splicing and some other mechanisms of realization of the genetic information [40, 45-48]. As a result, the formula ―«one gene – one polypeptide chain‖» [40, 49] has undergone essential revision and detailed elaboration. With many examples, it has been convincingly demonstrated that even one gene can determine formation of variety of protein products which have sometimes similar functions, but sharply differ in structure, that is to provide protein polymorphism. Such polymorphism has been brightly enough demonstrated and henceforth repeatedly proved to be in typical ABP tropomyosins, troponins and some other proteins of the given group [4, 47-49]. Along with alternative splicing, the whole group of mechanisms providing structural variety of protein products of the gene expression is known; these mechanisms cause post synthetic modifications of proteins and, in particular, phosphorylation, glycosylation and some other modifications, characteristic for different ABP [4, 49-51]. The transition which was outlined in the end of the XX century from traditional biochemical and immunochemical approaches in investigations of proteins to the so-called systemic approach, and also accumulation of data on primary structures of proteins, on their domain structure and on functional roles of certain domains, has essentially extended the potential for studying biochemical polymorphism of proteins [52-55]. Methodical bases for the systemic approach in investigations of proteins have made two-dimensional electrophoresis of O‘Farrell, immunoblotting, microsequencing and a number of other highly effective technologies of protein analysis [52-57]. Further development of systemic investigations of proteins in the last decade of the XX century with wide genomic investigations at the same time has led to creation of a special scientific discipline – proteomics, which has been distinguished as a component of functional genomics in the postgenomic era [22, 27, 58, 59]. Now, summing up studies of many authors [22, 57, 60, 61], the strategy of proteome investigations for any object can be schematically presented in the form of a number of consistent steps, the key positions among which are held by the method of two-dimensional electrophoresis of proteins, their identification by means of mass spectrometry and formation of computer data banks (figure 2) [59].
242
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
Figure 2. General scheme and some potential problems of the proteomic approach in protein studies (modification from [59]).
Investigations of protein polymorphisms are essential both in system proteins investigations, and in modern proteomics, revealing and studying which identification of structurally related proteins with various electrophoretic properties can be provided [59, 6267]. Among the numerous examples of similar investigations, our rather early publication, for example, on the polymorphic variant of one human ABP – myosin light strand 1 [68] can be noted. This variant has been characterised by 145 NH replacement by the results of microsequencing; it has been revealed at two-dimensional electrophoregrammes by isoelectric point change in comparison with the basic isoform. As another example, the work published in 2006 about detection in human transversely striated muscular tissue of polymorphism 3,5-2,4-dienoil-coenzyme A isomerase (protein product of ECH1 gene) can be noted, the important role in which was played by mass spectrometry identification of frequent and rare isoforms of this protein [67].
Biomedical Aspects in Investigations of Biochemical Polymorphism…
243
Schematically, proteome investigations strategy of biochemical proteins polymorphism caused by different reasons (multilocus, polyallelism, alternative splicing and/or postsynthetic modifications) can be presented as searches for related proteins which differ by electrophoretic analysis qualitatively or quantitatively, but have considerable structural similarity according to mass spectrometry (figure 3). In the whole, a lot of data indicative of the fact that qualitative (structural) and quantitative biochemical polymorphism of proteins directly causes certain physiological (and more often pathological) characteristics of the corresponding tissues, and it makes the study of such polymorphism important by various biomedical investigations.
Figure 3. Scheme of using proteomic technologies for studying of biochemical polymorphism of proteins caused by different reasons (multilocus, polyallelism, alternative splicing and posttranslational modifications).
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
244
BASIC ACTIN ISOFORMS OF THE HUMAN AND SOME ACTIN-BINDING PROTEINS INVOLVED IN THE FORMATION OF THIN FILAMENTS In the study made in 1970s, 6 actin isoforms were discovered in vertebrates [1, 41]. Afterwards, the scientists managed to demonstrate that in the human genome, there are some genes encoding in addition to this 4 actin-related proteins [69,70]. Consequently, according to the present information, the main reason for biochemical polymorphism of actin is multilocus (table 2). Table 2. Some characteristics of several actins and actin-related proteins, and also their genes present in human genome [41,51,71,4, and also by www.ncbi.nlm.nih.gov, data bases OMIM and SNP’s] Symbols, names and tissue locations
Symbol(s) of genes and syndromes or diseases ACTA1 CFTD NEM3
Chromosome location of genes (OMIM numbers) 1q42.1 102610
– myocard, skeletal and smooth muscles
ACTC CMD1R
15q14 102540
2 – smooth muscles (aortic or vascular) - smooth muscles (enteric)
ACTA2 ACTSA ACTG2 ACTA3
10q22-q24 102620 2p13.1 102545
β – cytoskeleton in muscular, neuronal and other cells
ACTB
7p22-p12 102630 607371
– skeletal muscle, coexpression in myocard; location in sarcomeres and cellular nucleus
Single nucleotide polymorphism, features of structure and diseases 34 SNP‘s, 375 a.a.; congenital actin myopathies (with including bodies and with fiber-type disproportion); special form of nemaline myopathy 62 SNP‘s, *4 amino acid substitutions; AD dilated cardiomyopathy R1 116 SNP‘s, *8 amino acid substitutions 156 SNP‘s, *6 amino acid substitutions; high alteration of the gene expression in tumors 279 SNP‘s, *24 amino acid substitutions, autosomal dominant developmental malformations, deafness, and dystonia; juvenileonset dystonia
Several references
[72,73]
[74,75]
[76] [77]
[78]
In cross-striated muscles, two isoforms of the actin are dominant: the skeletal -actin and the cardio -actin, which are different from two other actin isoforms revealed in unstriated muscle, and from two non-muscular actin isoforms [41,51,71,81]. In this connection, it is to emphasize that in the cardiac muscle [71,82,83], as well as in the skeletal muscles [51,82] of the human, two specified isoforms are present at the same time, but in the myocardium of an adult person, the part of the cardio -actin reaches 80%.
Biomedical Aspects in Investigations of Biochemical Polymorphism…
245
It is determined that the skeletal -actin is encoded by the gene АСТА1, which was charted in the zone 1q42, and the cardio -actin is encoded by the gene АСТС located at the chromosome 15q11-q14. Two of these genes encode proteins with the length of 377 aminoacid residues and have almost identical amino acid sequence (there is the difference in 4 amino-acid residues), but mature polypeptides contain only 375 amino-acid residues because of successive detaching during the posttranslational modification of the initiating methionine and of the situated at the second position cysteine [83]. By now, in actin genes, the significant number of single nucleotide polymorphisms is discovered (SNP‘s), a part of them seriously affects muscle functions and provokes generalizing myopathies, cardiomyopathies, and other types of pathologies (see table 2). In some smooth-muscular cells, the expression of the cardio -actin АСТС gene is shown, whereas in others, for example, in aortic smooth muscular cells ACTA2 gene is expressed, it encodes special actin isoform 2 [76]. Alongside with -isoforms, -actin is found in smooth muscular cells, which differs noticeably by primary structure and which is a product of ACTG2 gene [by 41,77 and according to the data from OMIM NCBI]. It is important to note that there is a lot of information about changes of this gene expression at neoplastic transformation of cells [77]. There are numerous data specifying that presence of -actin and of non muscular -actin is characteristic for the most different non muscular cells. The latter differs from the smooth muscular isoform of -actin, and it is encoded by its specific ACTG1 gene; thus, it is shown that mutations in the gene specified can lead to special forms of deafness with autosomal dominant inheritance [by 79 and according to the data from OMIM NCBI]. The spatial structure of G-actins is characterized in details by [4,84]. According to the existing information, about 40 % of actins‘ amino acid sequence is presented by -helices, and existing alongside with it -sheets play the important role in the formation of four socalled pseudo-subdomains which take part in actin filaments‘ self-assembling and in linkage with various ABP. Actin subdomains are designated by IА, IВ, IIА and IIВ or by numbers 14 [4]. Besides, rather recently, one more functionally important section, - area of linkage of ABP or of so-called nucleotide-binding cleft (NBC) has been precisely localized in tertiary structure of actins [84]. Certainly, the spatial structure of some relatively rare allelic actin isoforms (differing from normal isoforms by amino acid replacements) can be characterized by specific changes reflected in functional properties [17]. An essentially important feature of G-actin molecules is their ability to specific condensation, which leads to formation of the so-called filamentous actin (F-actin). In the condensation process (which sometimes is not quite precisely named the actin polymerization) some stages can be detached: а) a stage of initiation or of relatively slow formation of actin dimers capable of dissociating easier than entering into the following stages of condensation; b) a stage of formation of the stable trimer, getting the ability to further self-assembling, it is often named ―the nucleus of polymerization;‖; c) a stage of elongation, during which there is fast annexation of G-actin molecules to ―the nucleus of polymerization,‖, and of the proper formation of filamentous actin [4]. Intensive investigations of the stage of elongation have demonstrated that formation of filamentous actin is a rather dynamical process, it is accompanied not only by connection of monomeric actin molecules which connect preliminary АТP in its functional center, - NBC, but also by dissociation of some number of G-actins [4].
246
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
It is considered that the three-dimensional structure of the filamentous actin formed is described precisely enough by well -known Holmes model (figure 2) [4,85,86]. This structure is specified as an original double spiral with the diameter of 360-390 angstrom, consisting of two strands of actin protomers, each of which has a diameter of about 90-100 angstrom. It is supposed that actin monomers cooperate with each other perpendicularly to the axis of the spiral, and 13 protomers constitute one step of this spiral, they form six6 turns and a shift of 59 angstrom in total [4,85]. In the structure of F-actin the starting and the growing ends, which are named sharp (pointed end) and feathered or pointed (barbed end) accordingly, can be singled out. In the grooves of this spiral structure, the extended molecules of tropomyosin are situated; they form continuous chains inside of thin strands and stabilize them. The tropomyosin chains are constructed from corresponding dimers, - from two -helical polypeptides, which are curled with reference to each other, thanks to what they form a super helical (coiled-coil) core with the diameter of about 20 angstrom and the length of about 400 angstrom. The dimers in the structure of the general chain are connected by "a head to a tail" type. According to the existing perception, each molecule of tropomyosin in the structure of the thin strand come in contact with seven actin molecules and with the troponin complex forming the so-called regulatory unit [87]. Tropomyosins are revealed in thin filaments from cells with different types of differentiation; thus, it has been possible to find out more than 20 various isoforms of these proteins [88]. At the same time, by now in the human genome (as well as in genomes of other studied mammals), only four genes of tropomyosins are revealed and all of them have been charted at different chromosomes: ТРМ1 (15q22), ТРМ2 (9p13), ТРМ3 (1q21.2) and ТРМ4 (19q13.1). It is important to note that according to the existing perception in the exon-intron structure of tropomyosin genes, there isare a number of common features (figure 4); however, the details of their structure continue to be the subject of investigations, taking into account the found of the multi-variant approach in realization of the genetic information. For example, in the database of NCBI Genome view, 65 different models are found only for TPM1 gene, the main of which contains 13 exons, whereas in the model presented in review by Perry S.V. (2001) [88], 15 exons in the gene ТРМ1 are shown. As a result, the numeration of exons in various models (as well as in various publications) is not identical, which sometimes essentially complicates comparison of the results received. In the series of works published, it is shown that each of tropomyosin genes is capable of producing various isoforms of tissue-specific mRNA, owing to alternative splicing, and also to use of alternative promotors and various ways of processing 3/terminal sequences [47,88]. Consequently, a substantial variety of isoforms of tropomyosin proteins, which can be found out in various muscular and nonmuscular cells, is assured. -tropomyosin fast (-fast, TMSA, the product of ТРМ1 gene), -tropomyosin (TMSB, the product of ТРМ2) and -tropomyosin slow (-slow, the product of ТРМ3 gene), which is sometimes marked as ТРМ3 or ТМ-30, are attributed to principal isoforms of tropomyosin which are present in the cross-striated muscles of the human [51,88].
Biomedical Aspects in Investigations of Biochemical Polymorphism…
247
Figure 4. Schematic representation about structures of tropomyosin genes and protein products according to Perry S.V. (2001) [88]. А. The exon-intron structure of ТРМ1, ТРМ2, ТРМ3 and ТРМ4 genes (from top downward). Rectangles represent encoding sections of exons (homological exons are coloured identically), horizontal lines correspond to the intron sequences. B. Some protein products of ТРМ1 gene formed by means of alternative splicing. Rectangular sections of amino acid sequences are coloured identically with exons encoding them in figure 4А.
Alongside with multilocus, alternative splicing and polyallelism contributes to biochemical polymorphism of tropomyosin. By now, dozen and even hundred of SNP‘s have been discovered in tropomyosin genes, in particular 276 - in TPM1, 82 - in TPM2, 195 - in TPM3 and 160 - in TPM4 [according to the data from www.ncbi.nlm.nih.gov], SNP‘s. Thus, it is defined that some mutations in ТРМ1 gene lead to the appearance of one of the forms of hypertrophic cardiomyopathy – CMH3 [89,90], and in ТРМ2 gene – to nemaline myopathy, form 4, and to the distal form of congenital innate multiplex arthrogryposis [91,92], and in ТРМ3 gene – to nemaline myopathy, form 1 [93,94]. The information about diseases connected with mutations in ТРМ4 gene has not been found in the literature available;
248
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
however, the information about intense expression of ТРМ4 gene (as well as other tropomyosin genes) in cancer cells, subject to the metastasis, has appeared recently [95]. It is known that in the regulation of tropomyosins‘ functioning some other ABP have important shares. To these ABP in cross-striated muscular tissue, the tissue-specific proteins of the troponin complex are referedreferred primarily, this complex consists of three types of macromolecules, which got the names reflecting essentially important properties of the corresponding proteins: troponin С (TnC, binding the ions of calcium), troponin I (TnI, is able to inhibit АТР of the myosin) and troponin Т (TnT, connects the troponin complex with tropomyosin) [50,51,87,]. Significant information has already been accumulated about the biochemical polymorphism of troponin proteins of the human, which is subject to multilocus as well as polyallelism and other known mechanisms [47,50,51]. It should be noted that the investigations of polymorphism of man‘s troponins lead to determination of the nature of some cardiomyopathies and to creation of the methods of their precise diagnostics [50,96,97]. Moreover, the expression tissue specificity of some troponin genes has made it possible to work out efficient methods of myocardial infarction diagnostics [50,88,98]. In smooth muscular cells, tissue-specific calponins and transgelins are involved in regulation of troponins functioning, they are classical representatives of the ABP superfamily containing special homological actin-binding domains – the so-called СН-domains, which have already been mentioned above (figure 1). It is N-terminal sequence of calponin that the first CH-domain has been identified [99,100]. Many ABP provoke a significant interest of investigators, in the context of their participation in the process of assembling – disassembling of actin filaments [4,5,51,101]. Among these ABP, the following is singled out: monomer-binding ABP which prevent actin polymerization (for example, beta-thymosins); filament-depolymerizing ABP (СарZ and cofilins); ABP connected with the ends of filaments which prevent exchange of monomers (tropomodulins); ABP shortening filaments (gelsolin); ABP of cross linkages (Arp2/3) and many others. One more special ABP group is formed by proteins which support interaction of thin threads and other sarcomere and/or cytoskeletal structures (-actinins, titin, nebulin, dystrophin, utrophin, spectrin, СарZ, etc.) [4,5,51,101]. -actinin is considered to be one of the main Z-disk proteins [51,102,103]. Most likely, it is the protein which basically makes up Z-filaments [104-106]. -actinin structure has been deciphered in details, and thus it has been proved that in its molecules at least the following three domains are formed: N-terminal actin-binding domain where two calponin-like sites can be evolved; central core domain containing four spectrin-like repeats each consisting of 106 amino-acid residues; C-terminal Ca2+-connecting domain with two calmodulin-like domains with EF-motifs [107,108.] Spectrin repeats and link sequences between them in the central domain provide -actinin molecule with considerable elasticity, which allows it to resist to mechanical pressure [108]. In Z-disk structure, -actinin is present in the form of dimer in which two polypeptide chains are located in antiparallel orientation. As a result, at each end of dimer, there is one Ca2+-connecting and one actin-connecting domain. Such structural organization makes it possible for -actinin dimer to cooperate with two actin threads simultaneously and, thus, to ligate them. The proximity of Ca2+-connecting and actin-connecting domains provides for an opportunity of regulation of actin binding according to the level of Ca2+.
Biomedical Aspects in Investigations of Biochemical Polymorphism…
249
It has been established that -actinin can be bound with a number of other molecules of I-strip and Z-disk, in particular with nebulin, and with the proteins CapZ, ALP, FATZ, ZASP, miotilin and titin, and some other [51,101,102,109]. -actinin molecule has two sites of binding with titin. The first site is calcium-connecting terminal domain contacting with the so-called Z-repeats of titin. The second site is located in the area of 2 and 3 spectrin-like domains of -actinin core. This site binds by Zq-Z4 site of a titin molecule [110,102]. Biochemical polymorphism of -actinins is, in many respects, determined by the fact that in the human genome (and some other mammals) there are four4 related genes (ACTN1, ACTN2, ACTN3, ACTN4) coding the corresponding isoforms, two of which are myoproteins (-actinin-2 and -actinin-3), and two others are present in non-muscular cells [28,51,111,112]. Thus, it is obvious that polylocus plays an essential role in biochemical polymorphism of -actinins. It is essentially important that all isoforms of -actinin possess the ability to interact with F-actin threads and have significant structural similarity. In particular, it has been established that evolutionary N-terminal actin-binding domain is the most conservative, apparently, this is due to the fact that it provides the basic function of -actinin: ligation of actin filaments [107]. In spite of the fact that Ca2+-connecting domain is also conservative enough, however sensitive to Ca2+ , there are considered to be only not muscular isoforms of -actinin while actinin-2 and -actinin-3 have lost the ability to bind Ca2+[107]. It is important to note that originally spectrin repeats of core part of -actinin were considered simply as the building blocks necessary for construction of extended rod-like molecules. However, now spectrin repeats are considered to be flexible platforms for various proteins [108]. Spectrin repeats and link sequences between them provide a molecule of -actinin with significant elasticity that allows it to resist to mechanical pressure [108]. For the genes coding, -actinin isoforms special tissue specificity of expression is characteristic. In particular, -actinin-2 is synthesized in all types of skeletal muscular fibers and in cardiac hystiocytes, whereas -actinin-3 is found out only in muscular fibers of type II (fast), where it can form heterodimers with -actinin-2, which contributes to biochemical polymorphism of these proteins [28,111]. -actinins, as well as many other sarcomere proteins, possess significant allelic polymorphism [28,113,114,115]. By the present time, in actinin human genes ACTN1, ACTN2, ACTN3, ACTN4 there have already been discovered 766, 700, 129 and 496 SNP‘s, accordingly [according to the database of single nucleotide polymorphisms NCBI]. At that, all known SNP‘s in genes ACTN1, ACTN2, ACTN3 do not lead to any pathology and by now only in gene ACTN4 there have been discovered the mutation causing autosomal dominant segmental glomerular sclerosis [114,115]. Nevertheless, significant interest and attention of many researchers now are drawn to one of SNP in gene ACTN3 (replacement in position 577 arginine terminating codon) [23,112, 116,117], studying of alleles associations of which has brought encouraging results, which will be considered in detail below (see section IV). The list of ABP families is being constantly extended, and the significant interest to their study is determined by involvement of ABP into cell morphogenesis and various pathologic processes. For instance, there is being carried out an active study of the roles of representatives of one of the ABP families recently discovered, the so-called palladin family
250
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
(«―palladin/myotilin/myopalladin family»‖), in formation of sarcomere structures and pathogenesis of some forms of myopathias [118,119].
POLYMORPHISM OF ACTININS AND ITS ROLE IN MUSCLE EFFICIENCY. (STUDY OF SNO IN GENE ACTN3 WITH REPLACEMENT FOR R577X) In 1999, there appeared probably the first publication about the existence of some SNP‘s in gene ACTN3 [28]. At that, one of the revealed SNP‘s which leads to replacement of arginine for terminal codon - R577X (rs1815739; 1747 CT), was referred by the authors to exon 16 and to the so-called common or expressed polymorphism, since by their estimation the frequency of a more rare of alleles (Х577) in the investigated populations of Europe, America, Asia and other regions changed from 0,22 up to 0,52. It is important to note that as it was proved by North K.N. et al. [28], presence of allele 577Х in homozygous condition leads to full absence of -actinin 3 in fast muscular fibers. However, people with such genotype remain clinically healthy – apparently their -actinin-2 isoform replaces the absent -actinin 3. Soon after that, the same group of authors informed onf negative association of genotypes XX and going in for sprint, and on accumulation among elite sportsmen of genotypes RR [112]. These data became the basis for carrying out of studying of polymorphism R577X (1747 CT) of ethnic Russians and especially of people professionally going in for sports. First step and important condition for restriction analysis of SNP‘s is carrying out of amplification of the site of the gene containing corresponding SNP inside of the restriction site of an endonuclease. In case of SNP 1747CT in gene ACTN3, for this purpose in the pioneer publication, there has been used Dde-I restriction endonuclease [28]. However, neither in this work nor in other accessible literature it was it possible to find the descriptions of primers structure necessary for carrying out of amplification of the necessary site in gene ACTN3. Thus, it was necessary to analyze the base sequence of gene ACTN3 which was found in the Internet in computer databank NCBI (http://www.ncbi.nlm.nih.gov/). At that, it was rather unexpected that at comparing of this sequence with the corresponding sequence мРНК of -actinin-3 (number) with the purpose of delimitation of exons SNP 1747CT was found in exon 15, but not in exon 16, as it had been specified in the initial publication. Further by means of the computer program «Oligo» the search for potential primers has been carried out in full sequence of exon 15 together with distant flanking sequences. By the results of this search, there have been selected some sites potentially suitable for «―fitting»‖ potential primers [120]. Synthesized oligonucleotides have been tested as primers for simple polymerase chain reactions, and the best result (absence of not specific products and high output – figure 5А) has been obtained for the pair given below. Direct primer - 5’ cactgctgccctttctgttg 3’ Reverse primer - 5’ gcaggtggcactgaccata 3’
Biomedical Aspects in Investigations of Biochemical Polymorphism…
251
Figure 5. Restriction analysis of polymorphism R577X (1747 CT) in gene ACTN3 [by 120]. A. Amplicons of exon 15 ACTN3 obtained at using primers I and IV. Lanes 1 - 4 and 6 - 10 are various investigated samples; 5 - negative control (without DNA); M - standard marker fragments of DNA products of hydrolysis of рBR322 by Alu1 restriction endonuclease. B. Structure of the amplified site of gene ACTN3 comprising exon 15 and flanking sites. The yellow color indicates the first exon nucleotide, and the violet color indicates the last exon nucleotide. The place of fitting of the direct primer is indicated by the green color, and the reverse primer is indicated by the grey color. Restriction sites for endonuclease DdeI are shown in bold type; the site in the middle of the sequence includes single nucleotide polymorphism 1747 CT (in the figure allele T and the corresponding nucleotide have been indicated by the red color, the sequence corresponding to the terminal codon has been shown by a double line); the second site DdeI (located closer to 3/-terminal) is monomorphic. C. Products of the restriction. 1, 13 - homozygosity on allele Х577; 2, 3, 10, 11 - homozygosity on allele R577; 4, 5, 6, 7 - heterozygosity; 12 - control assay – amplicon incubated in buffer without addition of DdeI; M standard marker fragments of DNA - products of hydrolysis of рBR322 by Alu1.
252
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
A strong case for choosing these primers for the developed test system have also been the results of the preliminary analysis of places of «―fitting»‖ and structure of the assumed amplification product. As it is seen in figure 5B, amplicon to be formed at using this pair of primers at carrying out of polymerase chain reaction, will contain instead of one, two registration sites for DdeI. At that, one of the sites is characterized by the presence of the studied SNP, and the other one is monomorphic. Thus, in the created test system every received amplicon after having been processed by restrictase DdeI should undergo restriction by monomorphic site, due to which there will be performed a peculiar internal control over the quality of restriction of all analyzed assays. The calculated structure of amplicon (figure 5B) implies that the expected length of primary product of the polymerase chain reaction should make up 269 nucleotide pairs and in case of successful carrying out of the restriction by enzyme DdeI from 3/-terminal of amplicons a fragment containing 50 nucleotide pairs in length will sliver. Such result has been received in the carried out experiments (figure 5C). According to the presented data, all the amplicons processed by endonuclease DdeI have undergone restriction, which was proved by presence of characteristic DNA-fragments with increased electrophoretic mobility in comparison with the control assay (lane 12). At that, it was possible to define uniquely the assays with different genotypes on polymorphism R577X (SNP 1747CT). Thus, the assays with genotype RR (lanes 2, 3, 10, 11) each contained one rather large DNA-fragment with very quickly migrating DNA-fraction (apparently, representing a product of restriction by monomorphic site). Thereafter, the assays with XX homozygosity were characterized by the presence of the two basic fragments (lanes - 1 and 13), and heterozygosity - in the form of three fragments (lanes - 4, 5, 6, 7). In the consequence, the given test system of the restriction analysis SNP 1747CT which has provided precise differentiation of various genotypes has been used in further research works. By means of the above- described updating of the method of analyzing of the restricting fragments length, 65 samples of DNA from the control collection (sampling of healthy people - ethnic Russians which do not go in for sports) have been surveyed. The results of this examination have been given in figure 6. In the investigated sampling of occurrence of alleles have proved equal for 577R - 0,585, and for 577X - 0,415, and the distribution of genotypes at that has proved close to the expected, proceeding from Hardy-Weinberg equilibrium. Thus, according to the calculations, the number of genotypes RR (p2n) have turned out equal 22,2, and it has been revealed - 22 (33,9 %%); for genotype RX (2pqn) - 31,6, and it has been revealed - 32 (49,2 %%); for genotype XX (q2n) - 11,2 and - 11 (16,9 %%), accordingly. It is important to note that the received results about occurrence of alleles SNP 1747CT gene ACTN3 and distribution of genotypes in the studied sampling of ethnic Russians have proved close enough to the corresponding data for Caucasians. In particular, Ahmetov I.I. et al. [117] for the sampling of the citizens of the Russian Federation which do not go in for sports professionally have received the following estimations of occurrence of genotypes RR, RX and XX- 36,2%, 49,7% and 14,1%, accordingly. Yang and the joint authors (Yang N. et al. 2003) at examination of the sampling of Australians which were considered to be healthy white Caucasians and did not go in for sport professionally, determined the occurrence of the genotypes RR, RX and ХХ as very close to the values received for our sampling, - 30%, 52% and 18%, accordingly. [112]. In the specified article, the authors have noted that in other
Biomedical Aspects in Investigations of Biochemical Polymorphism…
253
populations, the distribution of genotypes may be significantly different, for instance, less than 1% of the African population Bantu has genotype ХХ and some Asian populations have 25%. Moreover, in one of the popular databases of SNP‘s in the human genome («―Applied Biosystems»‖), there have been given the frequency of allele Х - 0,43 (by AGI) and 0,36 (by AB) for Caucasians, which is also close to our result – 0,415.
Figure 6. Distribution of SNP R577X (1747 CT) in gene ACTN3 in the sampling of ethnic Russians not going in for sport (n=65). Rank 1 – homozygotes by allele R577; rank 2 – heterozygotes; rank 3 – homozygotes by Х577.
Thus, close similarity of the received data about distribution of genotypes SNP 1747CT of gene ACTN3 and frequencies of alleles in the studied sampling of ethnic Russians with literary materials for Caucasians has made it possible to pass over to the corresponding inspections of groups of professional sportsmen. For study of possible correlations between various genotypes by SNP 1747CT in gene ACTN3 accompanied by replacement of R577X and professional going in for sports, there have been analyzed 165 samples of DNA of sportsmen of especially collected DNAcollections. Distribution of sampling of the examined sportsmen by sports qualification has been shown in figure 7А. As it is seen from these data, a significant number of the studied samples of DNA (n=85) belonged to competent professional sportsmen (candidate masters - 38, masters of sports - 33, masters of sports of the international class-11 and merited masters of sports - 3), many of which had been members of the national teams of Russia or of the USSR (at different times). In the specified cohort, there has been revealed a reduction of the abundance of genotypes XX by one and a half in comparison with the control sampling at preservation of the quantity of genotypes RX and at the corresponding increase of genotypes RR. The diagrams illustrating the distribution of the investigated genotypes in a cohort of competent professional sportsmen in comparison with the group of people not going in for sports have been given in figure 7B.
254
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
a
b Figure 7. Studying of distribution of genotypes by SNP R577X (1747 CT) in gene ACTN3 in the generalized group of competent professional sportsmen (n=85) and in the control group - ethnic Russians not going in for sports (n=65). A. Distribution according to the sport qualification of the sportsmen examined by SNP R577X of gene ACTN3. Rank 1 - beginning sportsmen. Rank 2 - firstgrade sportsmen. Rank 3 - candidate masters. Rank 4 - masters of sports. Rank 5 - masters of sports of the international class. Rank 6 – merited masters of sports. B. The results of the analysis of distribution of genotypes by SNP R577X (1747 CT) in gene ACTN3 in the generalized group of professional sportsmen (group 1) and in the control group - ethnic Russians not going in for sports (group 2). Rank 1 – homozygotes by allele R577; rank 2 - heterozygotes; rank 3 – homozygotes by allele Х577.
Biomedical Aspects in Investigations of Biochemical Polymorphism…
255
More striking differences in the distributions of genotypes by SNP R577X (1747 CT) in gene ACTN3 have been found out after grouping of the received data according to kinds of sports (figure 8). So, first of all, the attention has been drawn by the fact that the distribution of genotypes in sampling of elite swimmers (figure 8, rank 1) sharply differed from the control group. In the specified small sampling, there were no elite swimmers with genotype XX. In comparison with the control group heterozygotes there often dominated and the number of genotypes RR has increased accordingly. Frequencies of occurrence of alternative alleles 577R and 577Х in the sampling of elite swimmers (n=21) have appeared equal to 0,69 and 0,31, accordingly, and authentically differed from the similar parameters in the control group (р < 0,05). Thus, it has been possible to draw a conclusion that for allele 577R, there has been positive correlation with high sports achievements in swimming, and the carriers of genotype XX are possibly exposed to peculiar selection at going in for professional swimming. At determining the frequencies of occurrence of alternative alleles 577R and 577Х in the sampling of elite skiers (n=16) values of 0,72 and 0,28, accordingly, which also authentically differed from the control (р < 0,05), have been received. Thus, half of all the examined elite skiers had genotype RR and only one of them had genotype XX. As a whole, comparison of the received data about a subgroup of skiers and the control sampling (figure 8) proves certain positive association of allele 577R with high achievements in skiing (racing).
Figure 8. Polymorphism R577X of gene ACTN3 in the groups of professional sportsmen: swimmers (group A, n=21), skiers (group B, n=16), oarsmen (group C, n=35), water polo players (group D, n=16), and under control (people not going in for sports) (group E, n=65). Rank 1 - homozygotes by allele R577; rank 2 - heterozygotes; rank 3 – homozygotes by Х577.
256
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
The results of determining of frequencies of occurrence of alternative alleles 577R and 577Х in the sampling of elite oarsmen (n=35) have shown parameters 0,63 and 0,37, accordingly. Unlike three subgroups considered before, in the sampling of elite water polo players (n=16) frequencies of occurrence of alternative alleles 577R and 577Х have appeared equal to 0,5625 and 0,4375, accordingly, which actually did not differ from the control group. Thus, it is possible to think that going in for this team kind of sport professionally either do not require advantages of RR genotypes, or to reveal associations of the given kind of activity with any of alternative alleles SNP 1747CT in gene ACTN3 the examination of essentially greater samplings is required. Practically in parallel with the researches described above which results have been published in 2004-2006, in the periodic literature, there has appeared a number of messages of other authors about association of alleles R577X (1747 CT) with the raised ability to performance of physical activities and/or about reduction in occurrence of XX genotypes among elite sportsmen [117,121-123]. In particular, Rogozkin V.A. et al. (2004) informed that in the sampling examined by them, out of 10 professional swimmers, there was nobody with XX genotype [121]. As a rule, the authors noted that authentic associations of alleles R577X (1747 CT) can be observed only with some kinds of physical activity. For example, Lucia A. et al. (2006) [123] have not found out differences in occurrence of different genotypes by the given polymorphism in the group of elite sportsmen engaged in the socalled cyclic kinds, in comparison with the control. Thus, the results presented by different groups of authors are quite inconsistent and specify existence of association of alleles R577X (1747 CT) with going in for some kinds of sport. The accumulated experience and the obtained data have made it possible for us to organize and to carry out pilot examination (genotyping) by SNP 1747CT in gene ACTN3 of a group of beginning swimmers (n=72) from Children and Youth Sports School of the Olympic Reserve, St. Petersburg, which had been going in for sport for 2 two years in average. After joint summarizing of the genotyping, the trainers of the Children and Youth Sports School of the Olympic Reserve started to use the obtained results at preliminary selection of talented young sportsmen. Thus, the researches of polymorphism in gene ACTN3 - one of actin-connecting proteins containing CH-domains, can become an important element in molecular and genetic system of profiling of sportsmen. It is important to note that at present time, there appear hundreds of publications on revealing of associations of certain alleles in single nucleotide polymorphisms (SNP) with various, positive attributes for generalized definition of which in the literature written in English the terms ―physical performance‖ and ―health-related fitness‖ [124-126] are used. In fact, there has been formed a special international consortium for increase of efficiency of such searches, and, as a result, there have already been revealed hundreds of SNP‘s relating to ―physical performance‖ and ―health-related fitness‖ [126].
Biomedical Aspects in Investigations of Biochemical Polymorphism…
257
SOME PROBLEMS IN PROSATATE CANCER INVESTIGATIONS AND POLYMORPHISM OF ACTIN BINDING PROTEINS In modern biomedical researches, the analysis of abnormalities of gene expression in tumor cells and, in particular, in cancer cells of prostate and other urogenital organs, is one of the most urgent problems [26,127,128]. It is quite reasonably considered that the establishment of genomes features functioning in tumor cells will not only make it possible to find out molecular bases of pathogenesis of malignant tumors and the mechanisms providing for metastatic propensity, but also it will give new diagnostic markers for practical public health services which will qualitatively improve diagnostics of these diseases [26,128,129]. The latter is especially important for maintenance of effective diagnostics of prostate cancer (PC), since these tumors are complex enough for differentiating from benign prostate hyperplasia which is common for 50-year-old and older men [128-130]. It is known that PC is one of rather common diseases. In the structure of oncologic pathology of men of a number of western countries PC has already held 2 second or 3 third place after lung cancer and stomach cancer, and in the USAU.S.A., it took the first place [131]. At that, studying of prostate cancer dynamics during 25 years has shown its practically worldwide steady growth. In particular, for the specified period the disease incidence in Canada, the U.S.A., Finland, Sweden and Japan has grown by two2 times [130]. In Russia, there have also been noted high growth rates of PC (the first place among other oncologic diseases) [132,130], and there is obviously an extensive contingent in group of risk (actually all men under 50), which makes the problems of PC not only medical but also medical and social. Lacks (significant number of false positive and false negative results [133-136]) of the widely used diagnostic molecular marker - the so-called prostate-specific antigen (PSA), demand urgent organization and carrying out of researches directed for revealing of new and more effective molecular markers of PC. As a consequence, in the U.S.A. and in some other western countries, the search for such markers of PC is being carried out, for this purpose proteomics strategy and proteomic technologies [26,135,136] have been actively used. The similar research has been started in Russia [137,138]. As it was noted above (figure 2), an essentially important step in realization of the systems approach in studying of human proteins and proteomic strategy is the creation of synthetic two-dimensional map of proteins, studied biomaterial, which is usually defined as an original proteins portrait or proteins profile (pattern) characterizing this object [22,18,59]. Thereafter, one of the most urgent problems at searching for potentially significant diagnostic markers becomes the construction of two-dimensional proteins map of human prostate. It is quite evident that to solve the specified problem coordinated work of biochemists and clinicians— - urologists and researchers of some adjacent specialties isare required. From 2006, in the A.N. Bach Institute of Biochemistry, Russian Academy of Science (the laboratory of biomedical researches) at in cooperation with the Institute of Urology of the Ministry of Public Health of the Russian Federation, Russian Medical Academy of Postgraduate Education (the department of urology) and Orekhovich Institute of Biomedical Chemistry, Russian Academy of Medical Sciences, there have been started the proteomic
258
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
investigations of samples of prostate tissues with hyperplasia and with cancer [137,138]. By the present time by means of the earlier developed two-dimensional electrophoresis modification [59,139,137], there has been performed the analysis of samples of prostate tissues with hyperplasia (n=47) and with cancer (n=89). As an example, in figure 9, there is a photo of one of the obtained two-dimensional electrophoregram (2DE) of prostate proteins. The collected experimental material was used for construction of a synthetic twodimensional map of human prostate proteins. For this purpose, at first the distribution of protein fractions in electrophoregrams was documented in the form of the corresponding image which was registered and kept as a graphic file in the .tif format. Full images of 2DE and (in some cases) their separate sites were obtained by the results of scanning and/or according to the digital photo. Estimation of adequacy of the two-dimensional electrophoregrams selected for the subsequent analysis was performed during the preliminary comparison of the results of proteins fractioning. For this purpose, the method of computer «―video overlay»‖ was used.
Figure 9. Silver stained typical two-dimensional electrophoregram of human prostate proteins. Landmarks proteins are present in red ovals.
Biomedical Aspects in Investigations of Biochemical Polymorphism…
259
Then, every image was standardized by means of the Melanie program according to 15 defined points (landmarks) corresponding to the precisely identified «―major»‖ protein fractions (see figure 9) and divided into 49 conditional rectangles. The division of all the images was performed according to the modified method of Comings [by 140,141] by the same principle – the borders of the rectangles formed 6 six standard horizontal and 6 six standard vertical lines and the edges of the electrophoregram. The points for horizontal lines were found by means of special proteins - markers of molecular weights, which were put on every gel plate before fractioning in the second direction (SDS-electrophoresis in the plate of gradient polyacrylamide gel) [139-141]. Thus, protein fractions located in the corresponding horizontal lines will have identical values of molecular weights. To draw conditional vertical lines, there were applied different protein markers, a part of which represented some of the chosen defined points as landmarks (figure 10), the others were offered earlier [137, 141]. In parallel, there were carried out the mass-spectroscopy identification of all proteins-landmarks (see below).
Figure 10. Synthetic two-dimensional map of human prostate proteins, on which is shown 49 rectangles sites and 558 protein fractions. Several landmarks proteins are present in red ovals. 10 Ten proteins, which are distinguished specimens of prostate with benign and malignant tumor, are marked by blue color (see below).
260
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
In the subsequent analysis, the characteristics of the drawn lines were used in calculations of the electrophoretic mobility, indicating the value of molecular weights and isoelectric points of the corresponding proteins. As a consequence, every analyzed image was fragmented into 49 rectangles which usually contained not more than 10 protein fractions and only for 4 four rectangles, their number exceeded 20. Every rectangle received corresponding alphanumeric symbol. The performed fragmentation of images essentially facilitates carrying out of the subsequent comparison of images and constructing of the final standardized two-dimensional map. The next step toward solution of the task was carried out by comparison of the chosen and standardized 2DE images by the same rectangles, thus determining both coordinates of the spots. On this basis for each rectangle, the most often encountered groups of spots were defined. The sports which can be encountered in not less than 3/4 of all the analyzed electrophoregrams are normally referred to such spots. These spots were entered into the first version of the map of proteins of prostate tissues with hyperplasia according to their coordinates (figure 10). Every spot included in the map received a certain number. For a uniform designation of the protein fractions put in the synthetic two-dimensional map the system of universal sevenelement numeration [139] was used. In this system, the number of each fraction is the function of its electrophoretic characteristics – the first four figures correspond to the decimal logarithm of Mm, presented in the form of a simple number, and the last three – correspond to the value of pI, which is also presented in the form of a simple number. For example, the protein identified earlier as AGR2 [137], possessing Mm 19,0 kDa and pI 9,00, in the given system received number - 4279900. Such objective designations of all the protein fractions included in the generalized synthetic two-dimensional map of prostate tissues proteins make it possible to compare the results received at proteomic analysis of different objects adequately, especially for single-type proteins which electrophoretic properties are determined by post transmitting modifications. Actually, the basis for changing similar numbers can only be specification of the values of Mm or pI of the corresponding protein fraction. The carried out comparative analysis of two groups of samples of prostate, - with hyperplasia and with cancer, revealed that the distribution of protein fractions in 2DE for both patterns were rather similar, which made it possible to generalize the received materials in the form of a synthetic two-dimensional map (figure 10). However, there were established certain distinctions in 10 protein fractions which were present among the samples with malignant tumors but were not revealed at all or were registered in much less amount in the corresponding rectangles of 2DE of the samples with hyperplasia (figure 11). In parallel, there was carried out the mass-spectroscopy identification of these 10 proteins, that were characteristic for the samples with malignant tumors. The results will be described below. As a whole, the total of proteins fractions registered during the computer analysis of the images and included in the synthetic standardized map - scheme, made up 558 (figure 10).
Biomedical Aspects in Investigations of Biochemical Polymorphism…
11 A - Qualitative distinctions at five proteins.
261
262
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
11 B - Quantitative distinctions at five proteins. Figure 11. Sites of two-dimensional electrophoregrams with ten proteins (are shown by arrows) by which differ specimens of prostate with benign and malignant tumor.
Biomedical Aspects in Investigations of Biochemical Polymorphism…
263
As it was noted above, in conformity with the strategy of proteomic researches (figure 2) for refinement and subsequent effective utilization of the constructed two-dimensional map, there was performed the work on sequential identification of the proteins entered into the map by means of MALDI-TOF MS and MALDI-TOF MS/MS technologies. For this purpose, from the received gels, there were cut out the zones containing separate «―major»‖ protein fractions («―spots»‖ in 2DE), and the pieces of gels were processed (tripsin hydrolysis and peptide extraction) in accordance with the reports of «―the Proteomic Researches Center»‖ at the Orekhovich Institute of Biomedical Chemistry, Russian Academy of Medical Sciences [142]. Mass spectra of tryptic peptides were received by MALDI-TOF mass Reflex III spectrometers («―Bruker»‖, the U.S.A) with ultraviolet laser (336 nanometers) in the mode of positive ions in the range of masses of 500-8000 Da. Identification of proteins by the sets of peptides mass values after trypsinolysis was carried out using the Peptide Fingerprint option of the Mascot program («―Matrix Science»‖, the USAU.S.A.), with accuracy of determining of МН+ mass equal 0,01 %%, anticipating the opportunity of modification of cysteines by acrylamide and oxidation of methionines. At searching for the corresponding proteins, there were used the databases of the National Center of Biotechnological Information of the USAU.S.A. (
Table 3. Human prostate proteins identified by mass-spectroscopy on synthetic two-dimensional map. 10 proteins, which are distinguished specimens of prostate with benign and malignant tumor, are typed by the font ―Bold’.‖ Isoforms of transgenin are revealed by the font ―Bold italic.‖ № 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Names of proteins in databases NCBI, SWISSPROT etc. 2 *Complex of Light and Heavy Chains of Ferritin *Complex of Light Chains of Ferritin Caldesmon 1, isoform 4 Vinculin Putative RNA-binding motif protein 15 long form Collagen alpha 2 Aconitase 2, mitochondrial Protein disulfide isomerase ER60 Hypothetical protein or metastasis associated 1like 1 protein Cytosol aminopeptidase (Leucine aminopeptidase) Enolase, isoform 1 NADP-dependent isocitrate dehydrogenase IgHG1 protein (Immunoglobulin G Heavy Chain 1) Glutathione Reductase Phosphoglycerate kinase-1 N-acetylneuraminic acid synthase (sialic acid synthase) Tropomyosin 2 (beta) isoform 2 Heat shock 27kDa protein 1 IgG kappa chain Growth-inhibiting gene 5 protein Unknown protein contained albumin-like domain (product PRO 2675)
№ GeneBank (gi)
Number of rectangles on the map 4 A1
Мм/pI, experimental data 5 450,0/5,07
Number of protein on the map 6 5653507
B1 B2 B2 B2
450,0/5,8 145,0/5,60 145,0/5,40 140/5,50
5653580 5161560 5161540 5146550
1070605 15559448 7437388 51476663 (?)
C2 C3 B3 C3
130/6,30 84,0/6.61 52,0/6,15 51,0/6,59
5113630 4924661 4716615 4708659
12643394
C3
50,0/6,30
4699630
29792061 3641398 51593790
C3 C3 D3
47,3/6,58 46,5/6,32 51,0/6,90
4674658 4667632 4708690
3212536 48145549 12652539
D3 C4 D4
50,5/6,90 45,0/6,75 34,0/6,85
4703690 4653675 4531685
47519616 54696638 4176418 41350397 7770217
A5 B5 C5 D5 D5
33,0/4,60 27,5/6,00 27,0/6,30 29,0/6,90 28,5/6,92
4518460 4439600 4431630 4462690 4454692
3 47125326 182516 182516 2498204 24657579 14161373 (?)
№ 1 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 № 1 47
Names of proteins in databases NCBI, SWISSPROT etc. 2 Phosphoglycerate mutase 1 (brain) Variant gene 7 (TEL2 oncogene) Triosephosphate Isomerase Chain B Peroxiredoxin 1 Superoxide dismutase [Mn], mitochondrial Calponin 1, basic, smooth muscle FK506-binding protein 3 variant ATP synthase, H+ transporting, mitochondrial F1 complex, O subunit (oligomycin sensitivity conferring protein) Myosin, light chain 9, regulatory Transgelin Transgelin 2 Hypothetical protein, type II inositol-1,4,5trisphosphate 5-phosphatase Transgelin Transgelin Phosphatidylethanolamine binding protein 1 Crystallin, alpha B Transgelin (variant) Transgelin (variant) Transgelin (variant) Transgelin (variant) Transgelin Cysteine and glycine-rich protein 1 Chain A, Cyclophilin B Chain A, Galectin-1 S100 calcium binding protein A6 Names of proteins in databases NCBI, SWISSPROT etc. 2 Transgelin (variant)
№ GeneBank (gi) 3 56081766 56208120 999893 32455266 134665 21361120 62897547 54696534
Number of rectangles on the map 4 D5 D5 D5 D5 D5 E5 E5 G5
Мм/pI, experimental data 5 28.5/6,85 27,5/6,80 27,0/6,80 22,5/6,95 22,5/7,10 31,0/7,30 30,0/8,90 24,5/10,0
Number of protein on the map 6 4454685 4439680 4431680 4352695 4352710 4491730 4477890 43891000
6983729 49168456 55960374 52545791 (?)
A6 C6 C6 C6
19,8/4,80 22.0/6.76 21,4/6,20 21,0/6,55
4297480 4342676 4330620 4322655
49168456 49168456 21410340 13937813 62897565 62897565 62897565 62897565 49168456 54695910 1310882 42542977 12655153 № GeneBank (gi)
D6 D6 D6 D6 D6 D6 D6 D6 E6 G6 F6 A7 B7 Number of rectangles on the map 4 C7
22.0/6.86 22,0/7,05 21,7/7,12 21,0/7,10 20,5/6,90 19.8/6,90 19,8/6,80 19,5/6,82 22,0/7,15 21,6/10,5 20,0/9,10 14,0/5,20 9,5/5,30 Мм/pI, experimental data 5 19,5/6,35
4342686 4342705 4336712 4336710 4311690 4297690 4297680 4290682 4342715 43341050 4301910 4146520 3978530 Number of protein on the map 6 4290635
3 62897565
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
Transgelin (variant) Cu,Zn superoxide dismutase Fatty acid binding protein 5, (FABP 5) Unknown protein (homology with C-terminal of -actin) Unknown protein contained albumin-like domain (related PRO 2044) DBI (Diazepam-binding inhibitor, Acyl-CoAbinding protein ) Cofilin 1 (non-muscle) Peptidylprolyl isomerase A-likeк Profilin 1 Ubiquitin Peptidylprolyl isomerase A (cyclophilin A) Chain D, Human Hemoglobin AGR2, anterior gradient protein 2 (secreted cement gland anterior gradient protein) Chain C, Human Hemoglobin H3 histone, family 3A Histone 1 H2ad Histone 1, H2aj Histone H2B.1 Chain D, D-Dimer From Cross-Linked Fibrin
(Continued) 62897565 408239 30583737 16306948 (?)
C7 C7 C7 C7
18,6/6,40 16,0/6,60 16,0/6,30 16,5/6,70
4269640 4204660 4204630 4217670
6650826
C7
14,5/6,75
4161675
48146029
C7
10,2/6,2
4009620
15126676 56847632 4826898 229532 13937981 442753 37183136
D7 D7 D7 D7 E7 E7 F7
18,8/7,0 18,2/7,08 14,7/6,95 8,5/6,82 18,5/7,30 14,0/8,0 19,0/9,0
4274700 4260708 4167695 3929682 4267730 4146800 4279900
61679692 55665435 4495086 7264004 184086 28373951
F7 G7 G7 G7 G7 G7
14,0/9,50 15,2/11,30 14,2/ 11,0 14,1/10,9 11,5/10,6 10,0/10,50
4146950 41811130 41521100 41491090 40611060 40001050
* It is supposed, that these complexes are formed by covalent or to other strong links between several polypeptide chains.
Biomedical Aspects in Investigations of Biochemical Polymorphism…
267
Figure 12. Biochemical polymorphism of transgelin isoforms (shown by arrows) revealed by twodimensional electrophoresis (rectangles C6-D6-C7 on Fig.10) into human prostate with benign (A) and malignant tumor (B). For comparison the corresponding fragment of two-dimensional gel of human myoblast proteins (C) with the unique isoform of transgelin is shown.
Thus, it is probably, this transcript is formed by use the alternative splicing of gene TAGLN1, during which the fragment coding the 17 C-terminal amino acid residues is removed. The decisive argument in favor of this assumption there could become the revealing during the analysis of tryptic peptides in any of the truncated transgelin isoforms of a product containing 38-th amino acid residue, since the distinction in coding properties of the basic and «―variant»‖ transcripts was revealed in the form of replacement of C38R (alongside with the fact that the «―variant»‖ transcript does not have the ability to code the C-terminal site of transgelin). However, in no case it was it possible to detect this peptide into protein fractions 5-10. At the same time, in no fractions №№ 5-10 there was detected the peptide corresponding the C-terminal site of transgelin. Thus, it is obvious that the revealed truncated transgelin isoforms could be formed both due to the synthesis of the truncated transcript of gene TAGLN1 and/or due to the posttranslational removal of C-terminal sequence but the participation of other posttranslational modifications is also possible.
268
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
Figure 13. Results of the mass-spectrometric identification of transgelin isoforms shown as the location on whole amino acid sequences of transgelin (A, B) and transgelin variant (C) the sequences of triptic peptides, that found during analysis of A – Protein 4342715 (on Fig 12A – №1); B – Protein 4342676 (on Fig 12A – №4); C – Protein 4297680 (on Fig 12A – №6). The sequence fragments, that corresponded to the obtained triptic peptides, shown by capital letters and the not found sequence fragments shown as small letters in grey color. The polymorphic amino acid substitution in position 38 is revealed into frame - C38R.
Beside protein products of gene TAGLN1 in an investigated site has been identified a product of gene TAGLN2 (fraction №11, and on a card(map) - 4330620). Comparing the results of the transgelin analysis in the samples of prostate tissues with the data received for human myoblast cultures (figure 12C) makes it possible to come to the conclusion that the revealed biochemical transgelin polymorphism may be connected with the pathological processes diagnosed for the examined patients. This conclusion is consistent with the recently appeared literary materials about the involvement of transgelins in molecular mechanisms of carcinogenesis in prostate [143]. From the materials of studying of human genome (according to the OMIM, SNP‘s NCBI and other databases), it follows that this genome has three genes capable to code transgelin proteins (TAGLN1, TAGLN2, TAGLN3). Significant amount of SNP‘s has already been revealed in each of these genes (in TAGLN1 - 78; in TAGLN2 - 59; in TAGLN3 - 162). It has also been shown that gene TAGLN3 is rather specifically expressed in nerve tissues, and the formed protein is often designated as NP25 [144], and the products of the two other genes are found out in the most different tissues and, in particular, in prostate tissues [145,146]. However, the presented data (table 3 and figure 11) about the expressed
Biomedical Aspects in Investigations of Biochemical Polymorphism…
269
biochemical transgelin polymorphism in prostate tissues with hyperplasia, possibly, are new. As the majority of the revealed transgelin isoforms are most likely the products of one gene (tryptic peptides from 10 protein fractions were corresponded to the sequence coded by gene TAGLN1), it is possible to assume that this biochemical polymorphism is caused either by alternative splicing and/or by any postsynthetic modifications. The constructed two-dimensional map was used in a separate series of experiments directed for studying of the specificity of presence of AGR2 protein in prostate tissues with cancer tumors, which is considered to be a potential diagnostic marker [137,147,148]. In total, by the present time, it has been possible to reveal AGR2 at 83 of 89 examined patients with prostate cancer. At the same time, in the zone of localization of this protein in a twodimensional electrophoregram at 2in two of 47 examined people with prostate hyperplasia, there has been detected minor protein fraction. Probably, it was AGR2, but it was impossible to establish the nature of the protein in the revealed fraction by means of mass spectrometry because of its small amount. As a whole, construction of a two-dimensional protein map for representative sampling of prostate tissues samples can be considered an important step to creation of clinical and experimental platform for research of molecular alteration at prostate cancer. Thus, it is obvious that significant opportunities of applying proteomic technologies for studying biochemical ABP polymorphism in benign and malignant tumors are open, and the further investigations can contribute to decoding of molecular mechanisms of carcinogenesis.
CONCLUSION Long-term researches of the biochemical polymorphism of human actins and some actin binding proteins have proved to be connected with studying of a whole complex of medical and biological problems. Detection of dozens of SNP‘s in all genes of actins and in ABP genes can be considered to be one of the brightest results which that have been received by the present time. It is important to emphasize that for many of these SNP‘s, there have been established pathogenetic roles in the development of some myopathies and some other forms of hereditary pathology. Accordingly, the fundamental medical and biological consequence has been the opportunity to use various tests for detection of the specified SNP‘s, which provides for exact molecular diagnostics of separate severe diseases including prenatal diagnostics, which makes it possible to exclude new cases of pathology in the burdened families. At the last five years, a new important biomedical aspect in researches of ABP polymorphisms the searches of associations for different SNP‘s with various positive qualities of muscular system have become actively developed. Presence (or absence) of carriers of certain alleles among highly skilled professional sportsmen going in for certain kinds of sport gives the basis to hope that in the near future, it will become possible to raise the efficiency of sportsmen training owning to preliminary profiling of sports beginners by the results of molecular and genetic tests in specially selected and approved polymorphisms in ABP genes. Evident acknowledgement of validity of similar expectations are presented by the results of studying of R577X polymorphism (1747 CT) in gene ACTN3, as well as by the data on polymorphisms in dozens of other genes which intensive study is continuing [126,149,150].
270
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
In conclusion, it is necessary to note that researches of biochemical ABP polymorphism at tumoral transformation have special biomedical prospects. Despite of significant achievements in decoding molecular mechanisms of carcinogenesis, there still exist many white spots and unsolved questions. In particular, great attention is drawn to characteristic changes of cytoskeletal structures in cancer cells and especially of actin microfilaments in structure of which various ABP isoforms are revealed. The development of postgenomic technologies, an important part of which belongs to proteomic technologies, makes it possible to hope that they will provide for obtaining of new data on protein polymorphism in malignant neoplasms. As a result, it is possible to expect the development of new methods of early diagnostics of tumoral diseases and new approaches to the development of methods of treatment of some cancer forms. Thus, as a whole, the researches of biochemical actin and ABP polymorphism represent a dynamically developing field of modern human biochemistry containing various biomedical aspects, which is essentially important for fundamental science and has already found application in practical public health service. The work has been supported by the Program of the Presidium of the Russian Academy of Science «Fundamental Science for Medicine», the project «Human Polymorphism», and JSC Moscow Committee for Science and Technology. The authors express their sincere gratitude for cooperation in separate units of the present work to: Professor N.K. Dzeranovov, A.V. Kazachenko, PhD, and K.I. Totrov (Institute of Urology of the Ministry of Public Health of the Russian Federation, Moscow); Associate Members of the Russian Academy of Medical Science, Professors O.B. Loran and I.V. Kononkov (Russian Medical Academy of Post-Diploma Education, urology department, Moscow); I.Y. Toropygin (Orekhovich Institute of Biomedical Chemistry, Russian Academy of Medical Sciences, Moscow); Professor O.L. Vinogradova (Institute for Biomedical Problems of Russian Academy of Sciences, Moscow), senior trainer V.A. Izmaylov (Children and Youth Sports School of the Olympic Reserve SKA, St. Petersburg), P.Z. Khasigov, M.D. (I.M. Sechenov Moscow Medical Academy, Moscow), A.G. Plugov, post-graduate student, and A.V.Ivanov, student (Peoples' Friendship University of Russia, Moscow).
REFERENCES [1] [2] [3] [4]
[5]
Sengbusch P. Molecular and Cellular Biochemistry [Russian translation], 1982. Vol. 2, «Mir», Moscow. P.148-199. Poglazov B.F. Levitsky D.I. Myosin and biological activity. (in Russian) «Nauka». М. 1982. 160p. Pollard T.D., Cooper J.A. Actin and actin-binding proteins: a critical evaluation of mechanisms and function. Annu. Rev. Biochem., 1986, V.55, 987–1035. dos Remedios, C.G., Chhabra D., Kekic M., Dedova I.V., Tsubakihara M., Berry D.A., Nosworthy N.J. Actin Binding Proteins: Regulation of Cytoskeletal Microfilaments. Physiol. Rev. 2003, V.83, 433–473. Huff T., Muller C.S., Otto A.M., Netzker R., Hannappel E. beta-Thymosins, small acidic peptides with multiple functions. Int. J. Biochem. Cell. Biol. 2001. V.33, 205220.
Biomedical Aspects in Investigations of Biochemical Polymorphism… [6] [7] [8] [9]
[10] [11]
[12]
[13] [14]
[15]
[16]
[17] [18] [19] [20] [21]
[22]
271
Korenbaum E. Rivero F. Calponin homology domains at a glance. J. Cell Sci. 2002. V.115. 3543-3545. Gimona M., Mital R. The single CH domain of calponin is neither sufficient nor necessary for Factin binding. J. Cell Sci. 1998. V.111. P.1813-1821. Morgan K.G., Gangopadhyay S.S. Invited Review: Cross-bridge regulation by thin filament-associated proteins. J. Appl Physiol. 2001. V.91. P.953–962. Sheen V.L., Feng Y., Graham D., Takafuta T., Shapiro S.S., Walsh C.A. Filamin A and Filamin B are co-expressed within neurons during periods of neuronal migration and can physically interact. Hum. Mol. Genet. 2002. V.11. P.2845-2854. Popowicz G.M., Schleicher M., Noegel A.A., Holak T.A. Filamins: promiscuous organizers of the cytoskeleton. Trends Biochem. Sci. 2006. V.31. P.411-419. Nagase, T., Miyajima, N., Tanaka, A., Sazuka, T., Seki, N., Sato, S., Tabata, S., Ishikawa, K., Kawarabayasi, Y., Kotani, H., Nomura, N. Prediction of the coding sequences of unidentified human genes. III. The coding sequences of 40 new genes (KIAA0081-KIAA0120) deduced by analysis of cDNA clones from human cell line KG-1. DNA Res. 1995. V.2. P.37-43. Latif N., Yacoub M.H., George R., Barton P.J., Birks E.J. Changes in sarcomeric and non-sarcomeric cytoskeletal proteins and focal adhesion molecules during clinical myocardial recovery after left ventricular assist device support. J. Heart Lung Transplant. 2007 V.26. P.230-235. Washington R.W., Knecht D.A. Actin binding domains direct actin-binding proteins to different cytoskeletal locations. BMC Cell Biol. 2008. V.9. P.10. Matsuda C., Kameyama K., Suzuki A., Mishima W., Yamaji S., Okamoto H., Nishino I., Hayashi Y.K. Affixin activates Rac1 via betaPIX in C2C12 myoblast. FEBS Lett. 2008. V.582. P.1189-1196. Hijikata T., Nakamura A., Isokawa K., Imamura M., Yuasa K., Ishikawa R., Kohama K., Takeda S., Yorifuji H. Plectin 1 links intermediate filaments to costameric sarcolemma through {beta}-synemin, {alpha}-dystrobrevin and actin. J. Cell Sci. 2008. V.121(Pt 12). P.2062-2074. Lin C.S., Shen W., Chen Z.P., Tu Y.H., Matsudaira P. Identification of I-plastin, a human fimbrin isoform expressed in intestine and kidney. Mol. Cell Biol. 1994. V.14. P.2457-2467. Gorbunova V.N., Saveleva-Vasileva E.A., Krasilnikov V.V. Molecular neurology. Part 1. Diseases of neuro-muscular system. «Intermedika» St-Peterburg. 2000. P.19-190. Archakov A.I. What lies beyond genomics?--Proteomics. Vopr. Med. Khim. 2000. V.46. P..335-343. Peltonen L., McKusick, V.A. Genomics and medicine. Dissecting human disease in the postgenomic era. Science. 2001. V.291, 1224-1229. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature. 2001. V.409. P.860-921. Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., et al. The sequence of the human genome. Science. 2001. V.291. P.1304-1351. Wilkins, M.R., Williams, K.L., Appel, R.D., Hochstrasser, D.F. Proteomic Research: New Frontiers in Functional Genomics (Principle and Practice), Springer Verlag, Berlin. 1997.
272
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
[23] Storck T., von Brevern M.C., Behrens C.K., Scheel J., Bach A. Transcriptomics in predictive toxicology. Curr. Opin. Drug. Discov. Devel. 2002. V.5. P.90-97. [24] Jungblut, P.R., Zimny-Arndt, U., Zeindl-Eberhart, E., Stulik, J., Koupilova, K., Pleissner, K.P., Otto, A., Muller, E.C., Sokolowska-Kohler, W., Grabher, G., Stoffler, G. Proteomics in human disease: cancer, heart and infectious diseases. Electrophoresis. 1999. V.20, P.2100-2110. [25] Kaiser, C., von Stein, O., Laux, G., Hoffmann, M. Functional genomics in cancer research: identification of target genes of the Epstein-Barr virus nuclear antigen 2 by subtractive cDNA cloning and high-throughput differential screening using highdensity agarose gels. Electrophoresis. 1999. V.20, P.261-268. [26] Taylor B.S., Varambally S., Chinnaiyan A.M. Differential proteomic alterations between localised and metastatic prostate cancer. Br. J. Cancer. 2006. V.95. P.425-430. [27] Thorgeirsson S.S., Lee J.S., Grisham J.W. Functional genomics of hepatocellular carcinoma. Hepatology. 2006. V.43.(2 Suppl 1). P.S145-150. [28] North K.N., Yang 1.N., Wattanasirichaigoon D., Mills M., Easteal S., Beggs A.H. A common nonsense mutation results in a-actinin-3 deficiency in the general population. Nature Genet. 1999. V.21. P. 353-354. [29] Franz W.M., Muller O.J , Katus H.A. Cardiomyopathies: from genetics to the prospect of treatment. Lancet. 2001. V.358. P.1627-1637. [30] The biological encyclopaedic dictionary. Ed. by M.S. Gilyarov. «The Soviet encyclopedia». Moscow. 1986. 832p. [31] Budd T. Object-oriented programming in operation. [Russian translation] «Peter Publishing». St-Peterburg. 1997. 464p. [32] van der Veen K.J., Willebrands A.F. Isoenzymes of creatine phosphokinase in tissue extracts and in normal and pathological sera. Clin. Chim. Acta 1966. V.13. P.312-317. [33] Harris H., Hopkinson D.A.. Handbook of enzyme electrophoresis in human genetics. Amsterdam 1976. [34] Shell W.E., Kjekshus J.K., Sobel B.E. Quantitative assessment of the extent of myocardial infarction in the conscious dog by means of analysis of serial changes in serum creatine phosphokinase activity. J. Clin. Invest. 1971. V.50. P.2614-2617. [35] Vaidya H.C., Maynard Y., Dietzler D.N., Ladenson J.H. Direct measurement of creatine kinase-MB activity in serum after extraction with a monoclonal antibody specific to the MB isoenzyme. Clin. Chem. 1986. V.32. P.657-662. [36] Rosalki S.B., Roberts R., Katus H.A., Giannitsis E., Ladenson J.H., Apple F.S.. Cardiac biomarkers for detection of myocardial infarction: perspectives from past to present. Clin. Chem. 2004. V.50. P.2205-2213. [37] Murray R., Granner D., Mayes P., Rodwell V. Biochemistry of human. 21-st edition. [Russian translation]. «Mir». Moscow. 1993. V.1. 381p. [38] Shulz G., Schirmer R.Р. Principles of Protein Structure. [Russian translation] «Mir». Moscow 1982. P.284-292. [39] Woods R. Biochemical Genetics. [Russian translation] «Mir». Moscow. 1983. 127p. [40] Beaudet A.L., Scriver C.R., Sly W.S., Cooper D.N., Mekusick V.A., Schmidke J. Genetics and biochimistry of variant human phenotypes. In: "The Metabolic basis of inherited disease.". By ed. C.R.Scriver, A.L.Beaudet, W.S.Sly, D.Walle. 6-th ed. Mc Graw-Hill Inf. Ser. Chem. 1989, v.1, 3-163.
Biomedical Aspects in Investigations of Biochemical Polymorphism…
273
[41] Vandekerckhove J., Weber K. At least six different actins are expressed in a higher mammal: an analysis based on the amino acid sequence of the amino-terminal tryptic peptide. J. Mol. Biol. 1978. V.126. P.783-802. [42] Shishkin S.S. Myostatin and several biochemical factors, which regulate the growth of muscle tissues on human and other highest vertebrates. Usp. Biol. Chem. 2004. V.44. P.209-262. [43] Chamberlain J.S., Gibbs R.A., Ranier J.E., Nguyen P.N., Caskey C.T. Deletion screening of the Duchenne muscular dystrophy locus via multiplex DNA amplification. Nucl. Acids Res. 1988. V.16. Р.11141-11156. [44] Beggs A.H., Koenig M., Boyce F.M., Kunkel L.M. Detection of 98% of DMD/BMD gene deletions by polymerase chain reaction. Hum.Genet. 1990. V.86. Р.45-48. [45] Gazaryan K.G., Tarantul V.Z. Eukaryotic Genome. «Moscow University». Moscow. 1983. [46] Lewin B. Genes. [Russian translation] «Mir». Moscow. 1987. 544с. [47] Breitbart R.E., Andreadis A., Nadal - Ginard B. Alternative splicing: a ubiquitous mechanism for the generation of multiple protein isoforms from single genes. Ann. Rev. Biochem. 1987. V.56. P.467-495. [48] Shishkin S.S., Kalinin V.N. Medical aspects of Biochemical and Molecular Genetics. «VINITI», Moscow. 1992, 216p. [49] Frezal J. Munnich A., Mitchell G. One gene, several messages. From multifunctional proteins to endogenous opiates. Hum.Genet. 1983. V.64. P.311-314. [50] Perry S.V. Troponin T: genetics, properties and function. J. Muscle Res. Cell Motil. 1998. V.19. P.575-602. [51] Schiaffino S., Reggiani C. Molecular Diversity of myofibrillar proteins: gene regulation and functional significance Physiol. Reviews. 1996. V.76. P.371-423. [52] Anderson N.G., Anderson L. The Human protein index. Clin.Chem., 1982, v.28, 739748. [53] Two-Dimensional Gel Electrophoresis of Proteins: Methods and Applications. Celis, J.E. and Bravo, R., ed. 1984. Academic Press, N. Y. [54] Shishkin S.S. Molecular-anatomical studies in biochemical genetics. Vestn. Akad. Med. Nauk SSSR. 1985. №1. P.78-84. [55] Klose J. Systematic analysis of the total proteins of a mammalian organism: principles, problems and implications for sequencing the human genome. Electrophoresis, 1989. V.10. P.140-152. [56] O'Farrell, P.H. High resolution two-dimensional electrophoresis of proteins. J. Biol. Chem. 1975. V. 250, 4007-4021. [57] Celis J.E., Madsen, P., Rasmussen, H.H., Leffers, H., Honore, B., Gesser, B., Dejgaard, K, Olsen, E., Magnusson, N., and Kiil, J., et al. A comprehensive two-dimensional gel protein database of noncultured unfractionated normal human epidermal keratinocytes: towards an integrated approach to the study of cell proliferation, differentiation and skin diseases. Electrophoresis. 1991. V.12. P.802-872. [58] Shishkin S.S. From structural to functional genomics: theoretical and applied aspects. Vestn. Ross. Akad. Med. Nauk. 2002.№4. P.11-16. [59] Shishkin S.S., Kovalyov L.I., Kovalyova M.A. Proteomic studies of human and other vertebrate muscle proteins. Biochemistry (Mosc). 2004. V.69. P.1283-1298.
274
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
[60] Abst. II Siena 2D-Electrophoresis Meeting "From Genome to Proteome,", Siena, Italy, 16-18 Sept. 1996. [61] Gromov P.S., Celis J.E. From genomics to proteomics. Mol. Biol. (Mosk). 2000. V.34. P.597-611 . [62] Goldman D., Merril C.R. Genetic polypeptide variation by two-dimensional electrophoresis. Ann. N Y Acad Sci. 1984. V.428. P.186-200. [63] Huang Y., von Eckardstein A., Wu S., Assmann G. Effects of the apolipoprotein E polymorphism on uptake and transfer of cell-derived cholesterol in plasma. J. Clin. Invest. 1995. V.96. P.2693-2701. [64] Arrell D.K., Neverova I., Van Eyk J.E. Cardiovascular proteomics: evolution and potential. Circ. Res. 2001. V. 27. 8. P. 763-773 [65] Nunoi H., Yamazaki T., Tsuchiya H., Kato S., Malech H.L., Matsuda I., Kanegasaki S. A heterozygous mutation of beta-actin associated with neutrophil dysfunction and recurrent infection. Proc. Natl. Acad. Sci. U. S. A. 1999. V.96. P.8693-8698. [66] Long S., Tian Y., Zhang R., Yang L., Xu Y., Jia L., Fu M. Relationship between plasma HDL subclasses distribution and lipoprotein lipase gene HindIII polymorphism in hyperlipidemia. Clin. Chim. Acta. 2006. V.366. P.316-321. [67] Kovalyov L.I., Kovalyova M.A., Kovalyov P.L., Serebryakova M.V., Moshkovskii S.A., Shishkin S.S. Polymorphism of delta3,5-delta2,4-dienoyl-coenzyme A isomerase (the ECH1 gene product protein) in human striated muscle tissue. Biochemistry (Mosc). 2006 V.71. P.448-453. [68] Laptev A.V., Shishkin S.S., Kovalyov L.I. Galyuk M.A., Musalyamov A.Ch., Egorov Ts.A. Identification of two allelic variant of isoform MHC1-V/sB (human myosin light chain). Biochem. Genet. 1993. V.31, N5/6, 253-258. [69] Chadwick B.P., Mull J., Helbling L.A., Gill S., Leyne M., Robbins C.M., Pinkett H.W., Makalowska I., Maayan C., Blumenfeld A., Axelrod F.B., Brownstein M., Gusella J.F., Slaugenhaupt S.A. Cloning, mapping, and expression of two novel actin genes, actinlike-7A (ACTL7A) and actin-like-7B (ACTL7B), from the familial dysautonomia candidate region on 9q31. Genomics. 1999. V.58. P.302-309. [70] Harata M., Nishimori K., Hatta S. Identification of two cDNAs for human actin-related proteins (Arps) that have remarkable similarity to conventional actin. Biochim. Biophys. Acta. 2001. V.1522. V.130-133. [71] Gunning P.P. Ponte H., Blau H., Kedes L. -Skeletal and -cardiac actin genes are coexpressed in adult human skeletal muscle and heart. Mol. Cell. Biol. 1983. V.3. P.1985-1995. [72] Agrawal P.B., Strickland C.D., Midgett C., Morales A., Newburger D.E., Poulos M.A., Tomczak K.K., Ryan M.M., Iannaccone S.T., Crawford T.O., Laing N.G., Beggs A.H. Heterogeneity of nemaline myopathy cases with skeletal muscle alpha-actin gene mutations. Ann. Neurol. 2004. V.56. P.86-96. [73] Kaindl A.M., Ruschendorf F., Krause S., Goebel H.H., Koehler K., Becker C., Pongratz D., Muller-Hocker J., Nurnberg P., Stoltenburg-Didinger G., Lochmuller H., Huebner A. Missense mutations of ACTA1 cause dominant congenital myopathy with cores. J. Med Genet. 2004. V.41. P.842-848. [74] Olson T.M., Michels V.V., Thibodeau S.N., Tai Y.-S., Keating M.T. Actin mutations in dilated cardiomyopathy, a heritable form of heart failure. Science. 1998. V.280. P.750752.
Biomedical Aspects in Investigations of Biochemical Polymorphism…
275
[75] Mogensen J., Klausen I.C., Pedersen A.K., Egeblad H., Bross P., Kruse T.A., Gregersen N., Hansen P.S., Baandrup U., Borglum A.D. Alpha-cardiac actin is a novel disease gene in familial hypertrophic cardiomyopathy. J. Clin. Invest. 1999. V.103. P.39-R.43. [76] Ueyama H., Bruns G., Kanda N. Assignment of the vascular smooth muscle actin gene ACTSA to human chromosome 10. Jpn. J. Hum. Genet. 1990. V.35. P.145-150. [77] Cheung V.G., Conlin L.K., Weber T.M., Arcaro M., Jen K.-Y., Morley M., Spielman R.S. Natural variation in human gene expression assessed in lymphoblastoid cells. Nature Genet. 2003. V.33. P.422-425. [78] Procaccio V., Salazar G., Ono S., Styers M.L., Gearing M., Davila A., Jimenez R., Juncos J., Gutekunst C.-A., Meroni G., Fontanella B., Sontag E., Sontag J.M., Faundez V., Wainer B.H. A mutation of beta-actin that alters depolymerization dynamics is associated with autosomal dominant developmental malformations, deafness, and dystonia. Am. J. Hum. Genet. 2006. V.78. P.947-960. [79] Zhu M., Yang T., Wei S., DeWan A.T., Morell R.J., Elfenbein J.L., Fisher R.A., Leal S.M., Smith R.J.H., Friderici K.H. Mutations in the gamma-actin gene (ACTG1) are associated with dominant progressive deafness (DFNA20/26). Am. J. Hum. Genet. 2003. V.73. P.1082-1091. [80] Chadwick B.P., Mull J., Helbling L.A., Gill S., Leyne M., Robbins C.M., Pinkett H.W., Makalowska I., Maayan C., Blumenfeld A., Axelrod F.B., Brownstein M., Gusella J.F., Slaugenhaupt S.A. Cloning, mapping, and expression of two novel actin genes, actinlike-7A (ACTL7A) and actin-like-7B (ACTL7B), from the familial dysautonomia candidate region on 9q31. Genomics. 1999. V.58. P.302-309. [81] Boheler K.R., Carrier L., De La Bastie D., Allen P.D., Komajada M., Mercadier J.J., Schwarttz K. Skeletal actin mRNA increases in the human heart during ontogenic development and is the major isoform of control and failing adult hearts. J. Clin. Invest. 1991. V.88. P323-330. [82] Buckingham, M., Alonso, S., Barton, P., Cohen, A., Daubas, P., Garner, I., Robert, B., Weydert, A. Actin and myosin multigene families: their expression during the formation and maturation of striated muscle. Am. J. Med. Genet. 1986. V.25. P.623634. [83] Gunning, P., Ponte, P., Okayama, H., Engel, J., Blau, H., Kedes, L. Isolation and characterization of full-length cDNA clones for human alpha-, beta-, and gamma-actin mRNAs: skeletal but not cytoplasmic actins have an amino-terminal cysteine that is subsequently removed. Molec. Cell. Biol. 1983. V.3. P.787-795. [84] Otterbein L., Graceffa P., Dominguez R. The crystal structure of uncomplexed actin in the ADP state. Science. 2001. V.293. P.708–711. [85] Holmes K.C., Popp D., Gebhard W., Kabsch W. Atomic model of the actin filament. Nature. 1990. V.347. P. 44–49. [86] Aguda A.H., Leslie D. Burtnick L,D., Robinson R.C. The state of the filament. EMBO reports. 2005. V.6. Р. 220–226. [87] Filatov VL, Katrukha AG, Bulargina TV, Gusev NB. Troponin: structure, properties, and mechanism of functioning. Biochemistry (Mosc). 1999. V.64(9). P.969-985. [88] Perry S.V. Vertebrate tropomyosin: distribution, properties and function. J. Muscle Res. Cell Motil. 2001. V.22. P.5-49. [89] Thierfeld L., Watkins H., MacRae C. Lamas R, McKenna W, Vosberg HP, Seidman JG, Seidman CE. Alpha-tropomyosin and cardiac troponin T mutations cause familial
276
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
hypertrophic cardiomyopathy: a disease of the sarcomere. Cell. 1994. Vol.77. P.701712. [90] Poutanen T., Tikanoja T., Jaaskelainen P., Jokinen E., Silvast A., Laakso M., Kuusisto J. Diastolic dysfunction without left ventricular hypertrophy is an early finding in children with hypertrophic cardiomyopathy-causing mutations in the beta-myosin heavy chain, alpha-tropomyosin, and myosin-binding protein C genes. Am. Heart J. 2006. V.151. P.725.e1-725.e9. [91] Donner K., Ollikainen M., Ridanpaa M., Christen H.J., Goebel H.H., de Visser M., Pelin K., Wallgren-Pettersson C. Mutations in the beta-tropomyosin (TPM2) gene - a rare cause of nemaline myopathy. Neuromuscul. Disord. 2002. V.12. 151-158. [92] Sung S.S., Brassington A.M., Grannatt K., Rutherford A., Whitby F.G., Krakowiak P.A., Jorde L.B. Carey J.C., Bamshad M. Mutations in genes encoding fast-twitch contractile proteins cause distal arthrogryposis syndromes. Am. J. Hum. Genet. 2003. V.72. P.681-690. [93] Laing N.G., Wilton S.D., Akkari P.A., Dorosz S., Boundy K., Kneebone C., Blumbergs P., White S., Watkins H., Love D.R., et al. A mutation in the alpha tropomyosin gene TPM3 associated with autosomal dominant nemaline myopathy. Nat. Genet. 1995. V.9. P.75-79. [94] Clarkson E., Costa C.F., Machesky L.M. Congenital myopathies: diseases of the actin cytoskeleton. J. Pathol. 2004. P.204. P.407-417. [95] Li D.Q, Wang L., Fei F., Hou Y.F., Luo J.M., Wei-Chen; Zeng R., Wu J., Lu J.S., Di G.H., Ou Z.L., Xia Q.C., Shen Z.Z., Shao Z.M. Identification of breast cancer metastasis-associated proteins in an isogenic tumor metastasis model using twodimensional gel electrophoresis and liquid chromatography-ion trap-mass spectrometry. Proteomics. 2006. V.6. P.3352-3368. [96] Miliou A., Anastasakis A., D'Cruz L.G., Theopistou A., Rigopoulos A., Rizos I., Stamatelopoulos S., Toutouzas P., Stefanadis C. Low prevalence of cardiac troponin T mutations in a Greek hypertrophic cardiomyopathy cohort. Heart. 2005. V.91. P966967. [97] Doolan A., Tebo M., Ingles J., Nguyen L., Tsoutsman T., Lam L., Chiu C., Chung J., Weintraub R.G., Semsarian C. Cardiac troponin I mutations in Australian families with hypertrophic cardiomyopathy: clinical, genetic and functional consequences. J. Mol. Cell. Cardiol. 2005. V.38. P.387-393. [98] Saprygin D.B., Romanov M.Yu.. Importance of troponins (I and T) for stratification and prognosis of acute coronary syndrome. Laboratory Medicine. 2002. V.5. P.14-18. [99] Thweatt R., Lumpkin C.K., Goldstein S. A novel gene encoding a smooth muscle protein is overexpressed in senescent human fibroblasts. Biochem. Biophys. Res. Commun. 1992. V.187. P.1-7 [100] Gimona M., Djinovic-Carugo K., Kranewitter W.J., Winder S.J. Functional plasticity of CH domains. FEBS Lett. 2002. V.513.P.98-106. [101] Gusev N.B. The Bases of Biochemistry of muscular tissues. In: «Muscle tissues». Ed. by Yu. S. Chentzov. «Meditzina». Moscow. 2001. P.176-267. [102] Faulkner G., Lanfranchi G., Valle G. Telethonin and other new proteins of the Z-disc of skeletal muscle. // IUBMB Life. 2001. V.51. P.275-282. [103] Pyle W.G., Solaro R.J. At the crossroads of myocardial signaling: the role of Z-discs in intracellular signaling and cardiac function. Circ. Res. 2004. V.94. P.296-305.
Biomedical Aspects in Investigations of Biochemical Polymorphism…
277
[104] Luther P.K. Three-dimensional structure of a vertebrate muscle Z-band: implications for titin and alpha-actinin binding. J. Struct. Biol. 2000. V.129. P.1-16. [105] Luther P.K., Padron R., Ritter S., Craig R., Squire J.M. Heterogeneity of Z-band structure within a single muscle sarcomere: implications for sarcomere assembly. J. Mol. Biol. 2003. V.332. P.161-169. [106] Luther P.K., Squire J.M. Muscle Z-band ultrastructure: titin Z-repeats and Z-band periodicities do not match. J. Mol. Biol. 2002. V.319. P.1157-1164. [107] Virel A., Backman L. Molecular evolution and structure of alpha-actinin. Mol. Biol Evol. 2004. V.21. P.1024-1031. [108] Otey C.A., Carpen O. Alpha-actinin revisited: a fresh look at an old player. Cell. Motil. Cytoskeleton. 2004. V.58. P.104-111. [109] Agarkova I., Perriard J.C. The M-band: an elastic web that crosslinks thick filaments in the center of the sarcomere. Trends. Cell Biol. 2005. V.15. P.477-485. [110] Young P., Ferguson C., Banuelos S., Gautel M. Molecular structure of the sarcomeric Z-disk: two types of titin interactions lead to an asymmetrical sorting of alpha-actinin. EMBO J. 1998. V.17. P.1614-1624. [111] Brakebusch C., Fassler R. The integrin-actin connection, an eternal love affair. EMBO J. 2003. V.22. P.2324-2333. [112] Yang N., MacArthur D.G., Gulbin J.P., Hahn A.G., Beggs A.H., Easteal S., North1 K. ACTN3 Genotype Is Associated with Human Elite Athletic Performance. Am. J. Hum. Genet. 2003. V.73. P.627–631. [113] Beggs A.H., Phillips H.A., Kozman H., Mulley J.C., Wilton S.D., Kunkel L.M., Laing N.G. A (CA)n repeat polymorphism for the human skeletal muscle alpha-actinin gene ACTN2 and its localization on the linkage map of chromosome 1. Genomics. 1992. V.13. P.1314-1315. [114] Kaplan J.M., Kim S.H., North K.N., Rennke H., Correia L.A., Tong H.-Q., Mathis B.J., Rodriguez-Perez J.-C., Allen P.G., Beggs A.H., Pollak M. R. Mutations in ACTN4, encoding alpha-actinin-4, cause familial focal segmental glomerulosclerosis. Nature Genet. 2000. V.24. P.251-256. [115] Aucella F., De Bonis P., Gatta G., Muscarella L.A., Vigilante M., di Giorgio G., D'Errico M., Zelante L., Stallone C., Bisceglia L. Molecular analysis of NPHS2 and ACTN4 genes in a series of 33 Italian patients affected by adult-onset nonfamilial focal segmental glomerulosclerosis. Nephron Clin. Pract. 2005. V.99. P.31-36. [116] Clarkson P.M., Devaney J.M., Gordish-Dressman H., Thompson P.D., Hubal M.J., Urso M., Price T.B., Angelopoulos T.J., Gordon P.M., Moyna N.M., Pescatello L.S., Visich P.S., Zoeller R.F., Seip R.L., Hoffman E.P. ACTN3 genotype is associated with increases in muscle strength in response to resistance training in women. J. Appl. Physiol. 2005. V.99. P.154-1.63 [117] Akhmetov I.I., Astratenkova I.V., Drughevskaya A.M., Komkova A.I., Lyubaeva E.V., Tarakin P.P., Netreba A.I., Popov D.V., Prostova A.B., Vinogradova O.L., Shenkman B.S., Rogozkin V.A. The significance of the complex analysis of factors for genetic predisposition to muscular activity of the human. Medical and biologic technologies of increase of serviceability in conditions of intense physical working. The issue of articles. (in Russian) V.2. Ed. by A.I. Grigorev. «Anita Press». Moscow. 2006. P.23-38. [118] Otey C.A., Rachlin A., Moza M., Arneman D., Carpen O. The paladin /myotilin/ myopalladin family of actin-associated scaffolds. Int. Rev. Cytol. 2005. V.246. P.31-58.
278
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
[119] Gontier Y., Taivainen A., Fontao L., Sonnenberg A., van der Flier A., Olli Carpen O., Faulkner G., Borradori L. The Z-disc proteins myotilin and FATZ-1 interact with each other and are connected to the sarcolemma via muscle-specific filamins. J. Cell Sci. 2005. V.118. P.3739-3749. [120] Krakhmaleva I.N., Shishkin S.S., Shakhovskaia N.I., Stoliarova E.B., Plugov A.G., Kniazev A.I., Khomenkov V.G., Shevelev A.B., Chernov N.N. NP-detecting DNA technologies: solving problems of applied biochemistry. Prikl. Biokhim. Mikrobiol. 2005. V.41. P.303-307. [121] Rogozkin V.A., Astratenkova I.V. Muscular activity and polymorphism of genes. Medical and biologic technologies of increase of serviceability in conditions of intense physical working. The issue of articles. (in Russian) V.2. Ed. by A.I. Grigorev. «Fon». V.1. Moscow. 2004. P.57-64. [122] Niemi A.K.., Majamaa K. Mitochondrial DNA and ACTN3 genotypes in Finnish elite endurance and sprint athletes. Eur. J. Hum. Genet. 2005. V.13. P.965-969. [123] Lucia A., Gomez-Gallego F., Santiago C., Bandres F., Earnest C., Rabadan M., Alonso J.M., Hoyos J., Cordova A., Villa G., Foster C. ACTN3 Genotype in Professional Endurance Cyclists. // Int. J. Sports Med. 2006. V.27. P880-884. [124] Perusse L., Rankinen T., Rauramaa R., Rivera M.A., Wolfarth B., Bouchard C. The Human Gene Map for Performance and Health-Related Fitness Phenotypes: The 2002 Update. Med. Sci. Sports Exerc. 2003. V.35. P.1248-1264. [125] Williams A.G., Dhamrait S.S., Wootton P.T., Day S.H., Hawe E., Payne J.R., Myerson S.G., World M., Budgett R., Humphries S.E., Montgomery H.E. Bradykinin receptor gene variant and human physical performance. J. Appl. Physiol. 2004. V. 96. P.938942. [126] Wolfarth B, Bray MS, Hagberg JM, Perusse L, Rauramaa R, Rivera MA, Roth SM, Rankinen T, Bouchard C. The Human Gene Map for Performance and Health-Related Fitness Phenotypes: The 2004 Update. Med. Sci. Sports Exerc. 2005. V. 37. P. 881-903. [127] Said J. Biomarker discovery in urogenital cancer. Biomarkers. 2005. V.10 Suppl 1. P.S83-86. [128] Alberti C. Prostate cancer progression and surrounding microenvironment. Int. J. Biol. Markers. 2006. V.21. P.88-95. [129] Marks L.S., Roehrborn C.G., Andriole G.L. Prevention of benign prostatic hyperplasia disease. J. Urol. 2006. V.176. P.1299-1306. [130] Cancer of prostatic gland. (in Russian) Ed. by N.E. Kushlinsky, Yu.N. Solov‘eva, M.F. Trapeznikova. «RAMS Publisher». 2002. 432с. [131] Jemal A., Siegel R., Ward E., Murray T., Xu J., Smigal C., Thun M.J. Cancer statistics, 2007. // CA Cancer J. Clin. 2007. V.57. P.43-.66. [132] Malignant tumors in Russia in 2002. Diseases and death rate. (in Russian) Ed. by V.I. Chissov, V.V. Starinsky, G.V. Petrova. «MEDpress-inform». Moscow. 2004. С.5-22. [133] Stamey T.A., Caldwell M., McNeal J.E., Nolley R., Hemenez M., Downs J. The prostate specific antigen era in the United States is over for prostate cancer: what happened in the last 20 years? J. Urol. 2004. V.172. P.1297-1301. [134] Lange P.H., Ercole C.J., Lightner D.J., Fraley E.E., Vessella R. The value of serum prostate specific antigen determinations before and after radical prostatectomy. J. Urol. 1989. V.141. P.873-879.
Biomedical Aspects in Investigations of Biochemical Polymorphism…
279
[135] Hudson M.A., Bahnson R.R., Catalona W.J. Clinical use of prostate specific antigen in patients with prostate cancer. J. Urol. 1989. V.142. P.1011-1017. [136] Charrier J.P., Tournel C., Michel S. et al. Differential diagnosis of prostate cancer and benign prostate hyperplasia using two-dimensional electrophoresis. Electrophoresis. 2001. V.22. P.1861-1866. [137] Kovalev L.I., Shishkin S.S., Khasigov P.Z., Dzeranov N.K., Kazachenko A.V., Toropygin I.Iu., Mamykina S.V. Identification of AGR2 protein, a novel potential cancer marker, using proteomics technologies. Prikl. Biokhim. Mikrobiol. 2006. V.42. P.480-484. [138] Kovalev L.I., Shishkin S.S., Khasigov P.Z., Dzeranov N.K., Kazachenko A.V., Kovaleva M.A., Toropygin I.Iu., Eremina L.S., Grachev S.V. New approaches to molecular diagnosis of prostatic cancer. Urologiia. 2006. №5. P..16-19. [139] Kovalyov L.I., Shishkin S.S., Efimochkin A.S., Kovalyova M.A., Ershova E.S., Egorov T.A., Musalyamov A.K. The mayor protein expression profile and two-dimensional protein database of human heart. Electrophoresis. 1995. V.16. P.1160-1169. [140] .Comings D. Two-dimensional gel electrophoresis of human brain proteins. I. Technique and nomenclature of proteins. Clin. Chem. 1982. V.28. P.782-789. [141] Puliaeva E.V., Kovalev L.I., Tsvetkova M.N., Shishkin S.S., Boldyrev N.I. A twodimensional map of proteins from left ventricle of the human heart. Biokhimiia. 1990. V.55. P.489-498. [142] Govorun V.M., Moshkovskii S.A., Tikhonova O.V., Goufman E.I., Serebryakova M.V., Momynaliev K.T., Lokhov P.G., Khryapova E.V., Kudryavtseva L.V., Smirnova O.V., Toropyguine I.Y., Maksimov B.I., Archakov A.I. Comparative analysis of proteome maps of Helicobacter pylori clinical isolates. Biochemistry (Mosc). 2003. V.68. P.4249.. [143] Yang Z., Chang Y.J., Miyamoto H., Niu Y., Chen Z., Chen Y.L., Yao J.L., di Sant'Agnese P.A., Chang C. Transgelin functions as a suppressor via inhibition of ARA54-enhanced androgen receptor transactivation and prostate cancer cell growth. Mol. Endocrinol. 2007. V.21. P.343-358. [144] Ren, W.-Z., Ng,G.Y.K., Wang, R.-X., Wu, P.H., O'Dowd, B.F., Osmond, D.H., George, S.R., Liew, C.-C. The identification of NP25: a novel protein that is differentially expressed by neuronal subpopulations. Molec. Brain Res. 1994. V22. P.173-185. [145] Untergasser G., Gander R., Lilg C., Lepperdinger G., Plas E., Berger P. Profiling molecular targets of TGF-beta1 in prostate fibroblast-to-myofibroblast transdifferentiation. Mech. Ageing. Dev. 2005. V.126. P.59-69. [146] Nagase, T., Miyajima, N., Tanaka, A., Sazuka, T., Seki, N., Sato, S., Tabata, S., Ishikawa, K., Kawarabayasi, Y., Kotani, H., Nomura, N. Prediction of the coding sequences of unidentified human genes. III. The coding sequences of 40 new genes (KIAA0081-KIAA0120) deduced by analysis of cDNA clones from human cell line KG-1. DNA Res. 1995. V.2. P.37-43. [147] Kristiansen G., Pilarsky C., Wissmann C., Kaiser S., Bruemmendorf T., Roepcke S., Dahl E., Hinzmann B., Specht T., Pervan J., Stephan C., Loening S., Dietel M., Rosenthal A. Expression profiling of microdissected matched prostate cancer samples reveals CD166/MEMD and CD24 as new prognostic markers for patient survival. J. Pathol. 2005. V.205. P.359-376.
280
S. S. Shishkin, L. I. Kovalev, I. N. Krakhmaleva et al.
[148] Smirnov D.A., Zweitzig D.R., Foulk B.W., Miller M.C., Doyle G.V., Pienta K.J., Meropol N.J., Weiner L.M., Cohen S.J., Moreno J.G., Connelly M.C., Terstappen L.W., O'Hara S.M. Global Gene Expression Profiling of Circulating Tumor Cells. Cancer Res. 2005. V.65. P.4993-4997 [149] Rankinen T., Bray M.S., Hagberg J.M., Perusse L., Roth S.M., Wolfarth B., Bouchard C. The human gene map for performance and health-related fitness phenotypes: the 2005 update. Med. Sci. Sports Exerc. 2006. V.38. P.1863-1888. [150] Wolfarth B., Rankinen T., Mühlbauer S., Scherr J., Boulay M.R., Pérusse L., Rauramaa R., Bouchard C. Association between a beta2-adrenergic receptor polymorphism and elite endurance performance. Metabolism. 2007. V.56. P.1649-1651.
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev and G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 9
MOLECULAR MECHANISMS OF ADAPTATION: STRESS AND AGGRESSION A. G. Tonevitsky1, N. V. Maluchenko*2, J. V. Shchegolkova2, M. A. Kulikova2, M. A. Timofeeva2, O. V. Sysoeva3, V. A. Shleptsova1 and A. I. Grigoriev4 1
Russian Research Institute of Physical Culture and Sport, 105005, Russia, Moscow, Elizavetinsky lane, 10 2 Biological faculty of Moscow State University, 119991, Russia, Moscow, Leninskie gory, 1\12 3 Institute of Higher Nervous Activity and Neurophysiology, Russian academy if science, 117485 4 Institute of Medical-Biological Problems of Russian academy if science, State Scientific Center 123007, Russia, Moscow, Horoshevskoe av., 76-а
ABSTRACT It is rational to conduct research of molecular mechanisms of human adaptation on people whose activity is closely connected with stress conditions. This study is dedicated genetic bases of psycho-emotional stability of athletes. We determined influence of polymorphisms of key genes controlling serotoninergic, dopaminergic and reninangiotensin systems. We have shown that genetic variants of serotonin transporter, catechol-O-methyltransferase and angiotensin-converting enzyme are associated with different forms of aggression of athletes and can be considered markers of human adaptation to conditions of psycho-emotional stress. Our studies demonstrate perspective and validity of such approach – genetic search of adaptive reserves of human organism.
*
[email protected]
282
A. G. Tonevitsky, N. V. Maluchenko, J. V. Shchegolkova et al.
INTRODUCTION After decoding of human genome, when scale of variability became obvious, individual human health obtained important medical, hygienic and ecological significance [7]. When discussing approaches to understanding and treatment of different diseases, it is important to emphasize genetic uniqueness of each person, which determines physiological, metabolic and biochemical features, norms of reaction of homeostasis and persons‘ adaptation abilities. Genetic individuality mainly depends on polymorphisms— – allelic gene variants [1, 2]. Genetic polymorphism appears due to fixation of gene, chromosome and genomic mutations in population. Mechanism of maintaining the stability of polymorphisms depends on different value of two or more alleles and different functions of phenotypes, coexisting in population. The most wide-spread genetic polymorphism is SNP (single nucleotide polymorphism). Total number of polymorphic loci among all human genomes studied for today is more than 10 ten millions [26]. Each of us contains notable part of all genetic diversity of mankind, since individual genome contains not less than 25% of heterozygous polymorphic loci that is represented by both alleles [41]. People are unique not only in genome structure, but also in genome functioning. From medical point of view, genetic polymorphism determines person‘s type of reaction, predisposition to some hereditary and complex (polygenic, multi-factorial) diseases, sensitivity to pharmacotherapy, resistance to physical and psychical stresses. Fundamental significance of polymorphism is the mobilization reserve ensuring resistance to environmental stress and maintenance of existing gene pool of the planet. Other important role of polymorphism is evolutional. Genetic variability provides species evolution. Unequal functional importance of different alleles results to appearance of species more adapted to existing media conditions. Distribution of alleliс frequencesfrequencies in human populations reflects the history of development of these populations and influence of natural selection factors on this process. After analyzing more than 1.6 millions of SNP in human chromosomes, it was concluded that about 1,800 genes (about 7% of human genome) have been altered in process of human civilization development as a result of natural selection. Alterations have taken place in genes responding for protein metabolism, carbohydrate and fatty acids metabolism, resistance to different diseases and in genes responding for brain activity [19, 34, 24, 36]. Fixation of such alterations is, probably, connected with adaptation of populations to new dietary habits due to changes from hunting and picking-up living strategies to agriculture and crafts development. Polymorphisms are a molecular mechanism of adaptation to changing environment. Unlike evolutional geneticists, who can observe the results of alterations in human genetic material during tens and hundreds of thousands of years, acting physician or biologist has only limited period to understand how this or that genetic variation influences the adaptive reserves of the organism. One of approaches to solving of the problem is studying genetic peculiarities of peoples whose activity is closely related with working in stress conditions— – athletes of highest qualification and astronauts [6, 4, 12].
Molecular Mechanisms of Adaptation: Stress and Aggression
283
ADAPTATION TO PSYCHO-EMOTIONAL STRESS Technical-scientific progress, accumulation of large amount of information, which each person in modern society must apperceive, and social instability lead to vulnerability of psychic, spiritual, emotional health [25]. This leads to emergence of new pathologies: psychosomatic diseases, outbreaks of cruel crimes, chronic fatigue syndrome, depressions and all. By some estimation, up to 80% of population in single populations has various psychic deviations. Studies eliciting adaptive reserves by example of top athletes are valuable for eliciting genetic markers of adaptation in the area of not only physical, but also psychoemotional endurance. In this work, we studied genetic determinants of aggressive behavior. Aggression is apparently one of behavioral forms contributing to species survival, defencedefense of own territories and competition. Aggression in human society has been transformed into new forms, but still remained a behavior of response to the environmental challenges. Perhaps, one of conditions of success in activity in situations of high stresses is the maintenance of definite level of useful aggression.
PROOF OF HERITABILITY OF AGGRESSION Predisposition to aggressive behavior is hereditary, and genes play important role in forming aggressive behavior [27, 16]. First, systematic research of genetic bases of psychic features was conducted in 1913, by A. Yerkes, who described inheritance of bitchiness, shyness and ferocity complex in rats. Hereafter, genetic determination of animal aggressive behavior was affirmed by comparison of inbreeding lines of mice and rats. Studying the character of aggression inheritance in descendants of first generation from breeding of lines contrast in given character showed the additive character of some parameters of aggression inheritance and presence of maternal effect. Aggressive behavior is under polygenic control— – at least 17 genes are known which influence emergence of aggression,; genetic effects can explain up to 50% of differences in temperamental character between people [39]. Results of twin and family studies show that individual differences of aggression in population are considerably conditioned by genetic features of an individual. In account of diversity in human manifestation of aggression, it is useful to limit such behavior by conceptual frames, proposed by Buss and acknowledged all over the world [29]. Diagnostics of aggression in twins by Buss-Durkee Hostility Inventory elicited 40% of individual differences on the scale ―physical aggression‖ (this is explained by additive effects of separate alleles), variability of parameters ―indirect aggression,‖, ―irritancy‖ and ―verbal aggression‖ – in 40%, 37% and 28%, respectively [10]. Not predisposition for aggression itself is heritable, but rather the characteristic trait (for example, impetuosity or aspiration for leading), raising an opportunity of aggression expression. Important role of biological factor in formation of aggressive behavior co-exists with environmental factor influence [3]. Heredity can determine personal edge, defining activation of specific neurohumoral reactions related to aggressive behavior. Environment also determines the limits of person‘s expression of aggression. For better understanding of genetic nature of aggression, the detailed study of physiological, neurohumoral and biochemical bases of this temperamental feature is required; this will promote the identification of candidate genes of aggression.
284
A. G. Tonevitsky, N. V. Maluchenko, J. V. Shchegolkova et al.
Main parts of brain playing important role in formation of aggressive behavior are: 1) limbic system, consisting of various structures, whose function is to control main affections and emotions; 2) cortex, responsible for a complex of cognitive functions, which are important in processes of learning, framing of estimation and decision -making. Limbic system controls general adaptation of a person to environment and emotional behavior [21]. All signals coming from analysersanalyzers to corresponding centers in cortex pass through limbic structures. Subcortical structures of limbic system are amygdale, septal cores and frontal thalamic core. There is convincing confirmation of relation between functioning of limbic system and aggressive behavior [28, 22]. Atrophy or damage of amygdale in human is associated with impulsive and aggressive behavior. Stimulation of amygdale in animals increases respondent aggressive reaction. After amygdale destruction, animals forfeited the ability for social in-group behavior. There are other facts of relation between limbic system and aggression. They were obtained while studying neurotransmitters circulating in limbic system and influencing the formation of aggressive behavior. These substances are of special interest, as far as they are responsive for information exchange between cortex and various limbic structures. The study conducted by American scientists on the group of people with Intermittent Explosive Disorder [17] is of special interest for understanding the physiological bases of aggression. 16 Sixteen millions of Americans suffer from this disease. The investigation was held by functional magnetic resonance imaging (fMRI). The group of patients demonstrated higher activity of left amygdale in response to negative stimuli compared with control group (p < 0.013). There are data relating human aggression to cortex. Investigation of people addicted to violence (more than 400 people) by computer tomography and nuclear-magnetic resonance revealed high frequency of prepotency of lesion in frontal-back fragment of temporal lobe. Organic alterations in temporal brain lobes and changes in bioelectric activity (slow and fast waves) of analogous localization correlate with aggression in expression levels [38]. People with lesions in neocortex frontal lobe demonstrate more impulsive and aggressive reaction at the provocation and exhibit irritancy and distemper more pronouncedly. Signs of dysfunction in left hemisphere among people with organic brain diseases and people addicted to aggression indicate the relation between this dysfunction and aggressive behavior.
NEUROHUMORAL BASES OF AGGRESSION: SEROTONINERGIC SYSTEM Aggression is traditionally connected with functioning of serotoninergic system [14, 30]. Aggressive psychic patients have low level of serotonin and its main metabolite 5-hydroxyindolyl-acetic acid (5HIAA) in cerebrospinal fluid. Inhibitors of serotonin recapture lead to increase in serotonin concentration and reduction of aggression. Serotoninergic system is a complex system, represented by a family (about 15 types) of serotonin presynaptic and postsynaptic receptors, enzymes involved in into serotonin synthesis and degradation and serotonin transporter. These proteins determine possible candidate genes of aggression [11]. Enzyme tryptophanhydroxylase (TPH) is responsive for serotonin synthesis: it catalyzes the reaction limiting the given neurotransmitter formation rate, namely, 5-hydroxylation of
Molecular Mechanisms of Adaptation: Stress and Aggression
285
tryptophan— – an exact precursor of serotonin. TPH gene is located in 11 chromosome (11p15.3-p14) and contains 11 exons. Mutation of adenine to cytosine in position 779 (in intone 7 of the gene) is associated with aggression: genetic carriers of alleles A779 are more aggressive and hostile, than that of alleles C779, whereas the last are more nervous. Gene of enzyme monoaminoxidase A (MAOA), located in X-chromosome, is a key gene whose variations are attributed to causation of behavioral characteristics. MAOA degrades serotonin and dophamine, regulating their levels in the organism. MAOA inhibitors are effective in treatment of hyper-excitability, while MAOA deficiency leads to hyperimpulsivity and destabilization in metabolism of transporters. It is shown that MAOA knockout mice are more aggressive than mice with active gene. Polymorphism of variable quantity of tandem repeats (VNTR) from 30 nucleotide pairs in promoter region is also connected with aggression. There are alleles with 2, 3, 3.5, 4, 5 and very seldom 6 repeats. Frequency of allele with 3, 4 and 5 repeats is ~34%, ~65% and 1%, respectively. This polymorphism influences the enzyme expression level. Gene with five5 repeats is transcribed with 5 five times less efficacy than that with 4 four repeats. People with reduced enzymatic activity are often impulsive and aggressive, especially if they have experienced bad treatment in childhood. Receptor apparatus condition plays more important role in formation of aggression than serotonin level itself. One of the most expressed serotonin receptors in mammal brain is receptor 1А (5HT1A), which is mainly localized in hippocampus, neocortex and dorsal raphe core. In neurons of dorsal raphe core, it is localized presynaptically and acts as auto-receptor, which is sensitive to extra-cellular serotonin concentration and blocks the release of this neurotransmitter from neuron. In non-serotoninergic neurons, 5HT1A is localized postsynaptically. Activation of this receptor leads to adenylate cyclase inhibition, calcium channels closure and potassium channels opening, which provides membrane hyperpolarization. Thus, receptor 1A regulates the concentration of serotonin in synaptic gap and, consequently, activation level of other receptors on post-synaptic membrane. Change in the activity of these receptors leads to behavioral deviation towards hyper-excitability,; however, it can also cause depression. One of functionally important polymorphisms of gene 5НТ1А C1019G is localized in promoter region and affects level of gene expression, since it is located directly in the binding site of inhibiting transcription factor NUDR. It is shown that inhibition of transcription of G-allele is much less than C-allele, G-alleles homozygotes contain much more of this receptor in the organism, than C-allele carriers. G-allele carriers are characterized by higher emotional instability than C-allele carriers. Susceptibility to general depression, which is also connected with G-allele, is an extremity of these exhibitions. Serotonin receptor 2А (5HT2A) is widely spread in peripheral tissues of the organism. In the brain, it is expressed in regions responsive for cognitive functions, such as pre-frontal cortex, especially in pyramidal neurons and inter-neurons. In synapses, it is localized only on post-synaptic membrane. Signals from activated receptor are directed to limbic system, hippocampus and other brain segments responsive for regulation of behavior, mood and fatigue. Experiments on animals have shown that individuals with higher level of aggression have higher density of these receptors. Promoter polymorphism Т102С is also connected with aggression. Allele T is linked with higher gene expression compared with allele C. It is shown that T-allele homozygotes are more aggressive.
286
A. G. Tonevitsky, N. V. Maluchenko, J. V. Shchegolkova et al.
Promoter polymorphism of serotonin transporter (5HTT) is another genetic marker of aggressive behavior, depression and other mental disorders (figure 1) [18]. This membrane protein responds for active serotonin transport in brain neurons. It deletes serotonin from synaptic gap and determines level and duration of the signal on post-synaptic membrane. After release of serotonin to the synapse, transporter removes it to pre-synaptic neuron, where it returns to neurotransmitter pool. One of important variants influencing protein activity is promoter polymorphism 5НТТ, which can be represented in two forms: as L-allele of 16 elements and S-allele, consisting of 14 repeating elements, each of 20-23 base pairs. Polymorphism leads to differences in m-RNA concentrations and, consequently, in membrane protein density. S-allele corresponds to less transcription level, so S-allele carriers have less protein on the membrane that L-carriers.
Figure 1. The structure of gene 5-HTT.
It was shown that polymorphism of serotonin transporter gene influences different types of aggression: indirect, irritation and negativism, measured by Buss-Durkee Hostility Inventory (figure 2). Homozygotes LL have higher irritancy and negativism, while indirect aggression is much less than SS-alleles carriers (p=0.02). It was proposed that S-allele carriers can control emotional sphere. They do not demonstrate aggression at the moment they feel it (irritation, negativism) but accumulate it and express later indirectly. These scales were united into one complex scale, reflecting immediateness of aggression expression on one pole and control of aggression on contrary pole. This complex index included positive standardized marks from scales of negativism and irritancy and negative standardized marks from indirect aggression scale. It was shown that there was a remarkable accordance between indirect scale and serotonin transporter gene polymorphism (p=0.003). Further multiple coupled comparison (Рost-hoc Neumann-Keul test) showed that SS homozygotes have much less level of aggression compared with LL and LS groups (p<0.01) (figure 3). L-allele carriers have higher aggression expression— – this can be of an adaptive importance in sports lifestyle. Existing inner aggression can be a stimulating factor when choosing sports. Outlet and compensation of aggression takes place exactly during sports competitions. Investigation among athletes of other specializations shows a tendency of changing the distribution of genotypes (table 1). In whole, distribution of genotypes 5HТT shows increased frequency of LL allele among athletes compared with control group. Perhaps, carrying of LL allele can be an advantageous characteristic for sports. Earlier, it was shown on the large group of volunteers (N>1000) that promoter polymorphism 5НТТ modulates person‘s reaction on stress conditions. Those who inherited two alleles (LL) of serotonin transporter had constantly low predisposition for depression despite former and present traumatic events. Influence of traumatic events on S-allele carriers (SS and SL) is much higher than on LL homozygotes.
Table 1. Genotypes distribution of 5HTT in different population
L L L S S S
Germany ♂N= 467
Poland ♂ N= 52 ♀ N = 75
Finland ♀ N=119
Germany ♂ N= 47 ♀ N = 87
Russia Control group, Our data ♂ N= 45 ♀ N = 167
Sportswoman Our data ♀ N=86
Sportsmen Hockey ♂ N=35
32
33
26
29
30
49
49
Sportsme n Football ♂ N=26 65
49
53
51
51
50
35
42
27
19
14
23
20
20
16
9
8
288
A. G. Tonevitsky, N. V. Maluchenko, J. V. Shchegolkova et al.
Figure 2. The influence of 5-HTTLPR polymorphism on different types of aggression: indirect aggression, irritancy and negativism.
Figure 3. The association between 5-HTT genotypes and complex scale of aggression.
Further analysis has shown that traumatic events can lead to depression, suicidal ideas and even suicidal attempts of S-allele carriers much more than LL-carriers. Thus, polymorphism 5НТТ can be a genetic marker of predisposition to depression in response to traumatic events in life, a kind of predictor of human psychological adaptation to conditions of high psychological stress. Studies by fMRI method gave the solution of this phenomenon [32]. S allele carriers had five5 times more active right amygdale than LL-homozygotes. Hyperactivity of amygdale in response to provoking emotional stimulus in S-allele carriers was proved by further experiments on large groups of people (N=300). This is accompanied with changes in other brain segments. Considerable activation in islands, integument and caudate core of the brain happened in response to negative stimuli of S-carriers. Morphometric study of volume based
Molecular Mechanisms of Adaptation: Stress and Aggression
289
on image analysis showed that S-carriers have had appreciably larger volume of left hemisphere of cerebellum, while L-carriers have had much large volume of left dorsal and middle frontal convolutions, left frontal zone convolution and right bottom frontal convolution and amygdale itself. 5НТТ polymorphism influenced brain-carried regulation of amygdale activity. When activated, amygdale sends signals to zonal convolution, which responds by backward regulatory signals inhibiting amygdale when activation is excessive. Such feedback system ―amygdale-cortex‖ is weaker in S-allele carriers. Activation of amygdale itself is more prolonged in S-allele carriers than LL-carriers.
NEUROHUMORAL BASES OF AGGRESSION: DOPHAMINERGIC SYSTEM As mentioned above, serotoninergic system is traditionally considered main regulator of animal aggressive behavior. Role of catecholamines in regulation of aggressive behavior was concealed, so it was disclosed only recently. It was discovered that hypo-function of dophaminergic system is associated with increase in impulsive aggressive behavior [8]. Slowdown in system of dophamine and noradrenaline utilization conduces to aggression expression. Psychostimulators increasing dophaminergic transmission to central nervous system such as amphetamine and cocaine induce psychoses with impulsive aggression. Pharmaceuticals decreasing noradrenergic activity (such as propranolol) reduce expression of aggression in humans. Obviously, aggression is determined by balance between serotonin, which increases the point of expression of aggressive response to environmental stimuli, and catecholamines, which act conversely. One of candidate genes from dophaminergic system is the gene, coding enzyme catecholO-methyltransferase (COMT) (figure 4). This enzyme degrades dophamine and noradrenaline in prefrontal cortex [9]. COMT gene is localized in 22q11 chromosome. Codon 158 of exon 4 in COMT gene contains point mutation resulting in ~ 40% enzymatic activity decrease. COMT polymorphism is also connected with such temperamental characteristics as extraversion, anxiety, damage avoidance, novelty search and aggression. Studies of correlation between COMT gene and aggressive behavior in human are controversial. Some works demonstrate association of mutant allele (M) with increased aggression, with credible associations mainly for men. Other work, contrariy-wise, showed that wild allele (V) carriers are more aggressive. Studies on animals are also controversial. Quantity of brushes between rats notably decreased after artificial increase of dophamine level in brain (by injection of 6hydroxy-dophamine). Other study on COMT-knockout mice showed increased hostile behavior. Thus, there was no univocal understanding of COMT gene polymorphism influence on aggression.
290
A. G. Tonevitsky, N. V. Maluchenko, J. V. Shchegolkova et al.
Figure 4. The structure of gene COMT.
Figure 5. The influence of Val158Met COMT polymorphism on physical aggression.
Two independent studies on groups of men and women were carried out [15]. Physical aggression of sportswomen decreases practically linear as VV (wild type) > MV ≥ MM (mutant type) (p=0.008) (figure5). COMT polymorphism doesn‘t influence on aggression in men group (data not shown). Thus, women-homozygote carriers of mutant allele (MM) are the least aggressive, while wild type homozygotes (VV) are the most aggressive. V-allele carriers have highly active enzyme;, this results in decrement of extra-cellular dophamine and consequently to weaker activation of prefrontal cortical neurons. It was showed that VVcarriers are less anxious and sensitive to pain, what which can be associated with increased physical aggression. In women, MM-genotype is associated with high level by ―damage avoidance‖ scale, measured by Cloninger Inventory. Such people are characterized by fear of danger and abeyance, anxiety and shyness. A complex of these characteristics can be considered contrary to physical aggression. Evolutional history oh V158M mutation is a good
Molecular Mechanisms of Adaptation: Stress and Aggression
291
confirmation of our results. While studying V158M polymorphism in apes, mutant allele has not been discovered. This shows that the mutation is relatively new and, perhaps, first appeared in humans. Moreover, many data affirm that mutant allele homozygotes have better memory and cognitive abilities than wild type homozygotes. Therefore, we can suppose that the mutation is an evolutionary factor of cognitive function development in human prefrontal cortex. During evolution of human beings, natural selection evolved mainly towards cogitative faculties‘ development and brain structure sophistication, but not towards development of physical strength and aggression. Thus, fixation of mutation promoting cognitive function development and physical aggression decrease seems justified as evolutionally outdated defensive characteristic.
INVOLVEMENT OF RENIN-ANGIOTENSIN SYSTEM IN FORMATION OF HUMAN EMOTIONAL CONDITION Renin-angiotensin system (RAS) on periphery plays one of the most important roles in regulation of vascular tonus and water-saline exchange [41]. Recently, it was discovered that RAS components are localized in basal ganglia, cortex, hypothalamus, frontal brain, black substance, brain stem, cerebellum, epiphysis, and influence neurotransmitters exchange in brain (figure 6). Thus, RAS can also participate in regulation of human behavior and emotions. RAS in brain and on periphery are two separate systems; RAS components do not come through blood-brain barrier. One of main RAS components is angiotensin II (ATII). ATII takes part in regulation of different links in central nervous system: it influences activity of neurotransmitters (serotonin, dophamine, noradrenaline, adrenaline) and hypothalamopituitary-adrenal system (vasopressin secretion, stimulation of thirst center and baroreflex). Some data prove anti-depressive and anxiolytic action of ATII, which evidences in decrease in anxiety, fear and emotional tension. It is a known fact that RAS components take part in regulation of memory activity, learning and stress development. ATII concentration depends on activity of ACE (Angiotensin-Converting Enzyme), converting ATI to ATII. ACE activity is associated with insertion-deletion (I/D) polymorphism of ACE gene. This gene polymorphism is characterized by presence or absence of 287 base pairs in 16 intron in short shoulder of 17 chromosome. Allele I is associated with low enzyme concentration, while allele D is associated with high activity. Angiotensin-converting enzyme polymorphism is of special interest, as it belongs to intronal category. Molecular mechanism of intronal polymorphisms‘ influence is of special interest, because it significantly changes protein activity. By now, there is no agreement on connection of intronal polymorphisms with gene expression level. Some researchers consider that there is a relation between insertion and polymorphisms in regulatory parts of ACE gene [37]. Others consider that insertion itself can influence process of intron splicing, so long as its sequence has the opposite direction and is partly complementary to neighbour sequence Alu, forming a palindrome with it. It is known that palindromes of sufficient length can form secondary structures blocking lariat formation on second step of splicing [35]. One hundred and eighty-nine189 athletes of different sports (boat racing, skis, hockey, football, volleyball, synchronous swimming) participated in ACE polymorphism research, 59 women and 130 men among them. All athletes had a sports rank from First to Master of sports of International class. Control groups were represented by people not engaged in
292
A. G. Tonevitsky, N. V. Maluchenko, J. V. Shchegolkova et al.
professional sports (N=212), 167 women and 45 men among them. Study of genetic I/D polymorphism of ACE gene was carried out by PCR, as published before [13]. As a result, we identified an association of I/D polymorphism with physical aggression in groups of athletes (in group up to 15 years and group over 15 years) (figure 7). We elicited that athletes having II genotype have lower level of physical aggression than those who have ID or DD. D-allele carriers have approximately similar levels of physical aggression in two groups (ID and DD). Influence of ACE polymorphism on aggression can be possibly explained by interaction of RAS and neurotransmitter systems (serotonin, dophamine system, etc.) in brain. It is known that ATII decreases serotonin concentration in subfornical organ. This influence is ensured by direct action of ATII on serotonin in this region, though presence of ATII receptors on serotonin endings is not affirmed.
Figure 6. Renin-angiotensin system.
Figure 7. The influence of ACE polymorphism on physical aggression.
Molecular Mechanisms of Adaptation: Stress and Aggression
293
Serotonin endings can also be indirectly influenced in dorsal raphe core. Hence, we can suppose that II genotype carriers, who have low ACE activity and thus low ATII quantity, have higher serotonin concentration than D-allele carriers, what which can result in decrease of physical aggression. It was designed a special test-system based on RT-PCR for wideranging genotyping of athletes from Olympic teams of various specialitiesspecialties. We analyzed ACE frequency in 900 athletes. Variant II frequency and relation D/II is shown on figure 8A,B. Only representatives of combat sports, cyclic and complexly coordinated sports are included (N=517). Our data show that representatives of combat sports and alpinists have the lowest frequency of II and the highest D/II coefficient. In view of genetic associations of ACE with aggression and stamina, such distribution can be explained by selection of the most aggressive athletes (D-carriers) in groups of combat sports and alpinists.
Figure 8. A) The association between frequency of genotype II of ACE and different types of sports; B) The association between ratio of frequency D/II of ACE and different types of sports.
294
A. G. Tonevitsky, N. V. Maluchenko, J. V. Shchegolkova et al.
CONCLUSION Recognition of genetic uniqueness of each person catalyzed development of the conception of individual health. Genetic research is essential in studying adaptive potential of people who are exposed to constant stress and in medicinal correction of depressions. In this work, we describe studies on genetic bases of psycho-emotional stability of athletes on example of different forms of aggression. We studied influence of polymorphisms in key genes controlling work of serotoninergic, dophaminergic and renin-angiotensin systems. Distribution of 5HТT genotypes demonstrates increased frequency of LL allele in athletes compared with control group. We show that carriers of LL serotonin transporter variant have increased direct aggressiveness expression, what which can be a defencedefense reaction towards traumatic events and advantageous characteristic for going in for sports. 5НТТ polymorphism can be considered a genetic marker of human adaptation to high psychoemotional stress. We showed that physical aggression is better expressed in VV homozygotes of catehol-O-methyl-transferase gene and then linearly decreased in heterozygotes and mutant homozygotes in sportswomen. We disclosed ACE (a key component of renin-angiotensin system) polymorphism influence on aggression expression in athletes. We showed that II genotype carriers have lower level of physical aggression than genotypes ID and DD carriers. Polymorphism influence is possibly connected with interaction of brain RAS with main neurotransmitter systems. Wide-range genotyping of a group of athletes from Olympic teams of various specialitiesspecialties demonstrated that DD genotype frequency is the highest in combat sports. Consequently, ACE polymorphism can be considered not only a genetic marker of physical stamina and working efficiency, but also psychological characteristics of a person. Our studies demonstrate that genetic search of adaptive reserves of human organism has promising applications.
ACKNOWLEDGMENTS This study was supported by grant from the Federal Agency of Science and Innovation of the Russian Federation (Grant 02.522.12.2001) and program ―Fundamental sciences to medicine‖ from Russian academy of sciences. We thank ―DNA-technology‖ corporation (Moscow, Russia) and personally Trofimov D.Ju and Rebrikov D.V. for designing of RTPCR test-systems. The authors are grateful to professor Ivanitskii A.M. (Institute of Higher Nervous Activity and Neurophysiology, Russian academy if science) for advisory support.
REFERENCES [1] [2]
Altuhov Ju.P. //Genetic processes in populations// manual for students ed.L.A. Ghivotovskii. - 3-е editions., -М.: Akademkniga, 2003. 431 p, Russian. Borinskaja S.A., Jankovskii N.K. // People and genes. Threads of fate//М: Vek 2,: Nauka segodnja , 2006. 64 с, Russian.
Molecular Mechanisms of Adaptation: Stress and Aggression [3] [4] [5]
[6]
[7]
[8]
[9]
[10] [11]
[12]
[13]
[14]
[15]
[16]
295
Beron R., Richardson D. //Aggression, 2-е edition, published by "Piter", 2001. 352 p, Russian. Vediakov AM, Tonevitskiĭ AG. //Analysis of a series of significant genetic polymorphisms in athletes// Fiziol. Cheloveka. 2006, V.32(2), PP. 92-97. Russian. Vediakov AM., Tonevitskiĭ AG, Sysoeva O.V., Portnova G.V., Ivanitskii A.// Analysis of genetic polymorphism of 5-HTTLPR in athletes //Medical-biological technologies of work efficiency in condition of intensive physical performances. 2006. V.2 PP. 97-108, Russian. Kvetnansky R, Koska J, Ksinantova L, Noskov VB, Blazicek P, Marko M, Macho L, Grigoriev AI, Vigas M. // Responses of sympathoadrenal and renin angiotensin systems to stress stimuli in humans during real and simulated microgravity.//J. Gravit. Physiol. 2002 Jul;9(1):P79-80. Grigoriev AI, Kozlovskaya IB, Potapov AN. // Goals of biomedical support of a mission to Mars and possible approaches to achieving them.//Aviat. Space Environ. Med. 2002 Apr;73(4):379-84. Review. Kulikova MA, Maliuchenko NV, Timofeeva MA, Shleptsova VA, Tschegol'kova IuA, Vediakov AM, Tonevitskiĭ AG. //Polymorphisms of the main genes of neurotransmitter systems: I. the dopaminergic system// Fiziol. Cheloveka. 2007, V. 33, PP. 105-112. Review. Russian. Kulikova MA, Maliuchenko NV, Timofeeva MA, Shleptsova VA, Tschegol'kova IuA, Vediakov AM, Tonevitskiĭ AG. // Influence of polymorphism Val158Met COMT on physical aggression // Biull. Eksp. Biol. Med. 2008, V. 145, PP. 68-70. Ravich-Shcherbo IV. Psychogenetics //Manual for students //ED. Ravich-Shcherbo IV М.: Aspect-Press, 2000. 447 p. Timofeeva MA, Maliuchenko NV, Kulikova MA, Shleptsova VA, Tschegol'kova IuA, Vediakov AM, Tonevitskiĭ AG. //Polymorphisms of the main genes of neurotransmitter systems: II. the serotoninergic system// Fiziol. Cheloveka. 2008, V. 34, PP. 35-47. Review. Russian. Tonevitsky AG., //Importance of genetic polymorphisms of neuromediator systems for sport psychology // Molecular human polymorphism structural and functional diversity of biomolecules/ Ed. S.D. Varfolomeev – М.: RUDN, 2007. – PP. 600-626, Russian. Shleptsova VA, Tonevitskiĭ AG. // Influence of polymorhisms in renin-angiotensin systems on emotional status of athletes // Biull. Eksp. Biol. Med. 2008, in press, Russian. Adayev T., Ranasinghe B., Banerjee P.//Transmembrane signaling in the brain by serotonin, a key regulator of physiology and emotion// Biosci. Rep., 2005, v.25, pp.363385. Chen J., Lipska B.K., Halim N., Ma Q.D., Matsumoto M., Melhem S., Kolachana B.S., Hyde T.M., Herman M.M., Apud J., Egan M.F., Kleinman J.E., Weinberger D.R.//Functional analysis of genetic variation in catechol-O-methyltransferase (COMT): effects on mRNA, protein, and Enzyme Activity in Postmorten Human Brain//Am. J. Hum. Genet. 2004 v.75, pp. 807-821. Coccaro E.F. et al.// Heritability of aggression and irritability: A twin study of the BussDurkee aggression scales in adult male subjects.//Biol. Psychiatry. 1997, v. 41, 3, pp. 273–284.
296
A. G. Tonevitsky, N. V. Maluchenko, J. V. Shchegolkova et al.
[17] Coccaro E.F., Michael S. McCloskey, Daniel A. Fitzgerald, and K. Luan Phan //Amygdala and Orbitofrontal Reactivity to Social Threat in Individuals with Impulsive Aggression//Biological Psychiatry. 2007, v 62, pp.168- 78. [18] Hariri A.R., Holmes A.//Genetics of emotional regulation: the role of the serotonin transporter in neural function// Trends Cogn. Sci., 2006, v.10, pp.182-91. [19] Hawks J, Wang ET, Cochran GM, Harpending HC, Moyzis RK. // Recent acceleration of human adaptive evolution. //Proc. Natl. Acad. Sci. U.S.A. 2007, v. 104, pp. 120753– 20758. [20] Healy D, Herxheimer A, Menkes DB // Antidepressants and Violence: Problems at the Interface of Medicine and Law// PLoS Med. 2006, v. 3, pp. e372. [21] Herman J.P., Ostrander M.M., Mueller N.K., Figueiredo H// Limbic sys-tem mechanisms of stress regulation:hypothalamo-pituitary-adrenocortical axis//Prog. Neuropsychopharmacol. Biol. Psychiatry., 2005, v.29, pp.1201-1213. [22] Izquierdo A., Suda R.K., Murray E.A. //Comparison of the effects of bilateral orbitofrontal cortex lesions and amygdala lesions on emotional responses in rhesus monkeys. //J. Neurosci., 2005 v. 25 pp. 8534–8542. [23] Jan van der Weide and John WJ Hinrichs //The Influence of Cytochrome P450 Pharmacogenetics on Disposition of Common Antidepressant and Antipsychotic Medications// Clin. Biochem. Rev. 2006, v 27, pp. 17-25. [24] Kehrer-Sawatzki H, Cooper DN. Understanding the recent evolution of the human genome: insights from human-chimpanzee genome comparisons.//Hum. Mutat. 2007, v.28, pp. 99-130. [25] Kelly CM, Jorm AF, Wright A.// Improving mental health literacy as a strategy to facilitate early intervention for mental disorders.//Med. J. Aust. 2007 v. 187, рр. 26-30. [26] Kim S, Misra A. //SNP genotyping: technologies and biomedical applications.//Annu. Rev. Biomed. Eng, 2007, v. 9, pp. 289-320. [27] Miczek K.A., Maxson S.C., Fish E.W., Faccidomo S. //Aggressive behavioral phenotypes in mice.// Behav. Brain. Res. 2001, v. 125, pp. 167-181. [28] Morgane P.J., Mokler D.J.//The limbic brain: continuing resolution. //Neurosci. Biobehav. Rev., 2006, v.30, pp.119-125. [29] Morrison S.D., Chaffin S., Chase T.V. //Aggression in adolescents: Use of the BussDurkee Inventory//South. Med. J. 1975, v. 68, pp 431-436. [30] Murphy D.L., Lerner A., Rudnick G., Lesch K.//Serotonin Transporter: Gene, Genetic Disorders, and Pharmacogenetics// Molecular Interventions., 2004, v.4, pp.109-123. [31] Paoloni-Giacobino A, Mouthon D, Lambercy C, Vessaz M, Coutant-Zimmerli S, Rudolph W, Malafosse A, Buresi C. // Identification and analysis of new sequence variants in the human tryptophan hydroxylase (TpH) gene.//Mol. Psychiatry. 2000 v.5., pp. 49-55. [32] Pezawas L, Meyer-Lindenberg A, Drabant EM, Verchinski BA, Munoz KE, Hariri AR, Weinberger DR. 5-HTTLPR polymorphism impacts human cingulate-amygdala interactions: a genetic susceptibility mechanism for depression. //Nat. Neurosci. 2005. v.8, pp. 828-834. [33] Pinna G, Costa E, Guidotti A.//Fluoxetine and norfluoxetine stereospecifically and selectively increase brain neurosteroid content at doses that are inactive on 5-HT reuptake// Psychopharmacology. (Berl). 2006. v. 186, pp. 362-372.
Molecular Mechanisms of Adaptation: Stress and Aggression
297
[34] Portin P.//Evolution of man in the light of molecular genetics: a review. Part I. Our evolutionary history and genomics.// Hereditas. 2007, v. 144, pp. 80-95. [35] Smith, C. W. J., E. B. Porro, J. G. Patton, and B. N. Ginard//Scanning from an independently specified branch point defines the 3' splice site of mammalian intron// Nature, 1989, v. 342, pp. 243–247. [36] Tang BL.//Molecular genetic determinants of human brain size.//Biochem. Biophys. Res. Commun. 2006 v. 345, pp. 911-916. [37] Tomita H, Ina Y, Sugiura Y, Sato S, Kawaguchi H, Morishita M, Yamamoto M, Ueda R//Polymorphism in the angiotensin-converting enzyme gene and sarcoidosis// Am. J. Respir. Crit. Care Med. 1997, v. 156, pp 255-259. [38] Tonkonogy J, Geller J.// A neuropsychiatry service in a state hospital. Adolf Meyer's approach revisited// Psychiatr. Q. 2007 v. 78, pp 219-35. [39] Vernon P.A. et al.// Individual differences in multiple dimensions of ag-gression: A univariate and multivariate genetic analysis. //Twin Res., 1999, v. 2, pp. 16–21. [40] Venter J.C. et al. //The sequence of human genome.// Science, 2001, v. 29, pp. 13041351. [41] von Bohlen O. and Albrecht H.D// The CNS renin-angiotensin system// Cell Tissue Res. 2006, v. 326, pp. 599-616.
In: Molecular Polymorphism of Man Editors: S. D. Varfolomyev and G. E. Zaikov
ISBN: 978-1-60741-843-6 © 2011 Nova Science Publishers, Inc.
Chapter 10
ETHNOGENOMICS: THE GENETIC HISTORY OF HUMANS WRITTEN IN CHROMOSOMAL DNA MARKERS L.A. Zhivotovsky1* and E.K. Khusnutdinova2* 1
Vavilov Institute of General Genetics, Russian Academy of Sciences 2 Institute of Biochemistry and Genetics, Ufa Research Centre, Russian Academy of Sciences
ABSTRACT The majority of human autosomal DNA variation is revealed with three kinds of polymorphic markers: single nucleotide polymorphisms (SNPs), microsatellite loci (STR) and Alu repeats. In population studies, the latter two groups are still widely used, and in this paper, we present some information on these DNA markers to demonstrate their ability to differentiate ethno-geographic human groups on a worldwide scale.
INTRODUCTION The branch of human population genetics that investigates the genetic history of ethnic groups of humans is sometimes called ethno-genomics. Analysis of polymorphisms at nonrecombining mitochondrial DNA and the Y chromosome deals with region-specific haplogroups and helped to establish a theory of African origin of the modern humans; it and is very useful in investigating the haplogroup composition of ethnics in the historic and demographic context. However, mitochondrial DNA and the Y chromosome constitute only a small part of the genome and reflect the genetic history of humans within the maternal and the paternal lineages only. A large portion of genetic diversity in human populations is due to * *
[email protected] [email protected]
300
L. A. Zhivotovsky and E. K. Khusnutdinova
variation at the autosomes and the X chromosome, where the most numerous polymorphisms are nucleotide substitutions and insertions/deletions. Point mutations caused by nucleotide substitutions result in polymorphisms widely occurring in all organisms;, they are referred to as single nucleotide polymorphisms (SNPs). However, building detailed population databases on SNPs has been just started. In view of this, in the present review, we consider only variation generated by the insertion/deletion process. Among these, microsatellites and Alu repeats are most widely used in population studies. Variation at these kinds of markers is caused by quite different mutation mechanisms, but they can be employed to discriminate populations and establish relationships between them. In this paper, we present general information on these DNA markers and demonstrate their ability to differentiate human populations.
MICROSATELLITES Microsatellite is a DNA fragment with a number of tandemly repeated identical motifs, usually called repeats, which are short sequences of several (typically, one to six) base pairs [1]. Microsatellites are highly polymorphic, with tens of alleles at each locus and high mutation rates [2-5]. Alleles of a microsatellite locus differ from one another in size, mainly in the number of repeats. Because of small size of microsatellite loci, they can be amplified in polymerase chain reaction with highly reproducible results [6-7]. Typing of microsatellites requires a small amount of DNA, which can be isolated from even severely degraded biological material. Microsatellites abundantly occur in all eukaryotic organisms [8] and are used for studying both natural populations, agricultural animals and plans, and humans as well. In humans, microsatellites are dispersed throughout the genome. This permits using them, like single nucleotide polymorphisms (SNPs), for analysis of associations, linkage, and mapping of genes [9-11], and as markers of hereditary diseases [12]. Owing to significant polymorphism of microsatellites, they can be employed for personal identification and determination of biological relatedness [13]. High mutation rates at these loci lead to accumulation of population-specific mutations, which promotes detailed analysis of the population structure [14-17].
Microsatellite Alleles and Their Identification Microsatellites are virtually scattered over the human genome: their number reaches tens of thousands, and they occur in all chromosome segments [18]. To date, about ten thousand microsatellite loci have been examined in humans [19-21]. Depending on the repeat length, microsatellites are classified into loci with mono-, di-, tri-, tetra-, penta- and hexanucleotide repeats. For instance, the c-protooncogen of the feline sarcoma virus fes/fps, locates in human chromosome 15 and contains ATTT repeats in one of the gene introns. They determine this microsatellite locus, which is designated as FES/FPS (figure 1). Microsatellite loci are highly polymorphic, i.e., each of them has many alleles (figure 2). For example, up to ten alleles of the FES/FPS locus (with the repeat numbers from 7 to 15) have been found in populations; loci with far greater allele number are no exception.
Ethnogenomics: The Genetic History of Humans…
301
Figure 1. Microsatellite locus FES/FPS. Locus FES/FPS, consisting of a series of tetranucleotide repeats ATTT, is located in the intron of c-protooncogene fes/fps. The figure presents the nucleotide sequence between positions 4701 and 4770, containing the ATTT repeats (the complete length of the fes/fps gene exceeds 12 kb). This particular allele has 11 repeats.
Microsatellite fragments are detected using polymerase chain reaction (PCR) that provides a manifold increase in the copy number (amplification) of the given DNA fragment between the primer regions. Since microsatellite alleles are short, generally not exceeding 200-300 bp, even strongly degraded biological material can contain full-size intact copies of the DNA fragment examined, providing high chances of their successful amplification. For this reason, PCR of small DNA regions, including microsatellites, proved particularly important for forensic studies. Efficient methods were designed for microsatellite analysis, using primers labelled with fluorescent dyes followed by detection of the reaction products by automated DNA sequencers [22]. Let us specify the definitions of microsatellite locus and microsatellite allele. A DNA region with particular genomic localization that contains short tandem repeats is called microsatellite locus (or microsatellite), often STR locus (or simply STR, from Short Tandem Repeat), or SSR (Simple Sequence Repeat). An allele of an STR locus is a DNA fragment containing a certain number of the repeats examined and flanked by the primer regions. Since the choice of primers is largely arbitrary, the terms allele and locus here are purely symbolic, indicating only that this locus contains a DNA region with the repeats in question. One can choose another pair of primers flanking the same set of repeats, and this defines the same STR locus, if the aim of the study is estimating the number of repeats and their finer composition in the given DNA region. The corresponding alleles, detected by different pairs of primers, differ from one another by the same number of nucleotides, determined only by the difference in the positions of the terminal parts of the amplified DNA fragments. Regardless of the fact whether the different primer pairs determine larger or smaller amplified DNA fragments, it is important that these fragments contain the whole region with the repeats examined, which are the main characteristic of a microsatellite locus. Consequently, the locus formula typically indicates the structure of the repeats and their numbers. For example, the formula of the FES/FPS locus is [ATTT]n, where n is the repeat score; the allele at this locus given in figure 1 is denoted as [ATTT]11, as it contains eleven ATTT repeats.
L. A. Zhivotovsky and E. K. Khusnutdinova
302
Fraction of loci
30 25 20 15 10 5 0 4
8
12
16
20
24
28
32
Num ber of alleles at dinucleotide loci
Fraction of loci
30 25 20 15 10 5 0 4
8
12
16
20
24
28
32
Num ber of alleles at trinucleotide loci
Fraction of loci
30 25 20 15 10 5 0 4
8
12
16
20
24
28
32
Num ber of alleles at tetranucleotide loci
Figure 2. The number of alleles in autosome microsatellite loci with di-, tri- and tetranucleotide repeats. The alleles were detected in more than a thousand individuals from 52 ethnic groups from worldwide regions (after Zhivotovsky et al., 2003). Most loci have eight and more alleles. Loci with dinucleotide repeats are on average more variable than loci with longer motifs.
As microsatellites are generally highly polymorphic, the overwhelming majority of people are heterozygous for them. Only a small percentage are homozygotes, having alleles identical by the nucleotide composition in the homologous loci. The nomenclature of microsatellite loci has mainly been standardized, although with a kind of shifts. The name of an STR locus can be related to the gene, in which it is located (e.g., FES/FPS), or reflect chromosomal location (e.g., D11S1986 in chromosome 11 and DYS392 in the Y chromosome). Even repeats can be designated differently: have different types of permutations or indicate different complementary DNA strands. For instance, figure
Ethnogenomics: The Genetic History of Humans…
303
1 shows that, since the region with tetranucleotide repeats is flanked by nucleotides A, repeats in the FES/FPS locus can be denoted not only as ATTT, but also as TTTA, and based on the nucleotide sequence of the complementary chain, also as AAAT or TAAA. The structure of microsatellite alleles can differ in complexity and show variation both in the region with repeats and in the flanking sequences (see classification in [23-24]). For example, all alleles of the FES/FPS locus have only so-called perfect, or simple repeats: their regularity is not interrupted. However, this concrete locus also exhibits variants that differ from the standard alleles by a substitution of A for C in a position of the 5‘-flanking region and are detected by direct sequencing or restriction analysis. The majority of alleles of the HUMTH01 locus (intron 1 of the tyrosine hydroxylase gene) have simple repeats [AATG]n, but some of its alleles have more complex structure, as, for example, [AATG]5ATG[AATG]3 and [AATG]6ATG[AATG]4. These alleles contain trinucleotide ATG repeats between the main tetranucleotide repeats; this trinucleotide was probably derived from the main repeat via deletion. Variation in the repeat structure of this locus can also arise through substitution, insertion, or deletion of a single nucleotide. For instance, the ATG part of allele [AATG][AACT][AATG]8ATG[AATG]3 arose from the repeat AATG via a loss of nucleotide A. Finally, some complex alleles are composed of two or more types of repeats. For instance, one allele of the vFW locus, used in forensic studies, has a complex structure: [TCTA]2[TCTG]4[TCTA]3TCCA[TCTA]3. Such a complex locus can be divided into parts according to the repeat types, which can then be examined separately using the flanking and the internal primers. Generally, the allele length expressed in the base pair number (without estimating the repeat number) is sufficient for comparative analysis. However, evolutionary estimations and dating often require information on the allele size expressed in the repeat number, because a repeat motif is the principal unit of mutation in microsatellite loci. In this case, one should know the complete nucleotide sequence of a specific reference allele or better use the socalled ladder (a DNA marker that includes all alleles discovered at the locus in question).
Mutation at Microsatellite Loci The great majority of mutations at microsatellite loci are generated by a specific DNA replication error in the microsatellite region— – DNA polymerase slippage along the homopolymer sequence. The rates of appearance of microsatellite mutations (μ) are far higher than the rates of point mutations in eukaryotes. While the latter is about 10-9-10-8 per nucleotide position, and thus is of order of 10-6 per gene, the mutation rate of microsatellites is much higher ranging from 10-6 to 10-2 [25]. For this reason, variability of STR loci is characterized by the repeat number, and the key method of genotyping is fragment analysis permitting to detect the allele size. Figure 3 presents a genealogy featuring a mutation that appeared de novo in a microsatellite locus of the Y chromosome.
Ai-do Lelu
Lev
Petr
Atka
Nakoti
Husu
Nikolai
Yuri
Efim
(35 repeats)
Andrei
Boris
Oleg
Sergei
(36 repeats)
(36 repeats)
(35 repeats)
(35 repeats)
Aleksei (35 repeats)
Figure 3. An example of a de novo mutation in an STR locus of the Y chromosome: The male genealogy of a family with haplogroup N-P63 of the Forest Nentsy (the raw data by L.P. Osipova, T.M. Karafet and M. Hammer). The repeat scores at the CDY locus is given for the typed individuals; these individuals carried identical alleles at all other loci examined. As the appearance of two independent identical mutations in brothers is unlikely, these results suggest that the ancestor of the family, Ai-do Lelu, and his sons, Lev and Petr, had the same Y-haplotypes with a 35-repeat alleles at the locus. This allele mutated to 36 repeats in Lev‘s son, Nakoti, who transmitted it to his sons, Andrei and Boris. The pathway of the allele is shown by the dotted line.
Ethnogenomics: The Genetic History of Humans…
305
Analysis of more than 5,000 autosomal microsatellites with diallelic repeats showed that the average frequency of their appearance is μ=6.2×10-4 per locus per generation [19]. Mutation rates at loci with tri- and tetranucleotide repeats is lower [26]. The majority of microsatellite mutations increases or decreases the repeat number exactly by one. However, some mutant alleles lose or gain two or even more repeats (figure 4). Therefore, in addition to mutation rate μ, an additional parameter of mutation process at microsatellite loci is the mutational variance, which measured the variation of the repeat number generated by mutation.
Number of events
120
80
40
0 -3
-2
-1
0
1
2
3
4
5
Mutational shift (in repeat score)
Figure 4. An example of the mutation spectrum in microsatellite loci (after Xu et al., 2000). The authors examined 287,786 parent-offspring locus transmissions in complete families at 273 microsatellites with tetranucleotide repeats. Among those, there were found 508 deviations from Mendelian inheritance, in which, by another typing, 236 mutations were certainly occurred in germline cells. As seen in the figure, in most of the mutations, the repeat number changed exactly by one; however, in 16.4% of the mutations, the mutational changes constituted two or more repeats. The mutational variance was 1.83 (here the mutational shift in the repeat number was from –3 to +5).
Mutational variance is m 2
1 n i 2 , where n is the number of mutations examined, i i n
i is the mutational shift in the repeat number, and a is the number of mutations with the repeat number shifted by i. Under symmetric mutation, in which the mean shift values in the minus and plus directions are equal,
1 5 ni i 0 . If mutation change the allele size by 1 only, then n i 3
m2 1; in a case of greater mutational changes (as those in figure 4), m2 1. In evolutionary studies, fundamental is the product of the two mutational parameters (μ и
2 m ):
w m2 [27-29], which was termed effective mutation rate [30-31]. This is
explained by the fact that the higher the number of appearing mutant alleles separated from
306
L. A. Zhivotovsky and E. K. Khusnutdinova
the parental allele by several repeats (i.e., the higher the value of
m2 ), the faster the increase
in the total population variability and the rate of population differentiation by the repeat number. Based on the data on more than five thousand autosomal microsatellite loci with dinucleotide repeats [19], mutational variance
m2 was estimated as about 2.5 [32].
Comparison of population variation at autosomal dinucleotide microsatellite loci with that at autosomal loci with tri- and tetranucleotide repeats in the same subjects provided an estimate of average effective mutation rates across tri- and tetranucleotide loci: 0.85×10–3 and 0.93×10–3 in one study [31] and 0.71×10–3 and 0.70×10–3 in the other [17]. The mutation rates can significantly vary across loci [29, 33, 34]. Moreover, alleles of different size at the same locus can mutate with different rates. For instance, longer alleles may show higher mutation rates, sometimes mutating preferentially to alleles with a lower repeat number [35-38], which may increase the variance of the repeat number with increasing the average allele size [39]. The preferential mutation of longer alleles into smaller ones and vice versa may act as an evolutionary mechanism keeping the microsatellite variation within certain limits [36, 40-43]. Selection against the carriers of large-size alleles, e.g., those associated with severe diseases, can also constrain microsatellite variation.
Worldwide Differentiation of Human Populations To date, genome-wide panels of human autosomal microsatellite loci have been developed (see http://research.marshfieldclinic.org/genetics/ sets/combo.html), allowing gene mapping and studying the evolution of human populations on the worldwide scale. Using one of these panels, a comparative study of populations of various world regions was conducted [16-17] aimed at answering the question whether it is possible to determine the regional origin of a concrete individual based on his/her DNA sample (figures 5, 6). The samples included more than one thousand members of 52 ethnic groups from various world regions (Equatorial, South, and North Africa; West, Central, and East Asia; Europe, Oceania, Central and South America). For the study, 377 microsatellite loci were selected that presumably were not associated with any adaptive traits or external differences. A certain number K (from 2 two to 5five) was specified; then, the sample of more than one thousand of individuals was subdivided into K ―DNA groups‖ according to their similarity at the selected 377 DNA markers (without information on their ethnic assignment). After the formation of the DNA groups, the composition of individuals from particular ethnic groups in each of them was estimated. (The procedure of assigning an individual to a DNA group was as follows: at specified K, all DNA characters were classified into K ―characteristic‖ sets according to their microsatellite similarity. Then, each individual was characterized by some quantities that can be interpreted as the percentage of characters that belong to a particular group) – see [16] for more details.
Ethnogenomics: The Genetic History of Humans…
307
Figure 5. Classification of the individuals from 52 populations into genotypic groups based on their similarity at 377 microsatellite loci (from Rosenberg et al., 2002).
The results of the study were rather striking: the individuals practically infallibly fell into groups according to their geographic origin (figure 5). At K = 2, all Africans and west Eurasians fell into one group, and the rest into the other. At K = 3, the first group divided into the African and the west Eurasian origins; at K = 4, Amerindians separated from the Asian group; and at K = 5, the Oceanic group separated [16]. Thus, taken regardless of external traits (such as skin colour, etc.), the use of microsatellite loci allowed subdividing all individuals, chosen worldwide, into a few large geographic groups and identifying the continental origin of each of the individuals. It is important to emphasize that some individuals had a certain percent of markers characteristic for other groups, which can indicate gene flow between and/or common origin of geographically close populations. And what was the situation within the large geographic groups? Could finer population/geographic subdivision be achieved on the basis of the DNA markers? The answer to this question in general is positive, but in some cases the method failed. As follows from figure 5, the west Eurasians wereas rather distinctly subdivided into the peoples of Europe, Near East, and Central/South Asia, but within these groups, the separation was not very clear, especially in Europe. In each European ethnic group, there were many individuals that could be assigned to another European ethnics by their DNA. The analysis isolated in the Equatorial and South Africa two oldest human hunter-gatherer tribes, the Pygmies (Biaka and Mbuti) and the San. However, three Bantu-speaking tribes fell into one cluster. The Amerindians were distinctly differentiated into tribes examined in this study: Pima and Maya from Central America, Columbians from northern South America, Surui and Karitiana from the Amazonian basin. The two Oceania peoples studied also were clearly separated from each other.
308
L. A. Zhivotovsky and E. K. Khusnutdinova
Figure 6. Positions of the samples from major world regions in the axes of three principal components, PCs (after Zhivotovsky et al., 2003). Diamonds: Central Africa (solid diamonds designate huntersgatherers; white diamonds, Bantu-speaking peoples); asterisks: western Eurasia (Uigurs and Khazareans are indicated by crosses); solid dots: East Asia; open dots: America; triangles: Australia.
Analysis of the principal components also revealed clear separation of the ethnic groups according to their geographic origin (figure 6). The principal components were estimated from the 52×52 matrix of pair-wise coefficients of correlation between the mean repeat numbers in the samples. Specifically, the population samples examined form large geographic clusters: South Africa, West Eurasia, East Asia, Oceania, and America. The position of the populations within each cluster corresponds to their population status. For example, the Bantu-speaking African peoples proved to be genetically closer to one another than to the hunter-gatherer tribes San and Mbuti; on the diagram, San and Mbuti were positioned at the edge of the African cluster, relatively separate from the other Pygmy tribe, Biaka. The latter tribe was located closer to the subcluster of the Bantu peoples (see [17] for detailed explanation). Western Eurasia, which includes Near East, Europe, Central and South Asia and
Ethnogenomics: The Genetic History of Humans…
309
North Africa, was clearly differentiated from the other large geographic groups and had pronounced internal structure. The Uygur (the sample was taken from western China) and the Hazara occupy positions intermediate to Eurasia and East Asia, reflecting their relatedness to Mongolians. The ethnic groups of East Asia formed a separate cluster, which also had distinct internal structure. Samples from Oceania separated from the East Asians. The Amerindian populations distanced from the rest groups. One of the Amazonian populations, the Surui, was more distant from other American populations, which might be explained by reproductive isolation and gene drift caused by extremely small size of the tribe. By contrast, the Maya population from Central America was closer to non-American clusters than the other American populations, which might reflect the impact of gene flow to the American continent in the post-Columbian era.
Correspondence of Genetic Data to Anthropological Classification Do the result of this worldwide study correspond to the image of ethnic groups in anthropological terms? People inhabiting different geographic regions form ethnic groups (from the Greek ―‗ethnos,‖‘, people), which differ from one another in life style, language, customs and beliefs, sometimes in external appearance. The language, customs and religion are usually shared by all members of an ethnic group. However, these can change virtually in one generation: after moving to a foreign country, the adult immigrants have to learn that language, but for their children and grandchildren this language becomes native; particular persons can change their religion— – remember the advent of Christianity to the heathen Russia. The culture (language, religion, customs, social behaviour, life style) constitutes the atmosphere of an ethnic group, and children born in it learn the cultural traditions from their parents and other adults. However, people vary not only in cultural traits but also in genetically hereditary ones, which are transmitted from parents to children regardless of what these children see and learn (we leave aside cases of gene-culture inheritance). In contrast to the cultural properties, the genetic traits change very slowly, over many generations. The difference in inherited external traits is greatest among the indigenous people from different continents: they may differ in skin hair, and eye colour, shape of the body, skull or nose; hair texture (straight, wavy or curly) and length, and other traits. In the traditional anthropological classification, geographically distant, ―continental‖ groups of humans are called races, and the traits distinguishing them, racial traits. Most racial traits are variable, and a single person of a given race rarely exhibits all of the traits inherent to this race. Furthermore, races are not absolutely isolated from one another. Even at the dawn of the humankind, various tribes mixed with each other. Each of the major races is heterogeneous, consisting of small anthropological races differing from each other in a number of traits. For instance, Amerindians, which belong to the Mongoloid race, lack the epicanthus (an overhanging fold of the superior eyelid), and their facial profile is similar to that in Caucasoids; the Oceanians are dark-skinned, but differ from the Africans— – by hair length, dental structure, and fingerprints they are closer to the Mongoloids; the Caucasoid race consists of several anthropological types as well. Each of major and minor races includes many intermediate groups.
310
L. A. Zhivotovsky and E. K. Khusnutdinova
The subdivision of the humankind into major and minor anthropological races by skin pigmentation, hair texture, skull shape, etc., is also a subdivision of the ethnic groups into geographic areas where they live and their ancestors lived, and the latter is in general agreement with the above results on the geographic subdivision of ethnics into genetic groups based on ―neutral‖ DNA markers, i.e., characters that are not related to any morphological or physiological features. Why do classifications of the human populations by such different character systems as DNA composition and the external morphology yield so close results in their geographic assignment? How did modern human races and within-race ethnic groups appear? Based on DNA variability in the modern humans, one can estimate the size of the ancestral population that, according to the hypothesis of African origin, gave rise to the whole humankind. According to genetic estimates, this prapopulation was extremely small; on the basis of our data on the same 377 DNA markers, its size was about two thousand people [17]. This does not mean that there were no other human populations on the Earth at that time. But the current genetic diversity of all people of the world stemmed from that small group – other populations did not leave their genetic traces in the present world population. Comparing DNA markers of the aboriginal people from South Africa, we estimated that approximately 70-150 thousand years ago, intense differentiation and complex demographic processes started. This was accompanied by the appearance of diverse populations within Africa. Then, 50-100 thousand years ago, the waves of migrants began ―spurtling‖ over the African borders and ―flow‖ over other continents [17], which is in agreement with other estimates. During tens of thousands years, humans were migrating and adapting to local environments. Imagine that a group of people came to Southeast Asia and settled there for many generations. Then part of them moved further, forming a new local population (a future ethnic group), which, however, had the common history and the common origin with the parental group, and thus their DNA was more similar to each other than to that of the inhabitants of a different geographic region, even within the same continent. As to the populations of different continents (the future major races), since the separation of the race ancestors from the common progenitors, their DNAs accumulated different mutations, and the accumulation of genetic differences was further promoted by isolation and genetic drift because of huge geographic distances between them. This is the situation observed for neutral DNA markers. For instance, consider allele #275 of the tetranucleotide locus D9S1120, which is present in all American populations examined and absent in the non-American populations. This allele is nearly fixed in the Surui tribe from the Amazon River basin (frequency 0.97), probably because of genetic drift; its frequency in other American populations is about 0.20.3: 0.3 in Maya, 0.22 in Puma, 0.19 in Columbianns, and 0.25 in another Amazonian population, Karitiana. The wide distribution of this allele within the continent suggests that it arose in the founders of the aboriginal American people. The allele 275 of the D9S1120 locus can be considered as a genetic marker of the native American populations. No such clear specific markers as marker #275 for American populations have been found for other populations or regional groups, although some population-specific alleles occurred fairly often. For example, there are two private alleles in South Africa, whose frequency exceeds 10%, whereas the populations of hunters-gatherers have their own specific alleles, one of which occurs at frequency of about 16%. In the San population, numerous private alleles were found, including two occurring at frequencies exceeding 30%. In the populations of Oceania,
Ethnogenomics: The Genetic History of Humans…
311
the frequencies of two private alleles are over 10%, while each of the populations of this region has specific private alleles occurring at relatively high frequencies. In contrast, in West Eurasia and East Asia private alleles are rare, having frequencies below 3%. In populations of South African ―farmers,‖, the frequency of none of the private alleles reaches 5%, except one allele, whose frequency is 8%. In none of the above populations or regional groups, a discriminating marker was found, but these groups can be distinguished by combinations of hundreds of nonspecific alleles. As to the between-race differences in external, ―‗racial,‖‘, traits, it is likely that they have developed in parallel with, though independently of, the neutral DNA differences, in a course of short-term evolution as adaptations to the environmental conditions, alimentation, and landscape of the particular continents/geographic regions; accordingly, the languages and cultures developed independently of populations of other geographic regions. However, not only the processes of population separation affected the formation of ethnic groups. Massive migration, interracial marriages and metisation can rapidly, within several generations, decrease the genetic differences that had been formed by the short-term evolution. Thus, the major anthropological races and intra-racial ethnic groups are a real, but not rigidly fixed category, which does not divide people by essential, profound biological properties. Ethnic differences in anthropological characters and DNA variation should be interpreted in the historical, evolutionary context, in relation to short-term geographic/environmental adaptation and isolation by distance.
ALU REPEATS Alu repeats are the most widespread family of short interspersed repetitive elements (LINEs) in the human genome [44, 45]. SINEs include numerous families of elements with average size of 300 bp, which are similar in nucleotide composition. They mainly consist of tandem repeats of GC-rich sequences separated by poly-A tracts. Alu repeats have over 500, 000 copies per haploid genome, constituting about 3-5% of the human genome (on average, 1 one copy per 6 kb). These sequences owe their name to the fact that most of them contain site AGTC, which is the target for restriction endonuclease AluI located 170 bp apart from the beginning of the sequence. There are two models of dispersal of Alu elements over the genome: the transposon model and the master gene model [44]. The transposon model states that many SINEs can generate new elements that are able to transpose. The new sequences acquire differences from the original ones in the course of amplification. The sequence of the Alu family members changes by accumulation of mutations. The amplification rate (the insertion number as the function of time) is not exponential, as expected if the transposon model is valid. Actually, the amplification rate of Alu elements is highly variable. The current amplification rate is estimated to be one fixed insertion per 5, 000 years. This is about 100-fold slower than 40-50 Myr ago. According to the master gene model, most SINEs were derived from a single or several active loci. This model assumes a linear rate of amplification, controlled by a single master gene. Mutations in master genes generate new subfamilies and are the main cause of differences in amplification rates.
312
L. A. Zhivotovsky and E. K. Khusnutdinova
Alu repeats, constituting more than 10% of the whole genome and moving in it by means of retroposition, apparently played a major role in the genome evolution. Based on their historical age, these repeats are classified into several subfamilies. Elements of some of the subfamilies represented in the human genome incorporated in new loci relatively recently and are polymorphic in the present-day human populations. The Alu element copies whose transposition occurred during the ethnic divergence of the modern human serve as helpful markers in evolutionary genetic studies. The youngest is subfamily AluYa5/8, some of whose members retained the ability for retroposition. The Alu repeats of this subfamily are polymorphic, and their movement over the genome coincided with the dispersal and ethnic divergence of humans. About 1,000-2,000 of such polymorphic insertions were found in the human genome. The main advantages of Alu sequences are as follows: high stability of Alu elements; a low level of de novo insertions; the absence of mechanism for removing of Alu repeats from a specific locus, which allows to regard Alu insertion as an independent event; and technically simple genotyping of Alu insertions [46-48]. Moreover, the character of movement of Alu elements permits unambiguous detection of the initial (the absence of Alu insertion) and the final (insertion) allelic state of the locus. In other words, in contrast to most other polymorphic diallelic systems, for Alu repeats, the ancestral state and the direction of mutation are always known. These features make Alu insertional polymorphism an attractive model for analysing the genetic history and structure, as well as evaluating population differentiation. Polymorphic Alu repeats are currently widely used for characterizing the genetic structure of modern populations and investigating the evolution of human populations. Studies of genetic diversity of human populations using Alu repeats are conducted in several research centres of the United States and Europe. The results of these studies were used to estimate the genetic diversity in Alu insertions in the world population and confirm the main migration pathways of the colonization of the territory of the globe by modern-type humans [49]; verify genetic relationships among various groups of Amerindians [45], several Caucasian ethnic groups [50, 51]; characterize dispersion and evolution of populations in North Eurasia [52], and reconstruct the genetic historical associations among some indigenous ethnic groups from Hindustan [53]. We have studied polymorphism of Alu repeats on the basis of global data on their distribution in various world populations, with special reference to differentiation of the populations from the Volga-Ural region and Central Asia. Being an important part of the East Europe, the Volga-Ural region is of obvious interest from the population genetic viewpoint, because of the features of the ethnic history of the peoples that currently inhabit its territory [54]. In earlier studies of the genetic structures of the ethnic groups from the Volga-Ural region, the following DNA markers were used: polymorphisms of restriction DNA fragments [55, 56], polyallelic systems: mini- and microsatellites (STRs, VNTRs) [57, 58], polymorphisms of mtDNA and the Y chromosome [59. 60]. Based on the results of these studies, the genetic structure of the populations was characterized, and the genetic differentiation and the contributions of the Caucasoid and the Mongoloid components into the gene pool of the peoples from the Volga-Ural region were estimated [61, 62]. As to the Central Asian populations, until recently, they were left out of the modern population and evolutionary genetic research based on highly informative DNA techniques. In spite of the abundant ethnographic, linguistic and anthropological data, the issues on the genetic structure and the relationships Central Asian ethnic groups with one another and with the peoples from the Volga-Ural region remained unclear. For instance, according to anthropological andd
Ethnogenomics: The Genetic History of Humans…
313
linguistic evidence, the South Ural and Cis-Ural populations were closely ethnically related with those from Central Asia, particularly the Cis-Aralian region. Moreover, elements of Central Asian origin can be traced in the historic legends as well as in the material and spiritual culture of most ethnic groups from the Volga-Ural region. The prevalence of tribal names of the Kypchack epoch in ethnonymic parallels of Bashkirs, Uzbeks, Turkmens, and Karakalpaks also suggests significant similarity of the late Central Asian ethnic substrate of these peoples [63]. To assess informativity of Alu insertion polymorphisms as population genetic markers and to analyse genetic diversity and relationships among the peoples, we have examined polymorphism of nine Alu insertions (Ya5NBC5, Ya5NBC27, Ya5NBC148, Ya5NBC182, Ya5NBC361, ACE, ApoA1, Pv92, and TRA25) in populations of the Volga-Ural region, Central Asia and other world regions. The material for genetic analysis of the populations from the Volga-Ural region and Central Asia was collected in expeditions of 1993-2002. In the study, 737 DNA samples were used collected from representatives of the following ethnic groups: Bashkirs from the Burzyanskii district of Bashkortostan (TransUralian) (n = 25); Mishari Tatars from Al‘met‘evskii district of Tatrstan (n = 76), Mountain Mari from Mari El Republic, Ioshkar-Ola (n = 88); Moksha Mordovians from Staroshaiginskii district of Mordovia (n =60); Udmurts from the village of Sharkanovo, Sharkanskii district of Udmurtia (n =80); Komi-Permyaks from villages of Yus‘ma and Kosa, Komi-Permyatskii Autonomous Okrug (n = 80); and indigenous inhabitants of Central Asia: Kasakhs (n = 83) and Uigurs (n = 63) from Kazakhstan and Uzbeks from Uzbekistan (n = 72). The data on Alu polymorphisms in populations of other world regions were taken from literature.
Table 1. Description of analysed Alu loci DNA locus
Chromosome localisation
Sizes of alleles
Ya5NBC27
11
591/265
YaNBC148
20
505/193
ACE ApoA1 PV92 TPA25
17q23 11q13 16 8
490/190 400/110 450/130 457/134
Ya5NBC5
2p22.2
497/195
Ya5NBC182
7p13
563/287
Ya5NBC361
7p21.3
545/279
References http://www.genetics.urah.edu/swatkins/pub/Alu_pri mes http://www.genetics.urah.edu/swatkins/pub/Alu_pri mes Tiret et al., 1992 [64] Batzer et al., 1994 [65] Batzer et al., 1994 [65] Yang-Feng et al., 1986 [66] http://www.genetics.urah.edu/swatkins/pub/Alu_pri mes http://www.genetics.urah.edu/swatkins/pub/Alu_pri mes http://www.genetics.urah.edu/swatkins/pub/Alu_pri mes
The Alu insertions were genotyped using polymerase chain reaction (PCR) with primers described earlier (Table 1). Statistical analysis of the results (allele frequencies and their errors, the observed and expected heterozygosities, goodness-of-fit to Hardy-Weinberg proportions, coefficient of genetic differentiation, pairwise population comparison) was conducted using the GENEPOP v. 1.2 software package [67]. Phylogenetic trees were constructed using the neighbour-joining procedure implemented in the PHYLIP v. 3.5c
314
L. A. Zhivotovsky and E. K. Khusnutdinova
software package [68], consensus dendrograms were built using the TreeView program [69]. Factor analysis was performed using the method of principal components implemented in the STATISTICA v. 5.5 package [70]. All Alu insertions examined proved to be polymorphic in all of the populations, with insertion frequencies ranging from 0.110 at locus Ya5NBC5 in Mountain Mari to 0.914 at locus ApoA1 in Tatars. The highest insertion frequency was found at the ApoA1 locus, in populations of both Volga-Ural region and Central Asia (on average 0.85 and 0.73, respectively). Among the Volga-Ural region populations, the highest ApoA1 frequency was found in Tatars (0.914) and Komi-Permyaks ().913), and in Central Asian ethnic groups, in Uzbeks (0.903). The lowest insertion frequency was observed in Uigurs (0.548), which conforms to the literature data on a lower frequency of this insertion in Asian populations (0.856) as compared to European populations (0.965) [71-73]. At the TRA25 locus, the Central Asian populations demonstrated homogeneity for the Alu insertion frequency (0.464-0.492). In the Volga-Ural populations, population differentiation was found at this locus. The insertion frequency varied from 0.176 in the Mountain Mari population to 0.5 in Udmurts and Tatars. According to Watkins et al. [73], the frequency of this inversion in the world populations varied from 0.193 in Africa and 0.397 in Asia to 0.583 in Europe and 0.5957 in India. High variation of the frequency of this insertion is observed in Asia: from 0.250 in Malaysia to 0.500 in Japan and Cambodia; the insertion is completely absent in some African populations. In European populations, the frequencies of this insertion are fairly high, ranging from 0.444 in Finns to 0.75 in Poles. These results indicate heterogeneity of the Volga-Ural region populations in general and for this polymorphic locus, in particular [74]. Loci Ya5NBC148, PV92 and Ya5NBC27 are the most informative markers of population differentiation. These Alu insertions may be termed ―Asian,‖, as they occurred at low frequencies in European populations— – 0.063 (Ya5NBC27), 0.199 (Ya5NBC148), 0.234 (PV92), whereas in Asian populations their frequencies reached 0.397 (Ya5NBC27), 0.420 (Ya5NBC148), 0.857 (PV92) [71, 72]. Among the European populations studied earlier, a Ya5NBC27 Alu insertion was detected only in the Finnish population, in which it occurred at a frequency of 0.325 [71]. This Alu insertion was found in all populations that we examined; its highest frequency was revealed in Komi-Permyaks and Udmurts (0.412 and 0.45, respectively). Komi and Udmurts belong to the Permian group of the Finnish Branch of the Finno-Ugrian Language family. This close relatedness of the languages suggests close genetic relationships of these ethnic groups with one another, on the one hand, and with the Finnish population, on the other. The lowest frequency of this allele was found in the Mountain Mari population (0.186), which also is assigned to the Finno-Ugric language family, but has a lower Mongoloid proportion in the mitochondrial genome (7%) as compared to the corresponding estimates for Komi (17.1%) and Udmurts (31%), according to previous studies [59]. The remaining groups of indigenous populations examined show homogeneity, which suggests their common origin. In locus Ya5NBC148, the Alu insertion occurs at the highest frequency in Uigurs (0.389) and at the lowest, in Moksha Mordovians (0.175), which corresponds to the frequency gradient of the insertion from the east westward. The results of the pervious studies of a hypervariable mtDNA segment, the proportion of the Mongoloid component also decreases from the east westward, being 62% in Kazakhs, 55% in Uigurs, 52% in Uzbeks, 12% in Tatars, and 2% in Mordovians [75]. The results for the PV92 locus
Ethnogenomics: The Genetic History of Humans…
315
are in agreement with the literature data on the ―Asian‖ origin of this insertion. For the VolgaUral populations, which are more Caucasoid in anthropological type, the frequency of this Alu insertion was on average 0.235, whereas it reached 0.506 in the populations of Central Asia. The Alu insertion in the ACE gene occurred at a fairly high frequency (on average 0.514) in all populations examined except Mountain Mari, in which this insertion was detected at the lowest frequency (0.250). According to V.A. Stepanov, the frequency of this insertion was high in the peoples of North Eurasia (0.565), varying in a wide range both in Caucasoids and in Mongoloids of Asia and New World [76]. Data on frequencies of Ya5NBC182, Ya5NBC361, and Ya5NBC5 Alu insertions lack in literature. Population heterogeneity at the Ya5NBC361 locus was shown in studies of frequency distribution in the samples examined. Finno-Ugric and Altaic groups of peoples were clearly differentiated. The insertion frequency in the former group did not exceed 0.291, whereas in the latter it reached 0.532. The insertion frequency at the Ya5NBC182 locus varied from 0.415 to 0.651 in all of the populations, except Komi-Permyaks and Udmurts, in which the insertion frequency attained 0.744-0.781. This high frequency of the insertion in some populations might reflect their common origin and prolonged contacts during the formation of the populations. No regular trends were found in the distribution of the Ya5NBC5 Alu insertion frequency depending on the proportion of the Caucasoid and Mongoloid components in their genomes. The lowest frequency of this Alu insertion was observed in Uzbeks (0.368), with a high proportion of the Mongoloid admixture, and in the Moksha Mordovian population (0.297), which occupies the westernmost position with regard to all of the Volga-Ural population examined and is more Caucasoid, based on the data on mtDNA and nuclear population (2% of the Mongoloid component) [59]. The evidence on the distribution of allele frequencies at the loci studied indicateindicates significant genetic diversity of the populations. The mean observed heterozygosities for nine Alu insertions varied from 0.326 in Mountain Mari to 0.445 in Kazakhs and Uigurs. At some loci, this parameter reaches 0.5, which is maximum for a diallelic locus. The highest diversity of Alu insertions was found for loci Ya5NBC182, Ya5NBC361, ACE, PV92, and TRA29. The Central Asian populations showed high genic diversity: the mean observed heterozygosity was 0.44, whereas in the Volga-Ural populations, it did not exceed 0.375. The relatively low heterozygosity of the Volga-Ural populations is explained by a low Intrapopulation diversity of the Mountain Mari sample at five loci. However, the Volga-Ural populations exhibited high interpopulation diversity: the coefficient of genetic differentiation Fst averaged over nine loci (Fst = 0.061) proved 2.5-fold higher than the corresponding index calculated for the Central Asian populations (Fst = 0.024). This relatively high Fst value in the Volga-Ural populations results from differences in Alu insertion frequencies at loci Ya5NBC361 (Fst = 0.143), ACE (Fst = 0.089), Ya5NBC182 (Fst = 0.083), TRA25 (Fst = 0.073), and Ya5NBC27 (Fst = 0.059), whereas the Central Asian populations showed high Fst only for the ApoA1 locus (Fst = 0.154). Analysis of genetic differentiation of the populations based on linguistic types detected higher Fst = 0.in Finno-Ugrians (0.066) than in Turkic peoples (0.032). Thus, the populations studied proved highly differentiated both in linguistic characters and in geographic location. As compared to the world populations, the interpopulation genetic differences of the populations from the Volga-Ural region was higher than those of population from Central
316
L. A. Zhivotovsky and E. K. Khusnutdinova
Asia, Europe and Southeast Asia [71, 77], which is in agreement with linguistic and anthropological evidence on complex ethnogenesis of the peoples from this region of Russia, which was characterized by extremely intense contacts and interactions among groups with different ethnic and racial characteristics [76, 78]. A pair-wise comparison of the populations for the distribution of insertion frequencies at nine loci did not detect significant differences between Udmurts and Komi-Permyaks, Uzbeks and Kazakhs, Tatars and Bashkirs (p < 0.01), which supports anthropological and linguistic evidence on the common origin of these ethnic groups. Significant differences were found between the other populations. Turkic and Finno-Ugric linguistic groups were found to significantly differ in the allele frequencies of the nine Alu insertions collectively (p < 0.000) and at individual loci taken separately, except Ya5NBC27. These results indicate heterogeneity and genetic subdivision of the regions and linguistic groups examined. To estimate the relationships among the populations examined, we used the method of principal components and phylogenetic analysis. The positions of these populations in the space of two principal components is are shown in figure 7. The two first principal components account for 62% of the allele frequency variation. According to the linguistic classification, two population clusters can be identified: one of them includes Turkic-speaking populations, the other, Finno-Ugrian ones. These data conform to the results of analysis of population differentiation listed above. Ancestral
Uighurs
Bashkirs
Tatars
Komi
Udmurts
Uzbeks
Kasakhs
Mordva
0.1
Mari
Figure 7. Dendrogram of genetic relationships among the populations of the Volga-Ural region and Central Asia.
Ethnogenomics: The Genetic History of Humans…
317
When the populations were compared with world populations using principal component analysis for five loci, they divided according to their geographic position: the populations from the Volga-Ural region fell into the same cluster with Finns, which supports the FinnishUgrian-Turkic interactions during the last millennium, whereas the Central Asian populations clustered with the populations from Southeast Asia. The first principal component explained 60.2% of the frequency variation, and the first and the second component, 79.2% (figure 8).
Figure 8. Positions of the populations studied in the axes of two principal components (PCs).
The results of the phylogenetic analysis are presented as a consensus dendrogram in figure 9. The populations examined clustered pair-wise in four groups: Tatars-Bashkirs, Komi-Udmurts, Uzbeks-Kazakhs, Mordovians-Mari, which is in good agreement with the data on their common origin. In general, this clustering reflected the histories of the populations and their interactions in the ethnogenesis rather than their geographic closeness. The populations of Uzbeks and Kazakhs were closer to Tatars and Bashkirs than their geographic neighbours Moksha Mordovians and Mari, while Uigurs occupies an isolated position, which reflected their late dispersal in Central Asia and limited ethno-cultural contacts with the other ethnic groups studied. These results showed the efficiency of this marker system for investigating genetic diversity, relationships among ethnic groups from various regions and linguistic groups. Loci Ya5NBC27, Ya5NBC148, Ya5NBC182, Ya5NBC361, ACE, ApoA1, Pv92, and TRA25 proved the most informative. The interpopulation genetic differences were greater among the populations from the Volga-Ural region than among those from Central Asia, Europe and Southeast Asia. The populations examined showed high differentiation in both linguistic characters and geographic position.
318
L. A. Zhivotovsky and E. K. Khusnutdinova
Figure 9. Positions of the populations examined and world populations in the axes of principal components.
CONCLUSION Because of the limits on the text, we could not dwell on finer details of the population structure related to the combinatorial nature of polymorphic autosomal markers. In particular, these markers are helpful in solving the problems of inbreeding in populations with high endogamy, personal identification, evaluation of relatedness, etc. These issues need special consideration employing other approaches than estimating genetic similarity/dissimilarity and sample clustering. Nevertheless, the features of variation of autosomal markers discussed above show that these markers enable solving some problems of ethnogenomics.
REFERENCES [1]
[2] [3]
[4]
[5]
Tautz D. Notes on the definition and nomenclature of tandemly repetitive DNA sequences, In: DNA Fingerprinting: State of the Science. Pena S.D.J., Chakraborty J.T., Epplen J.T. and Jeffreys A.J. (eds), Basel, Birkhäuser Verlag, 1993, pp. 21-28. Levinson, G. and G.A. Gutman. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol., 1987, vol. 4, pp. 203-221. Jeffreys, A.J., N.J. Royle, V. Wilson, and Z. Wong. Spontaneous mutation rates to new length alleles at tandem repetitive hypervariable loci in human DNA. Nature, 1988, vol. 332, pp. 278−281. Kelley, R., M. Gibbs, A. Collick, and A.J. Jeffreys. Spontaneous mutation at the hypervariable mouse minisatellite locus Ms6-hm: flanking DNA sequence and analysis of germline and early somatic mutation events. Proc. R. Soc. Lond. 1991, Series B, vol. 245, pp. 235−245. Henderson, S.T., and T.D. Petes. Instability of simple sequence DNA in Saccharomyces cerevisiae.. Mol. Cell. Biol., 1992, vol. 12, pp. 2749-2757.
Ethnogenomics: The Genetic History of Humans… [6]
[7] [8]
[9] [10] [11] [12] [13] [14]
[15]
[16]
[17]
[18] [19] [20] [21] [22]
[23]
319
Weber, J., and P. May. Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. Am. J. Hum. Genet., 1989, vol. 44, pp. 388– 396. Weber, J. L., and C. Wong. Mutation of human short tandem repeats. Hum. Mol. Genet., 1993, vol. 2, pp. 1123–1128. Kashi, Y., Y. Tikochinsky, E. Genislav, F. Iraqi, A. Nave, J.S. Beckman, Y. Gruenbaum, and M. Soller. Large restriction fragments containing poly-TG are polymorphic in a variety of vertebrates. Nucleic Acids Res., 1990, vol. 18, pp. 11291132. Holmans P. Nonparametric linkage. In: Handbook of Statistical Genetics, D.J. Balding et al., (eds.), John Wiley, 2001, pp. 487-505. Ott J. Analysis of human genetic linkage. Baltimore and London, Johns Hopkins University Press. 1999. Kong, A., D.F. Gudbjartsson, J. Sainz et al. (16 co-authors). A high-resolution recombination map of the human genome. Nat. Genet. 2002, vol. 31, pp. 241–247. Ashley, C. T., and S. T. Warren. Trinucleotide repeat expansion and human disease. Annu. Rev. Genet. 1995, vol. 29, pp.703–728. Evett, I.W., and B.S. Weir. Interpreting DNA Evidence. Sunderland, Sinauer. 1998. Bowcock A.M., Ruiz-Linares A, Tomfohrde J, Minch E, Kidd J.R., Cavalli-Sforza L.L. High resolution of human evolutionary trees with polymorphic microsatellites. Nature, 1994, vol. 368, pp. 455-457. Jorde, L. B., Watkins, W. S., Bamshad, M. J., Dixon, M. E., Ricker, C. E., Seielstad, M. T., and Batzer, M. A., The distribution of human genetic diversity: a comparison of mitochondrial, autosomal, and Y-chromosome data, Am. J. Hum. Genet. 2000, vol. 66, pp. 979-88. Rosenberg, N.A., J.K Pritchard,, J.L. Weber, H.M. Cann, K.K. Kidd, L.A. Zhivotovsky, M.W. Feldman. Genetic structure of human populations. Science, 2002, vol. 298, pp. 2381-2385. Zhivotovsky, L.A., N.A. Rosenberg, and M.W. Feldman. Features of evolution and expansion of modern humans inferred from genome-wide microsatellite markers. Amer. J. Hum. Genet. 2003, vol. 72, pp. 1171-1186. Zhivotovsky, L.A. Microsatellite variation in human populations and methods of its study. Inform. Herald All-Russ.Soc. Genet. Breeders, 2006, vol. 10, no. 1, pp.74-96. Dib C., Faure S., Fizames C. et al. (14 co-authors). A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature, 1996, vol. 380, pp. 152-154. Weber J.L., Broman K.W. Genotyping for human whole-genome scans: past, present, and future. Adv. Genet., 2001, vol. 42, pp. 77-96. The Mammalian Genotyping Service. http://research.marshfieldclinic.org/genetics/ sets/combo.html. Ziegle J.S., Su Y., Corcoran K.P., Nie L, Mayrand P.E., Hoff L.B., McBride L.J., Kronick M.N., Diehl S.R. Application of automated DNA sizing technology for genotyping microsatellite loci. Genomics, 1992, vol. 14, pp. 1026-1031. Urquhart, A., Kimpton, C.P., Downes, T.J. and Gill, P. Variation in short tandem repeat sequences—--a survey of twelve microsatellite loci for use as forensic identification markers. Int. J. Leg. Med., 1994, vol. 107, pp. 13-20.
320
L. A. Zhivotovsky and E. K. Khusnutdinova
[24] Gill, P., Kimpton, C.P., Urquhart, A., Oldroyd, N.J., Millican, E.S., Watson, S.K. and Downes, T.J. Automated short tandem repeat (STR) analysis in forensic casework—--a strategy for the future. Electrophoresis, 1995, vol. 16, pp. 1543-1552. [25] Schlotterer C. Evolutionary dynamics of microsatellite DNA. Chromosoma, 2000, vol. 109, pp. 365-71. [26] Chakraborty R., Kimmel M., Stivers D.N., Davison L.J., Deka R. Relative mutation rates at di-, tri-, and tetranucleotide microsatellite loci. Proc Natl. Acad. Sci. USA, 1997, vol. 94, pp. 1041-1046. [27] Slatkin, M. A measure of population subdivision based on microsatellite allele frequencies. Genetics, 1995, vol. 139, pp. 457–462. [28] Zhivotovsky, L. A., and M. W. Feldman. Microsatellite variability and genetic distances. Proc. Natl. Acad. Sci. USA, 1995, vol. 92, pp. 11549–11552. [29] Di Rienzo, A., P. Donnelly, C. Toomajian,B. Sisk,A. Hill, M. L. Petzl-Erler, G.K. Haines, and D.H. Barch. Heterogeneity of microsatellite mutations within and between loci, and implications for human demographic histories. Genetics, 1998, vol. 148, pp.1269–1281. [30] Zhivotovsky, L.A. A new genetic distance with application to constrained variation at microsatellite loci. Mol. Biol. Evol. 1999, vol. 16, pp. 467-471. [31] Zhivotovsky, L. A., L. Bennett, A. M. Bowcock, and M. W. Feldman. Human population expansion and microsatellite variation. Mol. Biol. Evol. 2000, vol. 17, pp. 757-767. [32] Feldman M.W., Kumm J., Pritchard J.K. Mutation and migration in models of microsatellite evolution. In: Microsatellites: Evolution and Applications, Goldstein DG, Schlotterer C (eds.), Oxford Univ. Press, 1999, pp. 98-115. [33] Cooper, G., W. Amos, R.Bellamy, M.R.Siddiqui, A. Frod-Sham, A.V.S. Hill, and D. C. Rubinsztein. An empirical exploration of the (δμ)2 genetic distance for 213 human microsatellite markers. Am. J. Hum. Genet. 1999, vol. 65, pp. 1125–1133. [34] Zhivotovsky L.A., D.B. Goldstein, M.W. Feldman. Genetic sampling error of distance (δμ)2 and variation in mutation rate among microsatellite loci. Mol. Biol. Evol. 2001, vol. 18, pp. 2141-2145. [35] Zhang, L., E. P. Leeflang, J. Yu, and N. Arnheim. Studying human mutations by sperm typing: instability of CAG trinucleotide repeats in the human androgen receptor gene. Nat. Genet. 1994, vol. 7, pp.531–535. [36] Xu X., M. Peng, Z. Fang, and X. Xu. The direction of microsatellite mutations is dependent upon allele length. Nat. Genet. 2000, vol. 24, pp.396–399. [37] Huang, Q. Y., F. H. Xu, H. Shen, H. Y. Deng, Y. J. Liu, Y. Z. Liu, J. L. Li, R. R. Recker, and H. W. Deng. Mutation patterns at dinuleotide microsatellite loci in humans. Am. J. Hum. Genet. 2002, vol. 70, pp.625–634. [38] Dupuy BM, Stenersen M, Egeland T, Olaisen B. Y-chromosomal microsatellite mutation rates: Differences in mutation rate between and within loci. Human Mutation, 2004, vol. 23, pp. 117-124. [39] Kayser M., Kittler R., Erler A., Hedman M., Lee A.C., Mohyuddin A., Mehdi S.Q., Rosser Z., Stoneking M., Jobling M.A., Sajantila A., Tyler-Smith C. A comprehensive survey of human Y-chromosomal microsatellites. Am. J. Hum. Genet., 2004, vol. 74, pp. 1183–1197.
Ethnogenomics: The Genetic History of Humans…
321
[40] Garza, J. C., M. Slatkin, and N. B. Freimer. Microsatellite allele frequencies in humans and chimpanzees, with implications for constraints on allele size. Mol. Biol. Evol. 1995, vol. 12, pp.594-603. [41] Nauta, M.J., and E.J. Weissing. Constraints on allele size at microsatellite loci: implications for genetic differentiation. Genetics 1996, vol. 143, pp. 1021-1032. [42] Feldman, M. W., A. Bergman, D. D. Pollock, and D. B. Goldstein. Microsatellite genetic distances with range constraints: analytic description and problems of estimation. Genetics, 1997, vol. 145, pp. 207-216. [43] Zhivotovsky, L.A., M.W. Feldman, and S.A. Grishechkin. Biased mutations and microsatellite variation. Mol. Biol. Evol. 1997, vol. 14, pp. 926-933. [44] Deininger P., Batzer M., Hutchison C. 3rd, Edgell M. Master genes in mammalian repetitive DNA amplification. Trends Genet., 1992, V.8, P. 307-312. [45] Novick G., Novick C., Yunis J., et al. Polymorphic Alu insertions and the Asian origin of Native American populations. Hum. Biol., 1998, V.70, P.23-39. [46] Batzer M.A., Stonecking M., Alegria-Hartman M. et al. African origin of humanspecific polymorphic Alu insertions. Proc. Natl. Acad. Sci. USA, 1994, vol. 91, pp. 12288-12292. [47] Roy A., Caroll M., Kass D. et al. Recently integrated human Alu repeats: finding needles in the haystack, Genetica, 1999, vol. 107, pp. 149-161. [48] Rowold D., Herrera R. Alu elements and the human genome. Genetica, 2000, vol. 108, pp. 57-72. [49] Stoneking M., Fontius J.J., Clifford S.L., Soodyall H., Arcot S.S., Saha N., Jenkins T., Tahir M.A., Deininger P.L., Batzer M.A. Alu insertion polymorphisms and human evolution: Evidence for a larger population size in Africa. Genome Res., 1997, V.7, P.1061–1071. [50] Nasidze I., Risch G., Robichaux M., Sherry S.T., Batzer M.A., Stoneking M. Alu insertion polymorphisms and the genetic structure of human populations from the Caucasus. Eur. J. Hum. Genet., 2001, vol. 9, pp. 267-272. [51] Yunusbayev B., Kutuev I., Khusainova R., Guseinov G., Khusnutdinova E. Genetic structure of Dagestan populations: A study of 11 ALU-insertion polymorphisms. Human Biology, 2006, vol. 78, pp. 465-476. [52] Khitrinskaya I. Yu. Genetic diversity of the indigenous population of Siberia and Central Asia at polymorphic Alu insertions. Abstract Cand. Sci. Dissertiation, Tomsk, 2003. [53] Majumder P., Roy B., Banerjee S., Chakraborty M., et al. Human-specific insertion/deletion polymorphisms in Indian populations and their possible evolutionary implications. Eur. J. Hum. Genet., 1999, vol.7, pp. 435-446. [54] Kuzeev R.G. Peoples of middle Volga region and South Urals. Moscow, Nauka, 1992, 345 pp. [55] Khusnutdinova E.K., Khidiyatova I.M., Viktorova T.V., Fatkhlislamova R.I., Ivashchenko T.E. Allele polymorphism of DNA loci МЕТ and D7S23, linked to the gene for cystic fibrosis in populations of the Volga-Ural region. Russ. J. Genet., 1997, vol.33, no. 6, pp. 889-894. [56] Khusnutdinova E.K., Khidiyatova I.M., Viktorova T.V., Fatkhlislamova R.I., Limborska S.A. Analysis of DNA polymorphism detected by genome fingerprinting
322
[57]
[58]
[59]
[60]
[61]
[62]
[63] [64]
[65]
[66]
[67] [68] [69] [70] [71]
[72]
L. A. Zhivotovsky and E. K. Khusnutdinova based on phage M13 in populations of the Volga-Ural region. Russ. J. Genet., 1999, vol. 35, no. 4, pp. 509-515. Khusnutdinova E.K., Pogoda T.V., Khidiyatova I.M., Galeeva A.R., Grinchuk O.V., Limborska S.A. Analysis of polymorphism of the hypervariable apoprotein B gene locus in the populations of the Volga-Ural region. Russ. J. Genet., 1996, vol. 32, no. 12, pp. 1678-1682. Fatkhlislamova R.I., Khidiyatova I.M., Khusnutdinova E.K., Popova S.N., Slominsky P.A., Limborska S.A. Analysis of polymorphism of CTG repeats in the myotonic dystrophy gene in populations of the Volga-Ural region. Russ. J. Genet., 1999, т. 35, № 7, сс. 988-993. Bermisheva M.A., Viktorova T.V., Tambets K., Villems R., Khusnutdinova E.K., Diversity of mitochondrial DNA haplogroups in peoples from the Volga-Ural region of Russia. Мol. Biol.(Moscow), 2002, vol. 36, no. 6, pp. 905-906. Bermisheva M.A., Viktorova T.V., Khusnutdinova E.K. Analysis of polymorphism of Y-chromosomal diallele loci in populations from the Volga-Ural region. Russ. J. Genet., 2001, vol. 37, no. 7, pp. 1002-1007. Khusnutdinova E.K., Khidiyatova I.M., Viktorova T.V., Fatkhlislamova R.I., Р.И., Galeeva A.R., Limborska S.A. Genetic distances and taxonomic analysis of populations from the Volga-Ural region inferred from DNA polymorphism data. Russ. J. Genet., 1999, vol. 35, no. 7, pp. 982-987. Khusnutdinova E.K., Viktorova T.V., Fatkhlislamova R.I., Galeeva A.R. Estimation of the relative contribution of the Caucasoid and Mongoloid components in the formation of the ethnic groups from the Volga-Ural region based on DNA polymorphism data. Russ. J. Genet., 1999, vol. 35, no.7, pp. 1-6. Кузеев Р.Г. Peoples of the Volga and the Cis-Ural regions. Moscow: Nauka, 1985, 308 pp. Tiret L., Riget B., Viskivis S. et al. Evidence from combined segregation and linkage analysis, that a variant of the angiotensin I-converting enzyme (ACE) gene controls plasma ACE levels.Am. J. Hum. Genet., 1992, vol. 51, pp. 197-205. Batzer M., Stoneking M., Alegria-Hartman M. et al. African origin of human-specific polymorphic Alu insertions. Proc. Nat. Acad. Sci. USA, 1994, vol. 91, pp. 1228812292. Yang-Feng T., Opdenakker G., Volekaert G. et al. Human tissue-type plasminogen activator gene located near chromosomal breakpoint in myeloproliferative disorder. Am. J. Hum. Genet., 1986, vol. 39, pp. 79-87. Rousset F. Inferences from spatial population genetics. Balding D., Bishop M., Cannings C. Handbook of Statistical Genetics. John Wiley, 2001, pp. 239-269. Felsenstein J. PHYLIP, version 3.5. Seattle Univ. Washington. 1993. Page R.D.M. TreeView version 1.6.1. 2000. StatSoft. STATISTICA for Windows (Computer program manual). Tulsa: StatSoft, 1999. http://www.statsoft.com. Watkins W.S., Ricker C.E., Bamshad M.J. et al. Patterns of ancestral human diversity: An analysis of Alu-insertion and restriction-site polymorphism. Am. J. Hum. Genet., 2001, vol. 68, pp. 738-752. Batzer M., Arcot S., Phinney J. et al. Genetic variation of recent Alu insertions in human populations. J. Mol. Evol., 1996, vol. 42, pp. 22-29.
Ethnogenomics: The Genetic History of Humans…
323
[73] Watkins W.S., Rogers A.R., Ostler C.T., Wooding S., Bamshad M.J., Brassington A.M.E., Carroll M.L., Nguyen S.V., Walker J.A., Prasad B.V.R., Reddy P.G., Das P.K., Batzer M.A., Jorde L.B. Genetic variation among world populations: Inferences from 100 Alu insertion polymorphisms. Genome Res., 2003, vol. 13, pp. 1607–1618. [74] Khusainova R.I., Akhmetova V.L., Kutuev I.A., Salimova A.Z., Lebedev Yu.B., Khusnutdinova E.K., Genetic structure of the peoples of the Volga-Ural region inferred from the data on Alu polymorphisms. Russ. J. Genet., 2004, vol. 40, no. 4, pp. 21-28. [75] E. Khusnutdinova, M. Bermisheva, M. Malyarchuk et al. Towards a comprehensive undestanding of the East European mtDNA heritage in its paleogeographic context. Conf. “Human Origins and Disease». Cold Spring Harbor, 2002, p. 90. [76] Stepanov V.A. Ethnogenomics of the North Eurasian population. Tomsk, Pechatnaya manufactura, 2002, 244 pp. [77] Kutuev I., Rhusainova R., Karunas A., Yunusbaev B., Fedorova S., Lebedev Yu, Hunsmann G., Khusnutdinova E. From east to west: Patterns of genetic diversity of populations living in four Eurasian regions. Hum. Hered., 2006, vol. 61, pp. 1-9. [78] Alekseev V.P. Geography of human races. Moscow, Nauka, 1974, 351 pp.
INDEX A access, 12 accessibility, 150 accounting, 7, 187 accuracy, viii, 4, 16, 147, 263 acetic acid, 284 acetylcholine, 2, 21 acetylcholinesterase, 22, 24 achievement, 2, 96 acrocentric chromosome, 146, 147, 154, 155, 156, 168, 171 activation energy, 11 active site, 3, 5, 6, 7, 8, 9, 10, 11, 16, 17, 18, 20, 21, 25, 43, 45, 47, 60 acute intermittent porphyria, 207, 232 ADAM33, 101, 102, 103, 104, 122, 123, 127 adaptation, 83, 184, 187, 192, 193, 195, 281, 282, 283, 284, 288, 294, 311 adaptations, 311 adenine, 34, 114, 285 adenovirus, 89 ADH, 2 adhesion, 42, 83, 89, 105, 271 adjustment, 21 adolescents, 296 ADP, 275 adrenaline, 291 adulthood, 35 advantages, 78, 133, 213, 228, 256, 312 Adygei, 189, 190 Africa, 131, 181, 182, 306, 307, 308, 310, 313, 321 age, 37, 103, 114, 166, 172, 179, 184, 206, 231, 233, 312 ageing, 145, 146, 166, 172, 208 aggregation, 18 aggression, 281, 283, 284, 285, 286, 288, 289, 290, 292, 293, 294, 295
aggression scales, 295 aggressive behavior, 283, 284, 286, 289 aggressiveness, 294 agonist, 116, 118, 126 agriculture, 179, 197, 282 AIDS, 182, 184, 196, 197 airway hyperresponsiveness, 102 airways, 114 alanine, 63, 81 albumin, 264, 266 alcohol, 1, 2, 33, 36, 93 alcoholism, 34, 35 alcohols, 34, 53 aldehydes, 34 aldosterone, 45 algorithm, 7 alimentation, 311 allergens, 102 allergic asthma, 109 allergic rhinitis, 127 allergy, 103, 105, 124, 126 ALT, 97 alternatives, 41 alters, 275 alveolar macrophage, 113, 114, 126 amino acids, 3, 6, 7, 8, 10, 11, 17, 18, 20, 21, 24, 25, 30, 47, 59, 63, 64, 84, 130 amphibians, 147 amygdala, 296 amyotrophic lateral sclerosis, 63 androgen, 98, 279, 320 anemia, 36, 59 aneuploidy, 219 angiogenesis, 80, 81, 98 angiotensin converting enzyme, 1 angiotensin II, 45, 291 aniridia, 53 annotation, 13 anthropologists, 190
Index
326
anthropology, 129, 176, 200 antibody, 272 antigen, 81, 89, 101, 102, 257, 272, 278, 279 antioxidant, 31 antisense, 235 antitumor, 25, 42 antitumor agent, 25, 42 anxiety, 289, 290, 291 APC, 207, 232 apoptosis, 79, 80, 81, 82, 83, 84, 86, 87, 89, 93, 94, 98, 99, 166 applications, 4, 191, 216, 294 aptitude, 3, 31 architecture, 21 arginine, 7, 87, 237, 249, 250 arthritis, 146, 166, 172 arthrogryposis, 247, 276 Asia, 130, 131, 134, 135, 136, 140, 141, 188, 250, 306, 307, 308, 309, 310, 311, 313, 314, 315, 316, 317, 321 aspartic acid, 9, 11, 18, 20 asphyxia, 25, 26 aspiration, 283 assessment, 191, 192, 272 assignment, 210, 306, 310 asthma, ix, 101, 102, 103, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127 ataxia, 88 atherosclerosis, 22, 31, 45, 55, 96 athletes, 278, 281, 282, 283, 286, 291, 293, 294, 295 atoms, 3, 9, 35, 60 atopic asthma, 103, 125, 126 atopy, 109, 125, 126 ATP, 172, 265 attachment, 171 Australia, 102, 308 authors, 15, 25, 64, 78, 114, 179, 210, 212, 215, 218, 241, 250, 252, 256, 270, 294, 305, 319 autosomal dominant, 244, 245, 249, 275, 276 autosomal recessive, 207, 232 avoidance, 289, 290 Azerbaijanians, 182
B BAC, 155, 156 background, viii, 26, 35, 37, 47, 54, 56, 63, 102, 163, 240 bacteria, 6, 30, 54 bacterial infection, 35 bacteriophage, 198 Balkans, 179 Balkars, 136
banks, 12, 241 barriers, 137 basal ganglia, 291 base pair, 45, 130, 230, 286, 291, 300, 303 basophils, 105 Bcl-2 proteins, 87 behavior, ix, 2, 10, 21, 283, 284, 285, 289 Belarus, 179, 180, 193, 201 beliefs, 309 benign, 46, 232, 257, 259, 262, 264, 267, 269, 278, 279 benign prostatic hyperplasia, 278 bile, 34 bile acids, 34 biliary obstruction, 35 binding, 16, 25, 26, 34, 41, 42, 59, 60, 87, 90, 105, 149, 159, 161, 162, 163, 166, 171, 208, 233, 237, 238, 245, 248, 249, 263, 264, 265, 266, 269, 270, 271, 276, 277, 285 biochemistry, 237, 240, 270, 278 bioinformatics, viii biological activity, 270 biological systems, 4, 8 biomarkers, 272 biomedical applications, 296 biosynthesis, 57 biotechnology, 176 blocks, 16, 87, 147, 148, 155, 200, 201, 285 blood, 21, 22, 25, 26, 31, 35, 37, 45, 47, 53, 56, 59, 114, 146, 166, 291 blood group, 22 blood plasma, 25, 26, 31 blood pressure, 45 blood transfusion, 22 blood vessels, 56 blood-brain barrier, 291 bonds, 5, 8, 13, 20, 60, 149 bone, 35, 37, 172 bone form, 35 bone growth, 35 bone marrow, 172 bones, 35 Bosnians, 140 bradykinin, 45 brain, 32, 47, 166, 172, 265, 279, 282, 284, 285, 286, 288, 289, 291, 292, 294, 295, 296, 297 brain activity, 282 brain size, 297 brain stem, 291 brain structure, 291 branching, 235 breast cancer, 77, 89, 93, 95, 96, 97, 100, 207, 231, 276
Index breast carcinoma, 232 bronchial asthma, ix, 124, 126 bronchial hyperresponsiveness, 123, 126, 127 bronchodilator, 118 bronchus, 126 brothers, 304 buffer, 103, 251 building blocks, 130, 249 Byelorussia, 192
C Ca2+, 248, 249 calcification, 35 calcium, 83, 231, 248, 249, 265, 285 Cambodia, 314 Canada, 257 cancer, viii, 35, 43, 46, 47, 77, 78, 79, 80, 81, 82, 83, 84, 86, 89, 92, 93, 95, 96, 97, 98, 99, 100, 145, 146, 193, 207, 235, 248, 257, 258, 260, 269, 270, 272, 276, 278, 279 cancer cells, 80, 81, 83, 86, 89, 248, 257, 270 cancer progression, 78, 98, 278 candidates, 14, 15, 124, 218, 230 carbohydrate, 8, 282 carboxylic groups, 18 carcinogenesis, ix, 77, 78, 79, 80, 82, 84, 89, 90, 96, 268, 269, 270 carcinoma, 43, 81, 84, 89, 98, 207, 232, 272 cardiac muscle, 244 cardiomyopathy, 64, 244, 247, 274, 275, 276 cardiovascular disease, viii, 31, 45 cardiovascular system, 22, 45 catalysis, 1, 3, 4, 5, 7, 8, 9, 10, 11, 16, 17, 20, 21, 46 catalyst, 21 catalytic activity, 1, 2, 3, 6, 8, 25, 26, 31, 53, 59, 63, 240 catalytic properties, 15, 22, 56 catalytic reaction, 46 catecholamines, 34, 289 Caucasian population, 125, 182, 197 Caucasians, 47, 129, 252, 253 Caucasus, 131, 134, 135, 136, 141, 142, 188, 189, 190, 199, 321 causation, 285 C-C, 175, 181 CD30, 155, 170 cDNA, 205, 207, 208, 225, 230, 271, 272, 275, 279 cell culture, 166 cell cycle, 42, 81, 84, 85, 86, 87, 88, 89, 94, 167 cell death, 89 cell line, 166, 170, 172, 173, 208, 219, 220, 225, 226, 227, 228, 235, 271, 279
327
cell lines, 166, 170, 172, 173, 208, 219, 220, 225, 226, 228, 235 cell organelles, 145 cement, 266 Central African Republic, 200 Central Asia, 131, 134, 135, 136, 140, 141, 142, 312, 313, 314, 315, 316, 317, 321 Central Europe, 15, 188, 190 central nervous system, 289, 291 centromere, 147, 155 cerebellum, 289, 291 cerebrospinal fluid, 284 cervical cancer, 235 challenges, 283 changing environment, 282 channels, 285 character, 5, 8, 11, 15, 83, 91, 152, 188, 283, 310, 312 chemical bonds, 4 chemical reactions, 10, 21 chemokine receptor, 103, 197 chemokines, 113 chicken, 78, 89 childhood, 116, 125, 285 children, viii, 35, 37, 101, 102, 103, 105, 120, 124, 126, 166, 276, 309 chimpanzee, 159, 168, 203, 204, 209, 210, 211, 228, 230, 296 China, 309 Chinese women, 46 chloramphenicol resistance, 138 chlorine, 43, 45, 54 chloroform, 103, 131 cholestasis, 35 cholesterol, 34, 187, 274 cholinesterase, 232 chromatography, 276 chromosomal abnormalities, 219 chronic diseases, 102 chronic fatigue syndrome, 283 chronic myelogenous, 166, 173 chronology, 138 chymotrypsin, 10, 18, 20 circulation, 47 cirrhosis, 35 civilization, 282 class, 3, 8, 9, 21, 35, 41, 57, 155, 163, 206, 210, 230, 253, 254, 291, 319 classes, 3, 6, 8, 17, 31, 34, 41, 147, 204, 206, 230 classical mechanics, 10 classification, 8, 34, 41, 129, 303, 309, 315 cleavage, 4, 147, 149, 150, 153, 169, 233 climate, 192
328
Index
climatic factors, 184 clone, 79, 152, 153, 155, 156, 163 cloning, 45, 103, 123, 141, 151, 154, 210, 272 close relationships, 131 closure, 285 clustering, 165, 192, 317, 318 clusters, 132, 145, 146, 147, 148, 150, 151, 152, 154, 156, 159, 161, 163, 164, 165, 168, 308, 315 CNS, 297 coagulation, 18 coagulation factors, 18 coding, 83, 87, 102, 107, 123, 131, 145, 172, 181, 209, 228, 229, 230, 231, 249, 267, 271, 279, 289 codon, 47, 53, 81, 88, 99, 193, 207, 237, 249, 250, 251 coenzyme, 242, 274 cognitive abilities, 291 cognitive function, 284, 285, 291 cohort, 93, 253, 276 colitis, 35 collagen, 83 colon, 32, 43, 47 colon cancer, 43, 47 colonization, 200, 312 color, iv, 162, 205, 251, 259, 268 complementary DNA, 302 complexity, 184, 235, 303 components, 16, 20, 21, 40, 54, 83, 89, 91, 92, 100, 120, 134, 147, 163, 178, 188, 239, 291, 308, 312, 313, 315, 317, 318, 322 composition, 8, 15, 21, 42, 299, 301, 302, 306, 310, 311 compounds, 2, 5, 25, 26, 31, 32, 34, 92 computation, ix, 3 computer technology, 6 concentration, 2, 5, 35, 43, 45, 59, 114, 124, 147, 149, 150, 205, 284, 285, 291, 292, 293 conception, 294 concrete, 303, 306 condensation, 245 conduction, 42 conference, 68 confidence, 14, 17, 106 confidence interval, 106 configuration, 9, 96 conformity, 11, 16, 263 consensus, 13, 149, 158, 208, 210, 313, 317 consent, 103, 131 conservation, 9, 170, 231 construction, 155, 175, 211, 218, 249, 257, 258, 269 consumption, 98 control, ix, 77, 80, 81, 83, 84, 85, 86, 88, 93, 94, 95, 96, 103, 105, 109, 112, 113, 116, 118, 123, 125,
130, 141, 146, 161, 162, 166, 167, 227, 240, 251, 252, 253, 254, 255, 256, 275, 283, 284, 286, 294 control group, 93, 96, 103, 105, 109, 113, 116, 254, 255, 256, 284, 286, 294 controversies, 114 convergence, 3 conversion, 46, 57, 152, 156, 165, 170 coronary artery disease, 31 correlation, 14, 15, 43, 45, 96, 109, 129, 137, 138, 182, 183, 184, 194, 197, 201, 255, 289, 308 correlation coefficient, 182 correlations, 3, 22, 129, 132, 182, 184, 195, 253 cortex, 284, 285, 289, 291, 296 cortical neurons, 290 coughing, 102 creatine, 240, 272 creatine phosphokinase, 240, 272 crystal structure, 275 CTA, 104 cultivation, 219 cultivation conditions, 219 cultural barriers, 137 cultural influence, 196 cultural tradition, 309 culture, 16, 130, 166, 180, 190, 192, 309, 313 curiosity, 34 cycles, 225, 226, 227 cyclooxygenase, 1, 2, 46, 48, 50, 52 cystic fibrosis, 321 cytochrome, 87, 92, 100 cytogenetics, 219 cytokines, 101, 105, 113, 125 cytomegalovirus, 138 cytoplasm, 64, 87, 88 cytosine, 47, 105, 285 cytoskeleton, 42, 238, 239, 244, 271, 276
D Dagestan, 136, 321 data analysis, 124, 141 data set, 131 database, 1, 9, 13, 14, 15, 17, 36, 37, 130, 217, 236, 239, 246, 249, 263, 273, 279 death, 80, 278 death rate, 278 decay, 26, 53, 232 decoding, vii, 2, 269, 270, 282 defects, 57, 88, 98 deficiency, 54, 56, 81, 82, 169, 182, 207, 232, 272, 285 definition, 12, 14, 256, 318 deformation, 239 degradation, 53, 85, 86, 88, 90, 92, 98, 284
Index dementia, 26, 31 demographic change, 184 dendritic cell, 120 Denmark, 102 density, 146, 169, 229, 272, 285, 286 deoxyribonucleic acid, 2 dephosphorylation, 40, 41, 42 depolymerization, 275 depression, 285, 286, 288, 296 derivatives, 25, 180 destruction, 284 detection, vii, viii, ix, 1, 3, 5, 6, 7, 9, 10, 13, 15, 17, 64, 102, 138, 147, 155, 169, 172, 176, 209, 213, 217, 220, 234, 240, 242, 269, 272, 301, 312 developed countries, viii diabetes, 31, 43, 45, 54, 96, 195 diagnosis, 77, 78, 96, 279 diagnostic markers, 257 dietary habits, 282 differentiation, 22, 77, 83, 89, 101, 102, 129, 143, 180, 196, 246, 252, 273, 306, 310, 312, 313, 314, 315, 316, 317, 321 diffraction, 14 diffusion, 196, 197 dilated cardiomyopathy, 244, 274 diploid, 147, 169, 220 direct action, 292 direct observation, 146, 198 disadvantages, 14 discipline, 176, 241 disclosure, 176 discrete data, 188 discrimination, 16 discs, 238, 276 disease gene, 200, 275 diseases, viii, ix, 1, 3, 13, 14, 15, 16, 22, 35, 36, 45, 53, 54, 59, 64, 79, 92, 93, 96, 102, 103, 145, 146, 176, 195, 207, 208, 213, 240, 244, 247, 257, 269, 270, 276, 282, 283, 284, 300, 306 disequilibrium, 125, 156, 171, 178, 192, 193, 194, 200, 201 disintegrin, 101, 123, 127 disorder, 322 dispersion, 312 disposition, 17, 25, 26, 35, 47, 56 dissociation, 45, 245 disturbances, 78, 80, 84 divergence, 129, 155, 159, 169, 179, 198, 203, 204, 206, 209, 210, 312 diversity, 131, 134, 138, 140, 144, 176, 177, 179, 180, 188, 191, 192, 193, 195, 196, 225, 231, 234, 235, 282, 283, 295, 299, 310, 312, 313, 315, 317, 319, 321, 322, 323
329
division, 80, 81, 93, 129, 146, 234, 259 DNA damage, 87, 88, 89, 98 DNA lesions, 85 DNA polymerase, 103, 303 DNA repair, 78, 89, 94 DNA testing, 176 doctors, 103 domain structure, 42, 87, 90, 241 donors, 103, 118, 119, 121, 122, 166 dopamine, 100 dopaminergic, 281, 295 dosage, 26, 172 draft, 210, 228 drawing, 7 Drosophila, 170 drug metabolism, 13 drugs, 25, 26, 45, 47, 92, 93 duplication, 155, 170, 193, 194, 204 duration, 25, 286 dyes, 301 dynamics, 159, 167, 171, 172, 176, 191, 257, 275, 320 dystonia, 244, 275
E E.coli, 12 East Asia, 139, 142, 306, 308, 311 Eastern Europe, 130, 135, 187, 188, 191, 195, 199 editors, 137, 141, 142 Education, 257, 270 Efficiency, 205, 250 EGF, 81 elaboration, 78, 176, 241 elasticity, 248, 249 election, 177, 184 electrophoresis, 146, 163, 241, 258, 259, 267, 272, 273, 274, 276, 279 elongation, 43, 147, 185, 186, 245 elucidation, 177 emotion, 295 emotional health, 283 emotional responses, 296 emotional stability, 281, 294 emotions, 284, 291 employment, 195 encoding, 43, 64, 82, 83, 89, 90, 92, 93, 126, 175, 178, 230, 244, 247, 276, 277 encryption, vii endocrinology, 35, 46 endonuclease, 149, 150, 169, 207, 215, 233, 250, 251, 252, 311 endothelial cells, 43 endothelium, 25, 105
330
Index
endurance, 45, 278, 280, 283 energy, 7, 10, 11, 12, 82, 178 entropy, 17, 18, 25, 37, 47, 63, 64 environment, 11, 180, 184, 204, 220, 284 environmental conditions, 187, 311 environmental factors, 92, 93, 101, 102, 192, 194, 195 environmental stimuli, 289 enzymatic activity, 4, 6, 8, 31, 240, 285, 289 enzymes, ix, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 13, 16, 17, 18, 19, 20, 21, 32, 40, 41, 42, 47, 53, 55, 59, 64, 83, 92, 100, 113, 130, 240, 284 epiphysis, 291 epithelial cells, 225 epithelium, 81 Epstein-Barr virus, 272 equilibrium, 138, 252 erythrocytes, 21, 53, 59 erythropoietin, 81 ester, 25, 30 estimating, 147, 301, 303, 318 estrogen, 35 ethanol, 34 ethnic background, 114 ethnic groups, 24, 34, 101, 103, 113, 114, 116, 120, 123, 143, 175, 186, 188, 190, 191, 193, 201, 212, 299, 302, 306, 308, 309, 310, 311, 312, 313, 314, 315, 317, 322 ethnicity, 112, 136, 194 ethyl alcohol, 34 etiology, 103, 206 euchromatin, 170 eukaryotic cell, 40, 147, 219, 239 Eurasia, 131, 136, 137, 139, 140, 142, 143, 178, 191, 195, 200, 308, 311, 312, 314 Europe, 130, 131, 137, 141, 142, 143, 144, 178, 179, 182, 184, 187, 188, 191, 196, 197, 250, 306, 307, 308, 312, 314, 315, 317 evolution, viii, 4, 8, 34, 53, 98, 129, 138, 139, 145, 147, 154, 156, 159, 167, 170, 171, 177, 178, 195, 198, 203, 204, 206, 208, 209, 210, 217, 229, 230, 231, 233, 234, 274, 277, 282, 291, 296, 306, 311, 312, 318, 319, 320, 321 excretion, 36 exons, 22, 36, 54, 57, 59, 187, 205, 212, 227, 246, 247, 250, 285 experts, 239 exploration, 320 extraction, 59, 103, 263, 272 extravasation, 46 extraversion, 289
F false negative, 257 false positive, 257 familial dysautonomia, 274, 275 familial hypercholesterolemia, 169 family, 1, 7, 16, 17, 18, 31, 42, 53, 59, 63, 79, 83, 85, 87, 90, 92, 96, 130, 137, 156, 165, 171, 182, 188, 189, 192, 206, 209, 210, 230, 232, 233, 234, 235, 249, 266, 277, 283, 284, 304, 311, 314 family members, 206, 311 family studies, 283 farmers, 311 FAS, 207 fasting, 43 fatigue, 285 fatty acids, 46, 92, 282 fear, 290, 291 fibroblasts, 105, 166, 172, 225, 276 fibrosis, 321 filament, 248, 271, 275 filters, 147, 148 fingerprints, 309 Finland, 257, 287 first generation, 283 fish, 147 fitness, 256, 280 fixation, 16, 17, 133, 172, 208, 282, 291 fluid, 284 fluorescence, 220, 235 fluoride ions, 26 fluorine, 27 focal segmental glomerulosclerosis, 277 focusing, 6 football, 291 Football, 287 forecasting, viii, 1, 15, 16, 64 formaldehyde, 34, 172 formula, 241, 301 fragments, 7, 14, 84, 101, 103, 104, 105, 146, 150, 151, 152, 153, 154, 155, 157, 166, 169, 172, 177, 211, 215, 218, 220, 251, 252, 263, 268, 301, 312, 319 framing, 284 France, 102 free activation energy, 11 free radicals, 59, 63 free rotation, 20 free variation, 8 frequencies, 12, 92, 96, 105, 107, 109, 112, 113, 114, 116, 118, 120, 123, 132, 135, 156, 177, 180, 181, 182, 183, 184, 186, 187, 188, 193, 194, 213, 218, 225, 253, 255, 256, 310, 313, 314, 315, 320, 321
Index frequency distribution, 106, 107, 108, 110, 111, 112, 113, 114, 115, 116, 117, 119, 120, 121, 122, 180, 186, 188, 192, 194, 314 frontal cortex, 285 frontal lobe, 284 functional analysis, 203, 204 fungi, 30 fusion, 146, 154
G gametogenesis, 146 gastrointestinal tract, 47 GC-content, 218, 219 gel, 105, 146, 168, 259, 267, 273, 276, 279 gene expression, 15, 42, 78, 81, 85, 86, 98, 113, 114, 124, 148, 212, 220, 225, 230, 231, 235, 241, 244, 245, 257, 275, 285, 291 gene mapping, 217, 306 gene pool, 134, 139, 141, 175, 176, 177, 178, 179, 187, 188, 191, 195, 197, 282, 312 gene promoter, 47, 55, 86, 105, 125, 126, 145 generalization, 225 generation, viii, 2, 82, 127, 155, 159, 165, 177, 217, 273, 305, 309 genetic disease, 3, 14 genetic disorders, 206 genetic diversity, 193, 196, 282, 299, 310, 312, 313, 315, 317, 319, 323 genetic drift, 133, 177, 182, 192, 310 genetic factors, 195 genetic information, vii, 241, 246 genetic linkage, 319 genetic marker, 101, 103, 123, 184, 187, 192, 194, 195, 217, 228, 283, 286, 288, 294, 310, 313 genetic mutations, 26 genetic traits, 309 genetics, viii, ix, 103, 123, 124, 125, 141, 175, 185, 188, 191, 192, 193, 195, 199, 200, 272, 273, 297, 299, 306, 319, 322 genomic instability, 88 genomics, vii, ix, 176, 200, 204, 228, 239, 241, 271, 272, 273, 274, 297, 299 genotype, viii, 26, 45, 46, 86, 96, 99, 100, 102, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 182, 250, 252, 253, 255, 256, 277, 290, 292, 293, 294 geography, 131, 133, 134, 137, 144, 176 germ line, 208 Germany, 102, 287 germline mutations, 198 gingivitis, 53 gland, 35, 43, 46, 57, 78, 91, 237, 266, 278 glaucoma, 53
331
glucocorticoid receptor, 105 glucose, 31, 43, 81 glutathione, 34, 92, 103 glycerol, 232 glycine, 7, 265 glycogen, 90 glycolysis, 81, 82 glycoproteins, 83 glycosylation, 241 government, iv GPRA, 103 Greeks, 191 growth, 18, 42, 43, 77, 78, 79, 80, 81, 85, 89, 90, 91, 98, 99, 103, 105, 123, 146, 167, 257, 273, 279 growth factor, 18, 42, 80, 81, 89, 105 growth rate, 80, 257 guanine, 47, 114 guidance, viii
H haemopoiesis, 45 hair, 235, 309, 310 hairy cell leukemia, 235 half-life, 90 hammer, 150 haploid, 311 haplotypes, 15, 109, 134, 143, 178, 179, 180, 192, 194, 196, 197, 201, 209, 214, 304 harbors, 145, 165 HDL, 2, 30, 274 head and neck cancer, 201 healing, 35 health, 100, 176, 240, 256, 280, 282, 294 health services, 102, 257 heart disease, ix, 47, 64, 199 heart failure, 274 height, 11 heme, 53, 55, 169 heme oxygenase, 169 hemisphere, 284, 289 hemoglobin, 53 hemolytic anemia, 59 hemophilia, 207, 231, 232 hemorrhage, 47 hepatitis, 35 hepatocellular carcinoma, 272 hepatocytes, 35 heterochromatin, 147 heterogeneity, 80, 103, 105, 116, 145, 164, 165, 192, 198, 314, 315 high density lipoprotein, 2, 30, 31 hippocampus, 285 Hispanics, 109
332
Index
histidine, 7, 10, 18, 20 histogram, 229 histone, 166, 266 HIV, 184, 196, 197 HIV/AIDS, 184 HIV-1, 196, 197 HLA, 102 homeostasis, 45, 77, 80, 282 hominids, 210, 230 homogeneity, 156, 206, 313, 314 homologous chromosomes, 156, 169, 171 homozygote, 290 hormone, 34, 55 host, 205, 207, 208, 229, 230 hot spots, 86 House, 94 human behavior, 291 human brain, 279, 297 human genome, vii, 2, 12, 66, 77, 78, 89, 152, 155, 159, 164, 168, 170, 171, 175, 176, 177, 178, 185, 200, 203, 204, 206, 208, 209, 210, 211, 212, 216, 217, 228, 234, 236, 239, 244, 246, 249, 253, 268, 271, 273, 282, 296, 297, 300, 311, 312, 319, 321 human immunodeficiency virus, 181 human leukocyte antigen, 102 human papilloma virus, 85, 89 human sciences, viii, ix Hungary, 179 Hunter, 70 hunting, 282 hybrid, 151 hybridization, 147, 149, 150, 152, 166, 172, 211, 212, 215, 220, 235 hydrogen, 4, 7, 16, 53, 54, 57, 60, 149 hydrogen bonds, 60, 149 hydrogen peroxide, 53, 54, 57 hydrolases, 8, 9, 11, 17, 20, 30, 35 hydrolysis, 21, 25, 32, 35, 43, 45, 92, 149, 251, 263 hydroxyl, 11, 92 hypercalcemia, 207, 231 hypercholesterolemia, 169 hyperlipidemia, 274 hyperparathyroidism, 35, 207, 231 hyperplasia, 257, 258, 260, 269, 278, 279 hypersensitivity, 88, 150 hypertension, 45, 46, 53, 195 hypertrophic cardiomyopathy, 247, 275, 276 hypertrophy, 45, 172, 276 hypoplasia, 53 hypothalamus, 291 hypothesis, 25, 97, 136, 141, 142, 154, 194, 203, 204, 208, 310 hypothyroidism, 36
hypoxia, 81, 94, 97, 98 hypoxia-inducible factor, 81, 98
I icterus, 35 identification, viii, ix, 5, 6, 8, 9, 10, 12, 16, 77, 96, 150, 170, 176, 185, 204, 208, 210, 212, 214, 216, 234, 235, 241, 242, 259, 260, 263, 268, 272, 279, 283, 300, 318, 319 identity, 7, 9, 30, 31, 47, 60, 217, 219, 234 idiopathic, 64 IFN, 126 IL-13, 105 IL-6, 126 image, 16, 24, 25, 26, 29, 32, 34, 36, 37, 40, 47, 50, 52, 54, 55, 56, 57, 63, 258, 259, 260, 289, 309 image analysis, 289 images, 1, 258, 259, 260 immigrants, 309 immune response, 77, 120, 166 immune system, 88 immunodeficiency, 166, 181, 207, 232 immunogenicity, 99 immunoglobulin, 102, 124, 126 immunoreactivity, 113 immunostimulatory, 166, 172 impacts, 21, 25, 296 impulsive, 284, 285, 289 impulsivity, 285 in situ hybridization, 235 in vitro, 172, 208 in vivo, 45, 99 inbreeding, 283, 318 incidence, 96, 257 independence, 16 India, 26, 141, 314 indication, 22, 25, 49, 51, 179, 185 indices, 108, 111, 180 indigenous, 140, 309, 312, 313, 314, 321 individual character, 166 individual differences, viii, 283 individuality, 282 inducible enzyme, 47 induction, 81, 82, 84, 87, 90, 91, 97, 105 infarction, 45, 46, 207, 232, 240, 248, 272 infection, 35, 181, 182, 196, 274 infectious disease, 272 inferences, 196, 235 inflammation, 47, 54, 105, 113 inflammatory disease, 101 inflammatory mediators, 102 inflammatory responses, 113 information exchange, 284
Index informed consent, 103, 131 inheritance, 137, 138, 156, 169, 171, 177, 192, 245, 283, 305, 309 inhibition, 25, 26, 86, 90, 113, 279, 285 inhibitor, 11, 41, 43, 85, 98, 171, 266 initiation, 80, 86, 171, 245 insects, 147, 152 insertion, 5, 56, 63, 152, 167, 176, 181, 206, 209, 210, 212, 215, 217, 218, 219, 220, 224, 226, 228, 231, 232, 234, 235, 291, 300, 303, 311, 312, 313, 314, 315, 321, 322, 323 instability, 80, 89, 99, 283, 285, 320 insulin, 31, 41, 43, 81 integration, 78, 83, 205, 207, 210, 211, 212, 215, 230, 234 interaction, 10, 11, 12, 42, 45, 60, 85, 86, 88, 89, 90, 91, 93, 101, 102, 116, 131, 149, 181, 194, 204, 229, 248, 292, 294 interactions, 175, 206, 228, 229, 238, 277, 296, 315, 316, 317 intercellular contacts, 83 internet, 217 Internet, 250 interphase, 146, 220 interrelations, 129, 191 interval, 13, 95 intervention, 296 intestine, 32, 35, 47, 93, 271 introns, 187, 206, 212, 217, 218, 219, 226, 235, 236, 300 inversion, 226, 313 ions, 6, 26, 43, 45, 248, 263 ischemia, 59, 96 isochromosome, 79 isolation, 124, 146, 309, 310, 311 isoleucine, 18 isozymes, 34 Italy, 274
J Japan, 12, 13, 26, 32, 37, 43, 141, 142, 257, 314 jaundice, 35 joints, 47
K K+, 215 karyotype, 84, 220, 235 karyotyping, 172, 220 Kazakhstan, 313 keratinocytes, 273 ketones, 34 kidney, 47, 271
333
kidneys, 47, 238 kinetic methods, ix kinetics, 4, 5
L lactate dehydrogenase, 81 landscape, 131, 136, 137, 140, 311 language, 129, 130, 133, 136, 137, 141, 142, 144, 192, 309, 314 languages, 129, 131, 132, 136, 137, 138, 143, 311, 314 Laos, 139 large intestine, 93 lateral sclerosis, 63, 73 learning, 15, 16, 284, 291 left hemisphere, 284, 289 left ventricle, 279 lesions, 85, 284, 296 leukemia, 54, 84, 88, 166, 173, 207, 235 life sciences, 237, 240 lifespan, 129 lifestyle, 192, 286 ligand, 16, 42 limbic system, 284, 285 limitation, 102 line, 7, 14, 150, 151, 152, 177, 215, 219, 220, 221, 222, 225, 227, 235, 251, 304 linearity, 5 linkage, 102, 107, 125, 156, 178, 192, 200, 201, 232, 245, 277, 300, 319, 322 links, 60, 167, 185, 266, 271, 291 lipases, 20 lipoproteins, 2, 187 liquid chromatography, 276 literacy, 296 liver, ix, 25, 30, 32, 34, 35, 53, 59, 89, 92, 238 liver cells, 53, 92 liver disease, ix local anesthetic, 26 localization, 46, 63, 104, 153, 156, 218, 228, 238, 269, 277, 284, 301 longitudinal study, 172 low risk, 45, 93, 96 low-density lipoprotein, 187 lung cancer, 43, 47, 98, 257 lung disease, 125 lung function, 107, 109, 112, 113, 123, 127 Luo, 98, 276 lying, ix lymphatic system, 83 lymphocytes, 42, 43, 149 lymphoid, 91 lymphoid tissue, 91
Index
334 lymphoma, 155, 170 lysozyme, 20
M machinery, 204, 205, 208 macromolecules, ix, 239, 248 macrophages, 32, 47, 103, 113, 114, 120, 126 magnesium, 42 magnetic resonance, 9, 284 magnetic resonance imaging, 284 maintenance, 80, 81, 105, 124, 154, 240, 257, 282, 283 majority, 6, 15, 102, 107, 135, 148, 152, 178, 190, 192, 209, 210, 269, 299, 302, 303, 305 Malaysia, 314 malignant hypertension, 46 malignant tumors, 77, 93, 257, 260, 269 mammal, 41, 206, 273, 285 management, 103, 124 mapping, 103, 150, 193, 199, 200, 211, 212, 217, 274, 275, 300, 306 marrow, 172 Mars, 295 mass spectrometry, ix, 241, 242, 243, 269, 276 maternal inheritance, 177 mathematics, ix matrix, 7, 35, 64, 83, 159, 171, 308 maturation, 146, 167, 275 measurement, 272 measures, 192 media, 282 median, 180, 181 Mediterranean, 137, 196, 199 medulloblastoma, 207 meiosis, 165 MEK, 90 membranes, 21, 46, 87, 92 memory, 291 men, 45, 55, 142, 168, 178, 180, 257, 289, 290, 291 Mendeleev, 65 mental capacity, 3 mental disorder, 286, 296 mental health, 296 Mesopotamia, 134 messages, 45, 256, 273 meta-analysis, 126 metabolism, 2, 13, 21, 25, 31, 32, 35, 64, 77, 81, 89, 92, 98, 99, 282, 285 metabolites, 31, 34, 92 metabolizing, 100 metalloproteinase, 43, 101 metals, 20 metastasis, 35, 80, 82, 83, 98, 248, 264, 276
methanol, 34 methodology, 21 methylation, 78, 145, 150, 166, 172, 205, 206 Mg2+, 42 mice, 42, 91, 99, 113, 283, 285, 289, 296 microgravity, 295 microsatellites, 147, 163, 168, 176, 180, 181, 185, 196, 217, 300, 301, 302, 303, 305, 312, 319, 320 migration, 83, 105, 177, 179, 184, 271, 311, 312, 320 minisatellites, 185, 187 missence mutation, 63 mitochondria, 82, 87, 98, 130, 178 mitochondrial DNA, 130, 138, 139, 141, 143, 175, 177, 299, 322 mitosis, 41, 88, 146 MMP, 83 mobility, 20, 34, 163, 237, 238, 239, 252, 260 model, 9, 11, 79, 138, 146, 159, 165, 180, 228, 246, 275, 276, 311, 312 model system, 146 models, 80, 98, 116, 154, 157, 246, 311, 320 modern society, vii, viii, 283 modification, 3, 4, 8, 13, 90, 242, 245, 258, 263 molecular biology, 97, 124 molecular mass, 41, 53, 55 molecular medicine, viii molecular weight, 33, 35, 46, 53, 124, 259, 260 molecules, 4, 5, 8, 16, 21, 42, 53, 92, 105, 130, 149, 178, 245, 246, 248, 249, 271 momentum, 210 Mongols, 130, 136 monoclonal antibody, 272 monomers, 159, 166, 205, 246, 248 morphogenesis, 249 morphology, 310 mortality, 31, 56 mosaic, 152, 170 Moscow, viii, 1, 65, 72, 77, 97, 102, 103, 138, 141, 142, 145, 168, 169, 170, 171, 172, 175, 195, 197, 199, 200, 203, 231, 270, 272, 273, 276, 277, 278, 281, 294, 321, 322, 323 motif, 148, 175, 238, 264, 303 motives, 148, 157, 159, 163 motor activity, 54 motor neuron disease, 63, 73 motor neurons, 63 movement, 151, 312 mRNA, 43, 94, 99, 113, 225, 232, 235, 246, 275, 295 mtDNA, 130, 131, 132, 133, 134, 135, 136, 138, 139, 140, 141, 142, 177, 178, 181, 190, 312, 314, 315, 323 multidimensional, 10, 188, 190, 191 muscle strength, 277
Index muscles, 43, 47, 238, 244, 246 muscular dystrophy, 154, 273 muscular tissue, 242, 248, 276 mutagenesis, ix, 8, 10, 80, 86 mutant, 25, 26, 77, 78, 82, 84, 86, 96, 181, 182, 198, 231, 232, 289, 290, 294, 305 mutation rate, 78, 130, 131, 177, 185, 195, 198, 300, 303, 305, 306, 318, 320 mycosis fungoides, 172 myocardial infarction, 45, 46, 207, 232, 240, 248, 272 myocardium, 244 myogenesis, 123 myoglobin, 177, 185 myopathy, 244, 247, 274, 276 myosin, 242, 248, 274, 275, 276
N NAD, 33, 34 natural selection, 133, 201, 282, 291 neck cancer, 201 necrosis, 102, 114, 126 neocortex, 284, 285 nephropathy, 45 nerve, 21, 22, 25, 268 nervous system, 289, 291 network, 3, 131, 139, 180, 181 neural function, 296 neuroblasts, 225 neurofibroma, 79 neurogenesis, 25 neurological disease, 186 neuronal cells, 238 neurons, 21, 63, 146, 271, 285, 286, 290 neuropsychiatry, 297 neurotransmission, 21 neurotransmitter, 284, 285, 286, 292, 294, 295 neutrophils, 54, 120 New Zealand, 102 next generation, 79, 97 nicotinamide, 34 nitron, 13, 45 NMR, 14 nondisjunction, 171 nonsense mutation, 272 North Africa, 306, 309 North Caucasus, 131, 134, 136, 189 novelty, 289 nuclear genome, 178 nuclei, 166, 239 nucleic acid, 5, 14, 146 nucleolus, 145, 147, 166, 167, 168, 171
335
nucleotides, ix, 30, 148, 156, 163, 164, 176, 178, 185, 208, 301, 303 nucleus, 16, 42, 43, 88, 147, 178, 244, 245
O observations, 3, 220 obstruction, 35 Oceania, 306, 307, 308, 310 oligomerization, 85 oligomers, 42 oligosaccharide, 46 omission, 185 oncogenes, 99 opacity, 53 operon, 147 opiates, 273 opportunities, vii, 4, 10, 155, 177, 269 order, 7, 9, 15, 154, 182, 183, 206, 212, 303 organism, viii, ix, 1, 2, 3, 13, 21, 25, 26, 34, 35, 36, 45, 53, 54, 64, 77, 80, 81, 82, 85, 92, 145, 146, 148, 165, 167, 178, 273, 281, 282, 285, 294 organizing, 171 orientation, 11, 16, 226, 235, 248 osteomalacia, 35 osteoporosis, 37, 96 ovarian cancer, 89, 96, 100 ovaries, 93 overlay, 258 oxidation, 34, 53, 92, 263 oxidative damage, 59 oxidative stress, 82 oxygen, 53, 81, 82, 98 oxygen consumption, 98
P p53, 175, 192, 193, 194, 201 pain, 290 paleontology, 230 palindrome, 291 pancreas, 93 pancreatic cancer, 98 parallel, 78, 88, 211, 256, 259, 260, 311 paralysis, 26 parameter, 9, 16, 107, 147, 166, 305, 315 parameters, 9, 16, 107, 109, 176, 183, 184, 195, 255, 256, 283, 305 parents, viii, 130, 166, 309 pathogenesis, 102, 113, 125, 250, 257 pathology, viii, 53, 103, 186, 195, 228, 249, 257, 269 pathophysiology, 123 pathways, 78, 82, 87, 90, 172, 312
336
Index
PCR, 93, 101, 103, 147, 151, 152, 155, 157, 169, 197, 209, 210, 211, 212, 215, 217, 218, 219, 225, 226, 227, 292, 293, 294, 301, 313 peptides, 13, 41, 263, 267, 268, 269, 270 performance, vii, 4, 7, 256, 278, 280 perinatal, 36 peripheral blood, 103, 113, 166 peripheral blood mononuclear cell, 113 permit, 79, 209 peroxide, 53, 54, 57 personality, ix pH, 59, 103 phage, 322 pharmaceuticals, 3 pharmacogenetics, 116 pharmacogenomics, viii pharmacology, 176 pharmacotherapy, 282 phenol, 103, 131 phenotype, 15, 78, 79, 80, 82, 98, 113 phosphates, 67 phosphorylation, 35, 40, 41, 82, 85, 88, 90, 130, 241 phylogenetic tree, 210 physical activity, 256 physical aggression, 283, 290, 292, 293, 294, 295 physics, ix physiology, ix, 3, 295 pigmentation, 310 PKs, 163 plants, vii, 4, 6, 53, 147, 148 plasma, 21, 30, 31, 32, 43, 64, 232, 274, 322 plasminogen, 322 ploidy, 147 PM3, 276 point mutation, 78, 159, 289, 303 Poland, 179, 231, 287 polarization, 7, 285 pollen, 127 pollutants, 102 polyacrylamide, 105, 259 polymerase, 103, 147, 151, 163, 167, 171, 199, 250, 252, 273, 300, 301, 303, 313, 319 polymerase chain reaction, 103, 199, 250, 252, 273, 300, 301, 313, 319 polymerase chain reactions, 250 polymerization, 245, 248 polypeptide, 1, 3, 6, 7, 8, 11, 16, 20, 21, 22, 35, 43, 53, 63, 241, 248, 263, 266, 274 polyploidy, 147 polyunsaturated fat, 46 polyunsaturated fatty acids, 46 population size, 321 porphyria, 207, 232
positive correlation, 182, 255 potassium, 285 power, vii, 240 precipitation, 182, 183 prediction, vii, 67, 210 preeclampsia, 35 prefrontal cortex, 289, 291 pregnancy, ix pressure, 46, 182, 186, 192, 248, 249 prevention, 84, 103, 124 primate, 149, 152, 203, 204, 206, 208, 209, 210, 211, 230, 231, 233 priming, 225 principal component analysis, 134, 316 PRINS, 220, 235 probability, 7, 15, 17, 18, 23, 25, 28, 38, 39, 44, 48, 57, 58, 62, 79, 86, 96, 133, 159, 220, 263 probe, 148, 149 production, 100, 105, 107, 113, 126, 167, 215 progesterone, 232 prognosis, 166, 173, 276 program, 87, 102, 105, 148, 153, 250, 259, 263, 294, 313, 322 programming, 272 project, viii, 2, 12, 13, 15, 239, 270 proliferation, 22, 78, 80, 83, 85, 86, 89, 90, 96, 273 promoter, 22, 47, 55, 86, 87, 91, 98, 104, 105, 113, 114, 120, 125, 126, 145, 148, 149, 150, 151, 155, 156, 159, 161, 163, 169, 205, 207, 233, 285, 286 propagation, 159, 203, 204, 206 properties, 2, 4, 8, 10, 15, 16, 21, 27, 31, 47, 64, 92, 166, 186, 190, 191, 195, 237, 238, 239, 242, 245, 248, 260, 267, 273, 275, 309, 311 propranolol, 289 prostaglandins, 46, 92 prostate, 81, 93, 98, 100, 238, 257, 258, 259, 260, 262, 263, 264, 267, 268, 269, 272, 278, 279 prostate cancer, 81, 98, 100, 257, 269, 272, 278, 279 prostate specific antigen, 278, 279 prostatectomy, 278 proteases, 18, 83 protein analysis, 241 protein family, 18, 30 protein kinases, 40, 42, 43 protein sequence, 12, 17, 18 protein structure, ix, 1, 3, 4, 6, 7, 9, 13, 17, 27 proteolysis, 41, 71 proteome, 237, 241, 243, 279 proteomics, vii, 239, 241, 242, 257, 274, 279 pseudogene, 152, 170, 206 psychological stress, 288 psychology, ix, 295 psychoses, 289
Index psychosomatic, 283 public health, 102, 240, 257, 270 purification, 59
Q quantitative estimation, 15 quantum chemistry, 10, 12 quantum mechanics, 10 quantum-chemical calculations, 10, 11 query, 13
R race, 129, 309, 310, 311 racial differences, 92 radiation, 25, 86, 88, 182, 183, 184 radicals, 6, 59, 63, 82 range, 34, 42, 56, 59, 81, 114, 147, 154, 184, 193, 195, 203, 210, 213, 220, 263, 294, 314, 321 reaction mechanism, 12 reaction rate, 5, 11 reactions, 4, 5, 8, 10, 20, 21, 46, 87, 92, 250, 283 reactivity, 7 reading, 207, 263 reagents, 2, 4 reason, 15, 16, 56, 165, 244, 301, 303 receptors, 41, 42, 102, 116, 166, 284, 285, 292 recognition, 99 recombination, 80, 84, 86, 89, 98, 130, 147, 150, 154, 165, 169, 171, 172, 177, 185, 201, 206, 209, 319 recommendations, iv reconstruction, 3, 130, 131 recovery, 271 reflection, 14 regression, 188 regulation, 21, 45, 46, 78, 81, 83, 84, 90, 91, 93, 94, 96, 98, 125, 146, 154, 159, 167, 204, 205, 206, 211, 212, 231, 235, 248, 271, 273, 285, 289, 291, 296 regulators, 42, 98, 230 relationship, 93, 100, 114, 197 relatives, 12, 79, 97, 103, 228 reliability, 15, 96 religion, 309 renal cell carcinoma, 81, 98 renin, 281, 294, 295, 297 repair, 77, 78, 80, 81, 84, 89, 93, 94, 99, 165 repetitions, 13 replacement, 7, 8, 10, 13, 15, 16, 20, 22, 23, 24, 25, 26, 28, 31, 34, 37, 38, 39, 43, 44, 46, 47, 48, 49, 51, 52, 53, 56, 57, 58, 61, 62, 63, 64, 142, 143, 237, 242, 249, 250, 253, 267
337
replication, 80, 88, 89, 146, 150, 159, 165, 303 repression, 98 reserves, viii, 281, 282, 283, 294 residues, 13, 14, 15, 16, 18, 20, 21, 41, 42, 43, 46, 59, 60, 64, 116, 239, 245, 248, 263 resistance, 27, 43, 54, 88, 90, 138, 182, 184, 197, 277, 282 resolution, 3, 11, 130, 136, 146, 191, 199, 273, 296, 319 respect, 9, 100, 187, 204, 238, 249 respiratory, 56, 82, 101, 102, 105 responsiveness, 116, 127 restriction enzyme, 104 restriction fragment length polymorphis, 149, 232 retention, 154 reticulum, 32, 41, 46, 92 retinoblastoma, 79 retinol, 34 retinopathy, 45 retrovirus, 230, 233, 234 retroviruses, 204, 208, 209, 230, 233, 234 returns, 286 reverse transcriptase, 80, 207, 230 rheumatoid arthritis, 166, 172 rhinitis, 127 ribonucleic acid, 2, 80 ribosomal RNA, ix, 130, 145, 169, 170, 172, 173 ribosome, 146, 167 rickets, 35, 36 rights, iv risk, viii, 26, 31, 34, 37, 43, 45, 46, 47, 53, 55, 63, 64, 77, 86, 88, 92, 93, 95, 97, 98, 100, 101, 103, 107, 112, 116, 120, 123, 124, 169, 207, 232, 257 risk factors, 31, 45, 101, 103, 123, 124 RNA, 2, 5, 12, 15, 130, 145, 147, 163, 167, 169, 170, 171, 172, 173, 205, 206, 207, 208, 230, 232, 233, 264, 286 Roses, 74 Rouleau, 73, 74 RUS, 191 Russia, v, viii, ix, 1, 77, 101, 102, 103, 129, 131, 136, 142, 145, 146, 175, 180, 187, 192, 193, 194, 196, 197, 198, 200, 201, 203, 253, 257, 270, 278, 281, 287, 294, 309, 315, 322
S sampling, 252, 253, 255, 256, 269, 320 sampling error, 320 sarcoidosis, 297 satellite, 147, 166, 168, 170, 172 satellites, 147 saturation, 147, 204 scaffolds, 277
338
Index
scaling, 188, 190, 191 Scandinavia, 143, 192 schizophrenia, 146, 166, 172 scientific progress, 283 screening, 105, 125, 157, 272, 273 search, 3, 78, 124, 141, 203, 209, 210, 211, 215, 217, 233, 235, 250, 257, 281, 289, 294 searches, 213, 237, 239, 243, 256, 269 searching, 8, 148, 152, 159, 163, 257, 263 secretion, 43, 126, 291 segmental glomerulosclerosis, 277 segregation, 82, 138, 165, 232, 322 selective serotonin reuptake inhibitor, 100 senescence, 97, 172 sensing, 231 sensitivity, 11, 22, 31, 47, 150, 213, 215, 265, 282 separation, 10, 307, 308, 310, 311 sepsis, 35 sequencing, 12, 13, 66, 130, 150, 151, 155, 159, 175, 176, 204, 209, 211, 212, 215, 271, 273, 303 serine, 9, 10, 11, 41, 42, 81, 87, 90 serotonin, 25, 100, 281, 284, 285, 286, 289, 291, 292, 293, 294, 295, 296 serum, 31, 35, 45, 105, 116, 120, 124, 125, 126, 166, 172, 272, 278 severe asthma, 109, 112, 114, 116, 118, 120, 123, 126 severity, 107, 112, 113, 114, 116, 118, 123, 125 sex, 177, 181, 234 sex chromosome, 234 shape, 22, 137, 309, 310 shares, 248 shock, 264 shortness of breath, 102 shoulders, ix shyness, 283, 290 sialic acid, 264 Siberia, 130, 135, 139, 140, 143, 189, 193, 321 signal peptide, 13, 47 signaling pathway, 99 signals, 42, 80, 83, 88, 90, 91, 129, 142, 206, 284, 289 single-nucleotide polymorphism, 98, 107, 114, 126 skeletal muscle, 43, 244, 274, 276, 277 skeleton, 36 skin, 99, 103, 120, 273, 307, 309, 310 skin cancer, 99 skin diseases, 273 smoking, 92, 102 smooth muscle, 123, 126, 238, 244, 265, 275, 276 SNP, 2, 12, 13, 14, 15, 16, 24, 37, 46, 53, 66, 217, 237, 244, 245, 247, 249, 250, 252, 253, 254, 255, 256, 268, 269, 282, 296
social behaviour, 309 social life, viii sodium, 47 software, 15, 105, 124, 141, 194, 313 somatic cell, 43, 78, 81, 96, 171 somatic mutations, 78, 87, 91, 99 sorption, 8, 11, 16, 17 South Africa, 307, 308, 310 South Asia, 197, 307, 308 Southeast Asia, 138, 140, 310, 315, 316, 317 Southern blot, 152, 165 space, 7, 8, 9, 11, 21, 59, 60, 315 Spain, 25, 102 specialists, 78 speciation, 149, 210 species, 30, 130, 131, 137, 145, 147, 148, 149, 159, 176, 203, 204, 211, 219, 230, 282, 283 spectroscopy, 14, 259, 260, 264 spectrum, 25, 32, 34, 187, 191, 240, 305 sports, 250, 252, 253, 254, 255, 260, 269, 286, 291, 293, 294 St. Petersburg, 256, 270 stability, 2, 15, 16, 26, 27, 31, 60, 84, 86, 89, 217, 282, 312 stabilization, 20 statistics, 102, 188, 278 stem cells, 97 steroids, 32 stimulus, 288 stock, 22 stomach, 32, 43, 93, 257 storage, 4, 175 strategies, 203, 209, 282 strategy, 80, 123, 210, 219, 241, 243, 257, 263, 296, 320 stratification, 201, 276 stress, 25, 63, 86, 168, 281, 282, 286, 291, 294, 295, 296 structural changes, 11, 93 structural gene, 21, 196 structural modifications, ix structure formation, 7 students, 294, 295 subdomains, 245 subgroups, 116, 118, 133, 137, 210, 256 substitution, ix, 81, 86, 94, 176, 179, 268, 303 substitutions, ix, 81, 84, 87, 93, 109, 147, 149, 159, 164, 165, 176, 181, 193, 206, 244, 300 substrates, 5, 7, 10, 16, 25, 26, 27, 31, 32, 34, 40, 41, 42 summer, 13, 15 Sun, 70, 126, 139 supply, 240
Index suppression, 47, 54, 98, 211, 215, 218 survey, 5, 180, 319, 320 survival, 81, 98, 279, 283 susceptibility, 77, 78, 98, 99, 102, 123, 124, 125, 195, 296 Sweden, 257 switching, 159 symbols, 150, 218 symptoms, 100, 102 synapse, 286 synaptic gap, 285, 286 syndrome, 22, 56, 85, 146, 166, 172, 182, 207, 231, 232, 276, 283 synergid, 91 synthesis, 4, 34, 46, 47, 93, 113, 125, 141, 146, 166, 167, 168, 171, 205, 212, 220, 267, 284
T T cell, 105 tandem repeats, 146, 156, 171, 177, 185, 197, 198, 199, 220, 285, 301, 311, 319 targets, viii, 3, 41, 279 TCC, 104, 148 teeth, 37 telangiectasia, 88 telomere, 81, 154, 155, 167 telophase, 145, 146 temperature, 182, 183, 184 tempo, 233 temporal lobe, 284 tension, 291 territory, 175, 191, 194, 312 texture, 309, 310 TGF, 279 therapy, 25, 55, 103, 118 threonine, 41, 42, 81, 90 thrombin, 18 thymine, 105 thyroid, 2, 57 thyroid gland, 57 thyrotoxicosis, 35 tics, 123 time periods, 206 tissue, 35, 77, 80, 81, 83, 91, 219, 225, 228, 235, 242, 244, 246, 248, 249, 272, 274, 322 tissue homeostasis, 80 TLR9, 166 TNF, 103, 114, 116, 126 TNF-α, 103 tobacco, 93 toxic effect, 67 toxic substances, 54 toxicity, 25, 31
339
toxicology, 272 trachea, 126 tracking, 130, 131, 141 traditions, 309 training, 9, 81, 269, 277 traits, 125, 190, 306, 307, 309, 311 transcription, 42, 43, 85, 87, 145, 146, 148, 149, 150, 151, 153, 154, 156, 157, 159, 161, 162, 163, 164, 165, 166, 167, 168, 171, 172, 205, 207, 212, 225, 226, 227, 228, 230, 235, 285, 286 transcription factors, 42, 43, 159, 163 transcriptomics, 239 transcripts, 146, 170, 205, 207, 225, 226, 227, 228, 235, 236, 267 transduction, 208 transfer RNA, 130 transformation, 5, 6, 16, 21, 27, 78, 80, 81, 84, 86, 92, 129, 245, 270 transformations, 2, 26 transfusion, 22 transition, viii, 10, 105, 205, 239, 241 transitions, 170 translation, 145, 207, 270, 272, 273 translocation, 83, 84, 97, 154 transmembrane glycoprotein, 83 transmission, 22, 83, 289 transport, 35, 64, 82, 92, 187, 239, 240, 286 traumatic events, 286, 288, 294 trees, 16, 313, 319 trends, 175, 176, 315 tribes, 130, 131, 136, 137, 203, 307, 308, 309 triggers, 123 trypsin, 18, 20 tryptophan, 285, 296 tumor, 54, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 91, 93, 95, 96, 97, 98, 99, 102, 114, 166, 175, 193, 219, 257, 259, 262, 264, 267, 276 tumor cells, 54, 79, 80, 81, 82, 83, 97, 257 tumor metastasis, 276 tumor necrosis factor, 102, 114 tumor progression, 80, 81, 91 tumorigenesis, 167 tumors, 77, 79, 80, 81, 82, 83, 84, 90, 91, 93, 96, 97, 99, 244, 257, 260, 269, 278 Turks, 129, 136 twins, 283 type 2 diabetes, 43 tyrosine, 41, 42, 90, 303 Tyrosine, 41 tyrosine hydroxylase, 303
U Ukraine, 179, 180, 196
Index
340 ulcerative colitis, 35 ultrastructure, 277 UN, 143 uncertainty, 17 uniform, 260 United States, 278, 312 updating, 252 USSR, 253 UV, 88, 105, 152 UV light, 105 UV-radiation, 88 Uzbekistan, 313
violence, 284 viruses, 78, 89 visualization, 14, 105, 188 vitamin D, 35 volleyball, 291 vulnerability, 283
W
V variability, 18, 31, 34, 131, 134, 140, 146, 154, 156, 163, 164, 165, 167, 175, 176, 177, 178, 185, 187, 188, 190, 191, 192, 200, 203, 228, 263, 282, 283, 303, 306, 310, 320 variables, 182 variance, 305, 306 variations, 1, 2, 4, 9, 12, 13, 64, 151, 169, 176, 178, 285 vascular dementia, 31 vascular system, 81 vascular wall, 83 vasopressin, 291 vasopressor, 45 vector, 16 ventricle, 279 vertebrates, 145, 146, 157, 244, 273, 319 vessels, 45, 56, 105 video, 258 village, 313
web, 14, 277 Western Europe, 130, 179, 198 Western Siberia, 193 wheat, 169, 234 wheezing, 102 wild type, 290 women, 31, 35, 37, 77, 93, 96, 100, 277, 290, 291 workers, 114, 215, 217
X X chromosome, 300 X-ray, 9, 14 X-ray diffraction, 14
Y Y chromosome, 130, 170, 178, 179, 180, 185, 196, 198, 299, 302, 303, 304, 312 yeast, 147, 148, 233
Z zinc, 43, 233